directoryspider.java

来自「一个使用的搜索引擎」· Java 代码 · 共 84 行

JAVA

84 行

package ir.webutils;import java.util.*;import java.net.*;import java.io.*;/** * Spider that limits itself to the directory it started in. * * @author Ted Wild and Ray Mooney */public class DirectorySpider extends Spider {    static URL firstURL;    /**     * Gets links from the page that are in or below the starting     * directory.     *     * @return The links on <code>page</code> that are in or below the     * directory of the first page. */    public List getNewLinks(HTMLPage page) {	List links = page.getOutLinks();	URL url = page.getLink().getURL();	ListIterator iterator = links.listIterator();	while(iterator.hasNext()) {	    Link link = (Link)iterator.next();	    if(!url.getHost().equals(link.getURL().getHost()))		iterator.remove();	    else if (!link.getURL().getPath().startsWith(getDirectory(firstURL)))		iterator.remove();	}	return links;	    }    /**     * Sets the initial URL from the "-u" argument, then calls the     * corresponding superclass method.     *     * @param value The value of the "-u" command line argument.  */    protected void handleUCommandLineOption(String value) {	try {	    firstURL = new URL(value);	}	catch (MalformedURLException e) {	    System.out.println(e.toString());	    System.exit(-1);	}	super.handleUCommandLineOption(value);    }    private String getDirectory(URL u) {	String directory = u.getPath();	if (directory.indexOf(".") != -1)	    directory = directory.substring(0, directory.lastIndexOf("/"));	return directory;    }       /** Spider the web according to the following command options,     * but only below the start URL directory.     * <ul>      * <li>-safe : Check for and obey robots.txt and robots META tag      * directives.</li>      * <li>-d &lt;directory&gt; : Store indexed files in &lt;directory&gt.</li>     * <li>-c &lt;count&gt; : Store at most &lt;count&gt; files.</li>      * <li>-u &lt;url&gt; : Start at &lt;url&gt;.</li>     * <li>-slow : Pause briefly before getting a page.  This can be      * useful when debugging.     * </ul>     */    public static void main(String args[]) {	new DirectorySpider().go(args);    }}

directoryspider.java - 源码说明

本页面展示了「一个使用的搜索引擎」中的 directoryspider.java 源码文件，采用 Java 编程语言编写，共 84 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与搜索引擎相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?