⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 sitespider.java

📁 一个使用的搜索引擎
💻 JAVA
字号:
package ir.webutils;import java.util.*;import java.net.*;import java.io.*;/**  * A spider that limits itself to a given site. * * @author Ray Mooney */public class SiteSpider extends Spider {    /**     * Gets links from the given page that are on the same host as the     * page.     *     * @return A list of links on <code>page</code> that have the same     * host as <code>url</code>.  */    public List getNewLinks(HTMLPage page) {	List links = page.getOutLinks();	URL url = page.getLink().getURL();	ListIterator iterator = links.listIterator();	while(iterator.hasNext()) {	    Link link = (Link) iterator.next();	    if(!url.getHost().equals(link.getURL().getHost()))		iterator.remove();	}	return links;	    }       /** Spider the web according to the following command options,     * but stay within the given site (same URL host).     * <ul>      * <li>-safe : Check for and obey robots.txt and robots META tag      * directives.</li>      * <li>-d &lt;directory&gt; : Store indexed files in &lt;directory&gt.</li>     * <li>-c &lt;count&gt; : Store at most &lt;count&gt; files.</li>      * <li>-u &lt;url&gt; : Start at &lt;url&gt;.</li>     * <li>-slow : Pause briefly before getting a page.  This can be      * useful when debugging.     * </ul>     */    public static void main(String args[]) {	new SiteSpider().go(args);    }}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -