⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 arachnid.html

📁 网络实验参考资料,希望对大家有点用,也是希望资源共享啊
💻 HTML
字号:
<html><head><title>Arachnid Web Spider Framework</title><head><body><h1>Arachnid Web Spider Framework</h1><h2>Description</h2>Arachnid is a Java-based web spider framework.  It includes asimple HTML parser object that parses an input stream containingHTML content.  Simple Web spiders can be created by sub-classingArachnid and adding a few lines of code called after each pageof a Web site is parsed.  Two example spider applications areincluded to illustrate how to use the framework.<h2><font color="red">Warning</font></h2><b>WARNING</b>:A Web spider may put a large load on a server and a network.  You may wish to do this by design - for instance when load testing YOUR server, using YOUR hosts and YOUR network.  <b>DO NOT</b> use this software to place an excessive load on someone elses host and network resources without explicit permission!!<h2>Author</h2>This software was written by <a href="http://sourceforge.net/users/turingtest/">Robert Platt</a>.<h2>Use</h2><ul><li>Build a Arachnid.jar file using build.xml and Ant.  You canalso build documentation using the 'docs' target.</li><li>Add the jar file to your CLASSPATH</li><li>Arachnid is an abstract base class that uses the"visitor" pattern.  It has a "traverse()" method that walks through a Web site.  For each (valid) pagein the site it calls the abstract method handleLink().You need to dervie a sub-class from Arachnid and definea handleLink() method.  This will be called for eachand every valid page in the Web site.  A PageInfo object is passed to handleLink().The PageInfo object contains useful information aboutthe Web page. Four other methods must be defined:  <p><ul>  <li>handleBadLink() - for processing an invalid URL</li>  <li>handleNonHTMLlink() - for processing links to non-HTML resources</li>  <li>handleExternalLink() - for processing links that are outside the Web site</li>  <li>handleBadIO() - in the event of an I/O problem while attempting to process a Web page  </ul></p>Instantiate your sub-class and call traverse().</li><li>Compile your application and run it.</li></ul><h2>Example</h2><p>The following code uses Arachnid to generate a (very simplistic) sitemap for a Web site.</p><pre><code>import java.io.*;import java.net.*;import java.util.*;import bplatt.spider.*;public class SimpleSiteMapGen {  private String site;  private final static String header = "&lt;html&gt;&lt;head&gt;&lt;title&gt;Site Map&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;ul&gt;";  private final static String trailer = "&lt;/ul&gt;&lt;/body&gt;&lt;/html&gt;";     public static void main(String[] args) {    if (args.length != 1) {      System.err.println("java SimpleSiteMapGen &lt;url&gt;");      System.exit(-1);    }    SimpleSiteMapGen s = new SimpleSiteMapGen(args[0]);    s.generate();  }    public SimpleSiteMapGen(String site) { this.site = site; }    public void generate() {    MySpider spider = null;    try { spider = new MySpider(site); }    catch(MalformedURLException e) {      System.err.println(e);      System.err.println("Invalid URL: "+site);      return;    }    System.out.println(header);    spider.traverse();    System.out.println(trailer);  }}class MySpider extends Arachnid {  public MySpider(String base) throws MalformedURLException { super(base); }    protected void handleLink(PageInfo p) {    String link = p.getUrl().toString();    String title = p.getTitle();    if (link == null || title == null || link.length() == 0 || title.length() ==0) return;    else System.out.println("&lt;li&gt;&lt;a href=\""+link+"\"&gt;"+title+"&lt;/a&gt;&lt;/li&gt;");  }  protected void handleBadLink(URL url,URL parent, PageInfo p) { }  protected void handleBadIO(URL url, URL parent) { }  protected void handleNonHTMLlink(URL url, URL parent,PageInfo p) { }  protected void handleExternalLink(URL url, URL parent) { }}</code></pre><h2>Availability</h2>The Arachnid Web Spider framework is available via <a href="http://sourceforge.net">SourceForge</a>.Follow this <a href="http://sourceforge.net/projects/arachnid/">link</a>to obtain the source code.  If you don't alreadyhave a Java Virtual Machine, you can obtain one from<a href="http://java.sun.com/j2se/downloads.html">Sun Microsystems</a>.<h2>License</h2>The Arachnid Web Spider framework is licensed under the GNU Public License. See GPL.txt fordetails.  If you are unable or unwilling to abide by the terms ofthis license, please remove this code from your machine.<h2>Support</h2>The Arachnid Web Spider framework is distributed <b>AS IS</b>, with <b>NO SUPPORT</b>.<p><a href="http://sourceforge.net"> <img src="http://sourceforge.net/sflogo.php?group_id=59326&amp;type=5" width="210" height="62" border="0" alt="SourceForge Logo"></a></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -