⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chap_modules_common.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>6.&nbsp;Common needs for all configurable modules</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix developer documentation"><link rel="up" href="index.html" title="Heritrix developer documentation"><link rel="prev" href="ar01s05.html" title="5.&nbsp;Settings"><link rel="next" href="ar01s07.html" title="7.&nbsp;Some notes on the URI classes"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">6.&nbsp;Common needs for all configurable modules</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ar01s05.html">Prev</a>&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="ar01s07.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="chap_modules_common"></a>6.&nbsp;Common needs for all configurable modules</h2></div></div></div><p>As mentioned earlier all configurable modules in Heritrix subclasses    ComplexType (or one of its descendants). When you write your own module    you should inherit from <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ModuleType.html" target="_top">ModuleType</a>    which is a subclass of ComplexType intended for be subclassed by all    modules in Heritrix.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N10279"></a>6.1.&nbsp;Definition of a module</h3></div></div></div><p>Heritrix knows how to handle a ComplexType and to get the needed      information to render the user interface part for it. To make this      happen your module has to obey some rules.</p><div class="orderedlist"><ol type="1"><li><p>A module should always implement a constructor taking exactly          one argument - the name argument (<a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ModuleType.html#ModuleType(java.lang.String)" target="_top">see          ModuleType(String name)</a>).</p></li><li><p>All attributes you want to be configurable should be defined          in the constructor of the module.</p></li></ol></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N1028B"></a>6.1.1.&nbsp;The obligatory one argument constructor</h4></div></div></div><p>All modules need to have a constructor taking a String argument.        This string is used to identify the module. In the case where a module        is of a type that is replacing an existing module of which there could        only be one, it is important that the same name is being used. In this        case the constructor might choose to ignore the name string and        substitute it with a hard coded one. This is for example the case with        the Frontier. The name of the Frontier should always be the string        "frontier". For this reason the Frontier interface that all        Frontiers should implement has a static variable:        <pre class="programlisting">public static final String ATTR_NAME = "frontier";</pre> which        implementations of the Frontier use instead of the string argument        submitted to the constructor. Here is the part of the default        Frontiers' constructor that shows how this should be        done. <pre class="programlisting">public Frontier(String name) {    //The 'name' of all frontiers should be the same (Frontier.ATTR_NAME)    //therefore we'll ignore the supplied parameter.     super(Frontier.ATTR_NAME, "HostQueuesFrontier. Maintains the internal" +        " state of the crawl. It dictates the order in which URIs" +        " will be scheduled. \nThis frontier is mostly a breadth-first" +        " frontier, which refrains from emitting more than one" +        " CrawlURI of the same \'key\' (host) at once, and respects" +        " minimum-delay and delay-factor specifications for" +        " politeness.");</pre>        As shown in this example, the        constructor must call the superclass's constructor. This example also        shows how to set the description of a module. The description is used        by the user interface to guide the user in configuring the crawl. If        you don't want to set a description (strongly discouraged), the        ModuleType also has a one argument constructor taking just the        name.</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="N10298"></a>6.1.2.&nbsp;Defining attributes</h4></div></div></div><p>The attributes on a module you want to be configurable must be        defined in the modules constructor. For this purpose the ComplexType        has a method <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ComplexType.html#addElementToDefinition(org.archive.crawler.settings.Type)" target="_top">addElementToDefinition(Type        type)</a>. The argument given to this method is a definition of        the attribute. The <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/Type.html" target="_top">Type</a>        class is the superclass of all the attribute definitions allowed for a        ModuleType. Since the ComplexType, which ModuleType inherits, is        itself a subclass of Type, you can add new ModuleTypes as attributes        to your module. The Type class implements configuration methods common        for all Types that defines an attribute on your module. The        addElementToDefinition method returns the added Type so that it is        easy to refine the configuration of the Type. Lets look at an example        (also from the default Frontier) of an attribute        definition.<pre class="programlisting">public final static String ATTR_MAX_OVERALL_BANDWIDTH_USAGE =        "total-bandwidth-usage-KB-sec";private final static Integer DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE =        new Integer(0);...Type t;t = addElementToDefinition(    new SimpleType(ATTR_MAX_OVERALL_BANDWIDTH_USAGE,    "The maximum average bandwidth the crawler is allowed to use. " +    "The actual readspeed is not affected by this setting, it only " +    "holds back new URIs from being processed when the bandwidth " +    "usage has been to high.\n0 means no bandwidth limitation.",    DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE));t.setOverrideable(false);</pre> Here we add an attribute definition        of the SimpleType (which is a subclass of Type). The SimpleType's        constructor takes three arguments: name, description and a default        value. Usually the name and default value are defined as constants        like here, but this is of course optional. The line        <span><strong class="command">t.setOverrideable(false);</strong></span> informs the settings        framework to not allow per overrides on this attribute. For a full        list of methods for configuring a Type see the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/Type.html" target="_top">Type</a>        class.</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N102B2"></a>6.2.&nbsp;Accessing attributes</h3></div></div></div><p>In most cases when the module needs to access its own attributes,      a CrawlURI is available. The right way to make sure that all the      overrides and refinements is considered is then to use the method <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String,%20org.archive.crawler.datamodel.CrawlURI)" target="_top">getAttribute(String      name, CrawlURI uri)</a> to get the attribute. Sometimes the context      you are working in could be defined by other objects than the CrawlURI,      then use the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.Object,%20java.lang.String)" target="_top">getAttribute(Object      context, String name)</a> method to get the value. This method tries      its best at getting some useful context information out of an object.      What it does is checking if the context is any kind of URI or a settings

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -