⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chap_modules_common.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 2 页
字号:
      object. If it can't find anything useful, the global settings are used      as the context. If you don't have any context at all, which is the case      in some initialization code, the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String)" target="_top">getAttribute(String      name)</a> could be used.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N102C3"></a>6.3.&nbsp;Putting together a simple module</h3></div></div></div><p>From what we learned so far, let's put together a module that      doesn't do anything useful, but show some of the concepts.</p><pre class="programlisting">package myModule;import java.util.logging.Level;import java.util.logging.Logger;import javax.management.AttributeNotFoundException;import org.archive.crawler.settings.MapType;import org.archive.crawler.settings.ModuleType;import org.archive.crawler.settings.RegularExpressionConstraint;import org.archive.crawler.settings.SimpleType;import org.archive.crawler.settings.Type;public class Foo extends ModuleType {  private static Logger logger = Logger.getLogger("myModule.Foo"); <a name="simpleEx_logger" href="chap_modules_common.html#simpleEx_txt_logger"><img border="0" alt="1" src="images/callouts/1.png"></a>  public Foo(String name) {    Type mySimpleType1 = new SimpleType(                "name1", "Description1", new Integer(10)); <a name="simpleEx_addSimpleType" href="chap_modules_common.html#simpleEx_txt_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a>    addElementToDefinition(mySimpleType1);    Type mySimpleType2 = new SimpleType(                "name2", "Description2", "defaultValue");    addElementToDefinition(mySimpleType2);    mySimpleType2.addConstraint(new RegularExpressionConstraint( <a name="simpleEx_addConstraint" href="chap_modules_common.html#simpleEx_txt_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a>                ".*Val.*", Level.WARNING,                "This field must contain 'Val' as part of the string."));    Type myMapType = new MapType("name3", "Description3", String.class); <a name="simpleEx_addMap" href="chap_modules_common.html#simpleEx_txt_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a>    addElementToDefinition(myMapType);  }  public void getMyTypeValue(CrawlURI curi) {    try {      int maxBandwidthKB = ((Integer) getAttribute("name1", curi)).intValue(); <a name="simpleEx_getAttribute" href="chap_modules_common.html#simpleEx_txt_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a>    } catch (AttributeNotFoundException e) {      logger.warning(e.getMessage());    }  }  public void playWithMap(CrawlURI curi) {    try {      MapType myMapType = (MapType) getAttribute("name3", curi);      myMapType.addElement(              null, new SimpleType("name", "Description", "defaultValue")); <a name="simpleEx_addElement" href="chap_modules_common.html#simpleEx_txt_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a>      myMapType.setAttribute(new Attribute("name", "newValue")); <a name="simpleEx_setAttribute" href="chap_modules_common.html#simpleEx_txt_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a>    } catch (Exception e) {      logger.warning(e.getMessage());    }  }}</pre><p>This example shows several things:<div class="calloutlist"><table summary="Callout list" border="0"><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_logger"></a><a href="#simpleEx_logger"><img border="0" alt="1" src="images/callouts/1.png"></a> </td><td align="left" valign="top"><p>One thing that we have not mentioned before is how we do            general error logging. Heritrix uses the standard Java 1.4 logging            facility. The convention is to initialize it with the class            name.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addSimpleType"></a><a href="#simpleEx_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a> </td><td align="left" valign="top"><p>Here we define and add a SimpleType that takes an Integer as            the argument and setting it to '10' as the default value.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addConstraint"></a><a href="#simpleEx_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a> </td><td align="left" valign="top"><p>It is possible to add constraints on fields. In addition to            be constrained to only take strings, this field add a requirement            that the string should contain 'Val' as part of the string. The            constraint also has a level and a description. The description is            used by the user interface to give the user a fairly good            explanation if the submitted value doesn't fit in with the            constraint. Three levels are honored. Level.INFO</p><div class="variablelist"><dl><dt><span class="term">Level.INFO</span></dt><dd><p>Values are accepted even if they don't fulfill the                  constraint's requirement. This is used when you don't want                  to disallow the value, but warn the user that the value                  seems to be out of reasonable bounds.</p></dd><dt><span class="term">Level.WARNING</span></dt><dd><p>The value must be accepted by the constraint to be                  valid in crawl jobs, but is legal in profiles even if it                  doesn't. This is used to be able to put values into a                  profile that a user should change for every crawl job                  derived from the profile.</p></dd><dt><span class="term">Level.SEVERE</span></dt><dd><p>The value is not allowed whatsoever if it isn't                  accepted by the constraint.</p></dd></dl></div><p>See the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/Constraint.html" target="_top">Constraint</a>            class for more information.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addMap"></a><a href="#simpleEx_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a> </td><td align="left" valign="top"><p>This line defines a MapType allowing only Strings as            values.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_getAttribute"></a><a href="#simpleEx_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a> </td><td align="left" valign="top"><p>An example of how to read an attribute.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addElement"></a><a href="#simpleEx_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a> </td><td align="left" valign="top"><p>Here we add a new element to the MapType. This element is            valid for this map because its default value is a String.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_setAttribute"></a><a href="#simpleEx_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a> </td><td align="left" valign="top"><p>Now we change the value of the newly added attribute. JMX            requires that the new value is wrapped in an object of type            Attribute which holds both the name and the new value.</p></td></tr></table></div></p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>To make your module known to Heritrix, you need to make mention of      it in the appropriate <code class="filename">src/conf/modules</code> file: i.e.      if your module is a Processor, it needs to be mentioned in the           <code class="filename">Processor.options</code> file.          The options files get built into the Heritrix jar.      </p></div><p>If everything seems ok so far, then we are almost ready to write      some real modules.</p></div></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ar01s05.html">Prev</a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="ar01s07.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">5.&nbsp;Settings&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%">&nbsp;7.&nbsp;Some notes on the URI classes</td></tr></table></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -