chap_modules_common.html

来自「网络爬虫开源代码」· HTML 代码 · 共 175 行 · 第 1/2 页

HTML
175
字号
import java.util.logging.Level;import java.util.logging.Logger;import javax.management.AttributeNotFoundException;import org.archive.crawler.settings.MapType;import org.archive.crawler.settings.ModuleType;import org.archive.crawler.settings.RegularExpressionConstraint;import org.archive.crawler.settings.SimpleType;import org.archive.crawler.settings.Type;public class Foo extends ModuleType {  private static Logger logger = Logger.getLogger("myModule.Foo"); <a name="simpleEx_logger" href="chap_modules_common.html#simpleEx_txt_logger"><img border="0" alt="1" src="images/callouts/1.png"></a>  public Foo(String name) {    Type mySimpleType1 = new SimpleType(                "name1", "Description1", new Integer(10)); <a name="simpleEx_addSimpleType" href="chap_modules_common.html#simpleEx_txt_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a>    addElementToDefinition(mySimpleType1);    Type mySimpleType2 = new SimpleType(                "name2", "Description2", "defaultValue");    addElementToDefinition(mySimpleType2);    mySimpleType2.addConstraint(new RegularExpressionConstraint( <a name="simpleEx_addConstraint" href="chap_modules_common.html#simpleEx_txt_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a>                ".*Val.*", Level.WARNING,                "This field must contain 'Val' as part of the string."));    Type myMapType = new MapType("name3", "Description3", String.class); <a name="simpleEx_addMap" href="chap_modules_common.html#simpleEx_txt_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a>    addElementToDefinition(myMapType);  }  public void getMyTypeValue(CrawlURI curi) {    try {      int maxBandwidthKB = ((Integer) getAttribute("name1", curi)).intValue(); <a name="simpleEx_getAttribute" href="chap_modules_common.html#simpleEx_txt_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a>    } catch (AttributeNotFoundException e) {      logger.warning(e.getMessage());    }  }  public void playWithMap(CrawlURI curi) {    try {      MapType myMapType = (MapType) getAttribute("name3", curi);      myMapType.addElement(              null, new SimpleType("name", "Description", "defaultValue")); <a name="simpleEx_addElement" href="chap_modules_common.html#simpleEx_txt_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a>      myMapType.setAttribute(new Attribute("name", "newValue")); <a name="simpleEx_setAttribute" href="chap_modules_common.html#simpleEx_txt_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a>    } catch (Exception e) {      logger.warning(e.getMessage());    }  }}</pre><p>This example shows several things:<div class="calloutlist"><table summary="Callout list" border="0"><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_logger"></a><a href="#simpleEx_logger"><img border="0" alt="1" src="images/callouts/1.png"></a> </td><td align="left" valign="top"><p>One thing that we have not mentioned before is how we do            general error logging. Heritrix uses the standard Java 1.4 logging            facility. The convention is to initialize it with the class            name.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addSimpleType"></a><a href="#simpleEx_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a> </td><td align="left" valign="top"><p>Here we define and add a SimpleType that takes an Integer as            the argument and setting it to '10' as the default value.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addConstraint"></a><a href="#simpleEx_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a> </td><td align="left" valign="top"><p>It is possible to add constraints on fields. In addition to            be constrained to only take strings, this field add a requirement            that the string should contain 'Val' as part of the string. The            constraint also has a level and a description. The description is            used by the user interface to give the user a fairly good            explanation if the submitted value doesn't fit in with the            constraint. Three levels are honored. Level.INFO</p><div class="variablelist"><dl><dt><span class="term">Level.INFO</span></dt><dd><p>Values are accepted even if they don't fulfill the                  constraint's requirement. This is used when you don't want                  to disallow the value, but warn the user that the value                  seems to be out of reasonable bounds.</p></dd><dt><span class="term">Level.WARNING</span></dt><dd><p>The value must be accepted by the constraint to be                  valid in crawl jobs, but is legal in profiles even if it                  doesn't. This is used to be able to put values into a                  profile that a user should change for every crawl job                  derived from the profile.</p></dd><dt><span class="term">Level.SEVERE</span></dt><dd><p>The value is not allowed whatsoever if it isn't                  accepted by the constraint.</p></dd></dl></div><p>See the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/Constraint.html" target="_top">Constraint</a>            class for more information.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addMap"></a><a href="#simpleEx_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a> </td><td align="left" valign="top"><p>This line defines a MapType allowing only Strings as            values.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_getAttribute"></a><a href="#simpleEx_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a> </td><td align="left" valign="top"><p>An example of how to read an attribute.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addElement"></a><a href="#simpleEx_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a> </td><td align="left" valign="top"><p>Here we add a new element to the MapType. This element is            valid for this map because its default value is a String.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_setAttribute"></a><a href="#simpleEx_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a> </td><td align="left" valign="top"><p>Now we change the value of the newly added attribute. JMX            requires that the new value is wrapped in an object of type            Attribute which holds both the name and the new value.</p></td></tr></table></div></p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>To make your module known to Heritrix, you need to make mention of      it in the appropriate <code class="filename">src/conf/modules</code> file: i.e.      if your module is a Processor, it needs to be mentioned in the           <code class="filename">Processor.options</code> file.          The options files get built into the Heritrix jar.      </p><p>&ldquo;<span class="quote">A little known fact about Heritrix: When trying to readmodules/Processor.options Heritrix will concatenate any such files it findson the classpath.This means that if you write your own processor and wrap it in a jar you cansimply include in that jar a modules/Processor.options file with just theone line needed to add your processor. Then simply add the new jar to the$HERITRIX_HOME/lib directory and you are done. No need to mess with theHeritrix binaries.  For an example of how this is done, look at the code forthis project: <a href="http://vefsofnun.bok.hi.is/deduplicator/index.html" target="_top">deduplicator</a></span>&rdquo; [Kristinn Sigur&eth;sson on the mailing list,<a href="http://tech.groups.yahoo.com/group/archive-crawler/message/3281" target="_top">3281</a>].      </p></div><p>If everything seems ok so far, then we are almost ready to write      some real modules.</p></div></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="settings.html">Prev</a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="uri.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">5.&nbsp;Settings&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%">&nbsp;7.&nbsp;Some notes on the URI classes</td></tr></table></div></body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?