📄 chap_modules_common.html
字号:
object. If it can't find anything useful, the global settings are used as the context. If you don't have any context at all, which is the case in some initialization code, the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String)" target="_top">getAttribute(String name)</a> could be used.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="N102C3"></a>6.3. Putting together a simple module</h3></div></div></div><p>From what we learned so far, let's put together a module that doesn't do anything useful, but show some of the concepts.</p><pre class="programlisting">package myModule;import java.util.logging.Level;import java.util.logging.Logger;import javax.management.AttributeNotFoundException;import org.archive.crawler.settings.MapType;import org.archive.crawler.settings.ModuleType;import org.archive.crawler.settings.RegularExpressionConstraint;import org.archive.crawler.settings.SimpleType;import org.archive.crawler.settings.Type;public class Foo extends ModuleType { private static Logger logger = Logger.getLogger("myModule.Foo"); <a name="simpleEx_logger" href="chap_modules_common.html#simpleEx_txt_logger"><img border="0" alt="1" src="images/callouts/1.png"></a> public Foo(String name) { Type mySimpleType1 = new SimpleType( "name1", "Description1", new Integer(10)); <a name="simpleEx_addSimpleType" href="chap_modules_common.html#simpleEx_txt_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a> addElementToDefinition(mySimpleType1); Type mySimpleType2 = new SimpleType( "name2", "Description2", "defaultValue"); addElementToDefinition(mySimpleType2); mySimpleType2.addConstraint(new RegularExpressionConstraint( <a name="simpleEx_addConstraint" href="chap_modules_common.html#simpleEx_txt_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a> ".*Val.*", Level.WARNING, "This field must contain 'Val' as part of the string.")); Type myMapType = new MapType("name3", "Description3", String.class); <a name="simpleEx_addMap" href="chap_modules_common.html#simpleEx_txt_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a> addElementToDefinition(myMapType); } public void getMyTypeValue(CrawlURI curi) { try { int maxBandwidthKB = ((Integer) getAttribute("name1", curi)).intValue(); <a name="simpleEx_getAttribute" href="chap_modules_common.html#simpleEx_txt_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a> } catch (AttributeNotFoundException e) { logger.warning(e.getMessage()); } } public void playWithMap(CrawlURI curi) { try { MapType myMapType = (MapType) getAttribute("name3", curi); myMapType.addElement( null, new SimpleType("name", "Description", "defaultValue")); <a name="simpleEx_addElement" href="chap_modules_common.html#simpleEx_txt_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a> myMapType.setAttribute(new Attribute("name", "newValue")); <a name="simpleEx_setAttribute" href="chap_modules_common.html#simpleEx_txt_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a> } catch (Exception e) { logger.warning(e.getMessage()); } }}</pre><p>This example shows several things:<div class="calloutlist"><table summary="Callout list" border="0"><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_logger"></a><a href="#simpleEx_logger"><img border="0" alt="1" src="images/callouts/1.png"></a> </td><td align="left" valign="top"><p>One thing that we have not mentioned before is how we do general error logging. Heritrix uses the standard Java 1.4 logging facility. The convention is to initialize it with the class name.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addSimpleType"></a><a href="#simpleEx_addSimpleType"><img border="0" alt="2" src="images/callouts/2.png"></a> </td><td align="left" valign="top"><p>Here we define and add a SimpleType that takes an Integer as the argument and setting it to '10' as the default value.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addConstraint"></a><a href="#simpleEx_addConstraint"><img border="0" alt="3" src="images/callouts/3.png"></a> </td><td align="left" valign="top"><p>It is possible to add constraints on fields. In addition to be constrained to only take strings, this field add a requirement that the string should contain 'Val' as part of the string. The constraint also has a level and a description. The description is used by the user interface to give the user a fairly good explanation if the submitted value doesn't fit in with the constraint. Three levels are honored. Level.INFO</p><div class="variablelist"><dl><dt><span class="term">Level.INFO</span></dt><dd><p>Values are accepted even if they don't fulfill the constraint's requirement. This is used when you don't want to disallow the value, but warn the user that the value seems to be out of reasonable bounds.</p></dd><dt><span class="term">Level.WARNING</span></dt><dd><p>The value must be accepted by the constraint to be valid in crawl jobs, but is legal in profiles even if it doesn't. This is used to be able to put values into a profile that a user should change for every crawl job derived from the profile.</p></dd><dt><span class="term">Level.SEVERE</span></dt><dd><p>The value is not allowed whatsoever if it isn't accepted by the constraint.</p></dd></dl></div><p>See the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/settings/Constraint.html" target="_top">Constraint</a> class for more information.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addMap"></a><a href="#simpleEx_addMap"><img border="0" alt="4" src="images/callouts/4.png"></a> </td><td align="left" valign="top"><p>This line defines a MapType allowing only Strings as values.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_getAttribute"></a><a href="#simpleEx_getAttribute"><img border="0" alt="5" src="images/callouts/5.png"></a> </td><td align="left" valign="top"><p>An example of how to read an attribute.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_addElement"></a><a href="#simpleEx_addElement"><img border="0" alt="6" src="images/callouts/6.png"></a> </td><td align="left" valign="top"><p>Here we add a new element to the MapType. This element is valid for this map because its default value is a String.</p></td></tr><tr><td align="left" valign="top" width="5%"><a name="simpleEx_txt_setAttribute"></a><a href="#simpleEx_setAttribute"><img border="0" alt="7" src="images/callouts/7.png"></a> </td><td align="left" valign="top"><p>Now we change the value of the newly added attribute. JMX requires that the new value is wrapped in an object of type Attribute which holds both the name and the new value.</p></td></tr></table></div></p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>To make your module known to Heritrix, you need to make mention of it in the appropriate <code class="filename">src/conf/modules</code> file: i.e. if your module is a Processor, it needs to be mentioned in the <code class="filename">Processor.options</code> file. The options files get built into the Heritrix jar. </p></div><p>If everything seems ok so far, then we are almost ready to write some real modules.</p></div></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ar01s05.html">Prev</a> </td><td align="center" width="20%"> </td><td align="right" width="40%"> <a accesskey="n" href="ar01s07.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">5. Settings </td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%"> 7. Some notes on the URI classes</td></tr></table></div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -