📄 package-summary.html
字号:
</TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/MapType.html" title="class in org.archive.crawler.settings">MapType</A></B></TD><TD>This class represents a container of settings.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/ModuleAttributeInfo.html" title="class in org.archive.crawler.settings">ModuleAttributeInfo</A></B></TD><TD> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/ModuleType.html" title="class in org.archive.crawler.settings">ModuleType</A></B></TD><TD>Superclass of all modules that should be configurable.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/RegularExpressionConstraint.html" title="class in org.archive.crawler.settings">RegularExpressionConstraint</A></B></TD><TD>A constraint that checks that a value matches a regular expression.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SettingsCache.html" title="class in org.archive.crawler.settings">SettingsCache</A></B></TD><TD>This class keeps a map of host names to settings objects.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SettingsFrameworkTestCase.html" title="class in org.archive.crawler.settings">SettingsFrameworkTestCase</A></B></TD><TD>Set up a couple of settings to test different functions of the settings framework.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SettingsHandler.html" title="class in org.archive.crawler.settings">SettingsHandler</A></B></TD><TD>An instance of this class holds a hierarchy of settings.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SimpleType.html" title="class in org.archive.crawler.settings">SimpleType</A></B></TD><TD>A type that holds a Java type.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SoftSettingsHash.html" title="class in org.archive.crawler.settings">SoftSettingsHash</A></B></TD><TD> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/SoftSettingsHash.SettingsEntry.html" title="class in org.archive.crawler.settings">SoftSettingsHash.SettingsEntry</A></B></TD><TD>The entries in this hash extend SoftReference, using the host string as the key.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/StringList.html" title="class in org.archive.crawler.settings">StringList</A></B></TD><TD>List of String values.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/TextField.html" title="class in org.archive.crawler.settings">TextField</A></B></TD><TD>Class to hold values for text fields.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/Type.html" title="class in org.archive.crawler.settings">Type</A></B></TD><TD>Interface implemented by all element types.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../org/archive/crawler/settings/XMLSettingsHandler.html" title="class in org.archive.crawler.settings">XMLSettingsHandler</A></B></TD><TD>A SettingsHandler which uses XML files as persistent storage.</TD></TR></TABLE> <P><A NAME="package_description"><!-- --></A><H2>Package org.archive.crawler.settings Description</H2><P>Provides classes for the settings framework.<p>The settings framework is designed to be a flexible way to configure a crawlwith special treatment for subparts of the web without adding to muchperformance overhead.<p>At it's core the settings framework is a way to keep persistent, contextsensitive configuration settings for any class in the crawler.<p>All classes in the crawler that has configurable settings subclasses<A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A> or one of its descendants. The <A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A> implements the<CODE>DynamicMBean</CODE> interface. This gives you a way to ask the objectfor what attributes it supports and standard methods for getting and settingthese attributes.<p>The entry point into the settings framework is the <A HREF="../../../../org/archive/crawler/settings/SettingsHandler.html" title="class in org.archive.crawler.settings"><CODE>SettingsHandler</CODE></A>. This classis responsible for loading and saving from persistent storage and forinterconnecting the different parts of the framework.<p><img src='doc-files/settings1.png'/><br>Figure 1. Schematic view of the Settings Framework<p><h2>Settings hierarchy</h2>The settings framework supports a hierarchy of settings. This hierarchy isbuilt by <A HREF="../../../../org/archive/crawler/settings/CrawlerSettings.html" title="class in org.archive.crawler.settings"><CODE>CrawlerSettings</CODE></A> objects. On the top there is a settings objectrepresenting the global settings. This consist of all the settings that a crawljob needs for running. Beneath this global object there is one "per" settingsobject for each host/domain which has settings that should override the orderfor that particular host or domain.<p>When the settings framework is asked for an attribute for a specific host, itwill first try to see if this attribute is set for this particular host. If itis, the value will be returned. If not, it will go up one level recursivelyuntil it eventually reach the order object and returns the global value. If novalue is set here either (normally it would be), a hard coded default value isreturned.<p>All per domain/host settings objects only contain those settings which are tobe overridden for that particular domain/host. The convention is to name thetop level object "global settings" and the objects beneath "per settings" or"overrides" (although the refinements described next, also do overriding).<p>To further complicate the picture, there is also settings objects calledrefinements. An object of this type belongs to a global or per settings objectand overrides the settings in it's owners object if some criteria is met. Thesecriteria could be that the URI in question conforms to a regular expression orthat it the settings are consulted at a specific time of day limited by a timespan.<p><h2>ComplexType hierarchy</h2>All the configurable modules in the crawler subclasses <A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A> or one ofits descendants. The <A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A> is responsible for keeping the definition ofthe configurable attributes of the module. The actual values are stored in aninstance of <A HREF="../../../../org/archive/crawler/settings/DataContainer.html" title="class in org.archive.crawler.settings"><CODE>DataContainer</CODE></A>. The <A HREF="../../../../org/archive/crawler/settings/DataContainer.html" title="class in org.archive.crawler.settings"><CODE>DataContainer</CODE></A> is never accessed directly fromuser code. Instead the user accesses the attributes through methods in the<A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A>. The attributes are accessed in different ways depending if it isfrom the user interface or from inside a running crawl.<p>When an attribute is accessed from the URI (either reading or writing) you wantto make sure that you are editing the attribute in the right context. Whentrying to override an attribute, you don't want the settings framework totraverse up to effective value for the attribute, but instead want to know thatthe attribute is not set on this level. To achieve this, there is<A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getLocalAttribute(org.archive.crawler.settings.CrawlerSettings, java.lang.String)"><CODE>ComplexType.getLocalAttribute(CrawlerSettings settings, String name)</CODE></A> and<A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAttribute(org.archive.crawler.settings.CrawlerSettings, javax.management.Attribute)"><CODE>ComplexType.setAttribute(CrawlerSettings settings, Attribute attribute)</CODE></A> methods taking asettings object as a parameter. These methods works only on the suppliedsettings object. In addition the methods <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String)"><CODE>ComplexType.getAttribute(String)</CODE></A> and<A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAttribute(javax.management.Attribute)"><CODE>ComplexType.setAttribute(Attribute attribute)</CODE></A> is there for conformance to the Java JMXspecification. The latter two always works on the global settings object.<p>Getting an attribute within a crawl is different in that you always want to geta value even if it is not set in it's context. That means that the settingsframework should work its way up the settings hierarchy to find the value ineffect for the context. The method <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String, org.archive.crawler.datamodel.CrawlURI)"><CODE>ComplexType.getAttribute(String name, CrawlURI uri)</CODE></A>should be used to make sure that the right context is used. Figure 2 showshow the settings framework finds the effective value given a context.<p><img src='doc-files/settings2.png'/><br>Figure 2. Flow of getting an attribute<p>The different attributes has a type. The allowed type all subclasses the <A HREF="../../../../org/archive/crawler/settings/Type.html" title="class in org.archive.crawler.settings"><CODE>Type</CODE></A>class. There are tree main Types:<ol> <li><A HREF="../../../../org/archive/crawler/settings/SimpleType.html" title="class in org.archive.crawler.settings"><CODE>SimpleType</CODE></A></li> <li><A HREF="../../../../org/archive/crawler/settings/ListType.html" title="class in org.archive.crawler.settings"><CODE>ListType</CODE></A></li> <li><A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings"><CODE>ComplexType</CODE></A></li></ol>Except for the <A HREF="../../../../org/archive/crawler/settings/SimpleType.html" title="class in org.archive.crawler.settings"><CODE>SimpleType</CODE></A>, the actual type used will be a subclass of one ofthese main types.<h3>SimpleType</h3>The <A HREF="../../../../org/archive/crawler/settings/SimpleType.html" title="class in org.archive.crawler.settings"><CODE>SimpleType</CODE></A> is mainly for representing Java鈩
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -