📄 adaptiverevisithostqueue.html
字号:
<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.frontier.AdaptiveRevisitHostQueue.HQSTATE_READY">Constant Field Values</A></DL></DL><HR><A NAME="HQSTATE_BUSY"><!-- --></A><H3>HQSTATE_BUSY</H3><PRE>public static final int <B>HQSTATE_BUSY</B></PRE><DL><DD>HQ has maximum number of CrawlURI currently being processed. This number is either equal to the 'valence' (maximum number of simultanious connections to a host) or (if smaller) the total number of CrawlURIs in the HQ.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.frontier.AdaptiveRevisitHostQueue.HQSTATE_BUSY">Constant Field Values</A></DL></DL><HR><A NAME="HQSTATE_SNOOZED"><!-- --></A><H3>HQSTATE_SNOOZED</H3><PRE>public static final int <B>HQSTATE_SNOOZED</B></PRE><DL><DD>HQ is in a suspended state until it can be woken back up<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.frontier.AdaptiveRevisitHostQueue.HQSTATE_SNOOZED">Constant Field Values</A></DL></DL><HR><A NAME="hostName"><!-- --></A><H3>hostName</H3><PRE>final java.lang.String <B>hostName</B></PRE><DL><DD>Name of the host that this AdaptiveRevisitHostQueue represents<P><DL></DL></DL><HR><A NAME="state"><!-- --></A><H3>state</H3><PRE>int <B>state</B></PRE><DL><DD>Last known state of HQ -- ALL methods should use getState() to read this value, never read it directly.<P><DL></DL></DL><HR><A NAME="nextReadyTime"><!-- --></A><H3>nextReadyTime</H3><PRE>long <B>nextReadyTime</B></PRE><DL><DD>Time (in milliseconds) when the HQ will next be ready to issue a URI for processing. When setting this value, methods should use the setter method <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#setNextReadyTime(long)"><CODE>setNextReadyTime()</CODE></A><P><DL></DL></DL><HR><A NAME="wakeUpTime"><!-- --></A><H3>wakeUpTime</H3><PRE>long[] <B>wakeUpTime</B></PRE><DL><DD>Time (in milliseconds) when each URI 'slot' becomes available again.<p> Any positive value larger then the current time signifies a taken slot where the URI has completed processing but the politness wait has not ended. <p> A zero or positive value smaller then the current time in milliseconds signifies an empty slot.<p> Any negative value signifies a slot for a URI that is being processed. <p> Methods should never write directly to this, rather use the <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#updateWakeUpTimeSlot(long)"><CODE>updateWakeUpTimeSlot()</CODE></A> and <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#useWakeUpTimeSlot()"><CODE>useWakeUpTimeSlot()</CODE></A> methods as needed.<P><DL></DL></DL><HR><A NAME="valence"><!-- --></A><H3>valence</H3><PRE>int <B>valence</B></PRE><DL><DD>Number of simultanious connections permitted to this host. I.e. this many URIs can be issued before state of HQ becomes busy until one of them is returned via the update method.<P><DL></DL></DL><HR><A NAME="size"><!-- --></A><H3>size</H3><PRE>long <B>size</B></PRE><DL><DD>Size of queue. That is, the number of CrawlURIs that have been added to it, including any that are currently being processed.<P><DL></DL></DL><HR><A NAME="inProcessing"><!-- --></A><H3>inProcessing</H3><PRE>long <B>inProcessing</B></PRE><DL><DD>Number of URIs belonging to this queue that are being processed at the moment. This number will always be in the range of 0 - valence<P><DL></DL></DL><HR><A NAME="substats"><!-- --></A><H3>substats</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/CrawlSubstats.html" title="class in org.archive.crawler.datamodel">CrawlSubstats</A> <B>substats</B></PRE><DL><DL></DL></DL><HR><A NAME="primaryUriDB"><!-- --></A><H3>primaryUriDB</H3><PRE>protected com.sleepycat.je.Database <B>primaryUriDB</B></PRE><DL><DD>Database containing the URI priority queue, indexed by the the URI string.<P><DL></DL></DL><HR><A NAME="secondaryUriDB"><!-- --></A><H3>secondaryUriDB</H3><PRE>protected com.sleepycat.je.SecondaryDatabase <B>secondaryUriDB</B></PRE><DL><DD>Secondary index into <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#primaryUriDB"><CODE>the primary DB</CODE></A>, URIs indexed by the time when they can next be processed again.<P><DL></DL></DL><HR><A NAME="processingUriDB"><!-- --></A><H3>processingUriDB</H3><PRE>protected com.sleepycat.je.Database <B>processingUriDB</B></PRE><DL><DD>A database containing those URIs that are currently being processed.<P><DL></DL></DL><HR><A NAME="classCatalog"><!-- --></A><H3>classCatalog</H3><PRE>protected com.sleepycat.bind.serial.StoredClassCatalog <B>classCatalog</B></PRE><DL><DD>For BDB serialization of objects<P><DL></DL></DL><HR><A NAME="primaryKeyBinding"><!-- --></A><H3>primaryKeyBinding</H3><PRE>protected com.sleepycat.bind.EntryBinding <B>primaryKeyBinding</B></PRE><DL><DD>A binding for the serialization of the primary key (URI string)<P><DL></DL></DL><HR><A NAME="crawlURIBinding"><!-- --></A><H3>crawlURIBinding</H3><PRE>protected com.sleepycat.bind.EntryBinding <B>crawlURIBinding</B></PRE><DL><DD>A binding for the CrawlURIARWrapper object<P><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="AdaptiveRevisitHostQueue(java.lang.String, com.sleepycat.je.Environment, com.sleepycat.bind.serial.StoredClassCatalog, int)"><!-- --></A><H3>AdaptiveRevisitHostQueue</H3><PRE>public <B>AdaptiveRevisitHostQueue</B>(java.lang.String hostName, com.sleepycat.je.Environment env, com.sleepycat.bind.serial.StoredClassCatalog catalog, int valence) throws java.io.IOException</PRE><DL><DD>Constructor<P><DL><DT><B>Parameters:</B><DD><CODE>hostName</CODE> - Name of the host this queue represents. This name must be unique for all HQs in the same Environment.<DD><CODE>env</CODE> - Berkeley DB Environment. All BDB databases created will use it.<DD><CODE>catalog</CODE> - Db for bdb class serialization.<DD><CODE>valence</CODE> - The total number of simultanous URIs that the HQ can issue for processing. Once this many URIs have been issued for processing, the HQ will go into <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#HQSTATE_BUSY"><CODE>busy</CODE></A> state until at least one of the URI is <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#update(org.archive.crawler.datamodel.CrawlURI, boolean, long)"><CODE>updated</CODE></A>. Value should be larger then zero. Zero and negative values will be treated same as 1.<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE> - if an error occurs opening/creating the database</DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="getHostName()"><!-- --></A><H3>getHostName</H3><PRE>public java.lang.String <B>getHostName</B>()</PRE><DL><DD>Returns the HQ's name<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the HQ's name</DL></DD></DL><HR><A NAME="add(org.archive.crawler.datamodel.CrawlURI, boolean)"><!-- --></A><H3>add</H3><PRE>public void <B>add</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, boolean overrideSetTimeOnDups) throws java.io.IOException</PRE>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -