📄 adaptiverevisithostqueue.html
字号:
<DL><DD>Add a CrawlURI to this host queue. <p> Calls can optionally chose to have the time of next processing value override existing values for the URI if the existing values are 'later' then the new ones.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - The CrawlURI to add.<DD><CODE>overrideSetTimeOnDups</CODE> - If true then the time of next processing for the supplied URI will override the any existing time for it already stored in the HQ. If false, then no changes will be made to any existing values of the URI. Note: Will never override with a later time.<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE> - When an error occurs accessing the database</DL></DD></DL><HR><A NAME="strictAdd(org.archive.crawler.datamodel.CrawlURI, boolean)"><!-- --></A><H3>strictAdd</H3><PRE>protected com.sleepycat.je.OperationStatus <B>strictAdd</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, boolean overrideDuplicates) throws com.sleepycat.je.DatabaseException</PRE><DL><DD>An internal method for adding URIs to the queue.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - The CrawlURI to add<DD><CODE>overrideDuplicates</CODE> - If true then any existing CrawlURI in the DB will be overwritten. If false insert into the queue is only performed if the key doesn't already exist.<DT><B>Returns:</B><DD>The OperationStatus object returned by the put method.<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE></DL></DD></DL><HR><A NAME="flushProcessingURIs()"><!-- --></A><H3>flushProcessingURIs</H3><PRE>protected void <B>flushProcessingURIs</B>() throws com.sleepycat.je.DatabaseException</PRE><DL><DD>Flush any CrawlURIs in the processingUriDB into the primaryUriDB. URIs flushed will have their 'time of next fetch' maintained and the nextReadyTime will be updated if needed. <p> No change is made to the list of available slots.<P><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE> - if one occurs while flushing</DL></DD></DL><HR><A NAME="countCrawlURIs()"><!-- --></A><H3>countCrawlURIs</H3><PRE>protected long <B>countCrawlURIs</B>() throws com.sleepycat.je.DatabaseException</PRE><DL><DD>Count all entries in both primaryUriDB and processingUriDB. <p> This method is needed since BDB does not provide a simple way of counting entries. <p> Note: This is an expensive operation, requires a loop through the entire queue!<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the number of distinct CrawlURIs in the HQ.<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE></DL></DD></DL><HR><A NAME="inProcessing(java.lang.String)"><!-- --></A><H3>inProcessing</H3><PRE>protected boolean <B>inProcessing</B>(java.lang.String uri) throws com.sleepycat.je.DatabaseException</PRE><DL><DD>Returns true if this HQ has a CrawlURI matching the uri string currently being processed. False otherwise.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>uri</CODE> - Uri to check<DT><B>Returns:</B><DD>true if this HQ has a CrawlURI matching the uri string currently being processed. False otherwise.<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE></DL></DD></DL><HR><A NAME="deleteInProcessing(java.lang.String)"><!-- --></A><H3>deleteInProcessing</H3><PRE>protected void <B>deleteInProcessing</B>(java.lang.String uri) throws com.sleepycat.je.DatabaseException</PRE><DL><DD>Removes a URI from the list of URIs belonging to this HQ and are currently being processed. <p> Returns true if successful, false if the URI was not found.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>uri</CODE> - The URI string of the CrawlURI to delete.<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE><DD><CODE>java.lang.IllegalStateException</CODE> - if the URI was not on the list</DL></DD></DL><HR><A NAME="addInProcessing(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>addInProcessing</H3><PRE>protected void <B>addInProcessing</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi) throws com.sleepycat.je.DatabaseException, java.lang.IllegalStateException</PRE><DL><DD>Adds a CrawlURI to the list of CrawlURIs belonging to this HQ and are being processed at the moment.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - The CrawlURI to add to the list<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE><DD><CODE>java.lang.IllegalStateException</CODE> - if the CrawlURI is already in the list of URIs being processed.</DL></DD></DL><HR><A NAME="getCrawlURI(java.lang.String)"><!-- --></A><H3>getCrawlURI</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>getCrawlURI</B>(java.lang.String uri) throws com.sleepycat.je.DatabaseException</PRE><DL><DD>Returns the CrawlURI associated with the specified URI (string) or null if no such CrawlURI is queued in this HQ. If CrawlURI is being processed it is not considered to be <i>queued </i> and this method will return null for any such URIs.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>uri</CODE> - A string representing the URI<DT><B>Returns:</B><DD>the CrawlURI associated with the specified URI (string) or null if no such CrawlURI is queued in this HQ.<DT><B>Throws:</B><DD><CODE>com.sleepycat.je.DatabaseException</CODE> - if a errors occurs reading the database</DL></DD></DL><HR><A NAME="update(org.archive.crawler.datamodel.CrawlURI, boolean, long)"><!-- --></A><H3>update</H3><PRE>public void <B>update</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, boolean needWait, long wakeupTime) throws java.lang.IllegalStateException, java.io.IOException</PRE><DL><DD>Update CrawlURI that has completed processing.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - The CrawlURI. This must be a CrawlURI issued by this HQ's <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#next()"><CODE>next()</CODE></A> method.<DD><CODE>needWait</CODE> - If true then the URI was processed successfully, requiring a period of suspended action on that host. If valence is > 1 then seperate times are maintained for each slot.<DD><CODE>wakeupTime</CODE> - If new state is <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#HQSTATE_SNOOZED"><CODE>snoozed</CODE></A> then this parameter should contain the time (in milliseconds) when it will be safe to wake the HQ up again. Otherwise this parameter will be ignored.<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if the CrawlURI does not match a CrawlURI issued for crawling by this HQ's <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#next()"><CODE>next()</CODE></A>.<DD><CODE>java.io.IOException</CODE> - if an error occurs accessing the database</DL></DD></DL><HR><A NAME="update(org.archive.crawler.datamodel.CrawlURI, boolean, long, boolean)"><!-- --></A><H3>update</H3><PRE>public void <B>update</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, boolean needWait, long wakeupTime, boolean forgetURI) throws java.lang.IllegalStateException, java.io.IOException</PRE><DL><DD>Update CrawlURI that has completed processing.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - The CrawlURI. This must be a CrawlURI issued by this HQ's <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#next()"><CODE>next()</CODE></A> method.<DD><CODE>needWait</CODE> - If true then the URI was processed successfully, requiring a period of suspended action on that host. If valence is > 1 then seperate times are maintained for each slot.<DD><CODE>wakeupTime</CODE> - If new state is <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#HQSTATE_SNOOZED"><CODE>snoozed</CODE></A> then this parameter should contain the time (in milliseconds) when it will be safe to wake the HQ up again. Otherwise this parameter will be ignored.<DD><CODE>forgetURI</CODE> - If true, the URI will be deleted from the queue.<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if the CrawlURI does not match a CrawlURI issued for crawling by this HQ's <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#next()"><CODE>next()</CODE></A>.<DD><CODE>java.io.IOException</CODE> - if an error occurs accessing the database</DL></DD></DL><HR><A NAME="next()"><!-- --></A><H3>next</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>next</B>() throws java.lang.IllegalStateException, java.io.IOException</PRE><DL><DD>Returns the 'top' URI in the AdaptiveRevisitHostQueue. <p> HQ state will be set to <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#HQSTATE_BUSY"><CODE>busy</CODE></A> if this method returns normally.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>a CrawlURI ready for processing<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if the HostQueues current state is not ready <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitHostQueue.html#HQSTATE_READY"><CODE>ready</CODE></A><DD><CODE>java.io.IOException</CODE> - if an error occurs reading from the database</DL></DD></DL><HR><A NAME="peek()"><!-- --></A><H3>peek</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>peek</B>() throws java.lang.IllegalStateException,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -