⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 frontier.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 4 页
字号:
<TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#finishedUriCount()">finishedUriCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of URIs that have <i>finished</i> processing.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getClassKey(org.archive.crawler.datamodel.CandidateURI)">getClassKey</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A>&nbsp;cauri)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../org/archive/crawler/frontier/FrontierJournal.html" title="interface in org.archive.crawler.frontier">FrontierJournal</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getFrontierJournal()">getFrontierJournal</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../org/archive/crawler/framework/Frontier.FrontierGroup.html" title="interface in org.archive.crawler.framework">Frontier.FrontierGroup</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getGroup(org.archive.crawler.datamodel.CrawlURI)">getGroup</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the 'frontier group' (usually queue) for the given  CrawlURI.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">getInitialMarker</A></B>(java.lang.String&nbsp;regexpr,                 boolean&nbsp;inCacheOnly)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get a <code>URIFrontierMarker</code> initialized with the given regular expression at the 'start' of the Frontier.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.util.ArrayList</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getURIsList(org.archive.crawler.framework.FrontierMarker, int, boolean)">getURIsList</A></B>(<A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A>&nbsp;marker,            int&nbsp;numberOfMatches,            boolean&nbsp;verbose)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns a list of all uncrawled URIs starting from a specified marker until <code>numberOfMatches</code> is reached.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#importRecoverLog(java.lang.String, boolean)">importRecoverLog</A></B>(java.lang.String&nbsp;pathToLog,                 boolean&nbsp;retainFailures)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Recover earlier state by reading a recovery log.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#initialize(org.archive.crawler.framework.CrawlController)">initialize</A></B>(<A HREF="../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework">CrawlController</A>&nbsp;c)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Initialize the Frontier.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#isEmpty()">isEmpty</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns true if the frontier contains no more URIs to crawl.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#kickUpdate()">kickUpdate</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Notify Frontier that it should consider updating configuration info that may have changed in external files.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#loadSeeds()">loadSeeds</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Request that the Frontier load (or reload) crawl seeds,  typically by contacting the Scope.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#next()">next</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the next URI that should be processed.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#pause()">pause</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Notify Frontier that it should not release any URIs, instead holding all threads, until instructed otherwise.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#queuedUriCount()">queuedUriCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of URIs <i>queued</i> up and waiting for processing.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#schedule(org.archive.crawler.datamodel.CandidateURI)">schedule</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A>&nbsp;caURI)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Schedules a CandidateURI.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#start()">start</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Request that Frontier allow crawling to begin.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#succeededFetchCount()">succeededFetchCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of <i>successfully</i> processed URIs.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#terminate()">terminate</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Notify Frontier that it should end the crawl, giving any worker ToeThread that askss for a next() an  EndedException.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#totalBytesWritten()">totalBytesWritten</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Total number of bytes contained in all URIs that have been processed.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#unpause()">unpause</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Resumes the release of URIs to crawl, allowing worker ToeThreads to proceed.</TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_org.archive.util.Reporter"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from interface org.archive.util.<A HREF="../../../../org/archive/util/Reporter.html" title="interface in org.archive.util">Reporter</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/util/Reporter.html#getReports()">getReports</A>, <A HREF="../../../../org/archive/util/Reporter.html#reportTo(java.io.PrintWriter)">reportTo</A>, <A HREF="../../../../org/archive/util/Reporter.html#reportTo(java.lang.String, java.io.PrintWriter)">reportTo</A>, <A HREF="../../../../org/archive/util/Reporter.html#singleLineLegend()">singleLineLegend</A>, <A HREF="../../../../org/archive/util/Reporter.html#singleLineReport()">singleLineReport</A>, <A HREF="../../../../org/archive/util/Reporter.html#singleLineReportTo(java.io.PrintWriter)">singleLineReportTo</A></CODE></TD></TR></TABLE>&nbsp;<P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="ATTR_NAME"><!-- --></A><H3>ATTR_NAME</H3><PRE>static final java.lang.String <B>ATTR_NAME</B></PRE><DL><DD>All URI Frontiers should have the same 'name' attribute. This constant defines that name. This is a name used to reference the Frontier being used in a given crawl order and since there can only be one Frontier per crawl order a fixed, unique name for Frontiers is optimal.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../org/archive/crawler/settings/ModuleType.html#ModuleType(java.lang.String)"><CODE>ModuleType.ModuleType(String)</CODE></A>, <A HREF="../../../../constant-values.html#org.archive.crawler.framework.Frontier.ATTR_NAME">Constant Field Values</A></DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="initialize(org.archive.crawler.framework.CrawlController)"><!-- --></A><H3>initialize</H3><PRE>void <B>initialize</B>(<A HREF="../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework">CrawlController</A>&nbsp;c)                throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A>,                       java.io.IOException</PRE><DL><DD>Initialize the Frontier. <p> This method is invoked by the CrawlController once it has created the Frontier. The constructor of the Frontier should only contain code for setting up it's settings framework. This method should contain all other 'startup' code.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>c</CODE> - The CrawlController that created the Frontier.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If provided settings are illegal or            otherwise unusable.<DD><CODE>java.io.IOException</CODE> - If there is a problem reading settings or seeds file            from disk.</DL></DD></DL><HR><A NAME="next()"><!-- --></A><H3>next</H3><PRE><A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>next</B>()              throws java.lang.InterruptedException,                     <A HREF="../../../../org/archive/crawler/framework/exceptions/EndedException.html" title="class in org.archive.crawler.framework.exceptions">EndedException</A></PRE><DL><DD>Get the next URI that should be processed. If no URI becomes availible during the time specified null will be returned.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the next URI that should be processed.<DT><B>Throws:</B><DD><CODE>java.lang.InterruptedException</CODE><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/EndedException.html" title="class in org.archive.crawler.framework.exceptions">EndedException</A></CODE></DL></DD></DL><HR><A NAME="isEmpty()"><!-- --></A><H3>isEmpty</H3><PRE>boolean <B>isEmpty</B>()</PRE><DL><DD>Returns true if the frontier contains no more URIs to crawl. <p>That is to say that there are no more URIs either currently availible (ready to be emitted), URIs belonging to deferred hosts or pending URIs in the Frontier. Thus this method may return false even if there is no currently availible URI.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if the frontier contains no more URIs to crawl.</DL></DD></DL><HR><A NAME="schedule(org.archive.crawler.datamodel.CandidateURI)"><!-- --></A><H3>schedule</H3><PRE>void <B>schedule</B>(<A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A>&nbsp;caURI)</PRE><DL><DD>Schedules a CandidateURI. <p>This method accepts one URI and schedules it immediately. This has nothing to do with the priority of the URI being scheduled. Only that it will be placed in it's respective queue at once. For priority scheduling see <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html#setSchedulingDirective(int)"><CODE>CandidateURI.setSchedulingDirective(int)</CODE></A> <p>This method should be synchronized in all implementing classes.<P>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -