⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 frontier.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 4 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_06) on Wed Sep 27 16:03:05 PDT 2006 --><TITLE>Frontier (Heritrix 1.10.1)</TITLE><META NAME="keywords" CONTENT="org.archive.crawler.framework.Frontier interface"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){    parent.document.title="Frontier (Heritrix 1.10.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/Frontier.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../../org/archive/crawler/framework/Filter.html" title="class in org.archive.crawler.framework"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../../org/archive/crawler/framework/Frontier.FrontierGroup.html" title="interface in org.archive.crawler.framework"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../index.html?org/archive/crawler/framework/Frontier.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="Frontier.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;<A HREF="#nested_class_summary">NESTED</A>&nbsp;|&nbsp;<A HREF="#field_summary">FIELD</A>&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;<A HREF="#field_detail">FIELD</A>&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.crawler.framework</FONT><BR>Interface Frontier</H2><DL><DT><B>All Superinterfaces:</B> <DD><A HREF="../../../../org/archive/util/Reporter.html" title="interface in org.archive.util">Reporter</A></DD></DL><DL><DT><B>All Known Implementing Classes:</B> <DD><A HREF="../../../../org/archive/crawler/frontier/AbstractFrontier.html" title="class in org.archive.crawler.frontier">AbstractFrontier</A>, <A HREF="../../../../org/archive/crawler/frontier/AdaptiveRevisitFrontier.html" title="class in org.archive.crawler.frontier">AdaptiveRevisitFrontier</A>, <A HREF="../../../../org/archive/crawler/frontier/BdbFrontier.html" title="class in org.archive.crawler.frontier">BdbFrontier</A>, <A HREF="../../../../org/archive/crawler/frontier/DomainSensitiveFrontier.html" title="class in org.archive.crawler.frontier">DomainSensitiveFrontier</A>, <A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></DD></DL><HR><DL><DT><PRE>public interface <B>Frontier</B><DT>extends <A HREF="../../../../org/archive/util/Reporter.html" title="interface in org.archive.util">Reporter</A></DL></PRE><P>An interface for URI Frontiers. <p>A URI Frontier is a pluggable module in Heritrix that maintains the internal state of the crawl. This includes (but is not limited to): <ul>     <li>What URIs have been discovered     <li>What URIs are being processed (fetched)     <li>What URIs have been processed     <li>In what order unprocessed URIs will be processed </ul> <p>The Frontier is also responsible for enforcing any politeness restrictions that may have been applied to the crawl. Such as limiting simultaneous connection to the same host, server or IP number to 1 (or any other fixed amount), delays between connections etc. <p>A URIFrontier is created by the <A HREF="../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework"><CODE>CrawlController</CODE></A> which is in turn responsible for providing access to it. Most significant among those modules interested in the Frontier are the <A HREF="../../../../org/archive/crawler/framework/ToeThread.html" title="class in org.archive.crawler.framework"><CODE>ToeThreads</CODE></A> who perform the actual work of processing a URI. <p>The methods defined in this interface are those required to get URIs for processing, report the results of processing back (ToeThreads) and to get access to various statistical data along the way. The statistical data is of interest to <A HREF="../../../../org/archive/crawler/framework/StatisticsTracking.html" title="interface in org.archive.crawler.framework"><CODE>Statistics Tracking</CODE></A> modules. A couple of additional methods are provided to be able to inspect and manipulate the Frontier at runtime. <p>The statistical data exposed by this interface is: <ul>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#discoveredUriCount()"><CODE>Discovered URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#queuedUriCount()"><CODE>Queued URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#finishedUriCount()"><CODE>Finished URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#succeededFetchCount()"><CODE>Successfully processed URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#failedFetchCount()"><CODE>Failed to process URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#disregardedUriCount()"><CODE>Disregarded URIs</CODE></A>     <li> <A HREF="../../../../org/archive/crawler/framework/Frontier.html#totalBytesWritten()"><CODE>Total bytes written</CODE></A> </ul> <p>In addition the frontier may optionally implement an interface that exposes information about hosts. <p>Furthermore any implementation of the URI Frontier should trigger <A HREF="../../../../org/archive/crawler/event/CrawlURIDispositionListener.html" title="interface in org.archive.crawler.event"><CODE>CrawlURIDispostionEvents</CODE></A> by invoking the proper methods on the <A HREF="../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework"><CODE>CrawlController</CODE></A>. Doing this allows a custom built <A HREF="../../../../org/archive/crawler/framework/StatisticsTracking.html" title="interface in org.archive.crawler.framework"><CODE>Statistics Tracking</CODE></A> module to gather any other additional data it might be interested in by examining the completed URIs. <p>All URI Frontiers inherit from <A HREF="../../../../org/archive/crawler/settings/ModuleType.html" title="class in org.archive.crawler.settings"><CODE>ModuleType</CODE></A> and therefore creating settings follows the usual pattern of pluggable modules in Heritrix.<P><P><DL><DT><B>Author:</B></DT>  <DD>Gordon Mohr, Kristinn Sigurdsson</DD><DT><B>See Also:</B><DD><A HREF="../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework"><CODE>CrawlController</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/CrawlController.html#fireCrawledURIDisregardEvent(org.archive.crawler.datamodel.CrawlURI)"><CODE>CrawlController.fireCrawledURIDisregardEvent(CrawlURI)</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/CrawlController.html#fireCrawledURIFailureEvent(org.archive.crawler.datamodel.CrawlURI)"><CODE>CrawlController.fireCrawledURIFailureEvent(CrawlURI)</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/CrawlController.html#fireCrawledURINeedRetryEvent(org.archive.crawler.datamodel.CrawlURI)"><CODE>CrawlController.fireCrawledURINeedRetryEvent(CrawlURI)</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/CrawlController.html#fireCrawledURISuccessfulEvent(org.archive.crawler.datamodel.CrawlURI)"><CODE>CrawlController.fireCrawledURISuccessfulEvent(CrawlURI)</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/StatisticsTracking.html" title="interface in org.archive.crawler.framework"><CODE>StatisticsTracking</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/ToeThread.html" title="class in org.archive.crawler.framework"><CODE>ToeThread</CODE></A>, <A HREF="../../../../org/archive/crawler/framework/FrontierHostStatistics.html" title="interface in org.archive.crawler.framework"><CODE>FrontierHostStatistics</CODE></A>, <A HREF="../../../../org/archive/crawler/settings/ModuleType.html" title="class in org.archive.crawler.settings"><CODE>ModuleType</CODE></A></DL><HR><P><!-- ======== NESTED CLASS SUMMARY ======== --><A NAME="nested_class_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Nested Class Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;interface</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.FrontierGroup.html" title="interface in org.archive.crawler.framework">Frontier.FrontierGroup</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Generic interface representing the internal groupings  of a Frontier's URIs -- usually queues.</TD></TR></TABLE>&nbsp;<!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#ATTR_NAME">ATTR_NAME</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;All URI Frontiers should have the same 'name' attribute.</TD></TR></TABLE>&nbsp;<!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#averageDepth()">averageDepth</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;float</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#congestionRatio()">congestionRatio</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#considerIncluded(org.archive.net.UURI)">considerIncluded</A></B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;u)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Notify Frontier that it should consider the given UURI as if already scheduled.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#deepestUri()">deepestUri</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#deleted(org.archive.crawler.datamodel.CrawlURI)">deleted</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Notify Frontier that a CrawlURI has been deleted outside of the normal next()/finished() lifecycle.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#deleteURIs(java.lang.String)">deleteURIs</A></B>(java.lang.String&nbsp;match)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete any URI that matches the given regular expression from the list of discovered and pending URIs.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#discoveredUriCount()">discoveredUriCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of <i>discovered</i> URIs.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#disregardedUriCount()">disregardedUriCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of URIs that were scheduled at one point but have been <i>disregarded</i>.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#failedFetchCount()">failedFetchCount</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Number of URIs that <i>failed</i> to process.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/framework/Frontier.html#finished(org.archive.crawler.datamodel.CrawlURI)">finished</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;cURI)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Report a URI being processed as having finished processing.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;long</CODE></FONT></TD>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -