bdbfrontier.html

来自「网络爬虫开源代码」· HTML 代码 · 共 622 行 · 第 1/5 页

HTML
622
字号
<A NAME="BdbFrontier(java.lang.String, java.lang.String)"><!-- --></A><H3>BdbFrontier</H3><PRE>public <B>BdbFrontier</B>(java.lang.String&nbsp;name,                   java.lang.String&nbsp;description)</PRE><DL><DD>Create the BdbFrontier<P><DL><DT><B>Parameters:</B><DD><CODE>name</CODE> - <DD><CODE>description</CODE> - </DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="createAlreadyIncluded()"><!-- --></A><H3>createAlreadyIncluded</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>createAlreadyIncluded</B>()                                       throws java.io.IOException</PRE><DL><DD>Create a UriUniqFilter that will serve as record  of already seen URIs.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#createAlreadyIncluded()">createAlreadyIncluded</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>A UURISet that will serve as a record of already seen URIs<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="deserializeAlreadySeen(java.lang.Class, java.io.File)"><!-- --></A><H3>deserializeAlreadySeen</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>deserializeAlreadySeen</B>(java.lang.Class&lt;? extends <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A>&gt;&nbsp;cls,                                               java.io.File&nbsp;dir)                                        throws java.io.FileNotFoundException,                                               java.io.IOException</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.FileNotFoundException</CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="getQueueFor(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</PRE><DL><DD>Return the work queue for the given CrawlURI's classKey. URIs are ordered and politeness-delayed within their 'class'.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(org.archive.crawler.datamodel.CrawlURI)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI to base queue on<DT><B>Returns:</B><DD>the found or created BdbWorkQueue</DL></DD></DL><HR><A NAME="getQueueFor(java.lang.String)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(java.lang.String&nbsp;classKey)</PRE><DL><DD>Return the work queue for the given classKey, or null if no such queue exists.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(java.lang.String)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>classKey</CODE> - key to look for<DT><B>Returns:</B><DD>the found WorkQueue</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"><!-- --></A><H3>getInitialMarker</H3><PRE>public <A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> <B>getInitialMarker</B>(java.lang.String&nbsp;regexpr,                                       boolean&nbsp;inCacheOnly)</PRE><DL><DD><B>Description copied from interface: <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">Frontier</A></CODE></B></DD><DD>Get a <code>URIFrontierMarker</code> initialized with the given regular expression at the 'start' of the Frontier.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">getInitialMarker</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>regexpr</CODE> - The regular expression that URIs within the frontier must                match to be considered within the scope of this marker<DD><CODE>inCacheOnly</CODE> - If set to true, only those URIs within the frontier                that are stored in cache (usually this means in memory                rather then on disk, but that is an implementation                detail) will be considered. Others will be entierly                ignored, as if they dont exist. This is usefull for quick                peeks at the top of the URI list.<DT><B>Returns:</B><DD>A URIFrontierMark

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?