bdbfrontier.html
来自「网络爬虫开源代码」· HTML 代码 · 共 622 行 · 第 1/5 页
HTML
622 行
<A NAME="BdbFrontier(java.lang.String, java.lang.String)"><!-- --></A><H3>BdbFrontier</H3><PRE>public <B>BdbFrontier</B>(java.lang.String name, java.lang.String description)</PRE><DL><DD>Create the BdbFrontier<P><DL><DT><B>Parameters:</B><DD><CODE>name</CODE> - <DD><CODE>description</CODE> - </DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="createAlreadyIncluded()"><!-- --></A><H3>createAlreadyIncluded</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>createAlreadyIncluded</B>() throws java.io.IOException</PRE><DL><DD>Create a UriUniqFilter that will serve as record of already seen URIs.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#createAlreadyIncluded()">createAlreadyIncluded</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>A UURISet that will serve as a record of already seen URIs<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="deserializeAlreadySeen(java.lang.Class, java.io.File)"><!-- --></A><H3>deserializeAlreadySeen</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>deserializeAlreadySeen</B>(java.lang.Class<? extends <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A>> cls, java.io.File dir) throws java.io.FileNotFoundException, java.io.IOException</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.FileNotFoundException</CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="getQueueFor(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</PRE><DL><DD>Return the work queue for the given CrawlURI's classKey. URIs are ordered and politeness-delayed within their 'class'.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(org.archive.crawler.datamodel.CrawlURI)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI to base queue on<DT><B>Returns:</B><DD>the found or created BdbWorkQueue</DL></DD></DL><HR><A NAME="getQueueFor(java.lang.String)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(java.lang.String classKey)</PRE><DL><DD>Return the work queue for the given classKey, or null if no such queue exists.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(java.lang.String)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>classKey</CODE> - key to look for<DT><B>Returns:</B><DD>the found WorkQueue</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"><!-- --></A><H3>getInitialMarker</H3><PRE>public <A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> <B>getInitialMarker</B>(java.lang.String regexpr, boolean inCacheOnly)</PRE><DL><DD><B>Description copied from interface: <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">Frontier</A></CODE></B></DD><DD>Get a <code>URIFrontierMarker</code> initialized with the given regular expression at the 'start' of the Frontier.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">getInitialMarker</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>regexpr</CODE> - The regular expression that URIs within the frontier must match to be considered within the scope of this marker<DD><CODE>inCacheOnly</CODE> - If set to true, only those URIs within the frontier that are stored in cache (usually this means in memory rather then on disk, but that is an implementation detail) will be considered. Others will be entierly ignored, as if they dont exist. This is usefull for quick peeks at the top of the URI list.<DT><B>Returns:</B><DD>A URIFrontierMark
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?