📄 bdbfrontier.html
字号:
<TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="createAlreadyIncluded()"><!-- --></A><H3>createAlreadyIncluded</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>createAlreadyIncluded</B>() throws java.io.IOException</PRE><DL><DD>Create a UriUniqFilter that will serve as record of already seen URIs.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#createAlreadyIncluded()">createAlreadyIncluded</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>A UURISet that will serve as a record of already seen URIs<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="deserializeAlreadySeen(java.lang.Class, java.io.File)"><!-- --></A><H3>deserializeAlreadySeen</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>deserializeAlreadySeen</B>(java.lang.Class cls, java.io.File dir) throws java.io.FileNotFoundException, java.io.IOException</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.FileNotFoundException</CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="getQueueFor(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</PRE><DL><DD>Return the work queue for the given CrawlURI's classKey. URIs are ordered and politeness-delayed within their 'class'.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(org.archive.crawler.datamodel.CrawlURI)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI to base queue on<DT><B>Returns:</B><DD>the found or created BdbWorkQueue</DL></DD></DL><HR><A NAME="getQueueFor(java.lang.String)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(java.lang.String classKey)</PRE><DL><DD>Return the work queue for the given classKey, or null if no such queue exists.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(java.lang.String)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>classKey</CODE> - key to look for<DT><B>Returns:</B><DD>the found WorkQueue</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"><!-- --></A><H3>getInitialMarker</H3><PRE>public <A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> <B>getInitialMarker</B>(java.lang.String regexpr, boolean inCacheOnly)</PRE><DL><DD><B>Description copied from interface: <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">Frontier</A></CODE></B></DD><DD>Get a <code>URIFrontierMarker</code> initialized with the given regular expression at the 'start' of the Frontier.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">getInitialMarker</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>regexpr</CODE> - The regular expression that URIs within the frontier must match to be considered within the scope of this marker<DD><CODE>inCacheOnly</CODE> - If set to true, only those URIs within the frontier that are stored in cache (usually this means in memory rather then on disk, but that is an implementation detail) will be considered. Others will be entierly ignored, as if they dont exist. This is usefull for quick peeks at the top of the URI list.<DT><B>Returns:</B><DD>A URIFrontierMarker that is set for the 'start' of the frontier's URI list.</DL></DD></DL><HR><A NAME="getURIsList(org.archive.crawler.framework.FrontierMarker, int, boolean)"><!-- --></A><H3>getURIsList</H3><PRE>public java.util.ArrayList <B>getURIsList</B>(<A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> marker, int numberOfMatches, boolean verbose)</PRE><DL><DD>Return list of urls.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getURIsList(org.archive.crawler.framework.FrontierMarker, int, boolean)">getURIsList</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>marker</CODE> - <DD><CODE>numberOfMatches</CODE> - <DD><CODE>verbose</CODE> - <DT><B>Returns:</B><DD>List of URIs (strings).<DT><B>See Also:</B><DD><A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework"><CODE>FrontierM
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -