⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 bdbfrontier.html

📁 一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫
💻 HTML
📖 第 1 页 / 共 5 页
字号:
<TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="createAlreadyIncluded()"><!-- --></A><H3>createAlreadyIncluded</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>createAlreadyIncluded</B>()                                       throws java.io.IOException</PRE><DL><DD>Create a UriUniqFilter that will serve as record  of already seen URIs.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#createAlreadyIncluded()">createAlreadyIncluded</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>A UURISet that will serve as a record of already seen URIs<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="deserializeAlreadySeen(java.lang.Class, java.io.File)"><!-- --></A><H3>deserializeAlreadySeen</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/UriUniqFilter.html" title="interface in org.archive.crawler.datamodel">UriUniqFilter</A> <B>deserializeAlreadySeen</B>(java.lang.Class&nbsp;cls,                                               java.io.File&nbsp;dir)                                        throws java.io.FileNotFoundException,                                               java.io.IOException</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.FileNotFoundException</CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="getQueueFor(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</PRE><DL><DD>Return the work queue for the given CrawlURI's classKey. URIs are ordered and politeness-delayed within their 'class'.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(org.archive.crawler.datamodel.CrawlURI)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI to base queue on<DT><B>Returns:</B><DD>the found or created BdbWorkQueue</DL></DD></DL><HR><A NAME="getQueueFor(java.lang.String)"><!-- --></A><H3>getQueueFor</H3><PRE>protected <A HREF="../../../../org/archive/crawler/frontier/WorkQueue.html" title="class in org.archive.crawler.frontier">WorkQueue</A> <B>getQueueFor</B>(java.lang.String&nbsp;classKey)</PRE><DL><DD>Return the work queue for the given classKey, or null if no such queue exists.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html#getQueueFor(java.lang.String)">getQueueFor</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/frontier/WorkQueueFrontier.html" title="class in org.archive.crawler.frontier">WorkQueueFrontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>classKey</CODE> - key to look for<DT><B>Returns:</B><DD>the found WorkQueue</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"><!-- --></A><H3>getInitialMarker</H3><PRE>public <A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> <B>getInitialMarker</B>(java.lang.String&nbsp;regexpr,                                       boolean&nbsp;inCacheOnly)</PRE><DL><DD><B>Description copied from interface: <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">Frontier</A></CODE></B></DD><DD>Get a <code>URIFrontierMarker</code> initialized with the given regular expression at the 'start' of the Frontier.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getInitialMarker(java.lang.String, boolean)">getInitialMarker</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>regexpr</CODE> - The regular expression that URIs within the frontier must                match to be considered within the scope of this marker<DD><CODE>inCacheOnly</CODE> - If set to true, only those URIs within the frontier                that are stored in cache (usually this means in memory                rather then on disk, but that is an implementation                detail) will be considered. Others will be entierly                ignored, as if they dont exist. This is usefull for quick                peeks at the top of the URI list.<DT><B>Returns:</B><DD>A URIFrontierMarker that is set for the 'start' of the frontier's                URI list.</DL></DD></DL><HR><A NAME="getURIsList(org.archive.crawler.framework.FrontierMarker, int, boolean)"><!-- --></A><H3>getURIsList</H3><PRE>public java.util.ArrayList <B>getURIsList</B>(<A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A>&nbsp;marker,                                       int&nbsp;numberOfMatches,                                       boolean&nbsp;verbose)</PRE><DL><DD>Return list of urls.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html#getURIsList(org.archive.crawler.framework.FrontierMarker, int, boolean)">getURIsList</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/framework/Frontier.html" title="interface in org.archive.crawler.framework">Frontier</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>marker</CODE> - <DD><CODE>numberOfMatches</CODE> - <DD><CODE>verbose</CODE> - <DT><B>Returns:</B><DD>List of URIs (strings).<DT><B>See Also:</B><DD><A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework"><CODE>FrontierM

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -