⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 candidateuri.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 5 页
字号:
forceFetch</H3><PRE>public boolean <B>forceFetch</B>()</PRE><DL><DD>If this method returns true, this URI should be fetched even though it already has been crawled. This also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if crawling of this URI should be forced</DL></DD></DL><HR><A NAME="setForceFetch(boolean)"><!-- --></A><H3>setForceFetch</H3><PRE>public void <B>setForceFetch</B>(boolean&nbsp;b)</PRE><DL><DD>Method to signal that this URI should be fetched even though it already has been crawled. Setting this to true also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>b</CODE> - set to true to enforce the crawling of this URI</DL></DD></DL><HR><A NAME="getSchedulingDirective()"><!-- --></A><H3>getSchedulingDirective</H3><PRE>public int <B>getSchedulingDirective</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Returns the schedulingDirective.</DL></DD></DL><HR><A NAME="setSchedulingDirective(int)"><!-- --></A><H3>setSchedulingDirective</H3><PRE>public void <B>setSchedulingDirective</B>(int&nbsp;schedulingDirective)</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>schedulingDirective</CODE> - The schedulingDirective to set.</DL></DD></DL><HR><A NAME="needsImmediateScheduling()"><!-- --></A><H3>needsImmediateScheduling</H3><PRE>public boolean <B>needsImmediateScheduling</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if needs immediate scheduling.</DL></DD></DL><HR><A NAME="needsSoonScheduling()"><!-- --></A><H3>needsSoonScheduling</H3><PRE>public boolean <B>needsSoonScheduling</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if needs soon but not top scheduling.</DL></DD></DL><HR><A NAME="getTransHops()"><!-- --></A><H3>getTransHops</H3><PRE>public int <B>getTransHops</B>()</PRE><DL><DD>Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed.  In some cases, URIs with greater than zero but less than some threshold such hops are treated specially.   <p>TODO: consider moving link-count in here as well, caching calculation, and refactoring CrawlScope.exceedsMaxHops() to use this.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Transhop count.</DL></DD></DL><HR><A NAME="fromString(java.lang.String)"><!-- --></A><H3>fromString</H3><PRE>public static <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>fromString</B>(java.lang.String&nbsp;uriHopsViaString)                               throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI  instance.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>uriHopsViaString</CODE> - String with a URI.<DT><B>Returns:</B><DD>A CandidateURI made from passed <code>uriHopsViaString</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="createSeedCandidateURI(org.archive.net.UURI)"><!-- --></A><H3>createSeedCandidateURI</H3><PRE>public static <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createSeedCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;uuri)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="createCandidateURI(org.archive.net.UURI, org.archive.crawler.extractor.Link)"><!-- --></A><H3>createCandidateURI</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;baseUURI,                                       <A HREF="../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A>&nbsp;link)                                throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Utility method for creation of CandidateURIs found extracting links from this CrawlURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseUURI</CODE> - BaseUURI for <code>link</code>.<DD><CODE>link</CODE> - Link to wrap CandidateURI in.<DT><B>Returns:</B><DD>New candidateURI wrapper around <code>link</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="createCandidateURI(org.archive.net.UURI, org.archive.crawler.extractor.Link, int, boolean)"><!-- --></A><H3>createCandidateURI</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;baseUURI,                                       <A HREF="../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A>&nbsp;link,                                       int&nbsp;scheduling,                                       boolean&nbsp;seed)                                throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Utility method for creation of CandidateURIs found extracting links from this CrawlURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseUURI</CODE> - BaseUURI for <code>link</code>.<DD><CODE>link</CODE> - Link to wrap CandidateURI in.<DD><CODE>scheduling</CODE> - How new CandidateURI should be scheduled.<DD><CODE>seed</CODE> - True if this CandidateURI is a seed.<DT><B>Returns:</B><DD>New candidateURI wrapper around <code>link</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="inheritFrom(org.archive.crawler.datamodel.CandidateURI)"><!-- --></A><H3>inheritFrom</H3><PRE>protected void <B>inheritFrom</B>(<A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A>&nbsp;ancestor)</PRE><DL><DD>Inherit (copy) the relevant keys-values from the ancestor.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>ancestor</CODE> - </DL></DD></DL><HR><A NAME="getClassKey()"><!-- --></A><H3>getClassKey</H3><PRE>public java.lang.String <B>getClassKey</B>()</PRE><DL><DD>Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Token (usually the hostname) which indicates what "class" this CrawlURI should be grouped with.</DL></DD></DL><HR><A NAME="setClassKey(java.lang.String)"><!-- --></A><H3>setClassKey</H3><PRE>public void <B>setClassKey</B>(java.lang.String&nbsp;key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getAList()"><!-- --></A><H3>getAList</H3><PRE>public st.ata.util.AList <B>getAList</B>()</PRE><DL><DD><B>Deprecated.</B>&nbsp;<I>Public access will be deprecated. This methods access will change in next release.  Use specialized accessors instead such as <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html#getString(java.lang.String)"><CODE>getString(String)</CODE></A>.</I><P><DD>Assumption is that only one thread at a time will ever be accessing a particular CandidateURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the attribute list.</DL></DD></DL><HR><A NAME="clearAList()"><!-- --></A><H3>clearAList</H3><PRE>protected void <B>clearAList</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="putObject(java.lang.String, java.lang.Object)"><!-- --></A><H3>putObject</H3><PRE>public void <B>putObject</B>(java.lang.String&nbsp;key,                      java.lang.Object&nbsp;value)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getObject(java.lang.String)"><!-- --></A><H3>getObject</H3><PRE>public java.lang.Object <B>getObject</B>(java.lang.String&nbsp;key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getString(java.lang.String)"><!-- --></A><H3>getString</H3><PRE>public java.lang.String <B>getString</B>(java.lang.String&nbsp;key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -