📄 candidateuri.html
字号:
forceFetch</H3><PRE>public boolean <B>forceFetch</B>()</PRE><DL><DD>If this method returns true, this URI should be fetched even though it already has been crawled. This also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if crawling of this URI should be forced</DL></DD></DL><HR><A NAME="setForceFetch(boolean)"><!-- --></A><H3>setForceFetch</H3><PRE>public void <B>setForceFetch</B>(boolean b)</PRE><DL><DD>Method to signal that this URI should be fetched even though it already has been crawled. Setting this to true also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>b</CODE> - set to true to enforce the crawling of this URI</DL></DD></DL><HR><A NAME="getSchedulingDirective()"><!-- --></A><H3>getSchedulingDirective</H3><PRE>public int <B>getSchedulingDirective</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Returns the schedulingDirective.</DL></DD></DL><HR><A NAME="setSchedulingDirective(int)"><!-- --></A><H3>setSchedulingDirective</H3><PRE>public void <B>setSchedulingDirective</B>(int schedulingDirective)</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>schedulingDirective</CODE> - The schedulingDirective to set.</DL></DD></DL><HR><A NAME="needsImmediateScheduling()"><!-- --></A><H3>needsImmediateScheduling</H3><PRE>public boolean <B>needsImmediateScheduling</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if needs immediate scheduling.</DL></DD></DL><HR><A NAME="needsSoonScheduling()"><!-- --></A><H3>needsSoonScheduling</H3><PRE>public boolean <B>needsSoonScheduling</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if needs soon but not top scheduling.</DL></DD></DL><HR><A NAME="getTransHops()"><!-- --></A><H3>getTransHops</H3><PRE>public int <B>getTransHops</B>()</PRE><DL><DD>Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed. In some cases, URIs with greater than zero but less than some threshold such hops are treated specially. <p>TODO: consider moving link-count in here as well, caching calculation, and refactoring CrawlScope.exceedsMaxHops() to use this.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Transhop count.</DL></DD></DL><HR><A NAME="fromString(java.lang.String)"><!-- --></A><H3>fromString</H3><PRE>public static <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>fromString</B>(java.lang.String uriHopsViaString) throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI instance.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>uriHopsViaString</CODE> - String with a URI.<DT><B>Returns:</B><DD>A CandidateURI made from passed <code>uriHopsViaString</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="createSeedCandidateURI(org.archive.net.UURI)"><!-- --></A><H3>createSeedCandidateURI</H3><PRE>public static <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createSeedCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A> uuri)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="createCandidateURI(org.archive.net.UURI, org.archive.crawler.extractor.Link)"><!-- --></A><H3>createCandidateURI</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A> baseUURI, <A HREF="../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A> link) throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Utility method for creation of CandidateURIs found extracting links from this CrawlURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseUURI</CODE> - BaseUURI for <code>link</code>.<DD><CODE>link</CODE> - Link to wrap CandidateURI in.<DT><B>Returns:</B><DD>New candidateURI wrapper around <code>link</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="createCandidateURI(org.archive.net.UURI, org.archive.crawler.extractor.Link, int, boolean)"><!-- --></A><H3>createCandidateURI</H3><PRE>public <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> <B>createCandidateURI</B>(<A HREF="../../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A> baseUURI, <A HREF="../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A> link, int scheduling, boolean seed) throws org.apache.commons.httpclient.URIException</PRE><DL><DD>Utility method for creation of CandidateURIs found extracting links from this CrawlURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseUURI</CODE> - BaseUURI for <code>link</code>.<DD><CODE>link</CODE> - Link to wrap CandidateURI in.<DD><CODE>scheduling</CODE> - How new CandidateURI should be scheduled.<DD><CODE>seed</CODE> - True if this CandidateURI is a seed.<DT><B>Returns:</B><DD>New candidateURI wrapper around <code>link</code>.<DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="inheritFrom(org.archive.crawler.datamodel.CandidateURI)"><!-- --></A><H3>inheritFrom</H3><PRE>protected void <B>inheritFrom</B>(<A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html" title="class in org.archive.crawler.datamodel">CandidateURI</A> ancestor)</PRE><DL><DD>Inherit (copy) the relevant keys-values from the ancestor.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>ancestor</CODE> - </DL></DD></DL><HR><A NAME="getClassKey()"><!-- --></A><H3>getClassKey</H3><PRE>public java.lang.String <B>getClassKey</B>()</PRE><DL><DD>Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Token (usually the hostname) which indicates what "class" this CrawlURI should be grouped with.</DL></DD></DL><HR><A NAME="setClassKey(java.lang.String)"><!-- --></A><H3>setClassKey</H3><PRE>public void <B>setClassKey</B>(java.lang.String key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getAList()"><!-- --></A><H3>getAList</H3><PRE>public st.ata.util.AList <B>getAList</B>()</PRE><DL><DD><B>Deprecated.</B> <I>Public access will be deprecated. This methods access will change in next release. Use specialized accessors instead such as <A HREF="../../../../org/archive/crawler/datamodel/CandidateURI.html#getString(java.lang.String)"><CODE>getString(String)</CODE></A>.</I><P><DD>Assumption is that only one thread at a time will ever be accessing a particular CandidateURI.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the attribute list.</DL></DD></DL><HR><A NAME="clearAList()"><!-- --></A><H3>clearAList</H3><PRE>protected void <B>clearAList</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="putObject(java.lang.String, java.lang.Object)"><!-- --></A><H3>putObject</H3><PRE>public void <B>putObject</B>(java.lang.String key, java.lang.Object value)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getObject(java.lang.String)"><!-- --></A><H3>getObject</H3><PRE>public java.lang.Object <B>getObject</B>(java.lang.String key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getString(java.lang.String)"><!-- --></A><H3>getString</H3><PRE>public java.lang.String <B>getString</B>(java.lang.String key)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -