crawluri.html

来自「网络爬虫开源代码」· HTML 代码 · 共 988 行 · 第 1/5 页

HTML
988
字号
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;long</CODE></FONT></TD><TD><CODE><B>ExtractorJS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorJS.html#considerStrings(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, org.archive.crawler.framework.CrawlController, boolean)">considerStrings</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,                java.lang.CharSequence&nbsp;cs,                <A HREF="../../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework">CrawlController</A>&nbsp;controller,                boolean&nbsp;handlingJSFile)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorCSS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorCSS.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorDOC.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorDOC.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Processes a word document and extracts any hyperlinks from it.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorSWF.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorSWF.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>TrapSuppressExtractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/TrapSuppressExtractor.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorPDF.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorPDF.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected abstract &nbsp;void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorJS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorJS.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorUniversal.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorUniversal.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorXML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorXML.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorURI.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Perform usual extraction on a CrawlURI</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorImpliedURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorImpliedURI.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Perform usual extraction on a CrawlURI</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,        java.lang.CharSequence&nbsp;cs)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) &nbsp;void</CODE></FONT></TD><TD><CODE><B>JerichoExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,        java.lang.CharSequence&nbsp;cs)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorURI.html#extractLink(org.archive.crawler.datamodel.CrawlURI, org.archive.crawler.extractor.Link)">extractLink</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,            <A HREF="../../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A>&nbsp;wref)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Consider a single Link for internal URIs</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ChangeEvaluator.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ChangeEvaluator.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>HTTPContentDigest.</B><B><A HREF="../../../../../org/archive/crawler/extractor/HTTPContentDigest.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorHTTP.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTTP.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;boolean</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#isHtmlExpectedHere(org.archive.crawler.datamodel.CrawlURI)">isHtmlExpectedHere</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorTool.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorTool.html#outlinks(org.archive.crawler.datamodel.CrawlURI)">outlinks</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)">processEmbed</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,             java.lang.CharSequence&nbsp;value,             java.lang.CharSequence&nbsp;context)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nb

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?