crawluri.html
来自「网络爬虫开源代码」· HTML 代码 · 共 988 行 · 第 1/5 页
HTML
988 行
<BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static long</CODE></FONT></TD><TD><CODE><B>ExtractorJS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorJS.html#considerStrings(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, org.archive.crawler.framework.CrawlController, boolean)">considerStrings</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs, <A HREF="../../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework">CrawlController</A> controller, boolean handlingJSFile)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorCSS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorCSS.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorDOC.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorDOC.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Processes a word document and extracts any hyperlinks from it.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorSWF.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorSWF.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>TrapSuppressExtractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/TrapSuppressExtractor.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorPDF.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorPDF.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected abstract void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorJS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorJS.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorUniversal.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorUniversal.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorXML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorXML.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorURI.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Perform usual extraction on a CrawlURI</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorImpliedURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorImpliedURI.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Perform usual extraction on a CrawlURI</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) void</CODE></FONT></TD><TD><CODE><B>JerichoExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorURI.html#extractLink(org.archive.crawler.datamodel.CrawlURI, org.archive.crawler.extractor.Link)">extractLink</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, <A HREF="../../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A> wref)</CODE><BR> Consider a single Link for internal URIs</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ChangeEvaluator.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ChangeEvaluator.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>HTTPContentDigest.</B><B><A HREF="../../../../../org/archive/crawler/extractor/HTTPContentDigest.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorHTTP.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTTP.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#isHtmlExpectedHere(org.archive.crawler.datamodel.CrawlURI)">isHtmlExpectedHere</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorTool.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorTool.html#outlinks(org.archive.crawler.datamodel.CrawlURI)">outlinks</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)">processEmbed</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence value, java.lang.CharSequence context)</CODE><BR> &nb
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?