📄 crawluri.html
字号:
Perform usual extraction on a CrawlURI</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorDOC.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorDOC.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Processes a word document and extracts any hyperlinks from it.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorCSS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorCSS.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected abstract void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorURI.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorURI.html#extractLink(org.archive.crawler.datamodel.CrawlURI, org.archive.crawler.extractor.Link)">extractLink</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, <A HREF="../../../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A> wref)</CODE><BR> Consider a single Link for internal URIs</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>HTTPContentDigest.</B><B><A HREF="../../../../../org/archive/crawler/extractor/HTTPContentDigest.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>ExtractorHTTP.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTTP.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ChangeEvaluator.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ChangeEvaluator.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B>Extractor.</B><B><A HREF="../../../../../org/archive/crawler/extractor/Extractor.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#isHtmlExpectedHere(org.archive.crawler.datamodel.CrawlURI)">isHtmlExpectedHere</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorTool.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorTool.html#outlinks(org.archive.crawler.datamodel.CrawlURI)">outlinks</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)">processEmbed</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence value, java.lang.CharSequence context)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence, char)">processEmbed</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence value, java.lang.CharSequence context, char hopType)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processGeneralTag(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)">processGeneralTag</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence element, java.lang.CharSequence cs)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processLink(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)">processLink</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence value, java.lang.CharSequence context)</CODE><BR> Handle generic HREF cases.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processMeta(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">processMeta</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Process metadata tags.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processScript(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, int)">processScript</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>AggressiveExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/AggressiveExtractorHTML.html#processScript(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, int)">processScript</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processScriptCode(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">processScriptCode</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B>ExtractorHTML.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorHTML.html#processStyle(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, int)">processStyle</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> Process style text.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static long</CODE></FONT></TD><TD><CODE><B>ExtractorCSS.</B><B><A HREF="../../../../../org/archive/crawler/extractor/ExtractorCSS.html#processStyleCode(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, org.archive.crawler.framework.CrawlController)">processStyleCode</A></B>(<A HREF="../../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs, <A HREF="../../../../../org/archive/crawler/framework/CrawlController.html" title="class in org.archive.crawler.framework">Crawl
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -