processor.html
来自「网络爬虫开源代码」· HTML 代码 · 共 866 行 · 第 1/4 页
HTML
866 行
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/HashCrawlMapper.html" title="class in org.archive.crawler.processor">HashCrawlMapper</A></B></CODE><BR> Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/LexicalCrawlMapper.html" title="class in org.archive.crawler.processor">LexicalCrawlMapper</A></B></CODE><BR> A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).</TD></TR></TABLE> <P><A NAME="org.archive.crawler.processor.recrawl"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/recrawl/package-summary.html">org.archive.crawler.processor.recrawl</A></FONT></TH></TR></TABLE> <P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/recrawl/package-summary.html">org.archive.crawler.processor.recrawl</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/FetchHistoryProcessor.html" title="class in org.archive.crawler.processor.recrawl">FetchHistoryProcessor</A></B></CODE><BR> Maintain a history of fetch information inside the CrawlURI's attributes.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/PersistLoadProcessor.html" title="class in org.archive.crawler.processor.recrawl">PersistLoadProcessor</A></B></CODE><BR> Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/PersistLogProcessor.html" title="class in org.archive.crawler.processor.recrawl">PersistLogProcessor</A></B></CODE><BR> Log CrawlURI attributes from latest fetch for consultation by a later recrawl.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/PersistOnlineProcessor.html" title="class in org.archive.crawler.processor.recrawl">PersistOnlineProcessor</A></B></CODE><BR> Common superclass for persisting Processors which directly store/load to persistence (as opposed to logging for batch load later).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/PersistProcessor.html" title="class in org.archive.crawler.processor.recrawl">PersistProcessor</A></B></CODE><BR> Superclass for Processors which utilize BDB-JE for URI state (including most notably history) persistence.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/recrawl/PersistStoreProcessor.html" title="class in org.archive.crawler.processor.recrawl">PersistStoreProcessor</A></B></CODE><BR> Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.</TD></TR></TABLE> <P><A NAME="org.archive.crawler.writer"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/writer/package-summary.html">org.archive.crawler.writer</A></FONT></TH></TR></TABLE> <P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/writer/package-summary.html">org.archive.crawler.writer</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/ARCWriterProcessor.html" title="class in org.archive.crawler.writer">ARCWriterProcessor</A></B></CODE><BR> Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/ExperimentalV10WARCWriterProcessor.html" title="class in org.archive.crawler.writer">ExperimentalV10WARCWriterProcessor</A></B></CODE><BR> <B>Deprecated.</B> <I>See <A HREF="../../../../../org/archive/io/warc/v10/ExperimentalWARCWriter.html" title="class in org.archive.io.warc.v10"><CODE>ExperimentalWARCWriter</CODE></A></I></TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/ExperimentalWARCWriterProcessor.html" title="class in org.archive.crawler.writer">ExperimentalWARCWriterProcessor</A></B></CODE><BR> Experimental WARCWriterProcessor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/Kw3WriterProcessor.html" title="class in org.archive.crawler.writer">Kw3WriterProcessor</A></B></CODE><BR> Processor module that writes the results of successful fetches to files on disk.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/MirrorWriterProcessor.html" title="class in org.archive.crawler.writer">MirrorWriterProcessor</A></B></CODE><BR> Processor module that writes the results of successful fetches to files on disk.</TD></TR></TABLE> <P><HR><!-- ======= START OF BOTTOM NAVBAR ====== --><A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework"><FONT CLASS="NavBarFont1"><B>Class</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Use</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> PREV NEXT</FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../../index.html?org/archive/crawler/framework//class-useProcessor.html" target="_top"><B>FRAMES</B></A> <A HREF="Processor.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR></TABLE><A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= --><HR>Copyright © 2003-2007 Internet Archive. All Rights Reserved.</BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?