processor.html

来自「网络爬虫开源代码」· HTML 代码 · 共 866 行 · 第 1/4 页

HTML
866
字号
<TD><CODE><B>ProcessorChain.</B><B><A HREF="../../../../../org/archive/crawler/framework/ProcessorChain.html#getFirstProcessor()">getFirstProcessor</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the first processor in the chain.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></CODE></FONT></TD><TD><CODE><B>ProcessorChain.</B><B><A HREF="../../../../../org/archive/crawler/framework/ProcessorChain.html#getProcessor(java.lang.Class)">getProcessor</A></B>(java.lang.Class&nbsp;classType)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the first processor that is of class <code>classType</code> or a subclass of it.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></CODE></FONT></TD><TD><CODE><B>Processor.</B><B><A HREF="../../../../../org/archive/crawler/framework/Processor.html#spawn(int)">spawn</A></B>(int&nbsp;serialNum)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Methods in <A HREF="../../../../../org/archive/crawler/framework/package-summary.html">org.archive.crawler.framework</A> with parameters of type <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B>Processor.</B><B><A HREF="../../../../../org/archive/crawler/framework/Processor.html#setDefaultNextProcessor(org.archive.crawler.framework.Processor)">setDefaultNextProcessor</A></B>(<A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A>&nbsp;nextProcessor)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set the default next processor in the chain.</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.postprocessor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/postprocessor/package-summary.html">org.archive.crawler.postprocessor</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/postprocessor/package-summary.html">org.archive.crawler.postprocessor</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/ContentBasedWaitEvaluator.html" title="class in org.archive.crawler.postprocessor">ContentBasedWaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A WaitEvaluator that compares the CrawlURIs content type to a configurable regular expression.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/CrawlStateUpdater.html" title="class in org.archive.crawler.postprocessor">CrawlStateUpdater</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/FrontierScheduler.html" title="class in org.archive.crawler.postprocessor">FrontierScheduler</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'Schedule' with the Frontier CandidateURIs being carried by the passed CrawlURI.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/ImageWaitEvaluator.html" title="class in org.archive.crawler.postprocessor">ImageWaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A specialized ContentBasedWaitEvaluator.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/LinksScoper.html" title="class in org.archive.crawler.postprocessor">LinksScoper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Determine which extracted links are within scope.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/LowDiskPauseProcessor.html" title="class in org.archive.crawler.postprocessor">LowDiskPauseProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Processor module which uses 'df -k', where available and with the expected output format (on Linux), to monitor available  disk space and pause the crawl if free space on  monitored  filesystems falls below certain thresholds.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/SupplementaryLinksScoper.html" title="class in org.archive.crawler.postprocessor">SupplementaryLinksScoper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Run CandidateURI links carried in the passed CrawlURI through a filter and 'handle' rejections.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/TextWaitEvaluator.html" title="class in org.archive.crawler.postprocessor">TextWaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A specialized ContentBasedWaitEvaluator.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/WaitEvaluator.html" title="class in org.archive.crawler.postprocessor">WaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor that determines when a URI should be revisited next.</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.prefetch"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/prefetch/package-summary.html">org.archive.crawler.prefetch</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/prefetch/package-summary.html">org.archive.crawler.prefetch</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/PreconditionEnforcer.html" title="class in org.archive.crawler.prefetch">PreconditionEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Ensures the preconditions for a fetch -- such as DNS lookup  or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/Preselector.html" title="class in org.archive.crawler.prefetch">Preselector</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/QuotaEnforcer.html" title="class in org.archive.crawler.prefetch">QuotaEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A simple quota enforcer.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/RuntimeLimitEnforcer.html" title="class in org.archive.crawler.prefetch">RuntimeLimitEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor to enforce runtime limits on crawls.</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.processor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/package-summary.html">org.archive.crawler.processor</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/package-summary.html">org.archive.crawler.processor</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/BeanShellProcessor.html" title="class in org.archive.crawler.processor">BeanShellProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor which runs a BeanShell script on the CrawlURI.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/CrawlMapper.html" title="class in org.archive.crawler.processor">CrawlMapper</A></B></CODE><BR>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?