⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 processor.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/LinksScoper.html" title="class in org.archive.crawler.postprocessor">LinksScoper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Determine which extracted links are within scope.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/LowDiskPauseProcessor.html" title="class in org.archive.crawler.postprocessor">LowDiskPauseProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Processor module which uses 'df -k', where available and with the expected output format (on Linux), to monitor available  disk space and pause the crawl if free space on  monitored  filesystems falls below certain thresholds.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/SupplementaryLinksScoper.html" title="class in org.archive.crawler.postprocessor">SupplementaryLinksScoper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Run CandidateURI links carried in the passed CrawlURI through a filter and 'handle' rejections.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/TextWaitEvaluator.html" title="class in org.archive.crawler.postprocessor">TextWaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A specialized ContentBasedWaitEvaluator.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/postprocessor/WaitEvaluator.html" title="class in org.archive.crawler.postprocessor">WaitEvaluator</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor that determines when a URI should be revisited next.</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.prefetch"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/prefetch/package-summary.html">org.archive.crawler.prefetch</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/prefetch/package-summary.html">org.archive.crawler.prefetch</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/PreconditionEnforcer.html" title="class in org.archive.crawler.prefetch">PreconditionEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Ensures the preconditions for a fetch -- such as DNS lookup  or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/Preselector.html" title="class in org.archive.crawler.prefetch">Preselector</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/QuotaEnforcer.html" title="class in org.archive.crawler.prefetch">QuotaEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A simple quota enforcer.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/prefetch/RuntimeLimitEnforcer.html" title="class in org.archive.crawler.prefetch">RuntimeLimitEnforcer</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor to enforce runtime limits on crawls.</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.processor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/package-summary.html">org.archive.crawler.processor</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/processor/package-summary.html">org.archive.crawler.processor</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/BeanShellProcessor.html" title="class in org.archive.crawler.processor">BeanShellProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A processor which runs a BeanShell script on the CrawlURI.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/CrawlMapper.html" title="class in org.archive.crawler.processor">CrawlMapper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/HashCrawlMapper.html" title="class in org.archive.crawler.processor">HashCrawlMapper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/processor/LexicalCrawlMapper.html" title="class in org.archive.crawler.processor">LexicalCrawlMapper</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).</TD></TR></TABLE>&nbsp;<P><A NAME="org.archive.crawler.writer"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2">Uses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/writer/package-summary.html">org.archive.crawler.writer</A></FONT></TH></TR></TABLE>&nbsp;<P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableSubHeadingColor"><TH ALIGN="left" COLSPAN="2">Subclasses of <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A> in <A HREF="../../../../../org/archive/crawler/writer/package-summary.html">org.archive.crawler.writer</A></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/ARCWriterProcessor.html" title="class in org.archive.crawler.writer">ARCWriterProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/ExperimentalWARCWriterProcessor.html" title="class in org.archive.crawler.writer">ExperimentalWARCWriterProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Experimental WARCWriterProcessor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;class</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../../org/archive/crawler/writer/MirrorWriterProcessor.html" title="class in org.archive.crawler.writer">MirrorWriterProcessor</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Processor module that writes the results of successful fetches to   files on disk.</TD></TR></TABLE>&nbsp;<P><HR><!-- ======= START OF BOTTOM NAVBAR ====== --><A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework"><FONT CLASS="NavBarFont1"><B>Class</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Use</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;PREV&nbsp;&nbsp;NEXT</FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../../index.html?org/archive/crawler/framework//class-useProcessor.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="Processor.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR></TABLE><A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= --><HR>Copyright &copy; 2003-2006 Internet Archive. All Rights Reserved.</BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -