writerpoolprocessor.html
来自「网络爬虫开源代码」· HTML 代码 · 共 1,138 行 · 第 1/5 页
HTML
1,138 行
<DL><DD>Writes a CrawlURI and its associated data to store file. Currently this method understands the following uri types: dns, http, and https.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/Processor.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></CODE> in class <CODE><A HREF="../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI to process.</DL></DD></DL><HR><A NAME="checkBytesWritten()"><!-- --></A><H3>checkBytesWritten</H3><PRE>protected void <B>checkBytesWritten</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="shouldWrite(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>shouldWrite</H3><PRE>protected boolean <B>shouldWrite</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</PRE><DL><DD>Whether the given CrawlURI should be written to archive files. Annotates CrawlURI with a reason for any negative answer.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI<DT><B>Returns:</B><DD>true if URI should be written; false otherwise</DL></DD></DL><HR><A NAME="getHostAddress(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>getHostAddress</H3><PRE>protected java.lang.String <B>getHostAddress</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</PRE><DL><DD>Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - CrawlURI<DT><B>Returns:</B><DD>String of IP address</DL></DD></DL><HR><A NAME="getAttributeUnchecked(java.lang.String)"><!-- --></A><H3>getAttributeUnchecked</H3><PRE>public java.lang.Object <B>getAttributeUnchecked</B>(java.lang.String name)</PRE><DL><DD>Version of getAttributes that catches and logs exceptions and returns null if failure to fetch the attribute.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>name</CODE> - Attribute name.<DT><B>Returns:</B><DD>Attribute or null.</DL></DD></DL><HR><A NAME="getMaxSize()"><!-- --></A><H3>getMaxSize</H3><PRE>public long <B>getMaxSize</B>()</PRE><DL><DD>Max size we want files to be (bytes). Default is ARCConstants.DEFAULT_MAX_ARC_FILE_SIZE. Note that ARC files will usually be bigger than maxSize; they'll be maxSize + length to next boundary.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>ARC maximum size.</DL></DD></DL><HR><A NAME="getPrefix()"><!-- --></A><H3>getPrefix</H3><PRE>public java.lang.String <B>getPrefix</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getOutputDirs()"><!-- --></A><H3>getOutputDirs</H3><PRE>public java.util.List<java.io.File> <B>getOutputDirs</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="isCompressed()"><!-- --></A><H3>isCompressed</H3><PRE>public boolean <B>isCompressed</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getPoolMaximumActive()"><!-- --></A><H3>getPoolMaximumActive</H3><PRE>public int <B>getPoolMaximumActive</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Returns the poolMaximumActive.</DL></DD></DL><HR><A NAME="getPoolMaximumWait()"><!-- --></A><H3>getPoolMaximumWait</H3><PRE>public int <B>getPoolMaximumWait</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>Returns the poolMaximumWait.</DL></DD></DL><HR><A NAME="getSuffix()"><!-- --></A><H3>getSuffix</H3><PRE>public java.lang.String <B>getSuffix</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getMaxToWrite()"><!-- --></A><H3>getMaxToWrite</H3><PRE>public long <B>getMaxToWrite</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="crawlEnding(java.lang.String)"><!-- --></A><H3>crawlEnding</H3><PRE>public void <B>crawlEnding</B>(java.lang.String sExitMessage)</PRE><DL><DD><B>Description copied from interface: <CODE><A HREF="../../../../org/archive/crawler/event/CrawlStatusListener.html#crawlEnding(java.lang.String)">CrawlStatusListener</A></CODE></B></DD><DD>Called when a CrawlController is ending a crawl (for any reason)<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../org/archive/crawler/event/CrawlStatusListener.html#crawlEnding(java.lang.String)">crawlEnding</A></CODE> in interface <CODE><A HREF="../../../../org/archive/crawler/event/CrawlStatusListener.html" title="interface in org.archive.crawler.event">CrawlStatusListener</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>sExitMessage</CODE> - Type of exit. Should be one of the STATUS constants in defined in CrawlJob.<DT><B>See Also:</B><DD><A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin"><CODE>CrawlJob</CODE></A></DL></DD></DL><HR><A NAME="crawlEnded(java.lang.String)"><!-- --></A><H3>crawlEnded</H3><PRE>public void <B>crawlEnded</B>(java.lang.String&n
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?