crawljobhandler.html

来自「网络爬虫开源代码」· HTML 代码 · 共 1,532 行 · 第 1/5 页

HTML
1,532
字号
<DT><B>Parameters:</B><DD><CODE>orderFile</CODE> - Order file to use as the template for the new job.<DD><CODE>name</CODE> - The name of the new job.<DD><CODE>description</CODE> - Descriptions of the job.<DD><CODE>seeds</CODE> - The contents of the new settings' seed file.<DT><B>Returns:</B><DD>The new crawl job.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If a problem occurs creating the             settings.</DL></DD></DL><HR><A NAME="checkDirectory(java.io.File)"><!-- --></A><H3>checkDirectory</H3><PRE>protected void <B>checkDirectory</B>(java.io.File&nbsp;dir)                       throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="createNewJob(java.io.File, java.lang.String, java.lang.String, java.lang.String, int)"><!-- --></A><H3>createNewJob</H3><PRE>protected <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>createNewJob</B>(java.io.File&nbsp;orderFile,                                java.lang.String&nbsp;name,                                java.lang.String&nbsp;description,                                java.lang.String&nbsp;seeds,                                int&nbsp;priority)                         throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="newProfile(org.archive.crawler.admin.CrawlJob, java.lang.String, java.lang.String, java.lang.String)"><!-- --></A><H3>newProfile</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>newProfile</B>(<A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A>&nbsp;baseOn,                           java.lang.String&nbsp;name,                           java.lang.String&nbsp;description,                           java.lang.String&nbsp;seeds)                    throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A>,                           java.io.IOException</PRE><DL><DD>Creates a new profile. The new profile will be returned and also registered as the handler's 'new job'. The new profile will be based on the settings provided but created in a new location on disk.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseOn</CODE> - A CrawlJob (with a valid settingshandler) to use as the            template for the new profile.<DD><CODE>name</CODE> - The name of the new profile.<DD><CODE>description</CODE> - Description of the new profile<DD><CODE>seeds</CODE> - The contents of the new profiles' seed file<DT><B>Returns:</B><DD>The new profile.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="createSettingsHandler(java.io.File, java.lang.String, java.lang.String, java.lang.String, java.io.File, org.archive.crawler.admin.CrawlJobErrorHandler, java.lang.String, java.lang.String)"><!-- --></A><H3>createSettingsHandler</H3><PRE>protected <A HREF="../../../../org/archive/crawler/settings/XMLSettingsHandler.html" title="class in org.archive.crawler.settings">XMLSettingsHandler</A> <B>createSettingsHandler</B>(java.io.File&nbsp;orderFile,                                                   java.lang.String&nbsp;name,                                                   java.lang.String&nbsp;description,                                                   java.lang.String&nbsp;seeds,                                                   java.io.File&nbsp;newSettingsDir,                                                   <A HREF="../../../../org/archive/crawler/admin/CrawlJobErrorHandler.html" title="class in org.archive.crawler.admin">CrawlJobErrorHandler</A>&nbsp;errorHandler,                                                   java.lang.String&nbsp;filename,                                                   java.lang.String&nbsp;seedfile)                                            throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD>Creates a new settings handler based on an existing job. Basically all the settings file for the 'based on' will be copied to the specified directory.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>orderFile</CODE> - Order file to base new order file on.  Cannot be null.<DD><CODE>name</CODE> - Name for the new settings<DD><CODE>description</CODE> - Description of the new settings.<DD><CODE>seeds</CODE> - The contents of the new settings' seed file.<DD><CODE>newSettingsDir</CODE> - <DD><CODE>errorHandler</CODE> - <DD><CODE>filename</CODE> - Name of new order file.<DD><CODE>seedfile</CODE> - Name of new seeds file.<DT><B>Returns:</B><DD>The new settings handler.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If there are problems with reading the 'base on'             configuration, with writing the new configuration or it's             seed file.</DL></DD></DL><HR><A NAME="updateRecoveryPaths(java.io.File, org.archive.crawler.settings.SettingsHandler, java.lang.String)"><!-- --></A><H3>updateRecoveryPaths</H3><PRE>protected void <B>updateRecoveryPaths</B>(java.io.File&nbsp;recover,                                   <A HREF="../../../../org/archive/crawler/settings/SettingsHandler.html" title="class in org.archive.crawler.settings">SettingsHandler</A>&nbsp;sh,                                   java.lang.String&nbsp;jobName)                            throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>recover</CODE> - Source to use recovering. Can be full path to a recovery log            or full path to a checkpoint src dir.<DD><CODE>sh</CODE> - Settings Handler to update.<DD><CODE>jobName</CODE> - Name of this job.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="discardNewJob()"><!-- --></A><H3>discardNewJob</H3><PRE>public void <B>discardNewJob</B>()</PRE><DL><DD>Discard the handler's 'new job'. This will remove any files/directories written to disk.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getNewJob()"><!-- --></A><H3>getNewJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>getNewJob</B>()</PRE><DL><DD>Get the handler's 'new job'<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the handler's 'new job'</DL></DD></DL><HR><A NAME="isRunning()"><!-- --></A><H3>isRunning</H3><PRE>public boolean <B>isRunning</B>()</PRE><DL><DD>Is the crawler accepting crawl jobs to run?<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if the next availible CrawlJob will be crawled. False otherwise.</DL></DD></DL><HR><A NAME="isCrawling()"><!-- --></A><H3>isCrawling</H3><PRE>public boolean <B>isCrawling</B>()</PRE><DL><DD>Is a crawl job being crawled?<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if a job is actually being crawled (even if it is paused).         False if no job is being crawled.</DL></DD></DL><HR><A NAME="startCrawler()"><!-- --></A><H3>startCrawler</H3><PRE>public void <B>startCrawler</B>()</PRE><DL><DD>Allow jobs to be crawled.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="stopCrawler()"><!-- --></A><H3>stopCrawler</H3><PRE>public void <B>stopCrawler</B>()</PRE><DL><DD>Stop future jobs from being crawled. This action will not affect the current job.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="startNextJob()"><!-- --></A><H3>startNextJob</H3><PRE>protected final void <B>startNextJob</B>()</PRE><DL><DD>Start next crawl job. If a is job already running this method will do nothing.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="startNextJobInternal()"><!-- --></A><H3>startNextJobInternal</H3><PRE>protected void <B>startNextJobInternal</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="kickUpdate()"><!-- --></A><H3>kickUpdate</H3><PRE>public void <B>kickUpdate</B>()</PRE><DL><DD>Forward a 'kick' update to current job if any.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="loadOptions(java.lang.String)"><!-- --></A><H3>loadOptions</H3><PRE>public static java.util.ArrayList&lt;java.lang.String&gt; <B>loadOptions</B>(java.lang.String&nbsp;file)                                                         throws java.io.IOException</PRE><DL><DD>Loads options from a file. Typically these are a list of available modules that can be plugged into some part of the configuration. For examples Processors, Frontiers, Filters etc. Leading and trailing spaces are trimmed from each line.  <p>Options are loaded from the CLASSPATH.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>file</CODE> - the name of the option file (without path!)<DT><B>Returns:</B><DD>The option file with each option line as a seperate entry in the         ArrayList.<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE> - when there is trouble reading the file.</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?