📄 crawljobhandler.html

📁 一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫
💻 HTML
📖 第 1 页 / 共 5 页
字号:
<DD>Returns a List of all known profiles.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>a List of all known profiles.</DL></DD></DL><HR><A NAME="addJob(org.archive.crawler.admin.CrawlJob)"><!-- --></A><H3>addJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>addJob</B>(<A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A>&nbsp;job)</PRE><DL><DD>Submit a job to the handler. Job will be scheduled for crawling. At present it will not take the job's priority into consideration.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>job</CODE> - A new job for the handler<DT><B>Returns:</B><DD>CrawlJob that was added or null.</DL></DD></DL><HR><A NAME="getDefaultProfile()"><!-- --></A><H3>getDefaultProfile</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>getDefaultProfile</B>()</PRE><DL><DD>Returns the default profile. If no default profile has been set it will return the first profile that was set/loaded and still exists. If no profiles exist it will return null<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the default profile.</DL></DD></DL><HR><A NAME="setDefaultProfile(org.archive.crawler.admin.CrawlJob)"><!-- --></A><H3>setDefaultProfile</H3><PRE>public void <B>setDefaultProfile</B>(<A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A>&nbsp;profile)</PRE><DL><DD>Set the default profile.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>profile</CODE> - The new default profile. The following must apply to it.                profile.isProfile() should return true and                this.getProfiles() should contain it.</DL></DD></DL><HR><A NAME="getPendingJobs()"><!-- --></A><H3>getPendingJobs</H3><PRE>public java.util.List <B>getPendingJobs</B>()</PRE><DL><DD>A List of all pending jobs<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>A List of all pending jobs. No promises are made about the order of the list</DL></DD></DL><HR><A NAME="getCurrentJob()"><!-- --></A><H3>getCurrentJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>getCurrentJob</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>The job currently being crawled.</DL></DD></DL><HR><A NAME="getCompletedJobs()"><!-- --></A><H3>getCompletedJobs</H3><PRE>public java.util.List <B>getCompletedJobs</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>A List of all finished jobs.</DL></DD></DL><HR><A NAME="getJob(java.lang.String)"><!-- --></A><H3>getJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>getJob</B>(java.lang.String&nbsp;jobUID)</PRE><DL><DD>Return a job with the given UID. Doesn't matter if it's pending, currently running, has finished running is new or a profile.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>jobUID</CODE> - The unique ID of the job.<DT><B>Returns:</B><DD>The job with the UID or null if no such job is found</DL></DD></DL><HR><A NAME="terminateCurrentJob()"><!-- --></A><H3>terminateCurrentJob</H3><PRE>public boolean <B>terminateCurrentJob</B>()</PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if we terminated a current job (False if no job to terminate)</DL></DD></DL><HR><A NAME="deleteJob(java.lang.String)"><!-- --></A><H3>deleteJob</H3><PRE>public void <B>deleteJob</B>(java.lang.String&nbsp;jobUID)</PRE><DL><DD>The specified job will be removed from the pending queue or aborted if currently running.  It will be placed in the list of completed jobs with appropriate status info. If the job is already in the completed list or no job with the given UID is found, no action will be taken.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>jobUID</CODE> - The UID (unique ID) of the job that is to be deleted.</DL></DD></DL><HR><A NAME="pauseJob()"><!-- --></A><H3>pauseJob</H3><PRE>public void <B>pauseJob</B>()</PRE><DL><DD>Cause the current job to pause. If no current job is crawling this method will have no effect.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="resumeJob()"><!-- --></A><H3>resumeJob</H3><PRE>public void <B>resumeJob</B>()</PRE><DL><DD>Cause the current job to resume crawling if it was paused. Will have no effect if the current job was not paused or if there is no current job. If the current job is still waiting to pause, this will not take effect until the job has actually paused. At which time it will immeditatly resume crawling.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="checkpointJob()"><!-- --></A><H3>checkpointJob</H3><PRE>public void <B>checkpointJob</B>()                   throws java.lang.IllegalStateException</PRE><DL><DD>Cause the current job to write a checkpoint to disk. Currently requires job to already be paused.<P><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - Thrown if crawl is not paused.</DL></DD></DL><HR><A NAME="getNextJobUID()"><!-- --></A><H3>getNextJobUID</H3><PRE>public java.lang.String <B>getNextJobUID</B>()</PRE><DL><DD>Returns a unique job ID. <p> No two calls to this method (on the same instance of this class) can ever return the same value. <br> Currently implemented to return a time stamp. That is subject to change though.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>A unique job ID.<DT><B>See Also:</B><DD><A HREF="../../../../org/archive/util/ArchiveUtils.html#TIMESTAMP17"><CODE>ArchiveUtils.TIMESTAMP17</CODE></A></DL></DD></DL><HR><A NAME="newJob(org.archive.crawler.admin.CrawlJob, java.lang.String, java.lang.String, java.lang.String, java.lang.String, int)"><!-- --></A><H3>newJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>newJob</B>(<A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A>&nbsp;baseOn,                       java.lang.String&nbsp;recovery,                       java.lang.String&nbsp;name,                       java.lang.String&nbsp;description,                       java.lang.String&nbsp;seeds,                       int&nbsp;priority)                throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD>Creates a new job. The new job will be returned and also registered as the handler's 'new job'. The new job will be based on the settings provided but created in a new location on disk.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseOn</CODE> - A CrawlJob (with a valid settingshandler) to use as the            template for the new job.<DD><CODE>recovery</CODE> - Whether to preinitialize new job as recovery of <code>baseOn</code> job.  String holds RECOVER_LOG if we are to do the recovery based off the recover.gz log -- See RecoveryJournal in the frontier package -- or it holds the name of the checkpoint we're to use recoverying.<DD><CODE>name</CODE> - The name of the new job.<DD><CODE>description</CODE> - Descriptions of the job.<DD><CODE>seeds</CODE> - The contents of the new settings' seed file.<DD><CODE>priority</CODE> - The priority of the new job.<DT><B>Returns:</B><DD>The new crawl job.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If a problem occurs creating the             settings.</DL></DD></DL><HR><A NAME="newJob(java.io.File, java.lang.String, java.lang.String, java.lang.String)"><!-- --></A><H3>newJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>newJob</B>(java.io.File&nbsp;orderFile,                       java.lang.String&nbsp;name,                       java.lang.String&nbsp;description,                       java.lang.String&nbsp;seeds)                throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD>Creates a new job. The new job will be returned and also registered as the handler's 'new job'. The new job will be based on the settings provided but created in a new location on disk.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>orderFile</CODE> - Order file to use as the template for the new job.<DD><CODE>name</CODE> - The name of the new job.<DD><CODE>description</CODE> - Descriptions of the job.<DD><CODE>seeds</CODE> - The contents of the new settings' seed file.<DT><B>Returns:</B><DD>The new crawl job.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If a problem occurs creating the             settings.</DL></DD></DL><HR><A NAME="checkDirectory(java.io.File)"><!-- --></A><H3>checkDirectory</H3><PRE>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -