⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 crawljobhandler.html

📁 一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫
💻 HTML
📖 第 1 页 / 共 5 页
字号:
protected void <B>checkDirectory</B>(java.io.File&nbsp;dir)                       throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="createNewJob(java.io.File, java.lang.String, java.lang.String, java.lang.String, int)"><!-- --></A><H3>createNewJob</H3><PRE>protected <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>createNewJob</B>(java.io.File&nbsp;orderFile,                                java.lang.String&nbsp;name,                                java.lang.String&nbsp;description,                                java.lang.String&nbsp;seeds,                                int&nbsp;priority)                         throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="newProfile(org.archive.crawler.admin.CrawlJob, java.lang.String, java.lang.String, java.lang.String)"><!-- --></A><H3>newProfile</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>newProfile</B>(<A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A>&nbsp;baseOn,                           java.lang.String&nbsp;name,                           java.lang.String&nbsp;description,                           java.lang.String&nbsp;seeds)                    throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A>,                           java.io.IOException</PRE><DL><DD>Creates a new profile. The new profile will be returned and also registered as the handler's 'new job'. The new profile will be based on the settings provided but created in a new location on disk.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>baseOn</CODE> - A CrawlJob (with a valid settingshandler) to use as the            template for the new profile.<DD><CODE>name</CODE> - The name of the new profile.<DD><CODE>description</CODE> - Description of the new profile<DD><CODE>seeds</CODE> - The contents of the new profiles' seed file<DT><B>Returns:</B><DD>The new profile.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE><DD><CODE>java.io.IOException</CODE></DL></DD></DL><HR><A NAME="createSettingsHandler(java.io.File, java.lang.String, java.lang.String, java.lang.String, java.io.File, org.archive.crawler.admin.CrawlJobErrorHandler, java.lang.String, java.lang.String)"><!-- --></A><H3>createSettingsHandler</H3><PRE>protected <A HREF="../../../../org/archive/crawler/settings/XMLSettingsHandler.html" title="class in org.archive.crawler.settings">XMLSettingsHandler</A> <B>createSettingsHandler</B>(java.io.File&nbsp;orderFile,                                                   java.lang.String&nbsp;name,                                                   java.lang.String&nbsp;description,                                                   java.lang.String&nbsp;seeds,                                                   java.io.File&nbsp;newSettingsDir,                                                   <A HREF="../../../../org/archive/crawler/admin/CrawlJobErrorHandler.html" title="class in org.archive.crawler.admin">CrawlJobErrorHandler</A>&nbsp;errorHandler,                                                   java.lang.String&nbsp;filename,                                                   java.lang.String&nbsp;seedfile)                                            throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD>Creates a new settings handler based on an existing job. Basically all the settings file for the 'based on' will be copied to the specified directory.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>orderFile</CODE> - Order file to base new order file on.  Cannot be null.<DD><CODE>name</CODE> - Name for the new settings<DD><CODE>description</CODE> - Description of the new settings.<DD><CODE>seeds</CODE> - The contents of the new settings' seed file.<DD><CODE>newSettingsDir</CODE> - <DD><CODE>errorHandler</CODE> - <DD><CODE>filename</CODE> - Name of new order file.<DD><CODE>seedfile</CODE> - Name of new seeds file.<DT><B>Returns:</B><DD>The new settings handler.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE> - If there are problems with reading the 'base on'             configuration, with writing the new configuration or it's             seed file.</DL></DD></DL><HR><A NAME="updateRecoveryPaths(java.io.File, org.archive.crawler.settings.SettingsHandler, java.lang.String)"><!-- --></A><H3>updateRecoveryPaths</H3><PRE>protected void <B>updateRecoveryPaths</B>(java.io.File&nbsp;recover,                                   <A HREF="../../../../org/archive/crawler/settings/SettingsHandler.html" title="class in org.archive.crawler.settings">SettingsHandler</A>&nbsp;sh,                                   java.lang.String&nbsp;jobName)                            throws <A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></PRE><DL><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>recover</CODE> - Source to use recovering. Can be full path to a recovery log            or full path to a checkpoint src dir.<DD><CODE>sh</CODE> - Settings Handler to update.<DD><CODE>jobName</CODE> - Name of this job.<DT><B>Throws:</B><DD><CODE><A HREF="../../../../org/archive/crawler/framework/exceptions/FatalConfigurationException.html" title="class in org.archive.crawler.framework.exceptions">FatalConfigurationException</A></CODE></DL></DD></DL><HR><A NAME="discardNewJob()"><!-- --></A><H3>discardNewJob</H3><PRE>public void <B>discardNewJob</B>()</PRE><DL><DD>Discard the handler's 'new job'. This will remove any files/directories written to disk.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="getNewJob()"><!-- --></A><H3>getNewJob</H3><PRE>public <A HREF="../../../../org/archive/crawler/admin/CrawlJob.html" title="class in org.archive.crawler.admin">CrawlJob</A> <B>getNewJob</B>()</PRE><DL><DD>Get the handler's 'new job'<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>the handler's 'new job'</DL></DD></DL><HR><A NAME="isRunning()"><!-- --></A><H3>isRunning</H3><PRE>public boolean <B>isRunning</B>()</PRE><DL><DD>Is the crawler accepting crawl jobs to run?<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if the next availible CrawlJob will be crawled. False otherwise.</DL></DD></DL><HR><A NAME="isCrawling()"><!-- --></A><H3>isCrawling</H3><PRE>public boolean <B>isCrawling</B>()</PRE><DL><DD>Is a crawl job being crawled?<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>True if a job is actually being crawled (even if it is paused).         False if no job is being crawled.</DL></DD></DL><HR><A NAME="startCrawler()"><!-- --></A><H3>startCrawler</H3><PRE>public void <B>startCrawler</B>()</PRE><DL><DD>Allow jobs to be crawled.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="stopCrawler()"><!-- --></A><H3>stopCrawler</H3><PRE>public void <B>stopCrawler</B>()</PRE><DL><DD>Stop future jobs from being crawled. This action will not affect the current job.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="startNextJob()"><!-- --></A><H3>startNextJob</H3><PRE>protected final void <B>startNextJob</B>()</PRE><DL><DD>Start next crawl job. If a is job already running this method will do nothing.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="startNextJobInternal()"><!-- --></A><H3>startNextJobInternal</H3><PRE>protected void <B>startNextJobInternal</B>()</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="kickUpdate()"><!-- --></A><H3>kickUpdate</H3><PRE>public void <B>kickUpdate</B>()</PRE><DL><DD>Forward a 'kick' update to current job if any.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="loadOptions(java.lang.String)"><!-- --></A><H3>loadOptions</H3><PRE>public static java.util.ArrayList <B>loadOptions</B>(java.lang.String&nbsp;file)                                       throws java.io.IOException</PRE><DL><DD>Loads options from a file. Typically these are a list of available modules that can be plugged into some part of the configuration. For examples Processors, Frontiers, Filters etc. Leading and trailing spaces are trimmed from each line.  <p>Options are loaded from the CLASSPATH.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>file</CODE> - the name of the option file (without path!)<DT><B>Returns:</B><DD>The option file with each option line as a seperate entry in the         ArrayList.<DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE> - when there is trouble reading the file.</DL></DD></DL><HR><A NAME="getInitialMarker(java.lang.String, boolean)"><!-- --></A><H3>getInitialMarker</H3><PRE>public <A HREF="../../../../org/archive/crawler/framework/FrontierMarker.html" title="interface in org.archive.crawler.framework">FrontierMarker</A> <B>getInitialMarker</B>(java.lang.String&nbsp;regexpr,                                       boolean&nbsp;inCacheOnly)</PRE><DL><DD>Returns a URIFrontierMarker for the current, paused, job. If there is no current job or it is not paused null will be returned.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>regexpr</CODE> - A regular expression that each URI must match in order to be            considered 'within' the marker.<DD><CODE>inCacheOnly</CODE> - Limit marker scope to 'cached' URIs.<DT><B>Returns:</B><DD>a URIFrontierMarker for the current job.<DT><B>See Also:</B><DD><A HREF="../../

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -