⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 frontier.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<a name="312" href="#312">312</a> <em>     */</em><a name="313" href="#313">313</a>     <strong>public</strong> <a href="../../../../org/archive/crawler/framework/FrontierMarker.html">FrontierMarker</a> getInitialMarker(String regexpr,<a name="314" href="#314">314</a>                                               <strong>boolean</strong> inCacheOnly);<a name="315" href="#315">315</a> <a name="316" href="#316">316</a>     <em>/**<em>*</em></em><a name="317" href="#317">317</a> <em>     * Returns a list of all uncrawled URIs starting from a specified marker</em><a name="318" href="#318">318</a> <em>     * until &lt;code>numberOfMatches&lt;/code> is reached.</em><a name="319" href="#319">319</a> <em>     *</em><a name="320" href="#320">320</a> <em>     * &lt;p>Any encountered URI that has not been successfully crawled, terminally</em><a name="321" href="#321">321</a> <em>     * failed, disregarded or is currently being processed is included. As</em><a name="322" href="#322">322</a> <em>     * there may be duplicates in the frontier, there may also be duplicates</em><a name="323" href="#323">323</a> <em>     * in the report. Thus this includes both discovered and pending URIs.</em><a name="324" href="#324">324</a> <em>     *</em><a name="325" href="#325">325</a> <em>     * &lt;p>The list is a set of strings containing the URI strings. If verbose is</em><a name="326" href="#326">326</a> <em>     * true the string will include some additional information (path to URI</em><a name="327" href="#327">327</a> <em>     * and parent).</em><a name="328" href="#328">328</a> <em>     *</em><a name="329" href="#329">329</a> <em>     * &lt;p>The &lt;code>URIFrontierMarker&lt;/code> will be advanced to the position at</em><a name="330" href="#330">330</a> <em>     * which it's maximum number of matches found is reached. Reusing it for</em><a name="331" href="#331">331</a> <em>     * subsequent calls will thus effectively get the 'next' batch. Making</em><a name="332" href="#332">332</a> <em>     * any changes to the frontier can invalidate the marker.</em><a name="333" href="#333">333</a> <em>     *</em><a name="334" href="#334">334</a> <em>     * &lt;p>While the order returned is consistent, it does &lt;i>not&lt;/i> have any</em><a name="335" href="#335">335</a> <em>     * explicit relation to the likely order in which they may be processed.</em><a name="336" href="#336">336</a> <em>     *</em><a name="337" href="#337">337</a> <em>     * &lt;p>&lt;b>Warning:&lt;/b> It is unsafe to make changes to the frontier while</em><a name="338" href="#338">338</a> <em>     * this method is executing. The crawler should be in a paused state before</em><a name="339" href="#339">339</a> <em>     * invoking it.</em><a name="340" href="#340">340</a> <em>     *</em><a name="341" href="#341">341</a> <em>     * @param marker</em><a name="342" href="#342">342</a> <em>     *            A marker specifing from what position in the Frontier the</em><a name="343" href="#343">343</a> <em>     *            list should begin.</em><a name="344" href="#344">344</a> <em>     * @param numberOfMatches</em><a name="345" href="#345">345</a> <em>     *            how many URIs to add at most to the list before returning it</em><a name="346" href="#346">346</a> <em>     * @param verbose</em><a name="347" href="#347">347</a> <em>     *            if set to true the strings returned will contain additional</em><a name="348" href="#348">348</a> <em>     *            information about each URI beyond their names.</em><a name="349" href="#349">349</a> <em>     * @return a list of all pending URIs falling within the specification</em><a name="350" href="#350">350</a> <em>     *            of the marker</em><a name="351" href="#351">351</a> <em>     * @throws InvalidFrontierMarkerException when the</em><a name="352" href="#352">352</a> <em>     *            &lt;code>URIFronterMarker&lt;/code> does not match the internal</em><a name="353" href="#353">353</a> <em>     *            state of the frontier. Tolerance for this can vary</em><a name="354" href="#354">354</a> <em>     *            considerably from one URIFrontier implementation to the next.</em><a name="355" href="#355">355</a> <em>     * @see FrontierMarker</em><a name="356" href="#356">356</a> <em>     * @see #getInitialMarker(String, boolean)</em><a name="357" href="#357">357</a> <em>     */</em><a name="358" href="#358">358</a>     <strong>public</strong> ArrayList getURIsList(<a href="../../../../org/archive/crawler/framework/FrontierMarker.html">FrontierMarker</a> marker,<a name="359" href="#359">359</a>                                  <strong>int</strong> numberOfMatches,<a name="360" href="#360">360</a>                                  <strong>boolean</strong> verbose)<a name="361" href="#361">361</a>                              throws InvalidFrontierMarkerException;<a name="362" href="#362">362</a> <a name="363" href="#363">363</a>     <em>/**<em>*</em></em><a name="364" href="#364">364</a> <em>     * Delete any URI that matches the given regular expression from the list</em><a name="365" href="#365">365</a> <em>     * of discovered and pending URIs. This does not prevent them from being</em><a name="366" href="#366">366</a> <em>     * rediscovered.</em><a name="367" href="#367">367</a> <em>     *</em><a name="368" href="#368">368</a> <em>     * &lt;p>Any encountered URI that has not been successfully crawled, terminally</em><a name="369" href="#369">369</a> <em>     * failed, disregarded or is currently being processed is considered to be</em><a name="370" href="#370">370</a> <em>     * a pending URI.</em><a name="371" href="#371">371</a> <em>     *</em><a name="372" href="#372">372</a> <em>     * &lt;p>&lt;b>Warning:&lt;/b> It is unsafe to make changes to the frontier while</em><a name="373" href="#373">373</a> <em>     * this method is executing. The crawler should be in a paused state before</em><a name="374" href="#374">374</a> <em>     * invoking it.</em><a name="375" href="#375">375</a> <em>     *</em><a name="376" href="#376">376</a> <em>     * @param match A regular expression, any URIs that matches it will be</em><a name="377" href="#377">377</a> <em>     *              deleted.</em><a name="378" href="#378">378</a> <em>     * @return The number of URIs deleted</em><a name="379" href="#379">379</a> <em>     */</em><a name="380" href="#380">380</a>     <strong>public</strong> <strong>long</strong> deleteURIs(String match);<a name="381" href="#381">381</a> <a name="382" href="#382">382</a>     <em>/**<em>*</em></em><a name="383" href="#383">383</a> <em>     * Notify Frontier that a CrawlURI has been deleted outside of the</em><a name="384" href="#384">384</a> <em>     * normal next()/finished() lifecycle. </em><a name="385" href="#385">385</a> <em>     * </em><a name="386" href="#386">386</a> <em>     * @param curi Deleted CrawlURI.</em><a name="387" href="#387">387</a> <em>     */</em><a name="388" href="#388">388</a>     <strong>public</strong> <strong>void</strong> deleted(<a href="../../../../org/archive/crawler/datamodel/CrawlURI.html">CrawlURI</a> curi);<a name="389" href="#389">389</a> <a name="390" href="#390">390</a>     <em>/**<em>*</em></em><a name="391" href="#391">391</a> <em>     * Notify Frontier that it should consider the given UURI as if</em><a name="392" href="#392">392</a> <em>     * already scheduled.</em><a name="393" href="#393">393</a> <em>     * </em><a name="394" href="#394">394</a> <em>     * @param u UURI instance to add to the Already Included set.</em><a name="395" href="#395">395</a> <em>     */</em><a name="396" href="#396">396</a>     <strong>public</strong> <strong>void</strong> considerIncluded(<a href="../../../../org/archive/net/UURI.html">UURI</a> u);<a name="397" href="#397">397</a> <a name="398" href="#398">398</a>     <em>/**<em>*</em></em><a name="399" href="#399">399</a> <em>     * Notify Frontier that it should consider updating configuration</em><a name="400" href="#400">400</a> <em>     * info that may have changed in external files.</em><a name="401" href="#401">401</a> <em>     */</em><a name="402" href="#402">402</a>     <strong>public</strong> <strong>void</strong> kickUpdate();<a name="403" href="#403">403</a> <a name="404" href="#404">404</a>     <em>/**<em>*</em></em><a name="405" href="#405">405</a> <em>     * Notify Frontier that it should not release any URIs, instead</em><a name="406" href="#406">406</a> <em>     * holding all threads, until instructed otherwise. </em><a name="407" href="#407">407</a> <em>     */</em><a name="408" href="#408">408</a>     <strong>public</strong> <strong>void</strong> pause();<a name="409" href="#409">409</a> <a name="410" href="#410">410</a>     <em>/**<em>*</em></em><a name="411" href="#411">411</a> <em>     * Resumes the release of URIs to crawl, allowing worker</em><a name="412" href="#412">412</a> <em>     * ToeThreads to proceed. </em><a name="413" href="#413">413</a> <em>     */</em><a name="414" href="#414">414</a>     <strong>public</strong> <strong>void</strong> unpause();<a name="415" href="#415">415</a> <a name="416" href="#416">416</a>     <em>/**<em>*</em></em><a name="417" href="#417">417</a> <em>     * Notify Frontier that it should end the crawl, giving</em><a name="418" href="#418">418</a> <em>     * any worker ToeThread that askss for a next() an </em><a name="419" href="#419">419</a> <em>     * EndedException. </em><a name="420" href="#420">420</a> <em>     */</em><a name="421" href="#421">421</a>     <strong>public</strong> <strong>void</strong> terminate();<a name="422" href="#422">422</a>     <a name="423" href="#423">423</a>     <em>/**<em>*</em></em><a name="424" href="#424">424</a> <em>     * @return Return the instance of {@link FrontierJournal} that</em><a name="425" href="#425">425</a> <em>     * this Frontier is using.  May be null if no journaling.</em><a name="426" href="#426">426</a> <em>     */</em><a name="427" href="#427">427</a>     <strong>public</strong> <a href="../../../../org/archive/crawler/frontier/FrontierJournal.html">FrontierJournal</a> getFrontierJournal();<a name="428" href="#428">428</a>     <a name="429" href="#429">429</a>     <em>/**<em>*</em></em><a name="430" href="#430">430</a> <em>     * @param cauri CandidateURI for which we're to calculate and</em><a name="431" href="#431">431</a> <em>     * set class key.</em><a name="432" href="#432">432</a> <em>     * @return Classkey for &lt;code>cauri&lt;/code>.</em><a name="433" href="#433">433</a> <em>     */</em><a name="434" href="#434">434</a>     <strong>public</strong> String getClassKey(<a href="../../../../org/archive/crawler/datamodel/CandidateURI.html">CandidateURI</a> cauri);<a name="435" href="#435">435</a> <a name="436" href="#436">436</a>     <em>/**<em>*</em></em><a name="437" href="#437">437</a> <em>     * Request that the Frontier load (or reload) crawl seeds, </em><a name="438" href="#438">438</a> <em>     * typically by contacting the Scope. </em><a name="439" href="#439">439</a> <em>     */</em><a name="440" href="#440">440</a>     <strong>public</strong> <strong>void</strong> loadSeeds();<a name="441" href="#441">441</a> <a name="442" href="#442">442</a>     <em>/**<em>*</em></em><a name="443" href="#443">443</a> <em>     * Request that Frontier allow crawling to begin. Usually</em><a name="444" href="#444">444</a> <em>     * just unpauses Frontier, if paused. </em><a name="445" href="#445">445</a> <em>     */</em><a name="446" href="#446">446</a>     <strong>public</strong> <strong>void</strong> start();<a name="447" href="#447">447</a> <a name="448" href="#448">448</a>     <em>/**<em>*</em></em><a name="449" href="#449">449</a> <em>     * Get the 'frontier group' (usually queue) for the given </em><a name="450" href="#450">450</a> <em>     * CrawlURI. </em><a name="451" href="#451">451</a> <em>     * @param curi CrawlURI to find matching group</em><a name="452" href="#452">452</a> <em>     * @return FrontierGroup for the CrawlURI</em><a name="453" href="#453">453</a> <em>     */</em><a name="454" href="#454">454</a>     <strong>public</strong> FrontierGroup getGroup(<a href="../../../../org/archive/crawler/datamodel/CrawlURI.html">CrawlURI</a> curi);<a name="455" href="#455">455</a>     <a name="456" href="#456">456</a>     <em>/**<em>*</em></em><a name="457" href="#457">457</a> <em>     * Generic interface representing the internal groupings </em><a name="458" href="#458">458</a> <em>     * of a Frontier's URIs -- usually queues. Currently only </em><a name="459" href="#459">459</a> <em>     * offers the HasCrawlSubstats interface. </em><a name="460" href="#460">460</a> <em>     */</em><a name="461" href="#461">461</a>     <strong>public</strong> <strong>interface</strong> FrontierGroup <strong>extends</strong> CrawlSubstats.HasCrawlSubstats {<a name="462" href="#462">462</a> <a name="463" href="#463">463</a>     }<a name="464" href="#464">464</a> }</pre><hr/><div id="footer">This page was automatically generated by <a href="http://maven.apache.org/">Maven</a></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -