⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 writefilter.html

📁 一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫一个开源的网页爬虫
💻 HTML
字号:
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>9.&nbsp;Writing a Filter</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix developer documentation"><link rel="up" href="index.html" title="Heritrix developer documentation"><link rel="prev" href="ar01s08.html" title="8.&nbsp;Writing a Frontier"><link rel="next" href="ar01s10.html" title="10.&nbsp;Writing a Scope"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">9.&nbsp;Writing a Filter</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ar01s08.html">Prev</a>&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="ar01s10.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="writefilter"></a>9.&nbsp;Writing a Filter</h2></div></div></div><p>Filters<sup>[<a href="#ftn.footnote_scope_problems">3</a>]</sup> are modules    that take a CrawlURI and determine if it matches the criteria of the    filter. If so it returns true, otherwise it returns false.</p><p>A filter could be used in several places in the crawler. Most    notably is the use of filters in the Scope. Aside that, filters are also    used in processors. Filters applied to processors always filter URIs out.    That is to say that any URI matching a filter on a processor will    effectively skip over that processor. This can be useful to disable (for    instance) link extraction on documents coming from a specific section of a    given website.</p><p>All Filters should subclass the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/framework/Filter.html" target="_top">Filter</a>    class. Creating a filter is just a matter of implementing the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/framework/Filter.html#innerAccepts(java.lang.Object)" target="_top">innerAccepts(Object)</a>    method. Because of the planned overhaul of the scopes and filters, we will    not provide a extensive example of how to create a filter at this point.    It should be pretty easy though to follow the directions in the    javadoc. For your filter to show in the application interface, you'll need    to edit <code class="filename">src/conf/modules/Filter.options</code></p></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ar01s08.html">Prev</a>&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="ar01s10.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">8.&nbsp;Writing a Frontier&nbsp;</td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%">&nbsp;10.&nbsp;Writing a Scope</td></tr></table></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -