📄 writefilter.html
字号:
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>9. Writing a Filter</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix developer documentation"><link rel="up" href="index.html" title="Heritrix developer documentation"><link rel="prev" href="ar01s08.html" title="8. Writing a Frontier"><link rel="next" href="ar01s10.html" title="10. Writing a Scope"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">9. Writing a Filter</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="ar01s08.html">Prev</a> </td><th align="center" width="60%"> </th><td align="right" width="20%"> <a accesskey="n" href="ar01s10.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="writefilter"></a>9. Writing a Filter</h2></div></div></div><p>Filters<sup>[<a href="#ftn.footnote_scope_problems">3</a>]</sup> are modules that take a CrawlURI and determine if it matches the criteria of the filter. If so it returns true, otherwise it returns false.</p><p>A filter could be used in several places in the crawler. Most notably is the use of filters in the Scope. Aside that, filters are also used in processors. Filters applied to processors always filter URIs out. That is to say that any URI matching a filter on a processor will effectively skip over that processor. This can be useful to disable (for instance) link extraction on documents coming from a specific section of a given website.</p><p>All Filters should subclass the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/framework/Filter.html" target="_top">Filter</a> class. Creating a filter is just a matter of implementing the <a href="http://crawler.archive.org/apidocs/org/archive/crawler/framework/Filter.html#innerAccepts(java.lang.Object)" target="_top">innerAccepts(Object)</a> method. Because of the planned overhaul of the scopes and filters, we will not provide a extensive example of how to create a filter at this point. It should be pretty easy though to follow the directions in the javadoc. For your filter to show in the application interface, you'll need to edit <code class="filename">src/conf/modules/Filter.options</code></p></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%"><a accesskey="p" href="ar01s08.html">Prev</a> </td><td align="center" width="20%"> </td><td align="right" width="40%"> <a accesskey="n" href="ar01s10.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">8. Writing a Frontier </td><td align="center" width="20%"><a accesskey="h" href="index.html">Home</a></td><td valign="top" align="right" width="40%"> 10. Writing a Scope</td></tr></table></div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -