1_6_0.html
来自「网络爬虫开源代码」· HTML 代码 · 共 377 行 · 第 1/4 页
HTML
377 行
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>7. Release 1.6.0 - 12/01/2005</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix Release Notes"><link rel="up" href="index.html" title="Heritrix Release Notes"><link rel="prev" href="1_8_0.html" title="6. Release 1.8.0 - 05/05/2006"><link rel="next" href="1_4_0.html" title="8. Release 1.4.0 - 04/28/2005"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">7. Release 1.6.0 - 12/01/2005</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="1_8_0.html">Prev</a> </td><th align="center" width="60%"> </th><td align="right" width="20%"> <a accesskey="n" href="1_4_0.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="1_6_0"></a>7. Release 1.6.0 - 12/01/2005</h2></div></div></div><div class="abstract"><p class="title"><b>Abstract</b></p><p>Release 1.6.0 offers improved remote control and monitoring via JMX, a crawl-checkpointing facility, and experimental support for bloom filter already-included testing, partitioning a crawl across multiple independent crawlers, and per-host/domain/queue-grouping collection quotas. Performance and stability in large crawls is also improved. Among tracked issues, it includes 39 requested enhancements and fixes 96 reported bugs.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="1_6_0_limitations"></a>7.1. Known Limitations/Issues</h3></div></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="bdb_nfs"></a>7.1.1. java.io.IOException: No locks available</h4></div></div></div><p>BDB will complain 'No locks available' when crawler is being built/run on an NFS mount. Workaround is not run on an NFS-mounted volume.</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="bdb_64bit"></a>7.1.2. OutOfMemoryError in 64bit JVMs</h4></div></div></div><p>BDB 2.0.90 can overgrow its intended cache size due to a misestimation of instance sizes under 64bit Java VMs, which may be a major contributor to early Heritrix OutOfMemoryError problems on 64bit systems. A workaround is to cut the assigned percentage by 1/3 to 1/2. For example, change the 'bdb-cache-percent' setting to '40' or '30' (instead of the default 60% when no value is set here).</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="1_6_0_changes"></a>7.2. Changes</h3></div></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="postselector"></a>7.2.1. Postselector</h4></div></div></div><p>The Postselector has been refactored out of existence. Its responsibilities have been parcelled out to two new Processors: LinksScoper and FrontierScheduler. LinksScoper is responsible for scope checking of extracted links. FrontierScheduler does the scheduling of URIs with the Frontier.</p><p>This change was done to allow introduction of processors between scope checking and Frontier scheduling steps.</p><p>Because of this change, order files from 1.4.0 Heritrix or before will need to be updated -- Postselector references replaced by LinkScoper and FrontierScheduler references -- before they can be used with Heritrix 1.6.0 (Referencing a non-existent Postselector in an order file usually shows as -50 fetch status in crawl.log).</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="wui_console"></a>7.2.2. Web Console</h4></div></div></div><p>The layout and terminology of the web Console and header have been changed, and new readouts added. Most notably, "Crawler Status" and "Job Status" information have been moved to separate boxes, with the controls for each at the top of their respective boxes, near the current status information. Also, the "Crawling"/"Stopped" distinction in the crawler -- whether available pending jobs would be started as possible -- has been renamed "Crawling Jobs"/"Holding Jobs" for clarity.</p></div><p> <div class="table"><a name="N10A5B"></a><p class="title"><b>Table 5. All Tracked Changes</b></p><table summary="All Tracked Changes" border="1"><colgroup><col><col><col><col><col><col></colgroup><thead><tr><th>ID</th><th>Type</th><th>Summary</th><th>Open Date</th><th>By</th><th>Filer</th></tr></thead><tbody><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=806831&group_id=73833&atid=539102" target="_top">806831</a> </td><td>Add</td><td>XMLExtractor (XML/RSS)</td><td>2003-09-15</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=983051&group_id=73833&atid=539102" target="_top">983051</a> </td><td>Add</td><td>annotate what robots.txt would have precluded</td><td>2004-06-30</td><td>karl-ia</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1069331&group_id=73833&atid=539102" target="_top">1069331</a> </td><td>Add</td><td>hold paused crawl at 'end', allowing all in-progress ops</td><td>2004-11-19</td><td>karl-ia</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1081774&group_id=73833&atid=539102" target="_top">1081774</a> </td><td>Add</td><td>need way to delete overrides</td><td>2004-12-08</td><td>karl-ia</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1104696&group_id=73833&atid=539102" target="_top">1104696</a> </td><td>Add</td><td>Confusion: CrawlController and CrawlJob States</td><td>2005-01-18</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1108006&group_id=73833&atid=539102" target="_top">1108006</a> </td><td>Add</td><td>alerts should show current processor</td><td>2005-01-23</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1108520&group_id=73833&atid=539102" target="_top">1108520</a> </td><td>Add</td><td>SURT needs facelift</td><td>2005-01-24</td><td>gojomo</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1119616&group_id=73833&atid=539102" target="_top">1119616</a> </td><td>Add</td><td>Decompose Postselector to Scoping and Scheduling components</td><td>2005-02-09</td><td>stack-sf</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1122692&group_id=73833&atid=539102" target="_top">1122692</a> </td><td>Add</td><td>[contribution] New fixed number of queues policy</td><td>2005-02-14</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1173597&group_id=73833&atid=539102" target="_top">1173597</a> </td><td>Add</td><td>jmx api additions</td><td>2005-03-30</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1176934&group_id=73833&atid=539102" target="_top">1176934</a> </td><td>Add</td><td>[contrib] Generalize/Refactor BDB Frontier</td><td>2005-04-05</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1180630&group_id=73833&atid=539102" target="_top">1180630</a> </td><td>Add</td><td>[contrib] UI stacktrace dump (Depends on JDK150)</td><td>2005-04-11</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1183376&group_id=73833&atid=539102" target="_top">1183376</a> </td><td>Add</td><td>Post 1.4 Deprecate filter scope and remove post 1.6.</td><td>2005-04-14</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1190974&group_id=73833&atid=539102" target="_top">1190974</a> </td><td>Add</td><td>Quick resume without real recovery / Checkpointing</td><td>2005-04-27</td><td>karl-ia</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1196602&group_id=73833&atid=539102" target="_top">1196602</a> </td><td>Add</td><td>[contrib] Show estimated remaining time</td><td>2005-05-06</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1200205&group_id=73833&atid=539102" target="_top">1200205</a> </td><td>Add</td><td>add 'exhausted' queue count to frontier report</td><td>2005-05-11</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1204644&group_id=73833&atid=539102" target="_top">1204644</a> </td><td>Add</td><td>add 'memory used' to progress-statistics.log</td><td>2005-05-18</td><td>kristinn_sig</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1205583&group_id=73833&atid=539102" target="_top">1205583</a> </td><td>Add</td><td>add CandidateURI parameter to UriUniqFilter.forget()</td><td>2005-05-20</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1207866&group_id=73833&atid=539102" target="_top">1207866</a> </td><td>Add</td><td>[contrib] ThreadLocal-version of TextUtil.getMatcher</td><td>2005-05-24</td><td>gojomo</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1207898&group_id=73833&atid=539102" target="_top">1207898</a> </td><td>Add</td><td>[contrib] WorkQueueFrontier: Store allQueues in RAM if poss.</td><td>2005-05-24</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208293&group_id=73833&atid=539102" target="_top">1208293</a> </td><td>Add</td><td>List based URIRegExprFilter</td><td>2005-05-25</td><td>kristinn_sig</td><td>kristinn_sig</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208510&group_id=73833&atid=539102" target="_top">1208510</a> </td><td>Add</td><td>[rfe-contrib] Add Stacktrace dump to ToeThread.report()</td><td>2005-05-25</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208747&group_id=73833&atid=539102" target="_top">1208747</a> </td><td>Add</td><td>CrawlURI serialization bloated; should be slimmed</td><td>2005-05-25</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208757&group_id=73833&atid=539102" target="_top">1208757</a> </td><td>Add</td><td>Cookies are thread traffic jam and memory hog</td><td>2005-05-25</td><td>gojomo</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208770&group_id=73833&atid=539102" target="_top">1208770</a> </td><td>Add</td><td>garbage hot spot: SerialBinding & FastOutputStream.bump()</td><td>2005-05-25</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1211217&group_id=73833&atid=539102" target="_top">1211217</a> </td><td>Add</td><td>[contrib] Add debugging aid for BDB
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?