1_6_0.html

来自「网络爬虫开源代码」· HTML 代码 · 共 377 行 · 第 1/4 页

HTML
377
字号
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>7.&nbsp;Release 1.6.0 - 12/01/2005</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix Release Notes"><link rel="up" href="index.html" title="Heritrix Release Notes"><link rel="prev" href="1_8_0.html" title="6.&nbsp;Release 1.8.0 - 05/05/2006"><link rel="next" href="1_4_0.html" title="8.&nbsp;Release 1.4.0 - 04/28/2005"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">7.&nbsp;Release 1.6.0 - 12/01/2005</th></tr><tr><td align="left" width="20%"><a accesskey="p" href="1_8_0.html">Prev</a>&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="1_4_0.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="1_6_0"></a>7.&nbsp;Release 1.6.0 - 12/01/2005</h2></div></div></div><div class="abstract"><p class="title"><b>Abstract</b></p><p>Release 1.6.0 offers improved remote control and monitoring via      JMX, a crawl-checkpointing facility, and experimental support for bloom      filter already-included testing, partitioning a crawl across multiple      independent crawlers, and per-host/domain/queue-grouping collection      quotas. Performance and stability in large crawls is also improved.      Among tracked issues, it includes 39 requested enhancements and fixes 96      reported bugs.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="1_6_0_limitations"></a>7.1.&nbsp;Known Limitations/Issues</h3></div></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="bdb_nfs"></a>7.1.1.&nbsp;java.io.IOException: No locks available</h4></div></div></div><p>BDB will complain 'No locks available' when crawler is being        built/run on an NFS mount. Workaround is not run on an NFS-mounted        volume.</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="bdb_64bit"></a>7.1.2.&nbsp;OutOfMemoryError in 64bit JVMs</h4></div></div></div><p>BDB 2.0.90 can overgrow its intended cache size due to a        misestimation of instance sizes under 64bit Java VMs, which may be a        major contributor to early Heritrix OutOfMemoryError problems on 64bit        systems. A workaround is to cut the assigned percentage by 1/3 to 1/2.        For example, change the 'bdb-cache-percent' setting to '40' or '30'        (instead of the default 60% when no value is set here).</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="1_6_0_changes"></a>7.2.&nbsp;Changes</h3></div></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="postselector"></a>7.2.1.&nbsp;Postselector</h4></div></div></div><p>The Postselector has been refactored out of existence. Its        responsibilities have been parcelled out to two new Processors:        LinksScoper and FrontierScheduler. LinksScoper is responsible for        scope checking of extracted links. FrontierScheduler does the        scheduling of URIs with the Frontier.</p><p>This change was done to allow introduction of processors between        scope checking and Frontier scheduling steps.</p><p>Because of this change, order files from 1.4.0 Heritrix or        before will need to be updated -- Postselector references replaced by        LinkScoper and FrontierScheduler references -- before they can be used        with Heritrix 1.6.0 (Referencing a non-existent Postselector in an        order file usually shows as -50 fetch status in crawl.log).</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="wui_console"></a>7.2.2.&nbsp;Web Console</h4></div></div></div><p>The layout and terminology of the web Console and header have        been changed, and new readouts added. Most notably, "Crawler Status"        and "Job Status" information have been moved to separate boxes, with        the controls for each at the top of their respective boxes, near the        current status information. Also, the "Crawling"/"Stopped" distinction        in the crawler -- whether available pending jobs would be started as        possible -- has been renamed "Crawling Jobs"/"Holding Jobs" for        clarity.</p></div><p>        <div class="table"><a name="N10A5B"></a><p class="title"><b>Table&nbsp;5.&nbsp;All Tracked Changes</b></p><table summary="All Tracked Changes" border="1"><colgroup><col><col><col><col><col><col></colgroup><thead><tr><th>ID</th><th>Type</th><th>Summary</th><th>Open Date</th><th>By</th><th>Filer</th></tr></thead><tbody><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=806831&group_id=73833&atid=539102" target="_top">806831</a>                </td><td>Add</td><td>XMLExtractor (XML/RSS)</td><td>2003-09-15</td><td>gojomo</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=983051&group_id=73833&atid=539102" target="_top">983051</a>                </td><td>Add</td><td>annotate what robots.txt would have precluded</td><td>2004-06-30</td><td>karl-ia</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1069331&group_id=73833&atid=539102" target="_top">1069331</a>                </td><td>Add</td><td>hold paused crawl at 'end', allowing all in-progress                ops</td><td>2004-11-19</td><td>karl-ia</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1081774&group_id=73833&atid=539102" target="_top">1081774</a>                </td><td>Add</td><td>need way to delete overrides</td><td>2004-12-08</td><td>karl-ia</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1104696&group_id=73833&atid=539102" target="_top">1104696</a>                </td><td>Add</td><td>Confusion: CrawlController and CrawlJob States</td><td>2005-01-18</td><td>nobody</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1108006&group_id=73833&atid=539102" target="_top">1108006</a>                </td><td>Add</td><td>alerts should show current processor</td><td>2005-01-23</td><td>gojomo</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1108520&group_id=73833&atid=539102" target="_top">1108520</a>                </td><td>Add</td><td>SURT needs facelift</td><td>2005-01-24</td><td>gojomo</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1119616&group_id=73833&atid=539102" target="_top">1119616</a>                </td><td>Add</td><td>Decompose Postselector to Scoping and Scheduling                components</td><td>2005-02-09</td><td>stack-sf</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1122692&group_id=73833&atid=539102" target="_top">1122692</a>                </td><td>Add</td><td>[contribution] New fixed number of queues                policy</td><td>2005-02-14</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1173597&group_id=73833&atid=539102" target="_top">1173597</a>                </td><td>Add</td><td>jmx api additions</td><td>2005-03-30</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1176934&group_id=73833&atid=539102" target="_top">1176934</a>                </td><td>Add</td><td>[contrib] Generalize/Refactor BDB Frontier</td><td>2005-04-05</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1180630&group_id=73833&atid=539102" target="_top">1180630</a>                </td><td>Add</td><td>[contrib] UI stacktrace dump (Depends on                JDK150)</td><td>2005-04-11</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1183376&group_id=73833&atid=539102" target="_top">1183376</a>                </td><td>Add</td><td>Post 1.4 Deprecate filter scope and remove post                1.6.</td><td>2005-04-14</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1190974&group_id=73833&atid=539102" target="_top">1190974</a>                </td><td>Add</td><td>Quick resume without real recovery /                Checkpointing</td><td>2005-04-27</td><td>karl-ia</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1196602&group_id=73833&atid=539102" target="_top">1196602</a>                </td><td>Add</td><td>[contrib] Show estimated remaining time</td><td>2005-05-06</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1200205&group_id=73833&atid=539102" target="_top">1200205</a>                </td><td>Add</td><td>add 'exhausted' queue count to frontier report</td><td>2005-05-11</td><td>gojomo</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1204644&group_id=73833&atid=539102" target="_top">1204644</a>                </td><td>Add</td><td>add 'memory used' to progress-statistics.log</td><td>2005-05-18</td><td>kristinn_sig</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1205583&group_id=73833&atid=539102" target="_top">1205583</a>                </td><td>Add</td><td>add CandidateURI parameter to                UriUniqFilter.forget()</td><td>2005-05-20</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1207866&group_id=73833&atid=539102" target="_top">1207866</a>                </td><td>Add</td><td>[contrib] ThreadLocal-version of                TextUtil.getMatcher</td><td>2005-05-24</td><td>gojomo</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1207898&group_id=73833&atid=539102" target="_top">1207898</a>                </td><td>Add</td><td>[contrib] WorkQueueFrontier: Store allQueues in RAM if                poss.</td><td>2005-05-24</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208293&group_id=73833&atid=539102" target="_top">1208293</a>                </td><td>Add</td><td>List based URIRegExprFilter</td><td>2005-05-25</td><td>kristinn_sig</td><td>kristinn_sig</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208510&group_id=73833&atid=539102" target="_top">1208510</a>                </td><td>Add</td><td>[rfe-contrib] Add Stacktrace dump to                ToeThread.report()</td><td>2005-05-25</td><td>stack-sf</td><td>ck-heritrix</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208747&group_id=73833&atid=539102" target="_top">1208747</a>                </td><td>Add</td><td>CrawlURI serialization bloated; should be                slimmed</td><td>2005-05-25</td><td>gojomo</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208757&group_id=73833&atid=539102" target="_top">1208757</a>                </td><td>Add</td><td>Cookies are thread traffic jam and memory hog</td><td>2005-05-25</td><td>gojomo</td><td>stack-sf</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1208770&group_id=73833&atid=539102" target="_top">1208770</a>                </td><td>Add</td><td>garbage hot spot: SerialBinding &amp;                FastOutputStream.bump()</td><td>2005-05-25</td><td>gojomo</td><td>gojomo</td></tr><tr><td>                  <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1211217&group_id=73833&atid=539102" target="_top">1211217</a>                </td><td>Add</td><td>[contrib] Add debugging aid for BDB

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?