1_4_0.html
来自「网络爬虫开源代码」· HTML 代码 · 共 390 行 · 第 1/4 页
HTML
390 行
OutOfMemoryErrors (OOMEs) are now possible; longer if more heap is assigned. Where 10k hosts was an upper bound on narrow domain- or host-scoped crawls, now, using the default heap size, it should now be possible to do 500k+ hosts.</p><p>Long-running crawls that encounter hundreds-of-thousands of hosts over the life of a crawl, or crawls started with hundreds-of-thousands of seeds, continue to throw OutOfMemoryErrors because there are still a few RAM-based datastructures that grow without bound left in Heritrix; the lists of queue names and internal structures inside 3rd party libraries used by Heritrix. These last few items we intend to address in a later release.</p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="ibmjvmredux"></a>8.2.6. IBM JVM Redux</h4></div></div></div><p>Testing with <code class="literal">IBM JVM 1.4.2 (Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia32142sr1a-20050209 (JIT enabled: jitc)))</code> using Heritrix 1.4.0, the SSL problem described in <a href="1_2_0.html#ibmjvm" title="9.1.1. IBM JVM">Section 9.1.1, “IBM JVM”</a> is no longer present (All of our crawling of the last couple of months has been done on the latest SUN 1.5.0 JVMs).</p></div><p> <div class="table"><a name="N11481"></a><p class="title"><b>Table 6. Changes</b></p><table summary="Changes" border="1"><colgroup><col><col><col><col><col><col></colgroup><thead><tr><th>ID</th><th>Type</th><th>Summary</th><th>Open Date</th><th>By</th><th>Filer</th></tr></thead><tbody><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=958061" target="_top">958061</a> </td><td>Add</td><td>[Post 1.0] New scoping model</td><td>2004-05-21</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1165205" target="_top">1165205</a> </td><td>Add</td><td>Add links to issue tracking/RFE to Heritrix' webapp</td><td>2005-03-17</td><td>nobody</td><td>ck-heritrix</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1119580" target="_top">1119580</a> </td><td>Add</td><td>Integrate revisiting frontier</td><td>2005-02-09</td><td>kristinn_sig</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1093609" target="_top">1093609</a> </td><td>Add</td><td>One-click recover</td><td>2004-12-30</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1078008" target="_top">1078008</a> </td><td>Add</td><td>Enable crawl-end at target compressed-ARC-data size</td><td>2004-12-02</td><td>stack-sf</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=934577" target="_top">934577</a> </td><td>Add</td><td>Need 'delete profile' option (like delete job)</td><td>2004-04-13</td><td>kristinn_sig</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1058302" target="_top">1058302</a> </td><td>Add</td><td>A 'dat' maker; A script to dump links</td><td>2004-11-01</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1114133" target="_top">1114133</a> </td><td>Add</td><td>Add referer header</td><td>2005-02-01</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1143892" target="_top">1143892</a> </td><td>Add</td><td>[contribution] SingleConnectionManager, range and close hdrs</td><td>2005-02-18</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1055766" target="_top">1055766</a> </td><td>Add</td><td>Dates in logs are unreadable.</td><td>2004-10-27</td><td>gojomo</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1111656" target="_top">1111656</a> </td><td>Add</td><td>Extractors should not extract if links already extracted</td><td>2005-01-28</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1047437" target="_top">1047437</a> </td><td>Add</td><td>Pause and alert on low-disk conditions</td><td>2004-10-14</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1104916" target="_top">1104916</a> </td><td>Add</td><td>Add info to candidateURI before scheduling</td><td>2005-01-18</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=953994" target="_top">953994</a> </td><td>Add</td><td>Change arc download dir mid-crawl</td><td>2004-05-14</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=894467" target="_top">894467</a> </td><td>Add</td><td>Stopping, pausing, checkpointing from command line/scripts</td><td>2004-02-10</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1096737" target="_top">1096737</a> </td><td>Add</td><td>[jmx] client pword and always start jmx server</td><td>2005-01-05</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1090663" target="_top">1090663</a> </td><td>Add</td><td>Move BDB to core of Heritrix</td><td>2004-12-23</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1092769" target="_top">1092769</a> </td><td>Add</td><td>[ARCReader] If garbage on end of record, report and skip it</td><td>2004-12-29</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1078016" target="_top">1078016</a> </td><td>Add</td><td>'Economic' frontier which defers low-value URIs</td><td>2004-12-02</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1002704" target="_top">1002704</a> </td><td>Add</td><td>Evaluate Berkeley DB Frontier</td><td>2004-08-03</td><td>gojomo</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1083315" target="_top">1083315</a> </td><td>Add</td><td>Update commons-pool, commons-collections, itext jars</td><td>2004-12-10</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=988276" target="_top">988276</a> </td><td>Add</td><td>ARC writer pool config. to write multiple disks</td><td>2004-07-09</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1078714" target="_top">1078714</a> </td><td>Add</td><td>Command-line insertion of URLs</td><td>2004-12-03</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1069105" target="_top">1069105</a> </td><td>Add</td><td>Make auto seed add on redirect optional (if happens at all)</td><td>2004-11-18</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1002707" target="_top">1002707</a> </td><td>Add</td><td>Fix heritrix shutdown (From Luca)</td><td>2004-08-03</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1065736" target="_top">1065736</a> </td><td>Add</td><td>Recovery should optionally retain failures ('Ff')</td><td>2004-11-13</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1057064" target="_top">1057064</a> </td><td>Add</td><td>HTTPRecorder's default buffer sizes should be configurable</td><td>2004-10-29</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539102&aid=1045817" target="_top">1045817</a> </td><td>Add</td><td>Untangle heritrix from jetty</td><td>2004-10-12</td><td>stack-sf</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1036720" target="_top">1036720</a> </td><td>Fix</td><td>NPE in ArcWriterProcessor.writeDns()</td><td>2004-09-28</td><td>stack-sf</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1178927" target="_top">1178927</a> </td><td>Fix</td><td>'submodules' map-edits not working for overrides/refinements</td><td>2005-04-07</td><td>gojomo</td><td>gojomo</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1179530" target="_top">1179530</a> </td><td>Fix</td><td>NPE in FastBufferedOutputStream.close</td><td>2005-04-08</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1184102" target="_top">1184102</a> </td><td>Fix</td><td>Frontier queues total still goes minus</td><td>2005-04-15</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1179527" target="_top">1179527</a> </td><td>Fix</td><td>ARCWriter AsynchronousCloseException</td><td>2005-04-08</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1096855" target="_top">1096855</a> </td><td>Fix</td><td>CME adding filters while crawling</td><td>2005-01-05</td><td>nobody</td><td>stack-sf</td></tr><tr><td> <a href="http://sourceforge.net/tracker/index.php?func=detail&group_id=73833&atid=539099&aid=1080378" target="_top">1080378</a> </td><td>Fix</td><td>job config: settings 'remove'-component-then-submit lost job</td><td>2004-12-06</td><td>nobody</td><td>gojomo</td></tr><tr><td>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?