📄 1_4_0.html
字号:
type of the <code class="literal">path</code> changed from <code class="literal">string</code>to <code class="literal">stringList</code>):<pre class="programlisting">+++ order.xml 2005-02-01 13:12:34.000000000 -0800@@ -162,7 +162,9 @@ <string name="prefix">BT</string> <string name="suffix"></string> <integer name="max-size-bytes">100000000</integer>- <string name="path">arcs</string>+ <stringList name="path">+ <string>arcs</string>+ </stringList> <integer name="pool-max-active">5</integer> <integer name="pool-max-wait">300000</integer> </newObject></pre> </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="cme_frontier"></a>5.1.4. <a href="https://sourceforge.net/tracker/?func=detail&atid=539099&aid=1119644&group_id=73833" target="_top">[ 1119644 ] frontier ConcurrentModificationException</a></h4></div></div></div><p>Sometimes you'll get a ConcurrentModificationException exception when you go to view or refresh the Frontier's report page. Workaround is to retry. The page should eventually come up. </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="arcfile_suffix"></a>5.1.5. New ARC file suffix</h4></div></div></div><p>Pre-release 1.2.0, currently open ARC files that are being written to by the crawler were differentiated by an '.open' suffix. When the crawler finished writing, the suffix was removed. A new suffix has been introduced -- '.invalid' -- which the crawler will use to mark ARC files it thinks suspect -- usually because there was an IOException thrown during the writing of an ARC Record. Such ARCs need to be checked for validity. Run <code class="literal">% gzip -t</code> and <code class="literal">% ARCReader --strict</code> against all files with an '.invalid' suffix -- and any unclosed '.open' files present after a crawl has ended -- to check for corruption. </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="1149470"></a>5.1.6. DNS lookups fail (-6 in crawl.log)</h4></div></div></div><p> <a href="https://sourceforge.net/tracker/index.php?func=detail&aid=1149470&group_id=73833&atid=539099" target="_top">[1149470] all DNS attempts fail -6</a> discusses badly-formatted DNS records returned on windows platform that Heritrix fails to parse and it includes a pointer to a mailing list discussion of failed lookups on non-english windows. The issue includes description of a workaround. </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="1178102"></a>5.1.7. FatalConfigurationException creating new job based on old</h4></div></div></div><p>Older SUN JVMS -- pre-beta3 versions of the SUN JVM 1.5.0 for instance -- had an issue using nio copying files. Try upgrading your JVM. See <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1178102&group_id=73833&atid=539099" target="_top">[1178102] FCE on creation of new job based on job w/ overrides</a> for more on this. </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="oome142"></a>5.1.8. OutOfMemoryErrors (OOMEs)</h4></div></div></div><p>Unusual pages -- pages of unorthodox structure, pages that contain thousands upon thousands of links -- will on occasion produce OOMEs.</p><p>There have been improvements regards memory usage running multiple jobs in series, <a href="1_2_0.html#oome_pending_jobs" title="6.1.3. Running more than one job in series throws OOME">Section 6.1.3, “Running more than one job in series throws OOME”</a>, but starting up a new job after a long-running job can prompt OOMEs. Workaround for now is to restart Heritrix between the running of big jobs.</p></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="1_4_0_changes"></a>5.2. Changes</h3></div></div></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="bdbfrontier"></a>5.2.1. Berkeley DB Based Frontier</h4></div></div></div><p>The BdbFrontier -- a frontier that keeps its queues of
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -