📄 index.html
字号:
quotas. Performance and stability in large crawls is also improved. Among tracked issues, it includes 39 requested enhancements and fixes 96 reported bugs. See <a href="articles/releasenotes/1_6_0.html">Heritrix Release Notes</a> for detail and <a href="articles/releasenotes/1_6_0.html#1_6_0_limitations">Known Limitations</a>: e.g. Again you will need to <a href="articles/releasenotes/1_6_0.html#postselector">tweak your old order files</a> to make them work with the new release.</p></div><div class="subsection"><a name="Release_1_4_0_04_28_2005"></a><h3>Release 1.4.0 04/28/2005</h3><p>Much improved memory usage, new experimental scoping/filter model, and a new revisiting frontier. Over 90 bugs fixed. See <a href="articles/releasenotes/1_4_0.html">Heritrix Release Notes</a> for detail and <a href="articles/releasenotes/1_4_0.html#1_4_0_limitations">Known Limitations</a>: e.g. You cannot use your old order files with the new release.</p></div><div class="subsection"><a name="Release_1_2_0_11_16_2004"></a><h3>Release 1.2.0 11/16/2004</h3><p>Added IP-based politeness, configurable URI-canonicalization, and mid-fetch abort. Lots of Bug fixes. See <a href="articles/releasenotes/1_2_0.html">Heritrix Release Notes</a> for detail and Known Limitations (In particular, https fetching requires SUN JDK and UI throws OOME if jobs run in series). </p></div><div class="subsection"><a name="Release_1_0_4_09_22_2004"></a><h3>Release 1.0.4 09/22/2004</h3><p>Bug fix. Crawl.log and ARC metadata lines could have whitespace in URIs and mimetype fields. See <a href="articles/releasenotes/1_0_4.html">Heritrix Release Notes</a> for detail and Known Limitations. </p></div><div class="subsection"><a name="Release_1_0_2_09_14_2004"></a><h3>Release 1.0.2 09/14/2004</h3><p>Bug fixes. See <a href="articles/releasenotes/1_0_2.html">Heritrix Release Notes</a> for detail and known limitations. </p></div><div class="subsection"><a name="Release_1_0_0_08_06_2004"></a><h3>Release 1.0.0 08/06/2004</h3><p>Added new prefix ('SURT') scope and filter, compression of recovery log, mass adding of URIs to running crawler, crawling via a http proxy, adding of headers to request, improved out-of-the-box defaults, hash of content to crawl log and to arcreader output, and many bug fixes. See <a href="articles/releasenotes/1_0_0.html">Heritrix Release Notes</a> for detail and known limitations. </p></div><div class="subsection"><a name="1_0_0_first_release_candidate__0_10_0_06_04_2004"></a><h3>1.0.0 first release candidate, 0.10.0 06/04/2004</h3><p>Release for second heritrix workshop, Copenhagen 06/2004 (1.0.0 first release candidate). Added site-first prioritization, fixed link extraction of multibyte URIs, added metadata to arcs as xml, changed arc naming template, new user and developer manuals, added basic/digest auth and http post/get login facility, and added help to UI. Bug fixes. See <a href="articles/releasenotes/0_10_0.html">Heritrix Release Notes</a> for detail and known limitations. </p></div><div class="subsection"><a name="Release_0_8_1_05_28_2004"></a><h3>Release 0.8.1 05/28/2004</h3><p>Fixes to build with maven rc2+. </p></div><div class="subsection"><a name="Release_0_8_0_05_24_2004"></a><h3>Release 0.8.0 05/24/2004</h3><p>Release (and branch heritrix-0_8 made at the heritrix-0_7_1 tag) because of concurrentmodificationexceptions if tens of seeds supplied and to fix domain-scope leakage. Also, made continuous build publically available, incorporated integration selftest into build, made it a maven-build only (ant-build no longer supported), added day/night configurations (refinements), ameliorated too-many-open files, added exploit of http-header content-type charset creating character streams, and heritrix now crawls ssl sites. UI improvements include red start by bad configuration, precompilation, and delineation of advanced settings. See <a href="articles/releasenotes/0_8_0.html">Heritrix Release Notes</a> for detail. </p></div><div class="subsection"><a name="Release_0_6_0_03_25_2004"></a><h3>Release 0.6.0 03/25/2004</h3><p>Release made in advance of radical frontier changes. Added bandwidth throttle, operator 'diary', settable robots expiration, crawler cookie pre-population, and changing of certain options mid-crawl. Many UI improvements including UI display of critical exceptions, UI desccription of job-order options, and improved reporting. Optimizations. Updated httpclient lib to 2.0 release and jmx libs to 1.2.1. See <a href="articles/releasenotes/0_6_0.html">Heritrix Release Notes</a> for detail. </p></div><div class="subsection"><a name="Point_Release_0_4_1_02_12_2004"></a><h3>Point Release 0.4.1 02/12/2004</h3><p>Released <a href="http://sourceforge.net/project/showfiles.php?group_id=73833&package_id=73980" class="externalLink" title="External Link">heritrix-0.4.1</a> to fix <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=895955&group_id=73833&atid=539099" class="externalLink" title="External Link">URIRegExpFilter retains memory</a>.</p></div><div class="subsection"><a name="Release_0_4_0_02_10_2004"></a><h3>Release 0.4.0 02/10/2004</h3><p>Release made for heritrix workshop, San Francisco, 02/2004. New MBEAN-based configuration, extensive UI revamp, first unit tests and integration selftest framework added, pooling of ARCWriters, new cmd-line start scripts, httpclient lib update (2.0RC3) and bugfixes. See <a href="articles/releasenotes/0_4_0.html">Heritrix Release Notes</a> for detail. </p></div><div class="subsection"><a name="First_Release_01_05_2004"></a><h3>First Release 01/05/2004</h3><p>Today we made our first 'official' heritrix release, <a href="http://sourceforge.net/project/showfiles.php?group_id=73833&package_id=73980" class="externalLink" title="External Link">heritrix-0.2.0</a>.</p></div></div></div></div><div class="clear"><hr></hr></div><div id="footer"><div class="xleft"><a href="http://sourceforge.net/projects/archive-crawler/" class="externalLink" title="External Link"> <img src="http://sourceforge.net/sflogo.php?group_id=archive-crawler&type=1" border="0" alt="sf logo"></img></a></div><div class="xright">漏 2003-2007, Internet Archive</div><div class="clear"><hr></hr></div></div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -