⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 index.html

📁 网络爬虫开源代码
💻 HTML
📖 第 1 页 / 共 2 页
字号:
      quotas. Performance and stability in large crawls is also improved.      Among tracked issues, it includes 39 requested enhancements and fixes 96       reported bugs. See      <a href="articles/releasenotes/1_6_0.html">Heritrix Release Notes</a>       for detail and      <a href="articles/releasenotes/1_6_0.html#1_6_0_limitations">Known      Limitations</a>: e.g. Again you will need to       <a href="articles/releasenotes/1_6_0.html#postselector">tweak your old order      files</a> to make them work with the new release.</p></div><div class="subsection"><a name="Release_1_4_0_04_28_2005"></a><h3>Release 1.4.0 04/28/2005</h3><p>Much improved memory usage, new experimental scoping/filter model,      and a new revisiting frontier.  Over 90 bugs fixed. See      <a href="articles/releasenotes/1_4_0.html">Heritrix Release Notes</a>       for detail and      <a href="articles/releasenotes/1_4_0.html#1_4_0_limitations">Known      Limitations</a>: e.g. You cannot use your old order files with the new      release.</p></div><div class="subsection"><a name="Release_1_2_0_11_16_2004"></a><h3>Release 1.2.0 11/16/2004</h3><p>Added IP-based politeness, configurable URI-canonicalization,        and mid-fetch abort.  Lots of Bug fixes.  See        <a href="articles/releasenotes/1_2_0.html">Heritrix Release Notes</a>         for detail and Known Limitations (In particular, https fetching        requires SUN JDK and UI throws OOME if jobs run in series).        </p></div><div class="subsection"><a name="Release_1_0_4_09_22_2004"></a><h3>Release 1.0.4 09/22/2004</h3><p>Bug fix. Crawl.log and ARC metadata lines could have whitespace        in URIs and mimetype fields.  See        <a href="articles/releasenotes/1_0_4.html">Heritrix Release Notes</a>         for detail and Known Limitations.        </p></div><div class="subsection"><a name="Release_1_0_2_09_14_2004"></a><h3>Release 1.0.2 09/14/2004</h3><p>Bug fixes.         See <a href="articles/releasenotes/1_0_2.html">Heritrix Release        Notes</a> for detail and known limitations.        </p></div><div class="subsection"><a name="Release_1_0_0_08_06_2004"></a><h3>Release 1.0.0 08/06/2004</h3><p>Added new prefix ('SURT') scope and filter,        compression of recovery log,        mass adding of URIs to running crawler,        crawling via a http proxy,         adding of headers to request,        improved out-of-the-box defaults,        hash of content to crawl log and to arcreader output,        and many bug fixes.        See <a href="articles/releasenotes/1_0_0.html">Heritrix Release        Notes</a> for detail and known limitations.        </p></div><div class="subsection"><a name="1_0_0_first_release_candidate__0_10_0_06_04_2004"></a><h3>1.0.0 first release candidate, 0.10.0 06/04/2004</h3><p>Release for second heritrix workshop, Copenhagen 06/2004        (1.0.0 first release candidate). Added site-first prioritization,        fixed link extraction of multibyte URIs, added metadata to arcs as xml,        changed arc naming template, new user and developer manuals,        added basic/digest auth and http post/get login facility, and added        help to UI. Bug fixes.         See <a href="articles/releasenotes/0_10_0.html">Heritrix Release        Notes</a> for detail and known limitations.        </p></div><div class="subsection"><a name="Release_0_8_1_05_28_2004"></a><h3>Release 0.8.1 05/28/2004</h3><p>Fixes to build with maven rc2+.        </p></div><div class="subsection"><a name="Release_0_8_0_05_24_2004"></a><h3>Release 0.8.0 05/24/2004</h3><p>Release (and branch heritrix-0_8 made at the heritrix-0_7_1 tag)        because of concurrentmodificationexceptions if tens of seeds supplied        and to fix domain-scope leakage. Also, made continuous build        publically available, incorporated integration selftest into build,        made it a maven-build only (ant-build no longer supported), added        day/night configurations (refinements), ameliorated too-many-open        files, added exploit of http-header content-type charset creating        character streams, and heritrix now crawls ssl sites. UI improvements        include red start by bad configuration, precompilation, and        delineation of advanced settings.         See <a href="articles/releasenotes/0_8_0.html">Heritrix Release        Notes</a> for detail.        </p></div><div class="subsection"><a name="Release_0_6_0_03_25_2004"></a><h3>Release 0.6.0 03/25/2004</h3><p>Release made in advance of radical frontier changes.        Added bandwidth throttle, operator 'diary', settable robots expiration,        crawler cookie pre-population, and changing of certain options        mid-crawl. Many UI improvements including UI display of critical        exceptions, UI desccription of job-order options, and improved        reporting.  Optimizations.  Updated httpclient lib to 2.0 release and        jmx libs to 1.2.1.        See <a href="articles/releasenotes/0_6_0.html">Heritrix Release        Notes</a> for detail.        </p></div><div class="subsection"><a name="Point_Release_0_4_1_02_12_2004"></a><h3>Point Release 0.4.1 02/12/2004</h3><p>Released <a href="http://sourceforge.net/project/showfiles.php?group_id=73833&amp;package_id=73980" class="externalLink" title="External Link">heritrix-0.4.1</a> to fix         <a href="http://sourceforge.net/tracker/index.php?func=detail&amp;aid=895955&amp;group_id=73833&amp;atid=539099" class="externalLink" title="External Link">URIRegExpFilter retains memory</a>.</p></div><div class="subsection"><a name="Release_0_4_0_02_10_2004"></a><h3>Release 0.4.0 02/10/2004</h3><p>Release made for heritrix workshop, San Francisco, 02/2004.        New MBEAN-based configuration, extensive UI revamp, first unit        tests and integration selftest framework added, pooling of        ARCWriters, new cmd-line start scripts, httpclient lib update (2.0RC3)        and bugfixes.        See <a href="articles/releasenotes/0_4_0.html">Heritrix Release        Notes</a> for detail.        </p></div><div class="subsection"><a name="First_Release_01_05_2004"></a><h3>First Release 01/05/2004</h3><p>Today we made our first 'official' heritrix release,         <a href="http://sourceforge.net/project/showfiles.php?group_id=73833&amp;package_id=73980" class="externalLink" title="External Link">heritrix-0.2.0</a>.</p></div></div></div></div><div class="clear"><hr></hr></div><div id="footer"><div class="xleft"><a href="http://sourceforge.net/projects/archive-crawler/" class="externalLink" title="External Link">            <img src="http://sourceforge.net/sflogo.php?group_id=archive-crawler&amp;type=1" border="0" alt="sf logo"></img></a></div><div class="xright">漏 2003-2007, Internet Archive</div><div class="clear"><hr></hr></div></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -