⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 index.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
字号:
<html><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Heritrix developer documentation</title><link href="../docbook.css" rel="stylesheet" type="text/css"><meta content="DocBook XSL Stylesheets V1.67.2" name="generator"><link rel="start" href="index.html" title="Heritrix developer documentation"><link rel="next" href="ar01s01.html" title="1.&nbsp;Introduction"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table summary="Navigation header" width="100%"><tr><th align="center" colspan="3">Heritrix developer documentation</th></tr><tr><td align="left" width="20%">&nbsp;</td><th align="center" width="60%">&nbsp;</th><td align="right" width="20%">&nbsp;<a accesskey="n" href="ar01s01.html">Next</a></td></tr></table><hr></div><div class="article" lang="en" id="N10001"><div class="titlepage"><div><div><h2 class="title"><a name="N10001"></a>Heritrix developer documentation</h2></div><div><div class="authorgroup"><h3 class="corpauthor">Internet Archive</h3><h4 class="editedby">Edited by</h4><h3 class="editor"><span class="firstname">John Erik</span> <span class="surname">Halse</span></h3><div class="author"><h3 class="author"><span class="firstname">Gordon</span> <span class="surname">Mohr</span></h3></div><div class="author"><h3 class="author"><span class="firstname">Kristinn</span> <span class="surname">Sigur&#273;sson</span></h3></div><div class="author"><h3 class="author"><span class="firstname">Michael</span> <span class="surname">Stack</span></h3></div><div class="author"><h3 class="author"><span class="firstname">Paul</span> <span class="surname">Jack</span></h3></div></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="ar01s01.html">1. Introduction</a></span></dt><dt><span class="sect1"><a href="ar01s02.html">2. Obtaining and building Heritrix</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s02.html#N10034">2.1. Obtaining Heritrix</a></span></dt><dt><span class="sect2"><a href="ar01s02.html#N1004E">2.2. Building Heritrix</a></span></dt><dt><span class="sect2"><a href="ar01s02.html#N100A2">2.3. Running Heritrix</a></span></dt><dt><span class="sect2"><a href="ar01s02.html#eclipse">2.4. Eclipse</a></span></dt><dt><span class="sect2"><a href="ar01s02.html#N100B8">2.5. Integration self test</a></span></dt><dt><span class="sect2"><a href="ar01s02.html#N100C9">2.6. cruisecontrol</a></span></dt></dl></dd><dt><span class="sect1"><a href="conventions.html">3. Coding conventions</a></span></dt><dd><dl><dt><span class="sect2"><a href="conventions.html#N100F8">3.1. Tightenings on the SUN conventions</a></span></dt><dt><span class="sect2"><a href="conventions.html#N1011B">3.2. Long versus int</a></span></dt><dt><span class="sect2"><a href="conventions.html#N10120">3.3. Unit tests code in same package</a></span></dt><dt><span class="sect2"><a href="conventions.html#log_messages">3.4. CVS Log Message Format</a></span></dt></dl></dd><dt><span class="sect1"><a href="ar01s04.html">4. Overview of the crawler</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s04.html#N10151">4.1. The CrawlController</a></span></dt><dt><span class="sect2"><a href="ar01s04.html#N10156">4.2. The Frontier</a></span></dt><dt><span class="sect2"><a href="ar01s04.html#N1016B">4.3. ToeThreads</a></span></dt><dt><span class="sect2"><a href="ar01s04.html#N10170">4.4. Processors</a></span></dt></dl></dd><dt><span class="sect1"><a href="ar01s05.html">5. Settings</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s05.html#N101E4">5.1. Settings hierarchy</a></span></dt><dt><span class="sect2"><a href="ar01s05.html#N101F3">5.2. ComplexType hierarchy</a></span></dt></dl></dd><dt><span class="sect1"><a href="chap_modules_common.html">6. Common needs for all configurable modules</a></span></dt><dd><dl><dt><span class="sect2"><a href="chap_modules_common.html#N10279">6.1. Definition of a module</a></span></dt><dt><span class="sect2"><a href="chap_modules_common.html#N102B2">6.2. Accessing attributes</a></span></dt><dt><span class="sect2"><a href="chap_modules_common.html#N102C3">6.3. Putting together a simple module</a></span></dt></dl></dd><dt><span class="sect1"><a href="ar01s07.html">7. Some notes on the URI classes</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s07.html#urischemes">7.1. Supported Schemes (UnsupportedUriSchemeException)</a></span></dt><dt><span class="sect2"><a href="ar01s07.html#N1037A">7.2. The CrawlURI's Attribute list</a></span></dt><dt><span class="sect2"><a href="ar01s07.html#N10383">7.3. The recorder streams</a></span></dt></dl></dd><dt><span class="sect1"><a href="ar01s08.html">8. Writing a Frontier</a></span></dt><dt><span class="sect1"><a href="writefilter.html">9. Writing a Filter</a></span></dt><dt><span class="sect1"><a href="ar01s10.html">10. Writing a Scope</a></span></dt><dt><span class="sect1"><a href="ar01s11.html">11. Writing a Processor</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s11.html#editingCURI">11.1. Accessing and updating the CrawlURI</a></span></dt><dt><span class="sect2"><a href="ar01s11.html#httprecorder">11.2. The HttpRecorder</a></span></dt><dt><span class="sect2"><a href="ar01s11.html#N105EF">11.3. An example processor</a></span></dt><dt><span class="sect2"><a href="ar01s11.html#N1066E">11.4. Things to keep in mind when writing a processor</a></span></dt></dl></dd><dt><span class="sect1"><a href="ar01s12.html">12. Writing a Statistics Tracker</a></span></dt><dd><dl><dt><span class="sect2"><a href="ar01s12.html#N106AA">12.1. AbstractTracker</a></span></dt><dt><span class="sect2"><a href="ar01s12.html#N106B7">12.2. Provided StatisticsTracker</a></span></dt></dl></dd><dt><span class="sect1"><a href="arcs.html">13. Internet Archive ARC files</a></span></dt><dd><dl><dt><span class="sect2"><a href="arcs.html#arcnaming">13.1. ARC File Naming</a></span></dt><dt><span class="sect2"><a href="arcs.html#arcreader">13.2. Reading arc files</a></span></dt><dt><span class="sect2"><a href="arcs.html#arcwriter">13.3. Writing arc files</a></span></dt><dt><span class="sect2"><a href="arcs.html#searching_arcs">13.4. Searching ARCS</a></span></dt></dl></dd><dt><span class="appendix"><a href="apa.html">A. Future changes in the API</a></span></dt><dd><dl><dt><span class="sect1"><a href="refactor_HTTPRecorder.html">1. The org.archive.util.HTTPRecorder class</a></span></dt><dt><span class="sect1"><a href="refactor_frontier_dispositions.html">2. The Frontiers handling of dispositions</a></span></dt></dl></dd><dt><span class="appendix"><a href="release_numbering.html">B. Version and Release Numbering</a></span></dt><dt><span class="appendix"><a href="apc.html">C. Making a Heritrix Release</a></span></dt><dt><span class="appendix"><a href="apd.html">D. Settings XML Schema</a></span></dt><dt><span class="appendix"><a href="profiling.html">E. Profiling Heritrix</a></span></dt><dt><span class="bibliography"><a href="bi01.html">Bibliography</a></span></dt></dl></div></div><div class="navfooter"><hr><table summary="Navigation footer" width="100%"><tr><td align="left" width="40%">&nbsp;</td><td align="center" width="20%">&nbsp;</td><td align="right" width="40%">&nbsp;<a accesskey="n" href="ar01s01.html">Next</a></td></tr><tr><td valign="top" align="left" width="40%">&nbsp;</td><td align="center" width="20%">&nbsp;</td><td valign="top" align="right" width="40%">&nbsp;1.&nbsp;Introduction</td></tr></table></div></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -