⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 extractortool.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_06) on Wed Sep 27 16:03:11 PDT 2006 --><TITLE>ExtractorTool (Heritrix 1.10.1)</TITLE><META NAME="keywords" CONTENT="org.archive.crawler.extractor.ExtractorTool class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){    parent.document.title="ExtractorTool (Heritrix 1.10.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/ExtractorTool.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../../org/archive/crawler/extractor/ExtractorSWF.ExtractorTagParser.html" title="class in org.archive.crawler.extractor"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../../org/archive/crawler/extractor/ExtractorUniversal.html" title="class in org.archive.crawler.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../index.html?org/archive/crawler/extractor/ExtractorTool.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="ExtractorTool.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;FIELD&nbsp;|&nbsp;<A HREF="#constructor_summary">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;FIELD&nbsp;|&nbsp;<A HREF="#constructor_detail">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.crawler.extractor</FONT><BR>Class ExtractorTool</H2><PRE>java.lang.Object  <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>org.archive.crawler.extractor.ExtractorTool</B></PRE><HR><DL><DT><PRE>public class <B>ExtractorTool</B><DT>extends java.lang.Object</DL></PRE><P>Run named extractors against passed ARC file. This extractor tool runs suboptimally.  It takes each ARC file record, writes it to a new scratch file, and then it runs each listed extractor against the scratch.  It works in this manner because extractors want CharSequence, being able to refer to characters by absolute position, but ARCs are compressed streams.  The work to get a CharSequence on an underlying compressed stream has not been done.  Other issues are need to setup CrawlerSetting environment so extractors can run.<P><P><DL><DT><B>Version:</B></DT>  <DD>$Date: 2005/07/18 17:29:23 $, $Revision: 1.4 $</DD><DT><B>Author:</B></DT>  <DD>stack</DD></DL><HR><P><!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#ExtractorTool()">ExtractorTool</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#ExtractorTool(java.lang.String[], java.lang.String)">ExtractorTool</A></B>(java.lang.String[]&nbsp;e,              java.lang.String&nbsp;scratch)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR></TABLE>&nbsp;<!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#extract(java.lang.String)">extract</A></B>(java.lang.String&nbsp;resource)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#getCrawlURI(org.archive.io.arc.ARCRecord, org.archive.util.HttpRecorder)">getCrawlURI</A></B>(<A HREF="../../../../org/archive/io/arc/ARCRecord.html" title="class in org.archive.io.arc">ARCRecord</A>&nbsp;record,            <A HREF="../../../../org/archive/util/HttpRecorder.html" title="class in org.archive.util">HttpRecorder</A>&nbsp;hr)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#main(java.lang.String[])">main</A></B>(java.lang.String[]&nbsp;args)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#outlinks(org.archive.crawler.datamodel.CrawlURI)">outlinks</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait</CODE></TD></TR></TABLE>&nbsp;<P><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="ExtractorTool()"><!-- --></A><H3>ExtractorTool</H3><PRE>public <B>ExtractorTool</B>()              throws java.lang.Exception</PRE><DL><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DL><HR><A NAME="ExtractorTool(java.lang.String[], java.lang.String)"><!-- --></A><H3>ExtractorTool</H3><PRE>public <B>ExtractorTool</B>(java.lang.String[]&nbsp;e,                     java.lang.String&nbsp;scratch)              throws java.lang.Exception</PRE><DL><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="extract(java.lang.String)"><!-- --></A><H3>extract</H3><PRE>public void <B>extract</B>(java.lang.String&nbsp;resource)             throws java.io.IOException,                    org.apache.commons.httpclient.URIException,                    java.lang.InterruptedException</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE><DD><CODE>org.apache.commons.httpclient.URIException</CODE><DD><CODE>java.lang.InterruptedException</CODE></DL></DD></DL><HR><A NAME="outlinks(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>outlinks</H3><PRE>protected void <B>outlinks</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi)</PRE><DL><DD><DL></DL></DD></DL><HR><A NAME="getCrawlURI(org.archive.io.arc.ARCRecord, org.archive.util.HttpRecorder)"><!-- --></A><H3>getCrawlURI</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>getCrawlURI</B>(<A HREF="../../../../org/archive/io/arc/ARCRecord.html" title="class in org.archive.io.arc">ARCRecord</A>&nbsp;record,                               <A HREF="../../../../org/archive/util/HttpRecorder.html" title="class in org.archive.util">HttpRecorder</A>&nbsp;hr)                        throws org.apache.commons.httpclient.URIException</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="main(java.lang.String[])"><!-- --></A><H3>main</H3><PRE>public static void <B>main</B>(java.lang.String[]&nbsp;args)                 throws java.lang.Exception</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DD></DL><!-- ========= END OF CLASS DATA ========= --><HR><!-- ======= START OF BOTTOM NAVBAR ====== --><A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/ExtractorTool.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../../org/archive/crawler/extractor/ExtractorSWF.ExtractorTagParser.html" title="class in org.archive.crawler.extractor"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../../org/archive/crawler/extractor/ExtractorUniversal.html" title="class in org.archive.crawler.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../index.html?org/archive/crawler/extractor/ExtractorTool.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="ExtractorTool.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;FIELD&nbsp;|&nbsp;<A HREF="#constructor_summary">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;FIELD&nbsp;|&nbsp;<A HREF="#constructor_detail">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= --><HR>Copyright &copy; 2003-2006 Internet Archive. All Rights Reserved.</BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -