📄 extractortool.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_06) on Wed Sep 27 16:03:11 PDT 2006 --><TITLE>ExtractorTool (Heritrix 1.10.1)</TITLE><META NAME="keywords" CONTENT="org.archive.crawler.extractor.ExtractorTool class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){ parent.document.title="ExtractorTool (Heritrix 1.10.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="class-use/ExtractorTool.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../org/archive/crawler/extractor/ExtractorSWF.ExtractorTagParser.html" title="class in org.archive.crawler.extractor"><B>PREV CLASS</B></A> <A HREF="../../../../org/archive/crawler/extractor/ExtractorUniversal.html" title="class in org.archive.crawler.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../index.html?org/archive/crawler/extractor/ExtractorTool.html" target="_top"><B>FRAMES</B></A> <A HREF="ExtractorTool.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2"> SUMMARY: NESTED | FIELD | <A HREF="#constructor_summary">CONSTR</A> | <A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL: FIELD | <A HREF="#constructor_detail">CONSTR</A> | <A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.crawler.extractor</FONT><BR>Class ExtractorTool</H2><PRE>java.lang.Object <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>org.archive.crawler.extractor.ExtractorTool</B></PRE><HR><DL><DT><PRE>public class <B>ExtractorTool</B><DT>extends java.lang.Object</DL></PRE><P>Run named extractors against passed ARC file. This extractor tool runs suboptimally. It takes each ARC file record, writes it to a new scratch file, and then it runs each listed extractor against the scratch. It works in this manner because extractors want CharSequence, being able to refer to characters by absolute position, but ARCs are compressed streams. The work to get a CharSequence on an underlying compressed stream has not been done. Other issues are need to setup CrawlerSetting environment so extractors can run.<P><P><DL><DT><B>Version:</B></DT> <DD>$Date: 2005/07/18 17:29:23 $, $Revision: 1.4 $</DD><DT><B>Author:</B></DT> <DD>stack</DD></DL><HR><P><!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#ExtractorTool()">ExtractorTool</A></B>()</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#ExtractorTool(java.lang.String[], java.lang.String)">ExtractorTool</A></B>(java.lang.String[] e, java.lang.String scratch)</CODE><BR> </TD></TR></TABLE> <!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#extract(java.lang.String)">extract</A></B>(java.lang.String resource)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#getCrawlURI(org.archive.io.arc.ARCRecord, org.archive.util.HttpRecorder)">getCrawlURI</A></B>(<A HREF="../../../../org/archive/io/arc/ARCRecord.html" title="class in org.archive.io.arc">ARCRecord</A> record, <A HREF="../../../../org/archive/util/HttpRecorder.html" title="class in org.archive.util">HttpRecorder</A> hr)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#main(java.lang.String[])">main</A></B>(java.lang.String[] args)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorTool.html#outlinks(org.archive.crawler.datamodel.CrawlURI)">outlinks</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</CODE><BR> </TD></TR></TABLE> <A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait</CODE></TD></TR></TABLE> <P><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="ExtractorTool()"><!-- --></A><H3>ExtractorTool</H3><PRE>public <B>ExtractorTool</B>() throws java.lang.Exception</PRE><DL><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DL><HR><A NAME="ExtractorTool(java.lang.String[], java.lang.String)"><!-- --></A><H3>ExtractorTool</H3><PRE>public <B>ExtractorTool</B>(java.lang.String[] e, java.lang.String scratch) throws java.lang.Exception</PRE><DL><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="extract(java.lang.String)"><!-- --></A><H3>extract</H3><PRE>public void <B>extract</B>(java.lang.String resource) throws java.io.IOException, org.apache.commons.httpclient.URIException, java.lang.InterruptedException</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>java.io.IOException</CODE><DD><CODE>org.apache.commons.httpclient.URIException</CODE><DD><CODE>java.lang.InterruptedException</CODE></DL></DD></DL><HR><A NAME="outlinks(org.archive.crawler.datamodel.CrawlURI)"><!-- --></A><H3>outlinks</H3><PRE>protected void <B>outlinks</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi)</PRE><DL><DD><DL></DL></DD></DL><HR><A NAME="getCrawlURI(org.archive.io.arc.ARCRecord, org.archive.util.HttpRecorder)"><!-- --></A><H3>getCrawlURI</H3><PRE>protected <A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> <B>getCrawlURI</B>(<A HREF="../../../../org/archive/io/arc/ARCRecord.html" title="class in org.archive.io.arc">ARCRecord</A> record, <A HREF="../../../../org/archive/util/HttpRecorder.html" title="class in org.archive.util">HttpRecorder</A> hr) throws org.apache.commons.httpclient.URIException</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>org.apache.commons.httpclient.URIException</CODE></DL></DD></DL><HR><A NAME="main(java.lang.String[])"><!-- --></A><H3>main</H3><PRE>public static void <B>main</B>(java.lang.String[] args) throws java.lang.Exception</PRE><DL><DD><DL><DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE></DL></DD></DL><!-- ========= END OF CLASS DATA ========= --><HR><!-- ======= START OF BOTTOM NAVBAR ====== --><A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="class-use/ExtractorTool.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../org/archive/crawler/extractor/ExtractorSWF.ExtractorTagParser.html" title="class in org.archive.crawler.extractor"><B>PREV CLASS</B></A> <A HREF="../../../../org/archive/crawler/extractor/ExtractorUniversal.html" title="class in org.archive.crawler.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../index.html?org/archive/crawler/extractor/ExtractorTool.html" target="_top"><B>FRAMES</B></A> <A HREF="ExtractorTool.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2"> SUMMARY: NESTED | FIELD | <A HREF="#constructor_summary">CONSTR</A> | <A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL: FIELD | <A HREF="#constructor_detail">CONSTR</A> | <A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= --><HR>Copyright © 2003-2006 Internet Archive. All Rights Reserved.</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -