⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 linkextractor.html

📁 用JAVA编写的,在做实验的时候留下来的,本来想删的,但是传上来,大家分享吧
💻 HTML
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_06) on Wed Sep 27 16:03:14 PDT 2006 --><TITLE>LinkExtractor (Heritrix 1.10.1)</TITLE><META NAME="keywords" CONTENT="org.archive.extractor.LinkExtractor interface"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){    parent.document.title="LinkExtractor (Heritrix 1.10.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/LinkExtractor.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../org/archive/extractor/RegexpCSSLinkExtractor.html" title="class in org.archive.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../index.html?org/archive/extractor/LinkExtractor.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="LinkExtractor.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;FIELD&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;FIELD&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.extractor</FONT><BR>Interface LinkExtractor</H2><DL><DT><B>All Superinterfaces:</B> <DD>java.util.Iterator</DD></DL><DL><DT><B>All Known Implementing Classes:</B> <DD><A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html" title="class in org.archive.extractor">CharSequenceLinkExtractor</A>, <A HREF="../../../org/archive/extractor/RegexpCSSLinkExtractor.html" title="class in org.archive.extractor">RegexpCSSLinkExtractor</A>, <A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html" title="class in org.archive.extractor">RegexpHTMLLinkExtractor</A>, <A HREF="../../../org/archive/extractor/RegexpJSLinkExtractor.html" title="class in org.archive.extractor">RegexpJSLinkExtractor</A></DD></DL><HR><DL><DT><PRE>public interface <B>LinkExtractor</B><DT>extends java.util.Iterator</DL></PRE><P>LinkExtractor is a general interface for classes which, when given an InputStream and Charset, can scan for Links and return them via an Iterator interface. Implementors may in fact complete all extraction on the first hasNext(), then trickle Links out from an internal collection, depending on whether the link-extraction technique used is amenable to incremental scanning. ROUGH DRAFT IN PROGRESS / incomplete... untested...<P><P><DL><DT><B>Author:</B></DT>  <DD>gojomo</DD></DL><HR><P><!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/LinkExtractor.html#nextLink()">nextLink</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Alternative to Iterator.next() which returns type Link.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/LinkExtractor.html#reset()">reset</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Discard all state and release any used resources.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/LinkExtractor.html#setup(org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A></B>(<A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;sourceandbase,      java.io.InputStream&nbsp;content,      java.nio.charset.Charset&nbsp;charset,      <A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor">ExtractErrorListener</A>&nbsp;listener)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Convenience version of above for common case where source and base are  same.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/LinkExtractor.html#setup(org.archive.net.UURI, org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A></B>(<A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;source,      <A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;base,      java.io.InputStream&nbsp;content,      java.nio.charset.Charset&nbsp;charset,      <A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor">ExtractErrorListener</A>&nbsp;listener)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Setup the LinkExtractor to operate on the given stream and charset, considering the given contextURI as the initial 'base' URI for resolving relative URIs.</TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_java.util.Iterator"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from interface java.util.Iterator</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>hasNext, next, remove</CODE></TD></TR></TABLE>&nbsp;<P><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="setup(org.archive.net.UURI, org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)"><!-- --></A><H3>setup</H3><PRE>void <B>setup</B>(<A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;source,           <A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;base,           java.io.InputStream&nbsp;content,           java.nio.charset.Charset&nbsp;charset,           <A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor">ExtractErrorListener</A>&nbsp;listener)</PRE><DL><DD>Setup the LinkExtractor to operate on the given stream and charset, considering the given contextURI as the initial 'base' URI for resolving relative URIs. May be called to 'reset' a LinkExtractor to start with new input.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>source</CODE> - source URI<DD><CODE>base</CODE> - base URI (usually the source URI) for URI derelativizing<DD><CODE>content</CODE> - input stream of content to scan for links<DD><CODE>charset</CODE> - Charset to consult to decode stream to characters<DD><CODE>listener</CODE> - ExtractErrorListener to notify, rather than raising   exception through extraction loop</DL></DD></DL><HR><A NAME="setup(org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)"><!-- --></A><H3>setup</H3><PRE>void <B>setup</B>(<A HREF="../../../org/archive/net/UURI.html" title="class in org.archive.net">UURI</A>&nbsp;sourceandbase,           java.io.InputStream&nbsp;content,           java.nio.charset.Charset&nbsp;charset,           <A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor">ExtractErrorListener</A>&nbsp;listener)</PRE><DL><DD>Convenience version of above for common case where source and base are  same.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>sourceandbase</CODE> - URI to use as source and base for derelativizing<DD><CODE>content</CODE> - input stream of content to scan for links<DD><CODE>charset</CODE> - Charset to consult to decode stream to characters<DD><CODE>listener</CODE> - ExtractErrorListener to notify, rather than raising   exception through extraction loop</DL></DD></DL><HR><A NAME="nextLink()"><!-- --></A><H3>nextLink</H3><PRE><A HREF="../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A> <B>nextLink</B>()</PRE><DL><DD>Alternative to Iterator.next() which returns type Link.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>a discovered Link</DL></DD></DL><HR><A NAME="reset()"><!-- --></A><H3>reset</H3><PRE>void <B>reset</B>()</PRE><DL><DD>Discard all state and release any used resources.<P><DD><DL></DL></DD><DD><DL></DL></DD></DL><!-- ========= END OF CLASS DATA ========= --><HR><!-- ======= START OF BOTTOM NAVBAR ====== --><A NAME="navbar_bottom"><!-- --></A><A HREF="#skip-navbar_bottom" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_bottom_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="class-use/LinkExtractor.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../org/archive/extractor/ExtractErrorListener.html" title="interface in org.archive.extractor"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../org/archive/extractor/RegexpCSSLinkExtractor.html" title="class in org.archive.extractor"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../index.html?org/archive/extractor/LinkExtractor.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="LinkExtractor.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;FIELD&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;FIELD&nbsp;|&nbsp;CONSTR&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_bottom"></A><!-- ======== END OF BOTTOM NAVBAR ======= --><HR>Copyright &copy; 2003-2006 Internet Archive. All Rights Reserved.</BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -