regexphtmllinkextractor.html

来自「网络爬虫开源代码」· HTML 代码 · 共 820 行 · 第 1/3 页

HTML
820
字号
<A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#RegexpHTMLLinkExtractor()">RegexpHTMLLinkExtractor</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR></TABLE>&nbsp;<!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#findNextLink()">findNextLink</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Scan to the next link(s), if any, loading it into the next buffer.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected static&nbsp;<A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html" title="class in org.archive.extractor">CharSequenceLinkExtractor</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#newDefaultInstance()">newDefaultInstance</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processEmbed(java.lang.CharSequence, java.lang.CharSequence)">processEmbed</A></B>(java.lang.CharSequence&nbsp;value,             java.lang.CharSequence&nbsp;context)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processGeneralTag(java.lang.CharSequence, java.lang.CharSequence)">processGeneralTag</A></B>(java.lang.CharSequence&nbsp;element,                  java.lang.CharSequence&nbsp;cs)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processLink(java.lang.CharSequence, java.lang.CharSequence)">processLink</A></B>(java.lang.CharSequence&nbsp;value,            java.lang.CharSequence&nbsp;context)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processMeta(java.lang.CharSequence)">processMeta</A></B>(java.lang.CharSequence&nbsp;cs)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processScript(java.lang.CharSequence, int)">processScript</A></B>(java.lang.CharSequence&nbsp;sequence,              int&nbsp;endOfOpenTag)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processScriptCode(java.lang.CharSequence)">processScriptCode</A></B>(java.lang.CharSequence&nbsp;cs)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected &nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processStyle(java.lang.CharSequence, int)">processStyle</A></B>(java.lang.CharSequence&nbsp;sequence,             int&nbsp;endOfOpenTag)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#reset()">reset</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Discard all state.</TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_org.archive.extractor.CharSequenceLinkExtractor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.extractor.<A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html" title="class in org.archive.extractor">CharSequenceLinkExtractor</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#charSequenceFrom(java.io.InputStream, java.nio.charset.Charset)">charSequenceFrom</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#createCharSequenceFrom(java.io.InputStream, java.nio.charset.Charset)">createCharSequenceFrom</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#extract(java.lang.CharSequence, org.archive.net.UURI, org.archive.net.UURI, java.util.List, org.archive.extractor.ExtractErrorListener)">extract</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#hasNext()">hasNext</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#next()">next</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#nextLink()">nextLink</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#remove()">remove</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, java.lang.CharSequence, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, org.archive.net.UURI, java.lang.CharSequence, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A></CODE></TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait</CODE></TD></TR></TABLE>&nbsp;<P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="honorRobots"><!-- --></A><H3>honorRobots</H3><PRE>boolean <B>honorRobots</B></PRE><DL><DL></DL></DL><HR><A NAME="extractInlineCss"><!-- --></A><H3>extractInlineCss</H3><PRE>boolean <B>extractInlineCss</B></PRE><DL><DL></DL></DL><HR><A NAME="extractInlineJs"><!-- --></A><H3>extractInlineJs</H3><PRE>boolean <B>extractInlineJs</B></PRE><DL><DL></DL></DL><HR><A NAME="next"><!-- --></A><H3>next</H3><PRE>protected java.util.LinkedList&lt;<A HREF="../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A>&gt; <B>next</B></PRE><DL><DL></DL></DL><HR><A NAME="tags"><!-- --></A><H3>tags</H3><PRE>protected java.util.regex.Matcher <B>tags</B></PRE><DL><DL></DL></DL><HR><A NAME="RELEVANT_TAG_EXTRACTOR"><!-- --></A><H3>RELEVANT_TAG_EXTRACTOR</H3><PRE>static final java.lang.String <B>RELEVANT_TAG_EXTRACTOR</B></PRE><DL><DD>Compiled relevant tag extractor. <p> This pattern extracts either: <li> (1) whole &lt;script&gt;...&lt;/script&gt; or <li> (2) &lt;style&gt;...&lt;/style&gt; or <li> (3) &lt;meta ...&gt; or <li> (4) any other open-tag with at least one attribute (eg matches "&lt;a href='boo'&gt;" but not "&lt;/a&gt;" or "&lt;br&gt;") <p> groups: <li> 1: SCRIPT SRC=foo&gt;boo&lt;/SCRIPT <li> 2: just script open tag <li> 3: STYLE TYPE=moo&gt;zoo&lt;/STYLE <li> 4: just style open tag <li> 5: entire other tag, without '<' '>' <li> 6: element <li> 7: META <li> 8: !-- comment --<P><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.RELEVANT_TAG_EXTRACTOR">Constant Field Values</A></DL></DL><HR><A NAME="EACH_ATTRIBUTE_EXTRACTOR"><!-- --></A><H3>EACH_ATTRIBUTE_EXTRACTOR</H3><PRE>static final java.lang.String <B>EACH_ATTRIBUTE_EXTRACTOR</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.EACH_ATTRIBUTE_EXTRACTOR">Constant Field Values</A></DL></DL><HR><A NAME="LIKELY_URI_PATH"><!-- --></A><H3>LIKELY_URI_PATH</H3><PRE>static final java.lang.String <B>LIKELY_URI_PATH</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.LIKELY_URI_PATH">Constant Field Values</A></DL></DL><HR><A NAME="ESCAPED_AMP"><!-- --></A><H3>ESCAPED_AMP</H3><PRE>static final java.lang.String <B>ESCAPED_AMP</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.ESCAPED_AMP">Constant Field Values</A></DL></DL><HR><A NAME="AMP"><!-- --></A><H3>AMP</H3><PRE>static final java.lang.String <B>AMP</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.AMP">Constant Field Values</A></DL></DL><HR><A NAME="WHITESPACE"><!-- --></A><H3>WHITESPACE</H3><PRE>static final java.lang.String <B>WHITESPACE</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.WHITESPACE">Constant Field Values</A></DL></DL><HR><A NAME="CLASSEXT"><!-- --></A><H3>CLASSEXT</H3><PRE>static final java.lang.String <B>CLASSEXT</B></PRE>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?