regexphtmllinkextractor.html
来自「网络爬虫开源代码」· HTML 代码 · 共 820 行 · 第 1/3 页
HTML
820 行
<A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#RegexpHTMLLinkExtractor()">RegexpHTMLLinkExtractor</A></B>()</CODE><BR> </TD></TR></TABLE> <!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#findNextLink()">findNextLink</A></B>()</CODE><BR> Scan to the next link(s), if any, loading it into the next buffer.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected static <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html" title="class in org.archive.extractor">CharSequenceLinkExtractor</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#newDefaultInstance()">newDefaultInstance</A></B>()</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processEmbed(java.lang.CharSequence, java.lang.CharSequence)">processEmbed</A></B>(java.lang.CharSequence value, java.lang.CharSequence context)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processGeneralTag(java.lang.CharSequence, java.lang.CharSequence)">processGeneralTag</A></B>(java.lang.CharSequence element, java.lang.CharSequence cs)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processLink(java.lang.CharSequence, java.lang.CharSequence)">processLink</A></B>(java.lang.CharSequence value, java.lang.CharSequence context)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processMeta(java.lang.CharSequence)">processMeta</A></B>(java.lang.CharSequence cs)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processScript(java.lang.CharSequence, int)">processScript</A></B>(java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processScriptCode(java.lang.CharSequence)">processScriptCode</A></B>(java.lang.CharSequence cs)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#processStyle(java.lang.CharSequence, int)">processStyle</A></B>(java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../org/archive/extractor/RegexpHTMLLinkExtractor.html#reset()">reset</A></B>()</CODE><BR> Discard all state.</TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.extractor.CharSequenceLinkExtractor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.extractor.<A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html" title="class in org.archive.extractor">CharSequenceLinkExtractor</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#charSequenceFrom(java.io.InputStream, java.nio.charset.Charset)">charSequenceFrom</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#createCharSequenceFrom(java.io.InputStream, java.nio.charset.Charset)">createCharSequenceFrom</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#extract(java.lang.CharSequence, org.archive.net.UURI, org.archive.net.UURI, java.util.List, org.archive.extractor.ExtractErrorListener)">extract</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#hasNext()">hasNext</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#next()">next</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#nextLink()">nextLink</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#remove()">remove</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, java.lang.CharSequence, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, org.archive.net.UURI, java.lang.CharSequence, org.archive.extractor.ExtractErrorListener)">setup</A>, <A HREF="../../../org/archive/extractor/CharSequenceLinkExtractor.html#setup(org.archive.net.UURI, org.archive.net.UURI, java.io.InputStream, java.nio.charset.Charset, org.archive.extractor.ExtractErrorListener)">setup</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait</CODE></TD></TR></TABLE> <P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="honorRobots"><!-- --></A><H3>honorRobots</H3><PRE>boolean <B>honorRobots</B></PRE><DL><DL></DL></DL><HR><A NAME="extractInlineCss"><!-- --></A><H3>extractInlineCss</H3><PRE>boolean <B>extractInlineCss</B></PRE><DL><DL></DL></DL><HR><A NAME="extractInlineJs"><!-- --></A><H3>extractInlineJs</H3><PRE>boolean <B>extractInlineJs</B></PRE><DL><DL></DL></DL><HR><A NAME="next"><!-- --></A><H3>next</H3><PRE>protected java.util.LinkedList<<A HREF="../../../org/archive/crawler/extractor/Link.html" title="class in org.archive.crawler.extractor">Link</A>> <B>next</B></PRE><DL><DL></DL></DL><HR><A NAME="tags"><!-- --></A><H3>tags</H3><PRE>protected java.util.regex.Matcher <B>tags</B></PRE><DL><DL></DL></DL><HR><A NAME="RELEVANT_TAG_EXTRACTOR"><!-- --></A><H3>RELEVANT_TAG_EXTRACTOR</H3><PRE>static final java.lang.String <B>RELEVANT_TAG_EXTRACTOR</B></PRE><DL><DD>Compiled relevant tag extractor. <p> This pattern extracts either: <li> (1) whole <script>...</script> or <li> (2) <style>...</style> or <li> (3) <meta ...> or <li> (4) any other open-tag with at least one attribute (eg matches "<a href='boo'>" but not "</a>" or "<br>") <p> groups: <li> 1: SCRIPT SRC=foo>boo</SCRIPT <li> 2: just script open tag <li> 3: STYLE TYPE=moo>zoo</STYLE <li> 4: just style open tag <li> 5: entire other tag, without '<' '>' <li> 6: element <li> 7: META <li> 8: !-- comment --<P><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.RELEVANT_TAG_EXTRACTOR">Constant Field Values</A></DL></DL><HR><A NAME="EACH_ATTRIBUTE_EXTRACTOR"><!-- --></A><H3>EACH_ATTRIBUTE_EXTRACTOR</H3><PRE>static final java.lang.String <B>EACH_ATTRIBUTE_EXTRACTOR</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.EACH_ATTRIBUTE_EXTRACTOR">Constant Field Values</A></DL></DL><HR><A NAME="LIKELY_URI_PATH"><!-- --></A><H3>LIKELY_URI_PATH</H3><PRE>static final java.lang.String <B>LIKELY_URI_PATH</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.LIKELY_URI_PATH">Constant Field Values</A></DL></DL><HR><A NAME="ESCAPED_AMP"><!-- --></A><H3>ESCAPED_AMP</H3><PRE>static final java.lang.String <B>ESCAPED_AMP</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.ESCAPED_AMP">Constant Field Values</A></DL></DL><HR><A NAME="AMP"><!-- --></A><H3>AMP</H3><PRE>static final java.lang.String <B>AMP</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.AMP">Constant Field Values</A></DL></DL><HR><A NAME="WHITESPACE"><!-- --></A><H3>WHITESPACE</H3><PRE>static final java.lang.String <B>WHITESPACE</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../constant-values.html#org.archive.extractor.RegexpHTMLLinkExtractor.WHITESPACE">Constant Field Values</A></DL></DL><HR><A NAME="CLASSEXT"><!-- --></A><H3>CLASSEXT</H3><PRE>static final java.lang.String <B>CLASSEXT</B></PRE>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?