extractorhtml.html

来自「网络爬虫开源代码」· HTML 代码 · 共 1,082 行 · 第 1/5 页

HTML
1,082
字号
<A NAME="FRAME"><!-- --></A><H3>FRAME</H3><PRE>static final java.lang.String <B>FRAME</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.FRAME">Constant Field Values</A></DL></DL><HR><A NAME="IFRAME"><!-- --></A><H3>IFRAME</H3><PRE>static final java.lang.String <B>IFRAME</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.IFRAME">Constant Field Values</A></DL></DL><HR><A NAME="ATTR_TREAT_FRAMES_AS_EMBED_LINKS"><!-- --></A><H3>ATTR_TREAT_FRAMES_AS_EMBED_LINKS</H3><PRE>public static final java.lang.String <B>ATTR_TREAT_FRAMES_AS_EMBED_LINKS</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.ATTR_TREAT_FRAMES_AS_EMBED_LINKS">Constant Field Values</A></DL></DL><HR><A NAME="ATTR_IGNORE_FORM_ACTION_URLS"><!-- --></A><H3>ATTR_IGNORE_FORM_ACTION_URLS</H3><PRE>public static final java.lang.String <B>ATTR_IGNORE_FORM_ACTION_URLS</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.ATTR_IGNORE_FORM_ACTION_URLS">Constant Field Values</A></DL></DL><HR><A NAME="ATTR_EXTRACT_JAVASCRIPT"><!-- --></A><H3>ATTR_EXTRACT_JAVASCRIPT</H3><PRE>public static final java.lang.String <B>ATTR_EXTRACT_JAVASCRIPT</B></PRE><DL><DD>whether to try finding links in Javscript; default true<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.ATTR_EXTRACT_JAVASCRIPT">Constant Field Values</A></DL></DL><HR><A NAME="ATTR_OVERLY_EAGER_LINK_DETECTION"><!-- --></A><H3>ATTR_OVERLY_EAGER_LINK_DETECTION</H3><PRE>public static final java.lang.String <B>ATTR_OVERLY_EAGER_LINK_DETECTION</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.ATTR_OVERLY_EAGER_LINK_DETECTION">Constant Field Values</A></DL></DL><HR><A NAME="ATTR_IGNORE_UNEXPECTED_HTML"><!-- --></A><H3>ATTR_IGNORE_UNEXPECTED_HTML</H3><PRE>public static final java.lang.String <B>ATTR_IGNORE_UNEXPECTED_HTML</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.ATTR_IGNORE_UNEXPECTED_HTML">Constant Field Values</A></DL></DL><HR><A NAME="numberOfCURIsHandled"><!-- --></A><H3>numberOfCURIsHandled</H3><PRE>protected long <B>numberOfCURIsHandled</B></PRE><DL><DL></DL></DL><HR><A NAME="numberOfLinksExtracted"><!-- --></A><H3>numberOfLinksExtracted</H3><PRE>protected long <B>numberOfLinksExtracted</B></PRE><DL><DL></DL></DL><HR><A NAME="JAVASCRIPT"><!-- --></A><H3>JAVASCRIPT</H3><PRE>static final java.lang.String <B>JAVASCRIPT</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.JAVASCRIPT">Constant Field Values</A></DL></DL><HR><A NAME="NON_HTML_PATH_EXTENSION"><!-- --></A><H3>NON_HTML_PATH_EXTENSION</H3><PRE>static final java.lang.String <B>NON_HTML_PATH_EXTENSION</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.NON_HTML_PATH_EXTENSION">Constant Field Values</A></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="ExtractorHTML(java.lang.String)"><!-- --></A><H3>ExtractorHTML</H3><PRE>public <B>ExtractorHTML</B>(java.lang.String&nbsp;name)</PRE><DL></DL><HR><A NAME="ExtractorHTML(java.lang.String, java.lang.String)"><!-- --></A><H3>ExtractorHTML</H3><PRE>public <B>ExtractorHTML</B>(java.lang.String&nbsp;name,                     java.lang.String&nbsp;description)</PRE><DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="processGeneralTag(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)"><!-- --></A><H3>processGeneralTag</H3><PRE>protected void <B>processGeneralTag</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,                                 java.lang.CharSequence&nbsp;element,                                 java.lang.CharSequence&nbsp;cs)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="processScriptCode(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)"><!-- --></A><H3>processScriptCode</H3><PRE>protected void <B>processScriptCode</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,                                 java.lang.CharSequence&nbsp;cs)</PRE><DL><DD>Extract the (java)script source in the given CharSequence.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - source CrawlURI<DD><CODE>cs</CODE> - CharSequence of javascript code</DL></DD></DL><HR><A NAME="processLink(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)"><!-- --></A><H3>processLink</H3><PRE>protected void <B>processLink</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,                           java.lang.CharSequence&nbsp;value,                           java.lang.CharSequence&nbsp;context)</PRE><DL><DD>Handle generic HREF cases.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>curi</CODE> - <DD><CODE>value</CODE> - <DD><CODE>context</CODE> - </DL></DD></DL><HR><A NAME="processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence)"><!-- --></A><H3>processEmbed</H3><PRE>protected final void <B>processEmbed</B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A>&nbsp;curi,                                  java.lang.CharSequence&nbsp;value,                                  java.lang.CharSequence&nbsp;context)</PRE><DL><DD><DL></DL></DD><DD><DL></DL></DD></DL><HR><A NAME="processEmbed(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, java.lang.CharSequence, char)"><!-- --></A><H3>processEmbed</H3>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?