jerichoextractorhtml.html
来自「网络爬虫开源代码」· HTML 代码 · 共 617 行 · 第 1/4 页
HTML
617 行
<!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected long</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#numberOfFormsProcessed">numberOfFormsProcessed</A></B></CODE><BR> </TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.crawler.extractor.ExtractorHTML"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from class org.archive.crawler.extractor.<A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html" title="class in org.archive.crawler.extractor">ExtractorHTML</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#APPLET">APPLET</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#ATTR_EXTRACT_JAVASCRIPT">ATTR_EXTRACT_JAVASCRIPT</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#ATTR_IGNORE_FORM_ACTION_URLS">ATTR_IGNORE_FORM_ACTION_URLS</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#ATTR_IGNORE_UNEXPECTED_HTML">ATTR_IGNORE_UNEXPECTED_HTML</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#ATTR_OVERLY_EAGER_LINK_DETECTION">ATTR_OVERLY_EAGER_LINK_DETECTION</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#ATTR_TREAT_FRAMES_AS_EMBED_LINKS">ATTR_TREAT_FRAMES_AS_EMBED_LINKS</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#BASE">BASE</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#CLASSEXT">CLASSEXT</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#EACH_ATTRIBUTE_EXTRACTOR">EACH_ATTRIBUTE_EXTRACTOR</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#FRAME">FRAME</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#IFRAME">IFRAME</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#JAVASCRIPT">JAVASCRIPT</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#LIKELY_URI_PATH">LIKELY_URI_PATH</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#LINK">LINK</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#MAX_ATTR_VAL_LENGTH">MAX_ATTR_VAL_LENGTH</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#NON_HTML_PATH_EXTENSION">NON_HTML_PATH_EXTENSION</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#numberOfCURIsHandled">numberOfCURIsHandled</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#numberOfLinksExtracted">numberOfLinksExtracted</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#RELEVANT_TAG_EXTRACTOR">RELEVANT_TAG_EXTRACTOR</A>, <A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#WHITESPACE">WHITESPACE</A></CODE></TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.crawler.framework.Processor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from class org.archive.crawler.framework.<A HREF="../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/framework/Processor.html#ATTR_DECIDE_RULES">ATTR_DECIDE_RULES</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#ATTR_ENABLED">ATTR_ENABLED</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#attrDecideRules">attrDecideRules</A></CODE></TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.crawler.settings.ComplexType"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from class org.archive.crawler.settings.<A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings">ComplexType</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/settings/ComplexType.html#definition">definition</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#definitionMap">definitionMap</A></CODE></TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.crawler.datamodel.CoreAttributeConstants"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from interface org.archive.crawler.datamodel.<A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html" title="interface in org.archive.crawler.datamodel">CoreAttributeConstants</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_ANNOTATIONS">A_ANNOTATIONS</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_CONTENT_DIGEST">A_CONTENT_DIGEST</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_CONTENT_TYPE">A_CONTENT_TYPE</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_CREDENTIAL_AVATARS_KEY">A_CREDENTIAL_AVATARS_KEY</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_DELAY_FACTOR">A_DELAY_FACTOR</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_DISTANCE_FROM_SEED">A_DISTANCE_FROM_SEED</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_DNS_FETCH_TIME">A_DNS_FETCH_TIME</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_DNS_SERVER_IP_LABEL">A_DNS_SERVER_IP_LABEL</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_ETAG_HEADER">A_ETAG_HEADER</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_FETCH_BEGAN_TIME">A_FETCH_BEGAN_TIME</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_FETCH_COMPLETED_TIME">A_FETCH_COMPLETED_TIME</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_FETCH_HISTORY">A_FETCH_HISTORY</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_FORCE_RETIRE">A_FORCE_RETIRE</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_HERITABLE_KEYS">A_HERITABLE_KEYS</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_HTML_BASE">A_HTML_BASE</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_HTTP_PROXY_HOST">A_HTTP_PROXY_HOST</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_HTTP_PROXY_PORT">A_HTTP_PROXY_PORT</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_HTTP_TRANSACTION">A_HTTP_TRANSACTION</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_LAST_MODIFIED_HEADER">A_LAST_MODIFIED_HEADER</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_LOCALIZED_ERRORS">A_LOCALIZED_ERRORS</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_META_ROBOTS">A_META_ROBOTS</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_MINIMUM_DELAY">A_MINIMUM_DELAY</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_MIRROR_PATH">A_MIRROR_PATH</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_PREREQUISITE_URI">A_PREREQUISITE_URI</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_REFERENCE_LENGTH">A_REFERENCE_LENGTH</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_RETRY_DELAY">A_RETRY_DELAY</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_RRECORD_SET_LABEL">A_RRECORD_SET_LABEL</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_RUNTIME_EXCEPTION">A_RUNTIME_EXCEPTION</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_SOURCE_TAG">A_SOURCE_TAG</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#A_STATUS">A_STATUS</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#HEADER_TRUNC">HEADER_TRUNC</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#LENGTH_TRUNC">LENGTH_TRUNC</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#TIMER_TRUNC">TIMER_TRUNC</A>, <A HREF="../../../../org/archive/crawler/datamodel/CoreAttributeConstants.html#TRUNC_SUFFIX">TRUNC_SUFFIX</A></CODE></TD></TR></TABLE> <!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#JerichoExtractorHTML(java.lang.String)">JerichoExtractorHTML</A></B>(java.lang.String name)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#JerichoExtractorHTML(java.lang.String, java.lang.String)">JerichoExtractorHTML</A></B>(java.lang.String name, java.lang.String description)</CODE><BR> </TD></TR></TABLE> <!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>(package private) void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#extract(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">extract</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Run extractor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#processForm(org.archive.crawler.datamodel.CrawlURI, au.id.jericho.lib.html.Element)">processForm</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, au.id.jericho.lib.html.Element element)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#processGeneralTag(org.archive.crawler.datamodel.CrawlURI, au.id.jericho.lib.html.Element, au.id.jericho.lib.html.Attributes)">processGeneralTag</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, au.id.jericho.lib.html.Element element, au.id.jericho.lib.html.Attributes attributes)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#processMeta(org.archive.crawler.datamodel.CrawlURI, au.id.jericho.lib.html.Element)">processMeta</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, au.id.jericho.lib.html.Element element)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#processScript(org.archive.crawler.datamodel.CrawlURI, au.id.jericho.lib.html.Element)">processScript</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, au.id.jericho.lib.html.Element element)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#processStyle(org.archive.crawler.datamodel.CrawlURI, au.id.jericho.lib.html.Element)">processStyle</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, au.id.jericho.lib.html.Element element)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/JerichoExtractorHTML.html#report()">report</A></B>()</CODE><BR> Compiles and returns a report (in human readable form) about the status of the processor.</TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.extractor.ExtractorHTML"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.extractor.<A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html" title="class in org.archive.crawler.extractor">ExtractorHTML</A></B></TH>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?