extractorhtml.html
来自「网络爬虫开源代码」· HTML 代码 · 共 1,082 行 · 第 1/5 页
HTML
1,082 行
<BR> Handle generic HREF cases.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#processMeta(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">processMeta</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Process metadata tags.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#processScript(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, int)">processScript</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> </TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#processScriptCode(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence)">processScriptCode</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence cs)</CODE><BR> Extract the (java)script source in the given CharSequence.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>protected void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#processStyle(org.archive.crawler.datamodel.CrawlURI, java.lang.CharSequence, int)">processStyle</A></B>(<A HREF="../../../../org/archive/crawler/datamodel/CrawlURI.html" title="class in org.archive.crawler.datamodel">CrawlURI</A> curi, java.lang.CharSequence sequence, int endOfOpenTag)</CODE><BR> Process style text.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../org/archive/crawler/extractor/ExtractorHTML.html#report()">report</A></B>()</CODE><BR> Compiles and returns a report (in human readable form) about the status of the processor.</TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.extractor.Extractor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.extractor.<A HREF="../../../../org/archive/crawler/extractor/Extractor.html" title="class in org.archive.crawler.extractor">Extractor</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/extractor/Extractor.html#innerProcess(org.archive.crawler.datamodel.CrawlURI)">innerProcess</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.framework.Processor"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.framework.<A HREF="../../../../org/archive/crawler/framework/Processor.html" title="class in org.archive.crawler.framework">Processor</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/framework/Processor.html#checkForInterrupt()">checkForInterrupt</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#finalTasks()">finalTasks</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#getController()">getController</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#getDecideRule(java.lang.Object)">getDecideRule</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#getDefaultNextProcessor(org.archive.crawler.datamodel.CrawlURI)">getDefaultNextProcessor</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#initialTasks()">initialTasks</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#innerRejectProcess(org.archive.crawler.datamodel.CrawlURI)">innerRejectProcess</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#isContentToProcess(org.archive.crawler.datamodel.CrawlURI)">isContentToProcess</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#isExpectedMimeType(java.lang.String, java.lang.String)">isExpectedMimeType</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#isHttpTransactionContentToProcess(org.archive.crawler.datamodel.CrawlURI)">isHttpTransactionContentToProcess</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#kickUpdate()">kickUpdate</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#process(org.archive.crawler.datamodel.CrawlURI)">process</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#rulesAccept(org.archive.crawler.deciderules.DecideRule, java.lang.Object)">rulesAccept</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#rulesAccept(java.lang.Object)">rulesAccept</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#setDefaultNextProcessor(org.archive.crawler.framework.Processor)">setDefaultNextProcessor</A>, <A HREF="../../../../org/archive/crawler/framework/Processor.html#spawn(int)">spawn</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.settings.ModuleType"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.settings.<A HREF="../../../../org/archive/crawler/settings/ModuleType.html" title="class in org.archive.crawler.settings">ModuleType</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/settings/ModuleType.html#addElement(org.archive.crawler.settings.CrawlerSettings, org.archive.crawler.settings.Type)">addElement</A>, <A HREF="../../../../org/archive/crawler/settings/ModuleType.html#listUsedFiles(java.util.List)">listUsedFiles</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.settings.ComplexType"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.settings.<A HREF="../../../../org/archive/crawler/settings/ComplexType.html" title="class in org.archive.crawler.settings">ComplexType</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/settings/ComplexType.html#addElementToDefinition(org.archive.crawler.settings.Type)">addElementToDefinition</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#checkValue(org.archive.crawler.settings.CrawlerSettings, java.lang.String, java.lang.Object)">checkValue</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#earlyInitialize(org.archive.crawler.settings.CrawlerSettings)">earlyInitialize</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAbsoluteName()">getAbsoluteName</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.Object, java.lang.String)">getAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String)">getAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttribute(java.lang.String, org.archive.crawler.datamodel.CrawlURI)">getAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttributeInfo(org.archive.crawler.settings.CrawlerSettings, java.lang.String)">getAttributeInfo</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttributeInfo(java.lang.String)">getAttributeInfo</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttributeInfoIterator(java.lang.Object)">getAttributeInfoIterator</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getAttributes(java.lang.String[])">getAttributes</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getDataContainerRecursive(org.archive.crawler.settings.ComplexType.Context)">getDataContainerRecursive</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getDataContainerRecursive(org.archive.crawler.settings.ComplexType.Context, java.lang.String)">getDataContainerRecursive</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getDefaultValue()">getDefaultValue</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getDescription()">getDescription</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getElementFromDefinition(java.lang.String)">getElementFromDefinition</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getLegalValues()">getLegalValues</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getLocalAttribute(org.archive.crawler.settings.CrawlerSettings, java.lang.String)">getLocalAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getMBeanInfo()">getMBeanInfo</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getMBeanInfo(java.lang.Object)">getMBeanInfo</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getParent()">getParent</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getPreservedFields()">getPreservedFields</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getSettingsHandler()">getSettingsHandler</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getUncheckedAttribute(java.lang.Object, java.lang.String)">getUncheckedAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#getValue()">getValue</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#globalSettings()">globalSettings</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#invoke(java.lang.String, java.lang.Object[], java.lang.String[])">invoke</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#isInitialized()">isInitialized</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#isOverridden(org.archive.crawler.settings.CrawlerSettings, java.lang.String)">isOverridden</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#iterator(java.lang.Object)">iterator</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#removeElementFromDefinition(java.lang.String)">removeElementFromDefinition</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAsOrder(org.archive.crawler.settings.SettingsHandler)">setAsOrder</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAttribute(javax.management.Attribute)">setAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAttribute(org.archive.crawler.settings.CrawlerSettings, javax.management.Attribute)">setAttribute</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setAttributes(javax.management.AttributeList)">setAttributes</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setDescription(java.lang.String)">setDescription</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#setPreservedFields(java.lang.String[])">setPreservedFields</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#toString()">toString</A>, <A HREF="../../../../org/archive/crawler/settings/ComplexType.html#unsetAttribute(org.archive.crawler.settings.CrawlerSettings, java.lang.String)">unsetAttribute</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_org.archive.crawler.settings.Type"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class org.archive.crawler.settings.<A HREF="../../../../org/archive/crawler/settings/Type.html" title="class in org.archive.crawler.settings">Type</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/crawler/settings/Type.html#addConstraint(org.archive.crawler.settings.Constraint)">addConstraint</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#equals(java.lang.Object)">equals</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#getConstraints()">getConstraints</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#getLegalValueType()">getLegalValueType</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#isExpertSetting()">isExpertSetting</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#isOverrideable()">isOverrideable</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#isTransient()">isTransient</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#setExpertSetting(boolean)">setExpertSetting</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#setLegalValueType(java.lang.Class)">setLegalValueType</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#setOverrideable(boolean)">setOverrideable</A>, <A HREF="../../../../org/archive/crawler/settings/Type.html#setTransient(boolean)">setTransient</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_javax.management.Attribute"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class javax.management.Attribute</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>getName</CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait</CODE></TD></TR></TABLE> <P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="RELEVANT_TAG_EXTRACTOR"><!-- --></A><H3>RELEVANT_TAG_EXTRACTOR</H3><PRE>static final java.lang.String <B>RELEVANT_TAG_EXTRACTOR</B></PRE><DL><DL></DL></DL><HR><A NAME="MAX_ATTR_VAL_LENGTH"><!-- --></A><H3>MAX_ATTR_VAL_LENGTH</H3><PRE>static final int <B>MAX_ATTR_VAL_LENGTH</B></PRE><DL><DL></DL></DL><HR><A NAME="EACH_ATTRIBUTE_EXTRACTOR"><!-- --></A><H3>EACH_ATTRIBUTE_EXTRACTOR</H3><PRE>static final java.lang.String <B>EACH_ATTRIBUTE_EXTRACTOR</B></PRE><DL><DL></DL></DL><HR><A NAME="LIKELY_URI_PATH"><!-- --></A><H3>LIKELY_URI_PATH</H3><PRE>static final java.lang.String <B>LIKELY_URI_PATH</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.LIKELY_URI_PATH">Constant Field Values</A></DL></DL><HR><A NAME="WHITESPACE"><!-- --></A><H3>WHITESPACE</H3><PRE>static final java.lang.String <B>WHITESPACE</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.WHITESPACE">Constant Field Values</A></DL></DL><HR><A NAME="CLASSEXT"><!-- --></A><H3>CLASSEXT</H3><PRE>static final java.lang.String <B>CLASSEXT</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.CLASSEXT">Constant Field Values</A></DL></DL><HR><A NAME="APPLET"><!-- --></A><H3>APPLET</H3><PRE>static final java.lang.String <B>APPLET</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.APPLET">Constant Field Values</A></DL></DL><HR><A NAME="BASE"><!-- --></A><H3>BASE</H3><PRE>static final java.lang.String <B>BASE</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.BASE">Constant Field Values</A></DL></DL><HR><A NAME="LINK"><!-- --></A><H3>LINK</H3><PRE>static final java.lang.String <B>LINK</B></PRE><DL><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#org.archive.crawler.extractor.ExtractorHTML.LINK">Constant Field Values</A></DL></DL><HR>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?