⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 html::parser.3

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 3
📖 第 1 页 / 共 4 页
字号:
with a \s-1TRUE\s0 argument and disabled with a \s-1FALSE\s0 argument.  Theattribute value is left unchanged if no argument is given.  The returnvalue from each method is the old attribute value..PPMethods that can be used to get and/or set parser options are:.ie n .IP "$p\->attr_encoded" 4.el .IP "\f(CW$p\fR\->attr_encoded" 4.IX Item "$p->attr_encoded".PD 0.ie n .IP "$p\fR\->attr_encoded( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->attr_encoded( \f(CW$bool\fR )" 4.IX Item "$p->attr_encoded( $bool )".PDBy default, the \f(CW\*(C`attr\*(C'\fR and \f(CW@attr\fR argspecs will have generalentities for attribute values decoded.  Enabling this attribute leavesentities alone..ie n .IP "$p\fR\->boolean_attribute_value( \f(CW$val )" 4.el .IP "\f(CW$p\fR\->boolean_attribute_value( \f(CW$val\fR )" 4.IX Item "$p->boolean_attribute_value( $val )"This method sets the value reported for boolean attributes inside \s-1HTML\s0start tags.  By default, the name of the attribute is also used as itsvalue.  This affects the values reported for \f(CW\*(C`tokens\*(C'\fR and \f(CW\*(C`attr\*(C'\fRargspecs..ie n .IP "$p\->case_sensitive" 4.el .IP "\f(CW$p\fR\->case_sensitive" 4.IX Item "$p->case_sensitive".PD 0.ie n .IP "$p\fR\->case_sensitive( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->case_sensitive( \f(CW$bool\fR )" 4.IX Item "$p->case_sensitive( $bool )".PDBy default, tagnames and attribute names are down-cased.  Enabling thisattribute leaves them as found in the \s-1HTML\s0 source document..ie n .IP "$p\->closing_plaintext" 4.el .IP "\f(CW$p\fR\->closing_plaintext" 4.IX Item "$p->closing_plaintext".PD 0.ie n .IP "$p\fR\->closing_plaintext( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->closing_plaintext( \f(CW$bool\fR )" 4.IX Item "$p->closing_plaintext( $bool )".PDBy default, \*(L"plaintext\*(R" element can never be closed. Everything up tothe end of the document is parsed in \s-1CDATA\s0 mode.  This historicalbehaviour is what at least \s-1MSIE\s0 does.  Enabling this attribute makesclosing \*(L"</plaintext>\*(R" tag effective and the parsing process will resumeafter seeing this tag.  This emulates gecko-based browsers..ie n .IP "$p\->empty_element_tags" 4.el .IP "\f(CW$p\fR\->empty_element_tags" 4.IX Item "$p->empty_element_tags".PD 0.ie n .IP "$p\fR\->empty_element_tags( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->empty_element_tags( \f(CW$bool\fR )" 4.IX Item "$p->empty_element_tags( $bool )".PDBy default, empty element tags are not recognized as such and the \*(L"/\*(R"before \*(L">\*(R" is just treated like a normal name character (unless\&\f(CW\*(C`strict_names\*(C'\fR is enabled).  Enabling this attribute make\&\f(CW\*(C`HTML::Parser\*(C'\fR recognize these tags..SpEmpty element tags look like start tags, but end with the charactersequence \*(L"/>\*(R" instead of \*(L">\*(R".  When recognized by \f(CW\*(C`HTML::Parser\*(C'\fR theycause an artificial end event in addition to the start event.  The\&\f(CW\*(C`text\*(C'\fR for the artificial end event will be empty and the \f(CW\*(C`tokenpos\*(C'\fRarray will be undefined even though the the token array will have oneelement containing the tag name..ie n .IP "$p\->marked_sections" 4.el .IP "\f(CW$p\fR\->marked_sections" 4.IX Item "$p->marked_sections".PD 0.ie n .IP "$p\fR\->marked_sections( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->marked_sections( \f(CW$bool\fR )" 4.IX Item "$p->marked_sections( $bool )".PDBy default, section markings like <![CDATA[...]]> are treated likeordinary text.  When this attribute is enabled section markings arehonoured..SpThere are currently no events associated with the marked sectionmarkup, but the text can be returned as \f(CW\*(C`skipped_text\*(C'\fR..ie n .IP "$p\->strict_comment" 4.el .IP "\f(CW$p\fR\->strict_comment" 4.IX Item "$p->strict_comment".PD 0.ie n .IP "$p\fR\->strict_comment( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->strict_comment( \f(CW$bool\fR )" 4.IX Item "$p->strict_comment( $bool )".PDBy default, comments are terminated by the first occurrence of \*(L"\-\->\*(R".This is the behaviour of most popular browsers (like Mozilla, Opera and\&\s-1MSIE\s0), but it is not correct according to the official \s-1HTML\s0standard.  Officially, you need an even number of \*(L"\-\-\*(R" tokens beforethe closing \*(L">\*(R" is recognized and there may not be anything butwhitespace between an even and an odd \*(L"\-\-\*(R"..SpThe official behaviour is enabled by enabling this attribute..SpEnabling of 'strict_comment' also disables recognizing these forms ascomments:.Sp.Vb 2\&  </ comment>\&  <! comment>.Ve.ie n .IP "$p\->strict_end" 4.el .IP "\f(CW$p\fR\->strict_end" 4.IX Item "$p->strict_end".PD 0.ie n .IP "$p\fR\->strict_end( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->strict_end( \f(CW$bool\fR )" 4.IX Item "$p->strict_end( $bool )".PDBy default, attributes and other junk are allowed to be present on end tags in amanner that emulates \s-1MSIE\s0's behaviour..SpThe official behaviour is enabled with this attribute.  If enabled,only whitespace is allowed between the tagname and the final \*(L">\*(R"..ie n .IP "$p\->strict_names" 4.el .IP "\f(CW$p\fR\->strict_names" 4.IX Item "$p->strict_names".PD 0.ie n .IP "$p\fR\->strict_names( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->strict_names( \f(CW$bool\fR )" 4.IX Item "$p->strict_names( $bool )".PDBy default, almost anything is allowed in tag and attribute names.This is the behaviour of most popular browsers and allows us to parsesome broken tags with invalid attribute values like:.Sp.Vb 1\&   <IMG SRC=newprevlstGr.gif ALT=[PREV LIST] BORDER=0>.Ve.SpBy default, \*(L"\s-1LIST\s0]\*(R" is parsed as a boolean attribute, not aspart of the \s-1ALT\s0 value as was clearly intended.  This is also whatMozilla sees..SpThe official behaviour is enabled by enabling this attribute.  Ifenabled, it will cause the tag above to be reported as textsince \*(L"\s-1LIST\s0]\*(R" is not a legal attribute name..ie n .IP "$p\->unbroken_text" 4.el .IP "\f(CW$p\fR\->unbroken_text" 4.IX Item "$p->unbroken_text".PD 0.ie n .IP "$p\fR\->unbroken_text( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->unbroken_text( \f(CW$bool\fR )" 4.IX Item "$p->unbroken_text( $bool )".PDBy default, blocks of text are given to the text handler as soon aspossible (but the parser takes care always to break text at aboundary between whitespace and non-whitespace so single words andentities can always be decoded safely).  This might create breaks thatmake it hard to do transformations on the text. When this attribute isenabled, blocks of text are always reported in one piece.  This willdelay the text event until the following (non-text) event has beenrecognized by the parser..SpNote that the \f(CW\*(C`offset\*(C'\fR argspec will give you the offset of the firstsegment of text and \f(CW\*(C`length\*(C'\fR is the combined length of the segments.Since there might be ignored tags in between, these numbers can't beused to directly index in the original document file..ie n .IP "$p\->utf8_mode" 4.el .IP "\f(CW$p\fR\->utf8_mode" 4.IX Item "$p->utf8_mode".PD 0.ie n .IP "$p\fR\->utf8_mode( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->utf8_mode( \f(CW$bool\fR )" 4.IX Item "$p->utf8_mode( $bool )".PDEnable this option when parsing raw undecoded \s-1UTF\-8\s0.  This tells theparser that the entities expanded for strings reported by \f(CW\*(C`attr\*(C'\fR,\&\f(CW@attr\fR and \f(CW\*(C`dtext\*(C'\fR should be expanded as decoded \s-1UTF\-8\s0 so they endup compatible with the surrounding text..SpIf \f(CW\*(C`utf8_mode\*(C'\fR is enabled then it is an error to pass stringscontaining characters with code above 255 to the \fIparse()\fR method, andthe \fIparse()\fR method will croak if you try..SpExample: The Unicode character \*(L"\ex{2665}\*(R" is \*(L"\exE2\ex99\exA5\*(R" when \s-1UTF\-8\s0encoded.  The character can also be represented by the entity\&\*(L"&hearts;\*(R" or \*(L"&#x2665\*(R".  If we feed the parser:.Sp.Vb 1\&  $p\->parse("\exE2\ex99\exA5&hearts;");.Ve.Spthen \f(CW\*(C`dtext\*(C'\fR will be reported as \*(L"\exE2\ex99\exA5\ex{2665}\*(R" without\&\f(CW\*(C`utf8_mode\*(C'\fR enabled, but as \*(L"\exE2\ex99\exA5\exE2\ex99\exA5\*(R" when enabled.The later string is what you want..SpThis option is only available with perl\-5.8 or better..ie n .IP "$p\->xml_mode" 4.el .IP "\f(CW$p\fR\->xml_mode" 4.IX Item "$p->xml_mode".PD 0.ie n .IP "$p\fR\->xml_mode( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->xml_mode( \f(CW$bool\fR )" 4.IX Item "$p->xml_mode( $bool )".PDEnabling this attribute changes the parser to allow some \s-1XML\s0constructs.  This enables the behaviour controlled by individually bythe \f(CW\*(C`case_sensitive\*(C'\fR, \f(CW\*(C`empty_element_tags\*(C'\fR, \f(CW\*(C`strict_names\*(C'\fR and\&\f(CW\*(C`xml_pic\*(C'\fR attributes and also suppresses special treatment ofelements that are parsed as \s-1CDATA\s0 for \s-1HTML\s0..ie n .IP "$p\->xml_pic" 4.el .IP "\f(CW$p\fR\->xml_pic" 4.IX Item "$p->xml_pic".PD 0.ie n .IP "$p\fR\->xml_pic( \f(CW$bool )" 4.el .IP "\f(CW$p\fR\->xml_pic( \f(CW$bool\fR )" 4.IX Item "$p->xml_pic( $bool )".PDBy default, \fIprocessing instructions\fR are terminated by \*(L">\*(R". Whenthis attribute is enabled, processing instructions are terminated by\&\*(L"?>\*(R" instead..PPAs markup and text is recognized, handlers are invoked.  The followingmethod is used to set up handlers for different events:.ie n .IP "$p\fR\->handler( event => \e&subroutine, \f(CW$argspec )" 4.el .IP "\f(CW$p\fR\->handler( event => \e&subroutine, \f(CW$argspec\fR )" 4.IX Item "$p->handler( event => &subroutine, $argspec )".PD 0.ie n .IP "$p\fR\->handler( event => \f(CW$method_name\fR, \f(CW$argspec )" 4.el .IP "\f(CW$p\fR\->handler( event => \f(CW$method_name\fR, \f(CW$argspec\fR )" 4.IX Item "$p->handler( event => $method_name, $argspec )".ie n .IP "$p\fR\->handler( event => \e@accum, \f(CW$argspec )" 4.el .IP "\f(CW$p\fR\->handler( event => \e@accum, \f(CW$argspec\fR )" 4.IX Item "$p->handler( event => @accum, $argspec )".ie n .IP "$p\->handler( event => """" );" 4.el .IP "\f(CW$p\fR\->handler( event => ``'' );" 4.IX Item "$p->handler( event => """" );".ie n .IP "$p\->handler( event => undef );" 4.el .IP "\f(CW$p\fR\->handler( event => undef );" 4.IX Item "$p->handler( event => undef );".ie n .IP "$p\->handler( event );" 4.el .IP "\f(CW$p\fR\->handler( event );" 4.IX Item "$p->handler( event );".PDThis method assigns a subroutine, method, or array to handle an event..SpEvent is one of \f(CW\*(C`text\*(C'\fR, \f(CW\*(C`start\*(C'\fR, \f(CW\*(C`end\*(C'\fR, \f(CW\*(C`declaration\*(C'\fR, \f(CW\*(C`comment\*(C'\fR,\&\f(CW\*(C`process\*(C'\fR, \f(CW\*(C`start_document\*(C'\fR, \f(CW\*(C`end_document\*(C'\fR or \f(CW\*(C`default\*(C'\fR..SpThe \f(CW\*(C`\e&subroutine\*(C'\fR is a reference to a subroutine which is called to handlethe event..SpThe \f(CW$method_name\fR is the name of a method of \f(CW$p\fR which is called to handlethe event..SpThe \f(CW@accum\fR is an array that will hold the event information assub-arrays..SpIf the second argument is "", the event is ignored.If it is undef, the default handler is invoked for the event..SpThe \f(CW$argspec\fR is a string that describes the information to be reportedfor the event.  Any requested information that does not apply to aspecific event is passed as \f(CW\*(C`undef\*(C'\fR.  If argspec is omitted, then itis left unchanged..SpThe return value from \f(CW$p\fR\->handler is the old callback routine or areference to the accumulator array..SpAny return values from handler callback routines/methods are alwaysignored.  A handler callback can request parsing to be aborted byinvoking the \f(CW$p\fR\->eof method.  A handler callback is not allowed toinvoke the \f(CW$p\fR\->\fIparse()\fR or \f(CW$p\fR\->\fIparse_file()\fR method.  An exception willbe raised if it tries..SpExamples:.Sp.Vb 1\&    $p\->handler(start =>  "start", \*(Aqself, attr, attrseq, text\*(Aq );.Ve.SpThis causes the \*(L"start\*(R" method of object \f(CW$p\fR to be called for 'start' events.The callback signature is \f(CW$p\fR\->start(\e%attr, \e@attr_seq, \f(CW$text\fR)..Sp.Vb 1\&    $p\->handler(start =>  \e&start, \*(Aqattr, attrseq, text\*(Aq );.Ve.SpThis causes subroutine \fIstart()\fR to be called for 'start' events.The callback signature is start(\e%attr, \e@attr_seq, \f(CW$text\fR)..Sp.Vb 1\&    $p\->handler(start =>  \e@accum, \*(Aq"S", attr, attrseq, text\*(Aq );.Ve.SpThis causes 'start' event information to be saved in \f(CW@accum\fR.The array elements will be ['S', \e%attr, \e@attr_seq, \f(CW$text\fR]..Sp.Vb 1\&   $p\->handler(start => "");.Ve.SpThis causes 'start' events to be ignored.  It also suppressesinvocations of any default handler for start events.  It is in mostcases equivalent to \f(CW$p\fR\->handler(start => sub {}), but is moreefficient.  It is different from the empty-sub-handler in that\&\f(CW\*(C`skipped_text\*(C'\fR is not reset by it..Sp.Vb 1\&   $p\->handler(start => undef);.Ve.SpThis causes no handler to be associated with start events.If there is a default handler it will be invoked..PPFilters based on tags can be set up to limit the number of eventsreported.  The main bottleneck during parsing is often the huge numberof callbacks made from the parser.  Applying filters can improveperformance significantly..PPThe following methods control filters:.ie n .IP "$p\fR\->ignore_elements( \f(CW@tags )" 4.el .IP "\f(CW$p\fR\->ignore_elements( \f(CW@tags\fR )" 4.IX Item "$p->ignore_elements( @tags )"Both the \f(CW\*(C`start\*(C'\fR event and the \f(CW\*(C`end\*(C'\fR event as well as any events that

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -