⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 html::parser.3

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 3
📖 第 1 页 / 共 4 页
字号:
.Vb 1\&  </A>.Ve.ie n .IP """end_document""" 4.el .IP "\f(CWend_document\fR" 4.IX Item "end_document"This event is triggered when \f(CW$p\fR\->eof is called and after any remainingtext is flushed.  There is no document text associated with this event..ie n .IP """process""" 4.el .IP "\f(CWprocess\fR" 4.IX Item "process"This event is triggered when a processing instructions markup isrecognized..SpThe format and content of processing instructions are system andapplication dependent..SpExamples:.Sp.Vb 2\&  <? HTML processing instructions >\&  <? XML processing instructions ?>.Ve.ie n .IP """start""" 4.el .IP "\f(CWstart\fR" 4.IX Item "start"This event is triggered when a start tag is recognized..SpExample:.Sp.Vb 1\&  <A HREF="http://www.perl.com/">.Ve.ie n .IP """start_document""" 4.el .IP "\f(CWstart_document\fR" 4.IX Item "start_document"This event is triggered before any other events for a new document.  Ahandler for it can be used to initialize stuff.  There is no documenttext associated with this event..ie n .IP """text""" 4.el .IP "\f(CWtext\fR" 4.IX Item "text"This event is triggered when plain text (characters) is recognized.The text may contain multiple lines.  A sequence of text may be brokenbetween several text events unless \f(CW$p\fR\->unbroken_text is enabled..SpThe parser will make sure that it does not break a word or a sequenceof whitespace between two text events..Sh "Unicode".IX Subsection "Unicode"The \f(CW\*(C`HTML::Parser\*(C'\fR can parse Unicode strings when running underperl\-5.8 or better.  If Unicode is passed to \f(CW$p\fR\->\fIparse()\fR then chunksof Unicode will be reported to the handlers.  The offset and lengthargspecs will also report their position in terms of characters..PPIt is safe to parse raw undecoded \s-1UTF\-8\s0 if you either avoid decodingentities and make sure to not use \fIargspecs\fR that do, or enable the\&\f(CW\*(C`utf8_mode\*(C'\fR for the parser.  Parsing of undecoded \s-1UTF\-8\s0 might beuseful when parsing from a file where you need the reported offsetsand lengths to match the byte offsets in the file..PPIf a filename is passed to \f(CW$p\fR\->\fIparse_file()\fR then the file will be readin binary mode.  This will be fine if the file contains only \s-1ASCII\s0 orLatin\-1 characters.  If the file contains \s-1UTF\-8\s0 encoded text then caremust be taken when decoding entities as described in the previousparagraph, but better is to open the file with the \s-1UTF\-8\s0 layer so thatit is decoded properly:.PP.Vb 2\&   open(my $fh, "<:utf8", "index.html") || die "...: $!";\&   $p\->parse_file($fh);.Ve.PPIf the file contains text encoded in a charset besides \s-1ASCII\s0, Latin\-1or \s-1UTF\-8\s0 then decoding will always be needed..SH "VERSION 2 COMPATIBILITY".IX Header "VERSION 2 COMPATIBILITY"When an \f(CW\*(C`HTML::Parser\*(C'\fR object is constructed with no arguments, a setof handlers is automatically provided that is compatible with the oldHTML::Parser version 2 callback methods..PPThis is equivalent to the following method calls:.PP.Vb 10\&   $p\->handler(start   => "start",   "self, tagname, attr, attrseq, text");\&   $p\->handler(end     => "end",     "self, tagname, text");\&   $p\->handler(text    => "text",    "self, text, is_cdata");\&   $p\->handler(process => "process", "self, token0, text");\&   $p\->handler(comment =>\&             sub {\&                 my($self, $tokens) = @_;\&                 for (@$tokens) {$self\->comment($_);}},\&             "self, tokens");\&   $p\->handler(declaration =>\&             sub {\&                 my $self = shift;\&                 $self\->declaration(substr($_[0], 2, \-1));},\&             "self, text");.Ve.PPSetting up these handlers can also be requested with the \*(L"api_version =>2\*(R" constructor option..SH "SUBCLASSING".IX Header "SUBCLASSING"The \f(CW\*(C`HTML::Parser\*(C'\fR class is subclassable.  Parser objects are plainhashes and \f(CW\*(C`HTML::Parser\*(C'\fR reserves only hash keys that start with\&\*(L"_hparser\*(R".  The parser state can be set up by invoking the \fIinit()\fRmethod, which takes the same arguments as \fInew()\fR..SH "EXAMPLES".IX Header "EXAMPLES"The first simple example shows how you might strip out comments froman \s-1HTML\s0 document.  We achieve this by setting up a comment handler thatdoes nothing and a default handler that will print out anything else:.PP.Vb 4\&  use HTML::Parser;\&  HTML::Parser\->new(default_h => [sub { print shift }, \*(Aqtext\*(Aq],\&                    comment_h => [""],\&                   )\->parse_file(shift || die) || die $!;.Ve.PPAn alternative implementation is:.PP.Vb 5\&  use HTML::Parser;\&  HTML::Parser\->new(end_document_h => [sub { print shift },\&                                       \*(Aqskipped_text\*(Aq],\&                    comment_h      => [""],\&                   )\->parse_file(shift || die) || die $!;.Ve.PPThis will in most cases be much more efficient since only a singlecallback will be made..PPThe next example prints out the text that is inside the <title>element of an \s-1HTML\s0 document.  Here we start by setting up a starthandler.  When it sees the title start tag it enables a text handlerthat prints any text found and an end handler that will terminateparsing as soon as the title end tag is seen:.PP.Vb 1\&  use HTML::Parser ();\&\&  sub start_handler\&  {\&    return if shift ne "title";\&    my $self = shift;\&    $self\->handler(text => sub { print shift }, "dtext");\&    $self\->handler(end  => sub { shift\->eof if shift eq "title"; },\&                           "tagname,self");\&  }\&\&  my $p = HTML::Parser\->new(api_version => 3);\&  $p\->handler( start => \e&start_handler, "tagname,self");\&  $p\->parse_file(shift || die) || die $!;\&  print "\en";.Ve.PPMore examples are found in the \fIeg/\fR directory of the \f(CW\*(C`HTML\-Parser\*(C'\fRdistribution: the program \f(CW\*(C`hrefsub\*(C'\fR shows how you can edit all linksfound in a document; the program \f(CW\*(C`htextsub\*(C'\fR shows how to edit the text only; theprogram \f(CW\*(C`hstrip\*(C'\fR shows how you can strip out certain tags/elementsand/or attributes; and the program \f(CW\*(C`htext\*(C'\fR show how to obtain theplain text, but not any script/style content..PPYou can browse the \fIeg/\fR directory online from the \fI[Browse]\fR link onthe http://search.cpan.org/~gaas/HTML\-Parser/ page..SH "BUGS".IX Header "BUGS"The <style> and <script> sections do not end with the first \*(L"</\*(R", butneed the complete corresponding end tag.  The standard behaviour isnot really practical..PPWhen the \fIstrict_comment\fR option is enabled, we still recognizecomments where there is something other than whitespace between evenand odd \*(L"\-\-\*(R" markers..PPOnce \f(CW$p\fR\->boolean_attribute_value has been set, there is no way torestore the default behaviour..PPThere is currently no way to get both quote charactersinto the same literal argspec..PPEmpty tags, e.g. \*(L"<>\*(R" and \*(L"</>\*(R", are not recognized.  \s-1SGML\s0 allows themto repeat the previous start tag or close the previous start tagrespectively..PP\&\s-1NET\s0 tags, e.g. \*(L"code/.../\*(R" are not recognized.  This is \s-1SGML\s0shorthand for \*(L"<code>...</code>\*(R"..PPUnclosed start or end tags, e.g. \*(L"<tt<b>...</b</tt>\*(R" are notrecognized..SH "DIAGNOSTICS".IX Header "DIAGNOSTICS"The following messages may be produced by HTML::Parser.  The notationin this listing is the same as used in perldiag:.IP "Not a reference to a hash" 4.IX Item "Not a reference to a hash"(F) The object blessed into or subclassed from HTML::Parser is not ahash as required by the HTML::Parser methods..ie n .IP "Bad signature in parser state object at %p" 4.el .IP "Bad signature in parser state object at \f(CW%p\fR" 4.IX Item "Bad signature in parser state object at %p"(F) The _hparser_xs_state element does not refer to a valid state structure.Something must have changed the internal valuestored in this hash element, or the memory has been overwritten..IP "_hparser_xs_state element is not a reference" 4.IX Item "_hparser_xs_state element is not a reference"(F) The _hparser_xs_state element has been destroyed..IP "Can't find '_hparser_xs_state' element in HTML::Parser hash" 4.IX Item "Can't find '_hparser_xs_state' element in HTML::Parser hash"(F) The _hparser_xs_state element is missing from the parser hash.It was either deleted, or not created when the object was created..ie n .IP "\s-1API\s0 version %s\fR not supported by HTML::Parser \f(CW%s" 4.el .IP "\s-1API\s0 version \f(CW%s\fR not supported by HTML::Parser \f(CW%s\fR" 4.IX Item "API version %s not supported by HTML::Parser %s"(F) The constructor option 'api_version' with an argument greater thanor equal to 4 is reserved for future extensions..IP "Bad constructor option '%s'" 4.IX Item "Bad constructor option '%s'"(F) An unknown constructor option key was passed to the \fInew()\fR or\&\fIinit()\fR methods..IP "Parse loop not allowed" 4.IX Item "Parse loop not allowed"(F) A handler invoked the \fIparse()\fR or \fIparse_file()\fR method.This is not permitted..IP "marked sections not supported" 4.IX Item "marked sections not supported"(F) The \f(CW$p\fR\->\fImarked_sections()\fR method was invoked in a HTML::Parsermodule that was compiled without support for marked sections..IP "Unknown boolean attribute (%d)" 4.IX Item "Unknown boolean attribute (%d)"(F) Something is wrong with the internal logic that set up aliases forboolean attributes..IP "Only code or array references allowed as handler" 4.IX Item "Only code or array references allowed as handler"(F) The second argument for \f(CW$p\fR\->handler must be either a subroutinereference, then name of a subroutine or method, or a reference to anarray..ie n .IP "No handler for %s events" 4.el .IP "No handler for \f(CW%s\fR events" 4.IX Item "No handler for %s events"(F) The first argument to \f(CW$p\fR\->handler must be a valid event name; i.e. oneof \*(L"start\*(R", \*(L"end\*(R", \*(L"text\*(R", \*(L"process\*(R", \*(L"declaration\*(R" or \*(L"comment\*(R"..ie n .IP "Unrecognized identifier %s in argspec" 4.el .IP "Unrecognized identifier \f(CW%s\fR in argspec" 4.IX Item "Unrecognized identifier %s in argspec"(F) The identifier is not a known argspec name.Use one of the names mentioned in the argspec section above..IP "Literal string is longer than 255 chars in argspec" 4.IX Item "Literal string is longer than 255 chars in argspec"(F) The current implementation limits the length of literals inan argspec to 255 characters.  Make the literal shorter..IP "Backslash reserved for literal string in argspec" 4.IX Item "Backslash reserved for literal string in argspec"(F) The backslash character \*(L"\e\*(R" is not allowed in argspec literals.It is reserved to permit quoting inside a literal in a later version..IP "Unterminated literal string in argspec" 4.IX Item "Unterminated literal string in argspec"(F) The terminating quote character for a literal was not found..IP "Bad argspec (%s)" 4.IX Item "Bad argspec (%s)"(F) Only identifier names, literals, spaces and commasare allowed in argspecs..IP "Missing comma separator in argspec" 4.IX Item "Missing comma separator in argspec"(F) Identifiers in an argspec must be separated with \*(L",\*(R"..IP "Parsing of undecoded \s-1UTF\-8\s0 will give garbage when decoding entities" 4.IX Item "Parsing of undecoded UTF-8 will give garbage when decoding entities"(W) The first chunk parsed appears to contain undecoded \s-1UTF\-8\s0 and oneor more argspecs that decode entities are used for the callbackhandlers..SpThe result of decoding will be a mix of encoded and decoded charactersfor any entities that expand to characters with code above 127.  Thisis not a good thing..SpThe solution is to use the \fIEncode::encode_utf8()\fR on the data beforefeeding it to the \f(CW$p\fR\->\fIparse()\fR.  For \f(CW$p\fR\->\fIparse_file()\fR pass a file thathas been opened in \*(L":utf8\*(R" mode..SpThe parser can process raw undecoded \s-1UTF\-8\s0 sanely if the \f(CW\*(C`utf8_mode\*(C'\fRis enabled or if the \*(L"attr\*(R", \*(L"@attr\*(R" or \*(L"dtext\*(R" argspecs is avoided..IP "Parsing string decoded with wrong endianess" 4.IX Item "Parsing string decoded with wrong endianess"(W) The first character in the document is U+FFFE.  This is not alegal Unicode character but a byte swapped \s-1BOM\s0.  The result of parsingwill likely be garbage..IP "Parsing of undecoded \s-1UTF\-32\s0" 4.IX Item "Parsing of undecoded UTF-32"(W) The parser found the Unicode \s-1UTF\-32\s0 \s-1BOM\s0 signature at the startof the document.  The result of parsing will likely be garbage..IP "Parsing of undecoded \s-1UTF\-16\s0" 4.IX Item "Parsing of undecoded UTF-16"(W) The parser found the Unicode \s-1UTF\-16\s0 \s-1BOM\s0 signature at the start ofthe document.  The result of parsing will likely be garbage..SH "SEE ALSO".IX Header "SEE ALSO"HTML::Entities, HTML::PullParser, HTML::TokeParser, HTML::HeadParser,HTML::LinkExtor, HTML::Form.PPHTML::TreeBuilder (part of the \fIHTML-Tree\fR distribution).PPhttp://www.w3.org/TR/html4.PPMore information about marked sections and processing instructions maybe found at \f(CW\*(C`http://www.sgml.u\-net.com/book/sgml\-8.htm\*(C'\fR..SH "COPYRIGHT".IX Header "COPYRIGHT".Vb 2\& Copyright 1996\-2007 Gisle Aas. All rights reserved.\& Copyright 1999\-2000 Michael A. Chase.  All rights reserved..Ve.PPThis library is free software; you can redistribute it and/ormodify it under the same terms as Perl itself.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -