⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 parser.pm

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 PM
📖 第 1 页 / 共 3 页
字号:
package HTML::Parser;# Copyright 1996-2007, Gisle Aas.# Copyright 1999-2000, Michael A. Chase.## This library is free software; you can redistribute it and/or# modify it under the same terms as Perl itself.use strict;use vars qw($VERSION @ISA);$VERSION = '3.56';  # $Date: 2007/01/12 09:18:31 $require HTML::Entities;require XSLoader;XSLoader::load('HTML::Parser', $VERSION);sub new{    my $class = shift;    my $self = bless {}, $class;    return $self->init(@_);}sub init{    my $self = shift;    $self->_alloc_pstate;    my %arg = @_;    my $api_version = delete $arg{api_version} || (@_ ? 3 : 2);    if ($api_version >= 4) {	require Carp;	Carp::croak("API version $api_version not supported " .		    "by HTML::Parser $VERSION");    }    if ($api_version < 3) {	# Set up method callbacks compatible with HTML-Parser-2.xx	$self->handler(text    => "text",    "self,text,is_cdata");	$self->handler(end     => "end",     "self,tagname,text");	$self->handler(process => "process", "self,token0,text");	$self->handler(start   => "start",		                  "self,tagname,attr,attrseq,text");	$self->handler(comment =>		       sub {			   my($self, $tokens) = @_;			   for (@$tokens) {			       $self->comment($_);			   }		       }, "self,tokens");	$self->handler(declaration =>		       sub {			   my $self = shift;			   $self->declaration(substr($_[0], 2, -1));		       }, "self,text");    }    if (my $h = delete $arg{handlers}) {	$h = {@$h} if ref($h) eq "ARRAY";	while (my($event, $cb) = each %$h) {	    $self->handler($event => @$cb);	}    }    # In the end we try to assume plain attribute or handler    while (my($option, $val) = each %arg) {	if ($option =~ /^(\w+)_h$/) {	    $self->handler($1 => @$val);	}        elsif ($option =~ /^(text|start|end|process|declaration|comment)$/) {	    require Carp;	    Carp::croak("Bad constructor option '$option'");        }	else {	    $self->$option($val);	}    }    return $self;}sub parse_file{    my($self, $file) = @_;    my $opened;    if (!ref($file) && ref(\$file) ne "GLOB") {        # Assume $file is a filename        local(*F);        open(F, $file) || return undef;	binmode(F);  # should we? good for byte counts        $opened++;        $file = *F;    }    my $chunk = '';    while (read($file, $chunk, 512)) {	$self->parse($chunk) || last;    }    close($file) if $opened;    $self->eof;}sub netscape_buggy_comment  # legacy{    my $self = shift;    require Carp;    Carp::carp("netscape_buggy_comment() is deprecated.  " .	       "Please use the strict_comment() method instead");    my $old = !$self->strict_comment;    $self->strict_comment(!shift) if @_;    return $old;}# set up method stubssub text { }*start       = \&text;*end         = \&text;*comment     = \&text;*declaration = \&text;*process     = \&text;1;__END__=head1 NAMEHTML::Parser - HTML parser class=head1 SYNOPSIS use HTML::Parser (); # Create parser object $p = HTML::Parser->new( api_version => 3,                         start_h => [\&start, "tagname, attr"],                         end_h   => [\&end,   "tagname"],                         marked_sections => 1,                       ); # Parse document text chunk by chunk $p->parse($chunk1); $p->parse($chunk2); #... $p->eof;                 # signal end of document # Parse directly from file $p->parse_file("foo.html"); # or open(my $fh, "<:utf8", "foo.html") || die; $p->parse_file($fh);=head1 DESCRIPTIONObjects of the C<HTML::Parser> class will recognize markup andseparate it from plain text (alias data content) in HTMLdocuments.  As different kinds of markup and text are recognized, thecorresponding event handlers are invoked.C<HTML::Parser> is not a generic SGML parser.  We have tried tomake it able to deal with the HTML that is actually "out there", andit normally parses as closely as possible to the way the popular webbrowsers do it instead of strictly following one of the many HTMLspecifications from W3C.  Where there is disagreement, there is oftenan option that you can enable to get the official behaviour.The document to be parsed may be supplied in arbitrary chunks.  Thismakes on-the-fly parsing as documents are received from the networkpossible.If event driven parsing does not feel right for your application, youmight want to use C<HTML::PullParser>.  This is an C<HTML::Parser>subclass that allows a more conventional program structure.=head1 METHODSThe following method is used to construct a new C<HTML::Parser> object:=over=item $p = HTML::Parser->new( %options_and_handlers )This class method creates a new C<HTML::Parser> object andreturns it.  Key/value argument pairs may be provided to assign eventhandlers or initialize parser options.  The handlers and parseroptions can also be set or modified later by the method calls described below.If a top level key is in the form "<event>_h" (e.g., "text_h") then itassigns a handler to that event, otherwise it initializes a parseroption. The event handler specification value must be an arrayreference.  Multiple handlers may also be assigned with the 'handlers=> [%handlers]' option.  See examples below.If new() is called without any arguments, it will create a parser thatuses callback methods compatible with version 2 of C<HTML::Parser>.See the section on "version 2 compatibility" below for details.The special constructor option 'api_version => 2' can be used toinitialize version 2 callbacks while still setting other options andhandlers.  The 'api_version => 3' option can be used if you don't wantto set any options and don't want to fall back to v2 compatiblemode.Examples: $p = HTML::Parser->new(api_version => 3,                        text_h => [ sub {...}, "dtext" ]);This creates a new parser object with a text event handler subroutinethat receives the original text with general entities decoded. $p = HTML::Parser->new(api_version => 3,			start_h => [ 'my_start', "self,tokens" ]);This creates a new parser object with a start event handler methodthat receives the $p and the tokens array. $p = HTML::Parser->new(api_version => 3,		        handlers => { text => [\@array, "event,text"],                                      comment => [\@array, "event,text"],                                    });This creates a new parser object that stores the event type and theoriginal text in @array for text and comment events.=backThe following methods feed the HTML documentto the C<HTML::Parser> object:=over=item $p->parse( $string )Parse $string as the next chunk of the HTML document.  The returnvalue is normally a reference to the parser object (i.e. $p).Handlers invoked should not attempt to modify the $string in-place until$p->parse returns.If an invoked event handler aborts parsing by calling $p->eof, then$p->parse() will return a FALSE value.=item $p->parse( $code_ref )If a code reference is passed as the argument to be parsed, then thechunks to be parsed are obtained by invoking this function repeatedly.Parsing continues until the function returns an empty (or undefined)result.  When this happens $p->eof is automatically signaled.Parsing will also abort if one of the event handlers calls $p->eof.The effect of this is the same as: while (1) {    my $chunk = &$code_ref();    if (!defined($chunk) || !length($chunk)) {        $p->eof;        return $p;    }    $p->parse($chunk) || return undef; }But it is more efficient as this loop runs internally in XS code.=item $p->parse_file( $file )Parse text directly from a file.  The $file argument can be afilename, an open file handle, or a reference to an open filehandle.If $file contains a filename and the file can't be opened, then themethod returns an undefined value and $! tells why it failed.Otherwise the return value is a reference to the parser object.If a file handle is passed as the $file argument, then the file willnormally be read until EOF, but not closed.If an invoked event handler aborts parsing by calling $p->eof,then $p->parse_file() may not have read the entire file.On systems with multi-byte line terminators, the values passed for theoffset and length argspecs may be too low if parse_file() is called ona file handle that is not in binary mode.If a filename is passed in, then parse_file() will open the file inbinary mode.=item $p->eofSignals the end of the HTML document.  Calling the $p->eof methodoutside a handler callback will flush any remaining buffered text(which triggers the C<text> event if there is any remaining text).Calling $p->eof inside a handler will terminate parsing at that pointand cause $p->parse to return a FALSE value.  This also terminatesparsing by $p->parse_file().After $p->eof has been called, the parse() and parse_file() methodscan be invoked to feed new documents with the parser object.The return value from eof() is a reference to the parser object.=backMost parser options are controlled by boolean attributes.Each boolean attribute is enabled by calling the corresponding methodwith a TRUE argument and disabled with a FALSE argument.  Theattribute value is left unchanged if no argument is given.  The returnvalue from each method is the old attribute value.Methods that can be used to get and/or set parser options are:=over=item $p->attr_encoded=item $p->attr_encoded( $bool )By default, the C<attr> and C<@attr> argspecs will have generalentities for attribute values decoded.  Enabling this attribute leavesentities alone.=item $p->boolean_attribute_value( $val )This method sets the value reported for boolean attributes inside HTMLstart tags.  By default, the name of the attribute is also used as itsvalue.  This affects the values reported for C<tokens> and C<attr>argspecs.=item $p->case_sensitive=item $p->case_sensitive( $bool )By default, tagnames and attribute names are down-cased.  Enabling thisattribute leaves them as found in the HTML source document.=item $p->closing_plaintext=item $p->closing_plaintext( $bool )By default, "plaintext" element can never be closed. Everything up tothe end of the document is parsed in CDATA mode.  This historicalbehaviour is what at least MSIE does.  Enabling this attribute makesclosing "</plaintext>" tag effective and the parsing process will resumeafter seeing this tag.  This emulates gecko-based browsers.=item $p->empty_element_tags=item $p->empty_element_tags( $bool )By default, empty element tags are not recognized as such and the "/"before ">" is just treated like a normal name character (unlessC<strict_names> is enabled).  Enabling this attribute makeC<HTML::Parser> recognize these tags.Empty element tags look like start tags, but end with the charactersequence "/>" instead of ">".  When recognized by C<HTML::Parser> theycause an artificial end event in addition to the start event.  TheC<text> for the artificial end event will be empty and the C<tokenpos>array will be undefined even though the the token array will have oneelement containing the tag name.=item $p->marked_sections=item $p->marked_sections( $bool )By default, section markings like <![CDATA[...]]> are treated likeordinary text.  When this attribute is enabled section markings arehonoured.There are currently no events associated with the marked sectionmarkup, but the text can be returned as C<skipped_text>.=item $p->strict_comment=item $p->strict_comment( $bool )By default, comments are terminated by the first occurrence of "-->".This is the behaviour of most popular browsers (like Mozilla, Opera andMSIE), but it is not correct according to the official HTMLstandard.  Officially, you need an even number of "--" tokens beforethe closing ">" is recognized and there may not be anything butwhitespace between an even and an odd "--".The official behaviour is enabled by enabling this attribute.Enabling of 'strict_comment' also disables recognizing these forms ascomments:  </ comment>  <! comment>=item $p->strict_end=item $p->strict_end( $bool )By default, attributes and other junk are allowed to be present on end tags in amanner that emulates MSIE's behaviour.The official behaviour is enabled with this attribute.  If enabled,only whitespace is allowed between the tagname and the final ">".

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -