📄 pullparser.pm

📁 source of perl for linux application,
💻 PM
📖 第 1 页 / 共 2 页
字号:
上一页 12
            "\n  Dropping back to seeking-head1-content mode...\n";        }      }    }        elsif($state == 2) {      # seeking start of para (which must immediately follow)      if($token->is_start and $content_containers{ $token->tagname }) {        DEBUG and print "  Found start of Para.  Accumulating content...\n";        $para_text_content = '';        ++$state;      } else {        DEBUG and print         "  Didn't see an immediately subsequent start-Para.  Reseeking H1\n";        $state = 0;      }    }        elsif($state == 3) {      # accumulating text until end of Para      if( $token->is_text ) {        DEBUG and print "   Adding \"", $token->text, "\" to para-content.\n";        $para_text_content .= $token->text;        # and keep looking              } elsif( $token->is_end and $content_containers{ $token->tagname } ) {        DEBUG and print "  Found end of Para.  Considering content: ",          $para_text_content, "\n";        if( $para_text_content =~ m/\S/          and ($max_content_length           ? (length($para_text_content) <= $max_content_length)           : 1)        ) {          # Some minimal sanity constraints, I think.          DEBUG and print "  It looks contentworthy, I guess.  Using it.\n";          $title = $para_text_content;          last;        } else {          DEBUG and print "  Doesn't look at all contentworthy!\n  Giving up.\n";          undef $title;          last;        }      }    }        else {      die "IMPOSSIBLE STATE $state!\n";  # should never happen    }      }    # Put it all back!  $self->unget_token(@to_unget);    if(DEBUG) {    if(defined $title) { print "  Returing title <$title>\n" }    else { print "Returning title <>\n" }  }    return '' unless defined $title;  $title =~ s/^\s+//;  return $title;}#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@##  Methods that actually do work at parse-time:sub _handle_element_start {  my $self = shift;   # leaving ($element_name, $attr_hash_r)  DEBUG > 2 and print "++ $_[0] (", map("<$_> ", %{$_[1]}), ")\n";    push @{ $self->{'token_buffer'} },       $self->{'start_token_class'}->new(@_);  return;}sub _handle_text {  my $self = shift;   # leaving ($text)  DEBUG > 2 and print "== $_[0]\n";  push @{ $self->{'token_buffer'} },       $self->{'text_token_class'}->new(@_);  return;}sub _handle_element_end {  my $self = shift;   # leaving ($element_name);  DEBUG > 2 and print "-- $_[0]\n";  push @{ $self->{'token_buffer'} },        $self->{'end_token_class'}->new(@_);  return;}#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@1;__END__=head1 NAMEPod::Simple::PullParser -- a pull-parser interface to parsing Pod=head1 SYNOPSIS my $parser = SomePodProcessor->new; $parser->set_source( "whatever.pod" ); $parser->run;Or: my $parser = SomePodProcessor->new; $parser->set_source( $some_filehandle_object ); $parser->run;Or: my $parser = SomePodProcessor->new; $parser->set_source( \$document_source ); $parser->run;Or: my $parser = SomePodProcessor->new; $parser->set_source( \@document_lines ); $parser->run;And elsewhere: require 5; package SomePodProcessor; use strict; use base qw(Pod::Simple::PullParser);  sub run {   my $self = shift;  Token:   while(my $token = $self->get_token) {     ...process each token...   } }=head1 DESCRIPTIONThis class is for using Pod::Simple to build a Pod processor -- butone that uses an interface based on a stream of token objects,instead of based on events.This is a subclass of L<Pod::Simple> and inherits all its methods.A subclass of Pod::Simple::PullParser should define a C<run> methodthat calls C<< $token = $parser->get_token >> to pull tokens.See the source for Pod::Simple::RTF for an example of a formatterthat uses Pod::Simple::PullParser.=head1 METHODS=over=item my $token = $parser->get_tokenThis returns the next token object (which will be of a subclass ofL<Pod::Simple::PullParserToken>), or undef if the parser-stream has hitthe end of the document.=item $parser->unget_token( $token )=item $parser->unget_token( $token1, $token2, ... )This restores the token object(s) to the front of the parser stream.=backThe source has to be set before you can parse anything.  The lowest-levelway is to call C<set_source>:=over=item $parser->set_source( $filename )=item $parser->set_source( $filehandle_object )=item $parser->set_source( \$document_source )=item $parser->set_source( \@document_lines )=backOr you can call these methods, which Pod::Simple::PullParser has definedto work just like Pod::Simple's same-named methods:=over=item $parser->parse_file(...)=item $parser->parse_string_document(...)=item $parser->filter(...)=item $parser->parse_from_file(...)=backFor those to work, the Pod-processing subclass ofPod::Simple::PullParser has to have defined a $parser->run method --so it is advised that all Pod::Simple::PullParser subclasses do so.See the Synopsis above, or the source for Pod::Simple::RTF.Authors of formatter subclasses might find these methods useful tocall on a parser object that you haven't started pulling tokensfrom yet:=over=item my $title_string = $parser->get_titleThis tries to get the title string out of $parser, by getting some tokens,and scanning them for the title, and then ungetting them so that you canprocess the token-stream from the beginning.For example, suppose you have a document that starts out:  =head1 NAME    Hoo::Boy::Wowza -- Stuff B<wow> yeah!$parser->get_title on that document will return "Hoo::Boy::Wowza --Stuff wow yeah!".In cases where get_title can't find the title, it will return empty-string("").=item my $title_string = $parser->get_short_titleThis is just like get_title, except that it returns just the modulename, ifthe title seems to be of the form "SomeModuleName -- description".For example, suppose you have a document that starts out:  =head1 NAME    Hoo::Boy::Wowza -- Stuff B<wow> yeah!then $parser->get_short_title on that document will return"Hoo::Boy::Wowza".But if the document starts out:  =head1 NAME    Hooboy, stuff B<wow> yeah!then $parser->get_short_title on that document will return "Hooboy,stuff wow yeah!".If the title can't be found, then get_short_title returns empty-string("").=item $author_name   = $parser->get_authorThis works like get_title except that it returns the contents of the"=head1 AUTHOR\n\nParagraph...\n" section, assuming that that sectionisn't terribly long.(This method tolerates "AUTHORS" instead of "AUTHOR" too.)=item $description_name = $parser->get_descriptionThis works like get_title except that it returns the contents of the"=head1 PARAGRAPH\n\nParagraph...\n" section, assuming that that sectionisn't terribly long.=item $version_block = $parser->get_versionThis works like get_title except that it returns the contents ofthe "=head1 VERSION\n\n[BIG BLOCK]\n" block.  Note that this does NOTreturn the module's C<$VERSION>!!=back=head1 NOTEYou don't actually I<have> to define a C<run> method.  If you'rewriting a Pod-formatter class, you should define a C<run> just sothat users can call C<parse_file> etc, but you don't I<have> to.And if you're not writing a formatter class, but are instead justwriting a program that does something simple with a Pod::PullParserobject (and not an object of a subclass), then there's no reason tobother subclassing to add a C<run> method.=head1 SEE ALSOL<Pod::Simple>L<Pod::Simple::PullParserToken> -- and its subclassesL<Pod::Simple::PullParserStartToken>,L<Pod::Simple::PullParserTextToken>, andL<Pod::Simple::PullParserEndToken>.L<HTML::TokeParser>, which inspired this.=head1 COPYRIGHT AND DISCLAIMERSCopyright (c) 2002 Sean M. Burke.  All rights reserved.This library is free software; you can redistribute it and/or modify itunder the same terms as Perl itself.This program is distributed in the hope that it will be useful, butwithout any warranty; without even the implied warranty ofmerchantability or fitness for a particular purpose.=head1 AUTHORSean M. Burke C<sburke@cpan.org>=cutJUNK:sub _old_get_title {  # some witchery in here  my $self = $_[0];  my $title;  my @to_unget;  while(1) {    push @to_unget, $self->get_token;    unless(defined $to_unget[-1]) { # whoops, short doc!      pop @to_unget;      last;    }    DEBUG and print "-Got token ", $to_unget[-1]->dump, "\n";    (DEBUG and print "Too much in the buffer.\n"),     last if @to_unget > 25; # sanity        my $pattern = '';    if( #$to_unget[-1]->type eq 'end'        #and $to_unget[-1]->tagname eq 'Para'        #and        ($pattern = join('',         map {;            ($_->type eq 'start') ? ("<" . $_->tagname .">")          : ($_->type eq 'end'  ) ? ("</". $_->tagname .">")          : ($_->type eq 'text' ) ? ($_->text =~ m<^([A-Z]+)$>s ? $1 : 'X')          : "BLORP"         } @to_unget       )) =~ m{<head1>NAME</head1><Para>(X|</?[BCIFLS]>)+</Para>$}s    ) {      # Whee, it fits the pattern      DEBUG and print "Seems to match =head1 NAME pattern.\n";      $title = '';      foreach my $t (reverse @to_unget) {        last if $t->type eq 'start' and $t->tagname eq 'Para';        $title = $t->text . $title if $t->type eq 'text';      }      undef $title if $title =~ m<^\s*$>; # make sure it's contentful!      last;    } elsif ($pattern =~ m{<head(\d)>(.+)</head\d>$}      and !( $1 eq '1' and $2 eq 'NAME' )    ) {      # Well, it fits a fallback pattern      DEBUG and print "Seems to match NAMEless pattern.\n";      $title = '';      foreach my $t (reverse @to_unget) {        last if $t->type eq 'start' and $t->tagname =~ m/^head\d$/s;        $title = $t->text . $title if $t->type eq 'text';      }      undef $title if $title =~ m<^\s*$>; # make sure it's contentful!      last;          } else {      DEBUG and $pattern and print "Leading pattern: $pattern\n";    }  }    # Put it all back:  $self->unget_token(@to_unget);    if(DEBUG) {    if(defined $title) { print "  Returing title <$title>\n" }    else { print "Returning title <>\n" }  }    return '' unless defined $title;  return $title;}
上一页 12
💿 文件大小 15230 K
👤 上传用户 gankai1983
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#application #source #linux #perl
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -