📄 ch05_05.htm
字号:
<html><head><title>A Handler Base Class (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch05_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">5.5. A Handler Base Class</h2><p>SAX<a name="INDEX-408" /> <a name="INDEX-409" /> <a name="INDEX-410" /> doesn't distinguishbetween different elements; it leaves that burden up to you. You haveto sort out the element name in the <tt class="literal">start_element()</tt> handler, and maybe use a stack to keep track of elementhierarchy. Don't you wish there were some way toabstract that stuff? Ken MacLeod has done just that with his<tt class="literal">XML::Handler::Subs</tt><a name="INDEX-411" /> module.</p><p>This module defines an object that branches handler calls to morespecific handlers. If you want a handler that deals only with<tt class="literal"><title></tt> elements, you can write that handlerand it will be called. The handler dealing with a start tag mustbegin with <tt class="literal">s_</tt>, followed by theelement's name (replace special characters with anunderscore). End tag handlers are the same, but start with<tt class="literal">e_</tt> instead of <tt class="literal">s_</tt>.</p><p>That's not all. The base object also has a built-instack and provides an accessor method to check if you are inside aparticular element. The <tt class="literal">$self->{Names}</tt> variablerefers to a stack of element names. Use the method<tt class="literal">in_element( $name )</tt> to test whether the parseris inside an element named <tt class="literal">$name</tt> at any point intime.</p><p>To try this out, let's write a program that doessomething element-specific. Given an HTML file, the program outputseverything inside an <tt class="literal"><h1></tt> element, eveninline elements used for emphasis. The code, shown in <a href="ch05_05.htm#perlxml-CHP-5-EX-7">Example 5-7</a>, is breathtakingly simple.</p><a name="perlxml-CHP-5-EX-7" /><div class="example"><h4 class="objtitle">Example 5-7. A program subclassing the handler base </h4><blockquote><pre class="code">use XML::Parser::PerlSAX;use XML::Handler::Subs## initialize the parser#use XML::Parser::PerlSAX;my $parser = XML::Parser::PerlSAX->new( Handler => H1_grabber->new( ) );$parser->parse( Source => {SystemId => shift @ARGV} );## Handler object: H1_grabber##package H1_grabber;use base( 'XML::Handler::Subs' );sub new { my $type = shift; my $self = {@_}; return bless( $self, $type );}## handle start of document#sub start_document { SUPER::start_document( ); print "Summary of file:\n";}## handle start of <h1>: output bracket as delineator#sub s_h1 { print "[";}## handle end of <h1>: output bracket as delineator#sub e_h1 { print "]\n";}## handle character data#sub characters { my( $self, $props ) = @_; my $data = $props->{Data}; print $data if( $self->in_element( h1 ));}</pre></blockquote></div><p>Let's feed the program a test file: </p><blockquote><pre class="code"><html> <head><title>The Life and Times of Fooby</title></head> <body> <h1>Fooby as a child</h1> <p>...</p> <h1>Fooby grows up</h1> <p>...</p> <h1>Fooby is in <em>big</em> trouble!</h1> <p>...</p> </body></html></pre></blockquote><p>This is what we get on the other side: </p><blockquote><pre class="code">Summary of file:[Fooby as a child][Fooby grows up][Fooby is in big trouble!]</pre></blockquote><p>Even the text inside the <tt class="literal"><em></tt> element wasincluded, thanks to the call to <tt class="literal">in_element( )</tt>.<tt class="literal">XML::Handler::Subs</tt> is definitely a useful moduleto have when doing SAX processing.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch05_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">5.4. Drivers for Non-XML Sources</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">5.6. XML::Handler::YAWriter as a Base Handler Class</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -