📄 ch03_04.htm
字号:
<html><head><title>Putting Parsers to Work (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch03_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">3.4. Putting Parsers to Work</h2><p>Enough<a name="INDEX-238" />tinkering with the parser's internal details. Wewant to see what you can do with the stuff you get from parsers.We've already seen an example of a complete,parser-built tree structure in <a href="ch03_02.htm#perlxml-CHP-3-EX-3">Example 3-3</a>, solet's do something with the other type.We'll take an XML event stream and make it driveprocessing by plugging it into some code to handle the events. It maynot be the most useful tool in the world, but it will serve wellenough to show you how real-world XML processing programs arewritten.</p><p><tt class="literal">XML::Parser</tt><a name="INDEX-239" />(with <a name="INDEX-240" />Expat running underneath) is at the inputend of our program. Expat subscribes to the event-based parsingschool we described earlier. Rather than loading your whole XMLdocument into memory and then turning around to see what it hathwrought, it stops every time it encounters a discrete chunk of dataor markup, such as an angle-bracketed tag or a literal string insidean element. It then checks to see if our program wants to react to itin any way.</p><p>Your first responsibility is to give the parser an interface to thepertinent bits of code that handle events. Each type of event ishandled by a different subroutine, or<em class="emphasis">handler</em><a name="INDEX-241" />.We register our handlers with the parser by setting the<tt class="literal">Handlers</tt> option at initialization time. <a href="ch03_04.htm#perlxml-CHP-3-EX-5">Example 3-5</a> shows the entire process.</p><a name="perlxml-CHP-3-EX-5" /><div class="example"><h4 class="objtitle">Example 3-5. A stream-based XML processor </h4><blockquote><pre class="code">use XML::Parser;# initialize the parsermy $parser = XML::Parser->new( Handlers => { Start=>\&handle_start, End=>\&handle_end, });$parser->parsefile( shift @ARGV );my @element_stack; # remember which elements are open# process a start-of-element event: print message about element#sub handle_start { my( $expat, $element, %attrs ) = @_; # ask the expat object about our position my $line = $expat->current_line; print "I see an $element element starting on line $line!\n"; # remember this element and its starting position by pushing a # little hash onto the element stack push( @element_stack, { element=>$element, line=>$line }); if( %attrs ) { print "It has these attributes:\n"; while( my( $key, $value ) = each( %attrs )) { print "\t$key => $value\n"; } }}# process an end-of-element event#sub handle_end { my( $expat, $element ) = @_; # We'll just pop from the element stack with blind faith that # we'll get the correct closing element, unlike what our # homebrewed well-formedness did, since XML::Parser will scream # bloody murder if any well-formedness errors creep in. my $element_record = pop( @element_stack ); print "I see that $element element that started on line ", $$element_record{ line }, " is closing now.\n";}</pre></blockquote></div><p>It's easy to see how this process works.We've written two handler subroutines called<tt class="literal">handle_start( )</tt> and <tt class="literal">handle_end()</tt> and <em class="emphasis">registered</em> each with aparticular event in the call to <tt class="literal">new( )</tt>. When wecall <tt class="literal">parse( )</tt>, the parser knows it has handlersfor a start-of-element event and an end-of-element event. Every timethe parser trips over an element start tag, it calls the firsthandler and gives it information about that element (element name andattributes). Similarly, any end tag it encounters leads to a call ofthe other handler with similar element-specific information.</p><p>Note that the parser also gives each handler a reference called<tt class="literal">$expat</tt>. This is a reference to the<tt class="literal">XML::Parser::Expat</tt> object, a low-level interfaceto Expat. It has access to interesting information that might beuseful to a program, such as line numbers and element depth.We've taken advantage of this fact, using the linenumber to dazzle users with our amazing powers of document analysis.</p><p>Want to see it run? Here's how the output looksafter processing the customer database document from <a href="ch01_02.htm#perlxml-CHP-1-EX-1">Example 1-1</a>: </p><blockquote><pre class="code">I see a spam-document element starting on line 1!It has these attributes: version => 3.5 timestamp => 2002-05-13 15:33:45I see a customer element starting on line 3!I see a first-name element starting on line 4!I see that the first-name element that started on line 4 is closing now.I see a surname element starting on line 5!I see that the surname element that started on line 5 is closing now.I see a address element starting on line 6!I see a street element starting on line 7!I see that the street element that started on line 7 is closing now.I see a city element starting on line 8!I see that the city element that started on line 8 is closing now.I see a state element starting on line 9!I see that the state element that started on line 9 is closing now.I see a zip element starting on line 10!I see that the zip element that started on line 10 is closing now.I see that the address element that started on line 6 is closing now.I see a email element starting on line 12!I see that the email element that started on line 12 is closing now.I see a age element starting on line 13!I see that the age element that started on line 13 is closing now.I see that the customer element that started on line 3 is closing now. [... snipping other customers for brevity's sake ...]I see that the spam-document element that started on line 1 is closing now.</pre></blockquote><p>Here we used the element stack again. We didn'tactually need to store the elements' namesourselves; one of the methods you can call on the<tt class="literal">XML::Parser::Expat</tt> object returns the current<em class="emphasis">context list</em><a name="INDEX-242" />, a newest-to-oldest ordering of allelements our parser has probed into. However, a stack proved to be auseful way to store additional information like line numbers. Itshows off the fact that you can let events build up structures ofarbitrary complexity -- the"memory" of thedocument's past.</p><p>There are many more event types than we handle here. Wedon't do anything with character data, comments, orprocessing instructions, for example. However, for the purpose ofthis example, we don't need to go into those eventtypes. We'll have more exhaustive examples of eventprocessing in the next chapter, anyway.</p><p>Before we close the topic of event processing, we want to mention onething: the Simple API for XML processing, more commonly known as<a name="INDEX-243" /><a name="INDEX-244" />SAX. It'svery similar to the event processing model we'veseen so far, but the difference is that it's aW3C-supported standard. Being a W3C-supported standard means that ithas a standardized, canonical set of events. How these events shouldbe presented for processing is also standardized. The cool thingabout it is that with a standard interface, you can hook up differentprogram components like Legos and it will all work. If youdon't like one parser, just plug in another (andsophisticated tools like the<a name="INDEX-245" /> <tt class="literal">XML::SAX</tt> modulefamily can even help you pick a parser based on the features youneed). Get your XML data from a database, a file, or yourmother's shopping list; itshouldn't matter where it comes from. SAX is veryexciting for the Perl community because we've longbeen criticized for our lack of standards compliance and generalbarbarism. Now we can be criticized for only one of those things. Youcan expect a nice, thorough discussion on SAX (specifically,<a name="INDEX-246" />PerlSAX,our<a name="INDEX-247" />beloved<a name="INDEX-248" /> language'smutation thereof) in <a href="ch05_01.htm">Chapter 5, "SAX"</a>.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch03_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">3.3. Stream-Based Versus Tree-Based Processing</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">3.5. XML::LibXML</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -