⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch03_02.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 2 页
字号:
<html><head><title>XML::Parser (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch03_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">3.2. XML::Parser</h2><p>Writing <a name="INDEX-199" />a<a name="INDEX-200" />parser requires a lot of work. Youcan't be sure if you've coveredeverything without a lot of testing. Unless you're amutant who loves to write efficient, low-level parser code, yourprogram will probably be slow and resource-intensive. The good newsis that a wide variety of free, high quality, and easy-to-use XMLparser packages (written by friendly mutants) already exist to helpyou. People have bashed Perl and XML together for years, and you havea barnful of conveniently pre-invented wheels at your disposal.</p><p>Where do Perl programmers go to find ready-made modules to use intheir programs? They go to the <a name="INDEX-201" />Comprehensive Perl Archive Network(CPAN), a many-mirrored public resource full of free, open-sourcePerl code. If you aren't familiar with using CPAN,you must change your isolationist ways and learn to become aprogrammer of the world. You'll find a multitude ofmodules authored by folks who have walked the path of Perl and XMLbefore you, and who've chosen to share the toolsthey've made with the rest of the world.</p><a name="ch03-7-fm2xml" /><blockquote><b>TIP:</b> Don't think of CPAN as a catalog of ready-madesolutions for all specific XML problems. Rather, look at it as atoolbox or a source of building blocks you can assemble and configureto craft a solution. While some modules specialize in popular XMLapplications like RSS and SOAP, most are more general-purpose.Chances are, you won't find a module thatspecifically addresses your needs. You'll morelikely take one of the general XML modules and adapt it somehow.We'll show that this process is painless and revealseveral ways to configure general modules to your particularapplication.</p></blockquote><p>XML parsers differ from one another in two major ways. First, theydiffer in their <em class="emphasis">parsingstyle</em><a name="INDEX-202" /><a name="INDEX-203" />, which is how the parser works withXML. There are a few different strategies, such as building a datastructure or creating an event stream. Another attribute of parsers,called<em class="emphasis">standards-completeness</em><a name="INDEX-204" />, is a spectrum ranging from ad hocon one extreme to an exhaustive, standards-based solution on theother. The balance on the latter axis is slowly moving from theeccentric, nonstandard side toward the other end as the Perlcommunity agrees on how to implement major standards like SAX andDOM.</p><p>The <tt class="literal">XML::Parser</tt> module is the great-grandpappy ofall Perl-based XML processors. It is a multifaceted parser, offeringa handful of different parsing styles. On the standards axis,it's closer to ad hoc than standards-compliant;however, being the first efficient XML parser to appear on the Perlhorizon, it has a dear place in our hearts and is still very useful.While <tt class="literal">XML::Parser</tt> uses a nonstandard API and has areputation for getting a bit persnickety over some issues, it<em class="emphasis">works</em>. It parses documents with reasonable speedand flexibility, and as all Perl hackers know, people tend to glomonto the first usable solution that appears on the radar, no matterhow ugly it is. Thus, nearly all of the first fewyears' worth of Perl and XML modules and programsbased themselves on <tt class="literal">XML::Parser</tt>.</p><p>Since 2001 or so, however, other low-level parsing modules haveemerged that base themselves on faster and more standards-compliantcore libraries. We'll touch on these modulesshortly. However, we'll start out with anexamination of <tt class="literal">XML::Parser</tt>, giving a nod to itsvenerability and functionality.</p><p>In the early days of XML, a skilled programmer named James<a name="INDEX-205" />Clark wrote anXML parser library in <a name="INDEX-206" />C and called it <a name="INDEX-207" />Expat.<a href="#FOOTNOTE-15">[15]</a> Fast,efficient, and very stable, it became the parser of choice amongearly adopters of XML. To bring XML into the Perl realm, Larry<a name="INDEX-208" />Wall wrote alow-level API for it and called the module<tt class="literal">XML::Parser::Expat</tt><a name="INDEX-209" />. Then he built a layer on top of that,<tt class="literal">XML::Parser</tt>, to serve as a general-purpose parserfor everybody. Now maintained by Clark Cooper,<tt class="literal">XML::Parser</tt> has served as the foundation of manyXML modules.</p><blockquote class="footnote"> <a name="FOOTNOTE-15" /><p>[15]JamesClark is a big name in the XML community. He tirelessly promotes thestandard with his free tools and involvement with the W3C. You cansee his work at <a href="http://www.jclark.com/">http://www.jclark.com/</a>. Clark is also editorof the XSLT and XPath recommendation documents at <a href="http://www.w3.org/">http://www.w3.org/</a>.</p> </blockquote><p>The C underpinnings are the secret to<tt class="literal">XML::Parser</tt>'s success.We've seen how to write a basic parser in Perl. Ifyou apply our previous example to a large XML document,you'll wait a long time before it finishes. Othershave written complete XML parsers in Perl that are portable to anysystem, but you'll find much better performance in acompiled C parser like Expat. Fortunately, as with every other Perlmodule based on C code (and there are actually lots of these modulesbecause they're not too hard to make, thanks toPerl's standard <a name="INDEX-210" />XS library),<a href="#FOOTNOTE-16">[16]</a> it's easy to forgetyou're driving Expat around when you use<tt class="literal">XML::Parser</tt>.</p><blockquote class="footnote"> <a name="FOOTNOTE-16" /><p>[16]See<tt class="literal">man perlxs</tt> or Chapter 25 ofO'Reilly's <em class="emphasis">ProgrammingPerl, Third Edition</em> for more information.</p></blockquote><a name="perlxml-CHP-3-SECT-2.1" /><div class="sect2"><h3 class="sect2">3.2.1. Example: Well-Formedness Checker Revisited</h3><p>To<a name="INDEX-211" />show how<tt class="literal">XML::Parser</tt><a name="INDEX-212" /> might be used, let'sreturn to the well-formedness checker problem. It'svery easy to create this tool with <tt class="literal">XML::Parser</tt>, asshown in <a href="ch03_02.htm#perlxml-CHP-3-EX-2">Example 3-2</a>.</p><a name="perlxml-CHP-3-EX-2" /><div class="example"><h4 class="objtitle">Example 3-2. Well-formedness checker using XML::Parser </h4><blockquote><pre class="code">use XML::Parser;my $xmlfile = shift @ARGV;              # the file to parse# initialize parser object and parse the stringmy $parser = XML::Parser-&gt;new( ErrorContext =&gt; 2 );eval { $parser-&gt;parsefile( $xmlfile ); };# report any error that stopped parsing, or announce successif( $@ ) {    $@ =~ s/at \/.*?$//s;               # remove module line number    print STDERR "\nERROR in '$file':\n$@\n";} else {    print STDERR "'$file' is well-formed\n";}</pre></blockquote></div><p>Here's how this program works. First, we create anew <tt class="literal">XML::Parser</tt> object to do the parsing. Using anobject rather than a static function call means that we can configurethe parser once and then process multiple files without the overheadof repeatedly recreating the parser. The object retains your settingsand keeps the Expat parser routine alive for as long as you want toparse files, and then cleans everything up whenyou're done.</p><p>Next, we call the <tt class="literal">parsefile()</tt><a name="INDEX-213" /> method inside an<tt class="literal">eval</tt><a name="INDEX-214" /> block because<tt class="literal">XML::Parser</tt> tends to be a little overzealous whendealing with parse errors. If we didn't use an<tt class="literal">eval</tt> block, our program would<tt class="literal">die</tt> before we had a chance to do any cleanup. Wecheck the variable <tt class="literal">$@</tt> for content in case therewas an error. If there was, we remove the line number of the moduleat which the parse method "died"and then print out the message.</p><p>When initializing the parser object, we set an option<tt class="literal">ErrorContext =&gt; 2</tt>.<tt class="literal">XML::Parser</tt> has several options you can set tocontrol parsing. This one is a directive sent straight to the Expatparser that remembers the context in which errors occur and saves twolines before the error. When we print out the error message, it tellsus what line the error happened on and prints out the region of textwith an arrow pointing to the offending mistake.</p><p>Here's an example of our checker choking on asyntactic faux pas (where we decided to name our program<em class="emphasis">xwf</em> as an XML well-formedness checker):</p><blockquote><pre class="code">$ <tt class="userinput"><b>xwf ch01.xml</b></tt> ERROR in 'ch01.xml':not well-formed (invalid token) at line 66, column 22, byte 2354:&lt;chapter id="dorothy-in-oz"&gt;&lt;title&gt;Lions, Tigers &amp; Bears&lt;/title&gt;=====================^</pre></blockquote><p>Notice how simple it is to set up the parser and get powerful

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -