📄 ch04_05.htm
字号:
<html><head><title>XML::PYX (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch04_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">4.5. XML::PYX</h2><p>In<a name="INDEX-341" /> <a name="INDEX-342" />the Perl universe, standard APIs have been slow to catch on for manyreasons. CPAN, the vast storehouse of publicly offered modules, growsorganically, with no central authority to approve of a submission.Also, with XML, a relative newcomer on the data format scene, thePerl community has only begun to work out standard solutions.</p><p>We can characterize the first era of XML hacking in Perl to be theage of nonstandard parsers. It's a time whendocumentation is scarce and modules are experimental. There is muchcreativity and innovation, and just as much idiosyncrasy andquirkiness. Surprisingly, many of the tools that first appeared onthe horizon were quite useful. It's fascinatingterritory for historians and developers alike.</p><p><tt class="literal">XML::PYX</tt> is one of these early parsers. Streamsnaturally lend themselves to the concept of pipelines, where dataoutput from one program can be plugged into another, creating a chainof processors. There's no reason why XMLcan't be handled that way, so an innovative andelegant processing style has evolved around this concept.Essentially, the XML is repackaged as a stream of easily recognizableand transmutable symbols, even as a command-line utility.</p><p>One example of this repackaging is PYX, a symbolic encoding of XMLmarkup that is friendly to text processing languages like Perl. Itpresents each XML event on a separate line very cleverly. Many Unixprograms like <em class="emphasis">awk</em><a name="INDEX-343" /> and<em class="emphasis">grep</em><a name="INDEX-344" /> are line oriented, so they workwell with PYX. Lines are happy in Perl too.</p><p><a href="ch04_05.htm#perlxml-CHP-4-TABLE-1">Table 4-1</a> summarizes the notation of PYX. </p><a name="perlxml-CHP-4-TABLE-1" /><h4 class="objtitle">Table 4-1. PYX notation </h4><table border="1"><tr><th><p>Symbol</p></th><th><p>Represents</p></th></tr><tr><td><p><a name="INDEX-345" /> <a name="INDEX-346" />(</p></td><td><p>An element start tag</p></td></tr><tr><td><p>)</p></td><td><p>An element end tag</p></td></tr><tr><td><p><a name="INDEX-347" /><a name="INDEX-348" />-</p></td><td><p>Character data</p></td></tr><tr><td><p><a name="INDEX-349" /> <a name="INDEX-350" />A</p></td><td><p>An attribute</p></td></tr><tr><td><p><a name="INDEX-351" /> <a name="INDEX-352" />?</p></td><td><p>A processing instruction</p></td></tr></table><p><p>For every event coming through the stream, PYX starts a new line,beginning with one of the five symbols shown in <a href="ch04_05.htm#perlxml-CHP-4-TABLE-1">Table 4-1</a>. This line is followed by the element name orwhatever other data is pertinent. Special characters are escaped witha backslash, as you would see in Perl code.</p><p>Here's how a parser converting an XML document intoPYX notation would look. The following code is XML input by theparser:</p><blockquote><pre class="code"><shoppinglist> <!-- brand is not important --> <item>toothpaste</item> <item>rocket engine</item> <item optional="yes">caviar</item></shoppinglist></pre></blockquote><p>As PYX, it would look like this: </p><blockquote><pre class="code">(shoppinglist-\n(item-toothpaste)item-\n(item-rocket engine)item-\n(itemAoptional yes-caviar)item-\n)shoppinglist</pre></blockquote><p>Notice that the comment didn't come through in thePYX translation. PYX is a little simplistic in some ways, omittingsome details in the markup. It will not alert you to CDATA markupsections, although it will let the content pass through. Perhaps themost serious loss is character entity references that disappear fromthe stream. You should make sure you don't need thatinformation before working with PYX.</p><p>Matt <a name="INDEX-353" />Sergeanthas written a module, <tt class="literal">XML::PYX</tt>, which parses XMLand translates it into PYX. The compact program in <a href="ch04_05.htm#perlxml-CHP-4-EX-2">Example 4-2</a> strips out all XML element tags, leaving onlythe character data.</p><a name="perlxml-CHP-4-EX-2" /><div class="example"><h4 class="objtitle">Example 4-2. PYX parser </h4><blockquote><pre class="code">use XML::PYX;# initialize parser and generate PYXmy $parser = XML::PYX::Parser->new;my $pyx;if (defined ( $ARGV[0] )) { $pyx = $parser->parsefile( $ARGV[0] );}# filter out the tagsforeach( split( / /, $pyx )) { print $' if( /^-/ );}</pre></blockquote></div><p>PYX is an interesting alternative to SAX and DOM for quick-and-dirtyXML processing. It's useful for simple tasks likeelement counting, separating content from markup, and reportingsimple events. However, it does lack sophistication, making it lessattractive<a name="INDEX-354" /> <a name="INDEX-355" /> for complex processing.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch04_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">4.4. Stream Applications</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">4.6. XML::Parser</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -