⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch03_05.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
字号:
<html><head><title>XML::LibXML (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch03_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">3.5. XML::LibXML</h2><p><tt class="literal">XML::LibXML</tt><a name="INDEX-249" />, like <tt class="literal">XML::Parser</tt>, isan interface to a library written in <a name="INDEX-250" />C. Called<tt class="literal">libxml2</tt><a name="INDEX-251" />,it's part of the <a name="INDEX-252" />GNOME project.<a href="#FOOTNOTE-17">[17]</a>Unlike <tt class="literal">XML::Parser</tt>, this new parser supports amajor standard for XML tree processing known as the Document ObjectModel (<a name="INDEX-253" /> <a name="INDEX-254" />DOM).</p><blockquote class="footnote"> <a name="FOOTNOTE-17" /><p>[17]Fordownloads and documentation, see <a href="http://www.libxml.org/">http://www.libxml.org/</a>.</p> </blockquote><p>DOM is another much-ballyhooed XML standard. It does for<a name="INDEX-255" />tree processing what SAX does for eventstreams. If you have your heart set on climbing trees in your programand you think there's a likelihood that it might bereused or applied to different data sources, you'rebetter off using something standard and interchangeable. Again,we're happy to delve into DOM in a future chapterand get you thinking in standards-complaint ways. That topic iscoming up in <a href="ch07_01.htm">Chapter 7, "DOM"</a>.</p><p>Now we want to show you an example of another parser in action.We'd be remiss if we focused on just one kind ofparser when so many are out there. Again, we'll showyou a basic example, nothing fancy, just to show you how to invokethe parser and tame its power. Let's write anotherdocument analysis tool like we did in <a href="ch03_04.htm#perlxml-CHP-3-EX-5">Example 3-5</a>,this time printing a frequency distribution of elements in adocument.</p><p><a href="ch03_05.htm#perlxml-CHP-3-EX-6">Example 3-6</a> shows the program. It'sa vanilla parser run because we haven't set anyoptions yet. Essentially, the parser parses the filehandle andreturns a DOM object, which is nothing more than a tree structure ofwell-designed objects. Our program finds the document element, andthen traverses the entire tree one element at a time, all the whileupdating the hash of frequency counters.</p><a name="perlxml-CHP-3-EX-6" /><div class="example"><h4 class="objtitle">Example 3-6. A frequency distribution program </h4><blockquote><pre class="code">use XML::LibXML;use IO::Handle;# initialize the parsermy $parser = new XML::LibXML;# open a filehandle and parsemy $fh = new IO::Handle;if( $fh-&gt;fdopen( fileno( STDIN ), "r" )) {    my $doc = $parser-&gt;parse_fh( $fh );    my %dist;    &amp;proc_node( $doc-&gt;getDocumentElement, \%dist );    foreach my $item ( sort keys %dist ) {        print "$item: ", $dist{ $item }, "\n";    }    $fh-&gt;close;}# process an XML tree node: if it's an element, update the# distribution list and process all its children#sub proc_node {    my( $node, $dist ) = @_;    return unless( $node-&gt;nodeType eq &amp;XML_ELEMENT_NODE );    $dist-&gt;{ $node-&gt;nodeName } ++;    foreach my $child ( $node-&gt;getChildnodes ) {        &amp;proc_node( $child, $dist );    }}</pre></blockquote></div><p>Note that instead of using a simple path to a file, we use afilehandle object of the <tt class="literal">IO::Handle</tt> class. Perlfilehandles, as you probably know, are magic and subtle beasties,capable of passing into your code characters from a wide variety ofsources, including files on disk, open network sockets, keyboardinput, databases, and just about everything else capable ofoutputting data. Once you define a filehandle'ssource, it gives you the same interface for reading from it as doesevery other filehandle. This dovetails nicely with our XML-basedideology, where we want code to be as flexible and reusable aspossible. After all, XML doesn't care where it comesfrom, so why should we pigeonhole it with one source type?</p><p>The parser object returns a document object after parsing. Thisobject has a method that returns a reference to the documentelement -- the element at the very root of the whole tree. We takethis reference and feed it to a recursive subroutine,<tt class="literal">proc_node( )</tt>, which happily munches on elementsand scribbles into a hash variable every time it sees an element.<a name="INDEX-256" />Recursion is an efficient way to writeprograms that process XML because the structure of documents issomewhat fractal: the same rules for elements apply at any depth orposition in the document, including the root element that representsthe entire document (modulo its prologue). Note the"node type" check, whichdistinguishes between elements and other parts of a document (such aspieces of text or processing instructions).</p><p>For every element the routine looks at, it has to call theobject's <tt class="literal">getChildnodes()</tt><a name="INDEX-257" /> method to continue processing on itschildren. This call is an essential difference between stream-basedand tree-based methodologies. Instead of having an event stream takethe steering wheel of our program and push data at it, thus callingsubroutines and codeblocks in a (somewhat) unpredictable order, ourprogram now has the responsibility of navigating through the documentunder its own power. Traditionally, we start at the root element andgo downward, processing children in order from first to last.However, because we, not the parser, are in control now, we can scanthrough the document in any way we want. We could go backwards, wecould scan just a part of the document, we could jump around, makingmultiple passes though the tree -- the sky's thelimit. Here's the result from processing a smallchapter coded in DocBook XML:</p><blockquote><pre class="code">$ xfreq &lt; ch03.xmlchapter: 1citetitle: 2firstterm: 16footnote: 6foreignphrase: 2function: 10itemizedlist: 2listitem: 21literal: 29note: 1orderedlist: 1para: 77programlisting: 9replaceable: 1screen: 1section: 6sgmltag: 8simplesect: 1systemitem: 2term: 6title: 7variablelist: 1varlistentry: 6xref: 2</pre></blockquote><p>The result shows only a few lines of code, but it sure does a lot ofwork. Again, thanks to the C library underneath<a name="INDEX-258" />,it's quite speedy.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch03_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">3.4. Putting Parsers to Work</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">3.6. XML::XPath</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -