首页 › 资源下载 › 其他书籍 › Perl & XML. by Er › 源码查看
ch03_05.htm

来自「Perl & XML. by Erik T. Ray and Jason 」· HTM 代码 · 共 178 行
HTM
178 行
<html><head><title>XML::LibXML (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch03_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">3.5. XML::LibXML</h2><p><tt class="literal">XML::LibXML</tt><a name="INDEX-249" />, like <tt class="literal">XML::Parser</tt>, isan interface to a library written in <a name="INDEX-250" />C. Called<tt class="literal">libxml2</tt><a name="INDEX-251" />,it's part of the <a name="INDEX-252" />GNOME project.<a href="#FOOTNOTE-17">[17]</a>Unlike <tt class="literal">XML::Parser</tt>, this new parser supports amajor standard for XML tree processing known as the Document ObjectModel (<a name="INDEX-253" /> <a name="INDEX-254" />DOM).</p><blockquote class="footnote"> <a name="FOOTNOTE-17" /><p>[17]Fordownloads and documentation, see <a href="http://www.libxml.org/">http://www.libxml.org/</a>.</p> </blockquote><p>DOM is another much-ballyhooed XML standard. It does for<a name="INDEX-255" />tree processing what SAX does for eventstreams. If you have your heart set on climbing trees in your programand you think there's a likelihood that it might bereused or applied to different data sources, you'rebetter off using something standard and interchangeable. Again,we're happy to delve into DOM in a future chapterand get you thinking in standards-complaint ways. That topic iscoming up in <a href="ch07_01.htm">Chapter 7, "DOM"</a>.</p><p>Now we want to show you an example of another parser in action.We'd be remiss if we focused on just one kind ofparser when so many are out there. Again, we'll showyou a basic example, nothing fancy, just to show you how to invokethe parser and tame its power. Let's write anotherdocument analysis tool like we did in <a href="ch03_04.htm#perlxml-CHP-3-EX-5">Example 3-5</a>,this time printing a frequency distribution of elements in adocument.</p><p><a href="ch03_05.htm#perlxml-CHP-3-EX-6">Example 3-6</a> shows the program. It'sa vanilla parser run because we haven't set anyoptions yet. Essentially, the parser parses the filehandle andreturns a DOM object, which is nothing more than a tree structure ofwell-designed objects. Our program finds the document element, andthen traverses the entire tree one element at a time, all the whileupdating the hash of frequency counters.</p><a name="perlxml-CHP-3-EX-6" /><div class="example"><h4 class="objtitle">Example 3-6. A frequency distribution program </h4><blockquote><pre class="code">use XML::LibXML;use IO::Handle;# initialize the parsermy $parser = new XML::LibXML;# open a filehandle and parsemy $fh = new IO::Handle;if( $fh-&gt;fdopen( fileno( STDIN ), "r" )) {    my $doc = $parser-&gt;parse_fh( $fh );    my %dist;    &amp;proc_node( $doc-&gt;getDocumentElement, \%dist );    foreach my $item ( sort keys %dist ) {        print "$item: ", $dist{ $item }, "\n";    }    $fh-&gt;close;}# process an XML tree node: if it's an element, update the# distribution list and process all its children#sub proc_node {    my( $node, $dist ) = @_;    return unless( $node-&gt;nodeType eq &amp;XML_ELEMENT_NODE );    $dist-&gt;{ $node-&gt;nodeName } ++;    foreach my $child ( $node-&gt;getChildnodes ) {        &amp;proc_node( $child, $dist );    }}</pre></blockquote></div><p>Note that instead of using a simple path to a file, we use afilehandle object of the <tt class="literal">IO::Handle</tt> class. Perlfilehandles, as you probably know, are magic and subtle beasties,capable of passing into your code characters from a wide variety ofsources, including files on disk, open network sockets, keyboardinput, databases, and just about everything else capable ofoutputting data. Once you define a filehandle'ssource, it gives you the same interface for reading from it as doesevery other filehandle. This dovetails nicely with our XML-basedideology, where we want code to be as flexible and reusable aspossible. After all, XML doesn't care where it comesfrom, so why should we pigeonhole it with one source type?</p><p>The parser object returns a document object after parsing. Thisobject has a method that returns a reference to the documentelement -- the element at the very root of the whole tree. We takethis reference and feed it to a recursive subroutine,<tt class="literal">proc_node( )</tt>, which happily munches on elementsand scribbles into a hash variable every time it sees an element.<a name="INDEX-256" />Recursion is an efficient way to writeprograms that process XML because the structure of documents issomewhat fractal: the same rules for elements apply at any depth orposition in the document, including the root element that representsthe entire document (modulo its prologue). Note the"node type" check, whichdistinguishes between elements and other parts of a document (such aspieces of text or processing instructions).</p><p>For every element the routine looks at, it has to call theobject's <tt class="literal">getChildnodes()</tt><a name="INDEX-257" /> method to continue processing on itschildren. This call is an essential difference between stream-basedand tree-based methodologies. Instead of having an event stream takethe steering wheel of our program and push data at it, thus callingsubroutines and codeblocks in a (somewhat) unpredictable order, ourprogram now has the responsibility of navigating through the documentunder its own power. Traditionally, we start at the root element andgo downward, processing children in order from first to last.However, because we, not the parser, are in control now, we can scanthrough the document in any way we want. We could go backwards, wecould scan just a part of the document, we could jump around, makingmultiple passes though the tree -- the sky's thelimit. Here's the result from processing a smallchapter coded in DocBook XML:</p><blockquote><pre class="code">$ xfreq &lt; ch03.xmlchapter: 1citetitle: 2firstterm: 16footnote: 6foreignphrase: 2function: 10itemizedlist: 2listitem: 21literal: 29note: 1orderedlist: 1para: 77programlisting: 9replaceable: 1screen: 1section: 6sgmltag: 8simplesect: 1systemitem: 2term: 6title: 7variablelist: 1varlistentry: 6xref: 2</pre></blockquote><p>The result shows only a few lines of code, but it sure does a lot ofwork. Again, thanks to the C library underneath<a name="INDEX-258" />,it's quite speedy.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch03_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">3.4. Putting Parsers to Work</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">3.6. XML::XPath</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
ch03_05.htm - 源码说明

本页面展示了「Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April」中的 ch03_05.htm 源码文件，采用 HTM 编程语言编写，共 178 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与T.相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?