⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch06_01.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
字号:
<html><head><title>Tree Processing (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_07.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch06_02.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h1 class="chapter">Chapter 6. Tree Processing</h1><div class="htmltoc"><h4 class="tochead">Contents:</h4>  <p> <a href="#perlxml-CHP-6-SECT-1">XML Trees</a><br /><a href="ch06_02.htm">XML::Simple</a><br /><a href="ch06_03.htm">XML::Parser's Tree Mode</a><br /><a href="ch06_04.htm">XML::SimpleObject</a><br /><a href="ch06_05.htm">XML::TreeBuilder</a><br /><a href="ch06_06.htm">XML::Grove</a><br /></p></div><p>Having<a name="INDEX-478" /></a> done just about all we can do withstreams, it's time to move on to another style ofXML processing. Instead of letting the XML fly past the program onetiny piece at a time, we will capture the whole document in memoryand <em class="emphasis">then</em> start working on it. Having anin-memory representation built behind the scenes for us makes our jobmuch easier, although it tends to require more memory and CPU cycles.</p><p>This chapter is an overview of programming with persistent XMLobjects, better known as<a name="INDEX-479" /></a><em class="emphasis">tree processing</em>. It looks at a variety ofdifferent modules and strategies for building and accessing XMLtrees, including the rigorous, standard Document Object Model (DOM),fast access to internal document parts with XPath, and efficient treeprocessing methods.</p><div class="sect1"><a name="perlxml-CHP-6-SECT-1" /></a><h2 class="sect1">6.1. XML Trees</h2><p>Every XML document can be represented as a collection of data objectslinked in an acyclic structure called a tree. Each object, or<em class="emphasis">node</em><a name="INDEX-480" /></a>, is a small piece of the document, suchas an element, a piece of text, or a processing instruction. Onenode, called the <em class="emphasis">root</em>, links to other nodes,and so on down to nodes that aren't linked toanything. Graph this image out and it looks like a big, bushytree -- hence the name.</p><p>A tree structure representing a piece of XML is a handy thing tohave. Since a tree is acyclic (it has no circular links), you can usesimple traversal methods that won't get stuck ininfinite loops. Like a filesystem directory tree, you can representthe location of a node easily in simple shorthand. Like real trees,you can break a piece off and treat it like a smaller tree -- atree is just a collection of subtrees joined by a root node. Best ofall, you have all the information in one place and search through itlike a database.</p><p>For the programmer, a tree makes life much easier. Stream processing,you will recall, remembers fleeting details to use later inconstructing another data structure or printing out information. Thiswork is tedious, and can be downright horrible for very complexdocuments. If you have to combine pieces of information fromdifferent parts of the document, then you might go mad. If you have atree containing the document, though, all the details are right infront of you. You only need to write code to sift through the nodesand pull out what you need.</p><p>Of course, you don't get anything good for free.There is a penalty for having easy access to every point in adocument. Building the tree in the first place takes time andprecious CPU cycles, and even more if you use object-oriented methodcalls. There is also a memory tax to pay, since each object in thetree takes up some space. With very large documents (trees withmillions of nodes are not unheard of), you could bring your poormachine down to its knees with a tree processing program. On theaverage, though, processing trees can get you pretty good results(especially with a little optimizing, as we show later in thechapter), so don't give up just yet.</p><p>As we talk about trees, we will frequently use genealogical terms todescribe relationships between nodes. A <a name="INDEX-481" /></a>container node is said to bethe <em class="emphasis">parent</em><a name="INDEX-482" /></a> of the nodes it branches to, each ofwhich may be called a <em class="emphasis">child</em> of the containernode. Likewise, the terms<em class="emphasis">descendant</em><a name="INDEX-483" /></a>,<em class="emphasis">ancestor</em><a name="INDEX-484" /></a>, and<em class="emphasis">sibling</em><a name="INDEX-485" /></a> mean pretty much what you think theywould. So two sibling nodes share the same parent node, and all nodeshave the root node as their ancestor.</p><p>There are several different species of trees, depending on theimplementation you're talking about. Each speciesmodels the document in a slightly different way. For example, do youconsider an entity reference to be a separate node from text, orwould you include the reference in the same package? You have to payattention to the individual scheme of each module. <a href="ch06_01.htm#perlxml-CHP-6-TABLE-1">Table 6-1</a> shows a common selection of node types.</p><a name="perlxml-CHP-6-TABLE-1" /></a><h4 class="objtitle">Table 6-1. Typical node type definitions </h4><table border="1"><tr><th><p>Type</p></th><th><p>Properties</p></th></tr><tr><td><p><a name="INDEX-486" /></a>Element</p></td><td><p>Name, attributes, references to children</p></td></tr><tr><td><p><a name="INDEX-487" /></a>Namespace</p></td><td><p>Prefix name, URI</p></td></tr><tr><td><p><a name="INDEX-488" /></a>Character data</p></td><td><p>String of characters</p></td></tr><tr><td><p><a name="INDEX-489" /></a>Processinginstruction</p></td><td><p>Target, Data</p></td></tr><tr><td><p><a name="INDEX-490" /></a>Comment</p></td><td><p>String of characters</p></td></tr><tr><td><p><a name="INDEX-491" /></a>CDATA section</p></td><td><p>String of characters</p></td></tr><tr><td><p><a name="INDEX-492" /></a>Entity reference</p></td><td><p>Name, Replacement text (or System ID and/or Public ID)</p></td></tr></table><p><p>In addition to this set, some implementations define node types forthe DTD, allowing a programmer to access declarations for elements,entities, notations, and attributes. Nodes may also exist for the XMLdeclaration and document type declarations.</p></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_07.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch06_02.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">5.7. XML::SAX: The Second Generation</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">6.2. XML::Simple</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -