📄 ch08_02.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 2 页
字号:
12 下一页
<html><head><title>XPath (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch08_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">8.2. XPath</h2><p>Imagine<a name="INDEX-679" /><a name="INDEX-680" /> thatyou have an army of monkeys at your disposal. You say to them,"I want you to get me a banana frappe from the icecream parlor on Massachusetts Avenue just north of PorterSquare." Not being very smart monkeys, they go outand bring back every beverage they can find, leaving you to tastethem all to figure out which is the one you wanted. To retrain them,you send them out to night school to learn a rudimentary language,and in a few months you repeat the request. Now the monkeys followyour directions, identify the exact item you want, and return withit.</p><p>We've just described the kind of problem XPath wasdesigned to solve. XPath is one of the most useful technologiessupporting XML. It provides an interface to find nodes in a purelydescriptive way, so you don't have to write code tohunt them down yourself. You merely specify the kind of nodes thatinterest you and an XPath parser will retrieve them for you.Suddenly, XML goes from becoming a vast, confusing pile of nodes to awell-indexed filing cabinet of data.</p><p>Consider the XML document in <a href="ch08_02.htm#perlxml-CHP-8-EX-4">Example 8-4</a>. </p><a name="perlxml-CHP-8-EX-4" /><div class="example"><h4 class="objtitle">Example 8-4. A preferences file </h4><blockquote><pre class="code">&lt;plist&gt;  &lt;dict&gt;    &lt;key&gt;DefaultDirectory&lt;/key&gt;    &lt;string&gt;/usr/local/fooby&lt;/string&gt;    &lt;key&gt;RecentDocuments&lt;/key&gt;    &lt;array&gt;      &lt;string&gt;/Users/bobo/docs/menu.pdf&lt;/string&gt;      &lt;string&gt;/Users/slappy/pagoda.pdf&lt;/string&gt;      &lt;string&gt;/Library/docs/Baby.pdf&lt;/string&gt;    &lt;/array&gt;    &lt;key&gt;BGColor&lt;/key&gt;    &lt;string&gt;sage&lt;/string&gt;  &lt;/dict&gt;&lt;/plist&gt;</pre></blockquote></div><p>This document is a typical preferences file for a program with aseries of data keys and values. Nothing in it is too complex. Toobtain the value of the key <tt class="literal">BGColor</tt>,you'd have to locate the<tt class="literal">&lt;key&gt;</tt> element containing the word"BGColor" and step ahead to thenext element, a <tt class="literal">&lt;string&gt;</tt>. Finally, you wouldread the value of the text node inside. In DOM, you might do it asshown in <a href="ch08_02.htm#perlxml-CHP-8-EX-5">Example 8-5</a>.</p><a name="perlxml-CHP-8-EX-5" /><div class="example"><h4 class="objtitle">Example 8-5. Program to get a preferred color </h4><blockquote><pre class="code">sub get_bgcolor {    my @keys = $doc-&gt;getElementsByTagName( 'key' );    foreach my $key ( @keys ) {        if( $key-&gt;getFirstChild-&gt;getData eq 'BGColor' ) {            return $key-&gt;getNextSibling-&gt;getData;        }    }    return;}</pre></blockquote></div><p>Writing one routine like this isn't too bad, butimagine if you had to do hundreds of queries like it. And thisprogram was for a relatively simple document -- imagine howcomplex the code could be for one that was many levels deep. It wouldbe nice to have a shorthand way of doing the same thing, say, on oneline of code. Such a syntax would be much easier to read, write, anddebug. This is where XPath comes in.</p><p>XPath is a language for expressing a path to a node or set of nodesanywhere in a document. It's simple, expressive, andstandard (backed by the W3C, the folks who brought youXML).<a href="#FOOTNOTE-28">[28]</a> You'll see it used in XSLT for matchingrules to nodes, and in XPointer, a technology for linking XMLdocuments to resources. You can also find it in many Perl modules, aswe'll show you soon.</p><blockquote class="footnote"> <a name="FOOTNOTE-28" /><p>[28]The recommendation is on the Web at <a href="http://www.w3.org/TR/xpath.html">http://www.w3.org/TR/xpath.html</a>.</p></blockquote><p>An XPath <a name="INDEX-681" />expression is called a<em class="emphasis">location path</em><a name="INDEX-682" /> and consists of somenumber of path <em class="emphasis">steps</em> that extend the path alittle bit closer to the goal. Starting from an absolute, knownposition (for example, the root of the document), the steps"walk" across the document tree toarrive at a node or set of nodes. The syntax looks much like afilesystem path, with steps separated by slash characters<a name="INDEX-683" /> <a name="INDEX-684" />(/).</p><p>This location path shows how to find that color value in our lastexample:</p><blockquote><pre class="code">/plist/dict/key[text()='BGColor']/following-sibling::*[1]/text( )</pre></blockquote><p>A location path is processed by starting at an absolute location inthe document and moving to a new<a name="INDEX-685" />node(or nodes) with each step. At any point in the search,a<a name="INDEX-686" /> <em class="emphasis">current node</em>serves as the context for the next step. If multiple nodes match thenext step, the search branches and the processor maintains a set ofcurrent nodes. Here's how the location path shownabove would be processed:</p><ul><li><p>Start at the root node (one level above the root element). </p></li><li><p>Move to a <tt class="literal">&lt;plist&gt;</tt><a name="INDEX-687" />element that is a child of the current node.</p></li><li><p>Move to a <tt class="literal">&lt;dict&gt;</tt><a name="INDEX-688" />element that is a child of the current node.</p></li><li><p>Move to a <tt class="literal">&lt;key&gt;</tt><a name="INDEX-689" />element that is a child of the current node and that has the value<tt class="literal">BGColor</tt>.</p></li><li><p>Find the next element after the current node. </p></li><li><p>Return any text nodes belonging to the current node. </p></li></ul><p>Because node searches can branch if multiple nodes match, wesometimes have to add a test condition to a step to restrict theeligible candidates. Adding a test condition was necessary for the<tt class="literal">&lt;key&gt;</tt> sampling step where multiple nodeswould have matched, so we added a test condition requiring the valueof the element to be <tt class="literal">BGColor</tt>. Without the test, wewould have received all text nodes from all siblings immediatelyfollowing a <tt class="literal">&lt;key&gt;</tt> element.</p><p>This location path matches all <tt class="literal">&lt;key&gt;</tt>elements in the document:</p><blockquote><pre class="code">/plist/dict/key</pre></blockquote><p>Of the many kinds of test conditions, all result in a booleantrue/false answer. You can test the position (where a node is in thelist), existence of children and attributes, numeric comparisons, andall kinds of boolean expressions using AND and OR operators.Sometimes a test consists of only a number, which is shorthand forspecifying an index into a node list, so the test<tt class="literal">[1]</tt> says, "stop at the first nodethat matches."</p><p>You can link multiple tests inside the brackets with<a name="INDEX-690" />boolean operations. Alternatively, you canchain tests with multiple sets of brackets, functioning as an ANDoperator. Every path step has an implicit test that prunes the searchtree of blind alleys. If at any point a step turns up zero matchingnodes, the search along that branch terminates.</p><p>Along with boolean tests, you can shape a location path withdirectives called <em class="emphasis">axes</em>. An axis is like acompass needle that tells the processor which direction to travel.Instead of the default, which is to descend from the current node toits children, you can make it go up to the parent and ancestors orlaterally among its siblings. The axis is written as a prefix to thestep with a <a name="INDEX-691" /> <a name="INDEX-692" />doublecolon (<tt class="literal">::</tt>). In our last example, we used the axis<tt class="literal">following-sibling</tt> to jump from the current node toits next-door neighbor.</p><p>A step is not limited to frolicking with elements. You can specifydifferent kinds of nodes, including attributes, text, processinginstructions, and comments, or leave it generic with a selector forany node type. You can specify the node type in many ways, some ofwhich are listed here:</p><a name="ch08-3-fm2xml" /><table border="1"><tr><th><p>Symbol</p></th><th><p>Matches</p></th></tr><tr><td><p><tt class="literal">node( )</tt><a name="INDEX-693" /></p></td><td><p>Any node</p></td></tr><tr><td><p><tt class="literal">text( )</tt><a name="INDEX-694" /></p></td><td><p>A text node</p></td></tr><tr><td><p><tt class="literal">element::foo</tt></p></td><td><p>An element named <tt class="literal">foo</tt></p></td></tr><tr><td><p><tt class="literal">foo</tt></p></td><td><p>An element named <tt class="literal">foo</tt></p></td></tr><tr><td><p><tt class="literal">attribute::foo</tt></p></td><td><p>An attribute named <tt class="literal">foo</tt></p></td></tr><tr><td><p><tt class="literal">@foo</tt></p></td><td><p>An attribute named <tt class="literal">foo</tt></p></td></tr><tr><td><p><tt class="literal">@*</tt></p></td><td><p>Any attribute</p></td></tr><tr><td><p><tt class="literal">*</tt></p></td><td><p>Any element</p></td></tr><tr><td><p><tt class="literal">.</tt></p></td><td><p>This element</p></td></tr><tr><td><p><tt class="literal">..</tt></p></td><td><p>The parent element</p></td></tr><tr><td><p><tt class="literal">/</tt></p></td><td><p>The root node</p></td></tr><tr><td><p><tt class="literal">/*</tt></p></td>
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -