📄 ch08_02.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
<td><p>The root element</p></td></tr><tr><td><p><tt class="literal">//foo</tt></p></td><td><p>An element <tt class="literal">foo</tt> at any level</p></td></tr></table><p><p>Since the thing you're most likely to select in alocation path step is an element, the default node type is anelement. But there are reasons why you should use another node type.In our example location path, we used <tt class="literal">text( )</tt> toreturn just the text node for the <tt class="literal">&lt;value&gt;</tt>element.</p><p>Most steps are<a name="INDEX-695" /> <em class="emphasis">relativelocators</em><a name="INDEX-696" /> because they define where to go relativeto the previous locator. Although locator paths are comprised mostlyof relative locators, they always start withan<a name="INDEX-697" /><em class="emphasis">absolute locator</em>, which describes a definitepoint in the document. This locator comes in two flavors:<tt class="literal">id( )</tt><a name="INDEX-698" />, which starts at an element with a givenID attribute, and <tt class="literal">root( )</tt><a name="INDEX-699" />, which starts at the root node of thedocument (an abstract node that is the parent of the documentelement). You will frequently see the shorthand"<tt class="literal">/</tt>" starting apath indicating that <tt class="literal">root( )</tt> is being used.</p><p>Now that we've trained our monkeys to understandXPath, let's give it a whirl with Perl. The<tt class="literal">XML::XPath</tt><a name="INDEX-700" /> module, written by Matt<a name="INDEX-701" />Sergeantof<a name="INDEX-702" /><tt class="literal">XML::LibXML</tt> fame, is a solid implementation ofXPath. We've written a program in <a href="ch08_02.htm#perlxml-CHP-8-EX-6">Example 8-6</a> that takes two command-line arguments: a fileand an XPath locator path. It prints the text value of all nodes itfinds that match the path.</p><a name="perlxml-CHP-8-EX-6" /><div class="example"><h4 class="objtitle">Example 8-6. A program that uses XPath </h4><blockquote><pre class="code">use XML::XPath;use XML::XPath::XMLParser;# create an object to parse the file and field XPath queriesmy $xpath = XML::XPath-&gt;new( filename =&gt; shift @ARGV );# apply the path from the command line and get back a list matchesmy $nodeset = $xpath-&gt;find( shift @ARGV );# print each node in the listforeach my $node ( $nodeset-&gt;get_nodelist ) {  print XML::XPath::XMLParser::as_string( $node ) . "\n";}</pre></blockquote></div><p>That example was simple. Now we need a datafile. Check out <a href="ch08_02.htm#perlxml-CHP-8-EX-7">Example 8-7</a>. </p><a name="perlxml-CHP-8-EX-7" /><div class="example"><h4 class="objtitle">Example 8-7. An XML datafile </h4><blockquote><pre class="code">&lt;?xml version="1.0"?&gt;&lt;!DOCTYPE inventory [  &lt;!ENTITY poison "&lt;note&gt;danger: poisonous!&lt;/note&gt;"&gt;  &lt;!ENTITY endang "&lt;note&gt;endangered species&lt;/note&gt;"&gt;]&gt;&lt;!-- Rivenwood Arboretum inventory --&gt;&lt;inventory date="2001.9.4"&gt;  &lt;category type="tree"&gt;    &lt;item id="284"&gt;      &lt;name style="latin"&gt;Carya glabra&lt;/name&gt;      &lt;name style="common"&gt;Pignut Hickory&lt;/name&gt;      &lt;location&gt;east quadrangle&lt;/location&gt;      &amp;endang;    &lt;/item&gt;    &lt;item id="222"&gt;      &lt;name style="latin"&gt;Toxicodendron vernix&lt;/name&gt;      &lt;name style="common"&gt;Poison Sumac&lt;/name&gt;      &lt;location&gt;west promenade&lt;/location&gt;      &amp;poison;    &lt;/item&gt;  &lt;/category&gt;  &lt;category type="shrub"&gt;    &lt;item id="210"&gt;      &lt;name style="latin"&gt;Cornus racemosa&lt;/name&gt;      &lt;name style="common"&gt;Gray Dogwood&lt;/name&gt;      &lt;location&gt;south lawn&lt;/location&gt;    &lt;/item&gt;    &lt;item id="104"&gt;      &lt;name style="latin"&gt;Alnus rugosa&lt;/name&gt;      &lt;name style="common"&gt;Speckled Alder&lt;/name&gt;      &lt;location&gt;east quadrangle&lt;/location&gt;      &amp;endang;    &lt;/item&gt;  &lt;/category&gt;&lt;/inventory&gt;</pre></blockquote></div><p>The first test uses the path<tt class="literal">/inventory/category/item/name</tt>:</p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "/inventory/category/item/name"</b></tt>&lt;name style="latin"&gt;Carya glabra&lt;/name&gt;&lt;name style="common"&gt;Pignut Hickory&lt;/name&gt;&lt;name style="latin"&gt;Toxicodendron vernix&lt;/name&gt;&lt;name style="common"&gt;Poison Sumac&lt;/name&gt;&lt;name style="latin"&gt;Cornus racemosa&lt;/name&gt;&lt;name style="common"&gt;Gray Dogwood&lt;/name&gt;&lt;name style="latin"&gt;Alnus rugosa&lt;/name&gt;&lt;name style="common"&gt;Speckled Alder&lt;/name&gt;</pre></blockquote><p>Every <tt class="literal">&lt;name&gt;</tt> element was found and printed.Let's get more specific with the path<tt class="literal">/inventory/category/item/name[@style='latin']</tt>:</p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "/inventory/category/item/name[@style='latin']"</b></tt>&lt;name style="latin"&gt;Carya glabra&lt;/name&gt;&lt;name style="latin"&gt;Toxicodendron vernix&lt;/name&gt;&lt;name style="latin"&gt;Cornus racemosa&lt;/name&gt;&lt;name style="latin"&gt;Alnus rugosa&lt;/name&gt;</pre></blockquote><p>Now let's use an ID attribute as a starting pointwith the path <tt class="literal">//item[@id='222']/note</tt>. (If we haddefined the attribute <tt class="literal">id</tt> in a DTD,we'd be able to use the path<tt class="literal">id('222')/note</tt>. We didn't, butthis alternate method works just as well.)</p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "//item[@id='222']/note"</b></tt>&lt;note&gt;danger: poisonous!&lt;/note&gt;</pre></blockquote><p>How about ditching the element tags? To do so, use this: </p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "//item[@id='222']/note/text( )"</b></tt>danger: poisonous!</pre></blockquote><p>When was this inventory last updated? </p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "/inventory/@date"</b></tt> date="2001.9.4"</pre></blockquote><p>With XPath, you can go hog wild! Here's the path asilly monkey might take through the tree:</p><blockquote><pre class="code">&gt; <tt class="userinput"><b>grabber.pl data.xml "//*[@id='104']/parent::*/preceding-sibling::*/child::*[2]/name[not(@style='latin')]/node( )"</b></tt>Poison Sumac</pre></blockquote><p>The monkey started on the element with the attribute<tt class="literal">id='104'</tt>, climbed up a level, jumped to theprevious element, climbed down to the second child element, found a<tt class="literal">&lt;name&gt;</tt> whose<tt class="literal">style</tt><a name="INDEX-703" /> attribute was not set to<tt class="literal">'latin'</tt>, and hopped on the child of that element,which happened to be the text node with the value <tt class="literal">PoisonSumac</tt>.</p><p>We have just seen how to use XPath expressions to locate and return aset of nodes. The implementation we are about to see is even morepowerful. <tt class="literal">XML::Twig</tt>, an ingenious module by MichelRodriguez, is quite Perlish in the way it uses XPath expressions. Ituses a hash to map them to subroutines, so you can have functionscalled automatically for certain types of nodes.</p><p>The program in <a href="ch08_02.htm#perlxml-CHP-8-EX-8">Example 8-8</a> shows how this works.When you initialize the <tt class="literal">XML::Twig</tt> object, you canset a bunch of handlers in a hash, where the keys are XPathexpressions. During the parsing stage, as the tree is built, thesehandlers are called for appropriate nodes.</p><p>As you look at <a href="ch08_02.htm#perlxml-CHP-8-EX-8">Example 8-8</a>, you'llnotice that <a name="INDEX-704" /> <a name="INDEX-705" />at-sign (<tt class="literal">@</tt>)characters are escaped. This is because <tt class="literal">@</tt> cancause a little confusion with XPath expressions living in a Perlcontext. In XPath, <tt class="literal">@foo</tt> refers to an attributenamed <tt class="literal">foo</tt>, not an array named<tt class="literal">foo</tt>. Keep this distinction in mind when going overthe XPath examples in this book and when writing your own XPath forPerl to use -- you must escape the <tt class="literal">@</tt> charactersso Perl doesn't try to interpolate arrays in themiddle of your expressions.</p><p>If your code does so much work with Perl arrays and XPath attributereferences that it's unclear which<tt class="literal">@</tt> characters are which, consider referring toattributes in longhand, using the"attribute" XPath axis:<tt class="literal">attribute::foo</tt>. This raises the issue of thedouble colon and its different meanings in Perl and XPath. SinceXPath has only a few hardcoded axes, however, andthey're always expressed in lowercase,they're easier to tell apart at a glance.</p><a name="perlxml-CHP-8-EX-8" /><div class="example"><h4 class="objtitle">Example 8-8. How twig handlers work </h4><blockquote><pre class="code">use XML::Twig;# buffers for holding textmy $catbuf = '';my $itembuf = '';# initialize parser with handlers for node processingmy $twig = new XML::Twig( TwigHandlers =&gt; {                              "/inventory/category"    =&gt; \&amp;category,                             "name[\@style='latin']"  =&gt; \&amp;latin_name,                             "name[\@style='common']" =&gt; \&amp;common_name,                             "category/item"          =&gt; \&amp;item,                                          });# parse, handling nodes on the way$twig-&gt;parsefile( shift @ARGV );# handle a category elementsub category {  my( $tree, $elem ) = @_;  print "CATEGORY: ", $elem-&gt;att( 'type' ), "\n\n", $catbuf;  $catbuf = '';}# handle an item elementsub item {  my( $tree, $elem ) = @_;  $catbuf .= "Item: " . $elem-&gt;att( 'id' ) . "\n" . $itembuf . "\n";  $itembuf = '';}# handle a latin namesub latin_name {  my( $tree, $elem ) = @_;  $itembuf .= "Latin name: " . $elem-&gt;text . "\n";}# handle a common namesub common_name {  my( $tree, $elem ) = @_;  $itembuf .= "Common name: " . $elem-&gt;text . "\n";}</pre></blockquote></div><p>Our program takes a datafile like the one shown in <a href="ch08_02.htm#perlxml-CHP-8-EX-7">Example 8-7</a> and outputs a summary report. Note that sincea handler is called only after an element is completely built, theoverall order of handler calls may not be what you expect. Thehandlers for children are called before their parent. For thatreason, we need to buffer their output and sort it out at theappropriate time.</p><p>The result comes out like this: </p><blockquote><pre class="code">CATEGORY: treeItem: 284Latin name: Carya glabraCommon name: Pignut HickoryItem: 222Latin name: Toxicodendron vernixCommon name: Poison SumacCATEGORY: shrubItem: 210Latin name: Cornus racemosaCommon name: Gray DogwoodItem: 104Latin name: Alnus rugosaCommon name: Speckled Alder</pre></blockquote><p>XPath makes the task of locating nodes in a document and describingtypes of nodes for processing ridiculously simple. It cuts down onthe amount of code you have to write because climbing around the treeto sample different parts is all taken care of. It'seasier to read than code too. We're happy with it,and because it is a standard, we'll be seeing moreuses for it in many<a name="INDEX-706" /> <a name="INDEX-707" /> modules to come.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch08_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">8. Beyond Trees: XPath, XSLT, and More</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">8.3. XSLT</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -