📄 ch07_04.htm
字号:
<html><head><title>XML::LibXML (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch08_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">7.4. XML::LibXML</h2><p>Matt Sergeant's<tt class="literal">XML::LibXML</tt><a name="INDEX-668" /> module is an interface to the GNOMEproject's <a name="INDEX-669" />LibXML library. It'squickly becoming a popular implementation of DOM, demonstrating speedand completeness over the older <tt class="literal">XML::Parser</tt> basedmodules. It also implements Level 2 DOM, which means it has supportfor namespaces.</p><p>So far, we haven't worked much with namespaces. Alot of people opt to avoid them. They add a new level of complexityto markup and code, since you have to handle both local names andprefixes. However, namespaces are becoming more important in XML, andsooner or later, we all will have to deal with them. The populartransformation language XSLT uses namespaces to distinguish betweentags that are instructions and tags that are data (i.e., whichelements should be output and which should be used to control theoutput).</p><p>You'll even see namespaces used in good old HTML.Namespaces provide a way to import specialized markup into documents,such as equations into regular HTML pages. The MathML language(<a href="http://www.w3.org/Math/">http://www.w3.org/Math/</a>) doesjust that. <a href="ch07_04.htm#perlxml-CHP-7-EX-1">Example 7-1</a> incorporates MathML into itwith namespaces.</p><a name="perlxml-CHP-7-EX-1" /><div class="example"><h4 class="objtitle">Example 7-1. A document with namespaces </h4><blockquote><pre class="code"><html><body xmlns:eq="http://www.w3.org/1998/Math/MathML"><h1>Billybob's Theory</h1><p>It is well-known that cats cannot be herded easily. That is, they donot tend to run in a straight line for any length of time unless theyreally want to. A cat forced to run in a straight line against itswill has an increasing probability, with distance, of deviating fromthe line just to spite you, given by this formula:</p><p> <!-- P = 1 - 1/(x^2) --> <eq:math> <eq:mi>P</eq:mi><eq:mo>=</eq:mo><eq:mn>1</eq:mn><eq:mo>-</eq:mo> <eq:mfrac> <eq:mn>1</eq:mn> <eq:msup> <eq:mi>x</eq:mi> <eq:mn>2</eq:mn> </eq:msup> </eq:mfrac> </eq:math></p></body></html></pre></blockquote></div><p>The tags with <tt class="literal">eq:</tt> prefixes are part of a namespaceidentified by the URI <a href="http://www.w3.org/1998/Math/MathML">http://www.w3.org/1998/Math/MathML</a>, definedin an attribute in the <tt class="literal"><body></tt> element. Usinga namespace helps the browser discern between what is native to HTMLand what is not. Browsers that understand MathML route the qualifiedelements to their equation formatter instead of the regular HTMLformatter.</p><p>Some browsers are confused by the MathML tags and renderunpredictable results. One particularly useful utility is a programthat detects and removes namespace-qualified elements that would gumup an older HTML processor. The following example uses DOM2 to siftthrough a document and strip out all elements that have a namespaceprefix.</p><p>The first step is to parse the file: </p><blockquote><pre class="code">use XML::LibXML;my $parser = XML::LibXML->new( );my $doc = $parser->parse_file( shift @ARGV );</pre></blockquote><p>Next, we locate the document element and run a recursive subroutineon it to ferret out the namespace-qualified elements. Afterwards, weprint out the document:</p><blockquote><pre class="code">my $mathuri = 'http://www.w3.org/1998/Math/MathML';my $root = $doc->getDocumentElement;&amp;purge_nselems( $root );print $doc->toString;</pre></blockquote><p>This routine takes an element node and, if it has a namespace prefix,removes it from its parent's content list.Otherwise, it goes on to process the descendants:</p><blockquote><pre class="code">sub purge_nselems { my $elem = shift; return unless( ref( $elem ) =~ /Element/ ); if( $elem->prefix ) { my $parent = $elem->parentNode; $parent->removeChild( $elem ); } elsif( $elem->hasChildNodes ) { my @children = $elem->getChildnodes; foreach my $child ( @children ) { &purge_nselems( $child ); } }}</pre></blockquote><p>You might have noticed that this DOM implementation adds some Perlishconveniences over the recommended DOM interface. The call to<tt class="literal">getChildnodes</tt>, in an array context, returns aPerl list instead of a more cumbersome <tt class="literal">NodeList</tt>object. Called in a scalar context, it would return the number ofchild nodes for that node, so <tt class="literal">NodeList</tt>saren't really used at all.</p><p>Simplifications like this are common in the Perl world, and no onereally seems to mind. The emphasis is usually on ease of use overrigorous object-oriented protocol. Of course, one would hope that allDOM implementations in the Perl world adopt the same conventions,which is why many long discussions on the<em class="emphasis">perl-xml</em> mailing list try to decide the best wayto adopt standards. A current debate discusses how to implement SAX2(which supports namespaces) in the most logical, Perlish way.</p><p>Matt Sergeant has stocked the <tt class="literal">XML::LibXML</tt> packagewith other goodies. The <tt class="literal">Node</tt> class has a methodcalled <tt class="literal">findnodes()</tt><a name="INDEX-670" /> <a name="INDEX-671" />, which takes an XPath expression as anargument, allowing retrieval of nodes in more flexible ways thanpermitted by the ordinary DOM interface. The parser has options thatcontrol how pedantically the parser runs, entity resolution, andwhitespace significance. One can also opt to use special handlers forunparsed entities. Overall, this module is<a name="INDEX-672" /> excellent forDOM<a name="INDEX-673" />programming.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch08_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">7.3. XML::DOM</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">8. Beyond Trees: XPath, XSLT, and More</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -