首页 › 资源下载 › 其他书籍 › Perl & XML. by Er › 源码查看
ch07_03.htm

来自「Perl & XML. by Erik T. Ray and Jason 」· HTM 代码 · 共 207 行
HTM
207 行
<html><head><title>XML::DOM (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch07_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">7.3. XML::DOM</h2><p>Enno <a name="INDEX-652" />Derkson's<tt class="literal">XML::DOM</tt><a name="INDEX-653" /> <a name="INDEX-654" />module is a good place to start exploring DOM in Perl.It's a complete implementation of Level 1 DOM with afew extra features thrown in for convenience.<tt class="literal">XML::DOM::Parser</tt> extends<tt class="literal">XML::Parser</tt> to build a document tree installed inan <tt class="literal">XML::DOM::Document</tt> object whose reference itreturns. This reference gives you complete access to the tree. Therest, we happily report, works pretty much as you'dexpect.</p><p>Here's a program that uses DOM to processan<a name="INDEX-655" /> XHTML file. It looks inside<tt class="literal">&lt;p&gt;</tt> elements for the word"monkeys," replacing every instancewith a link to <tt class="literal">monkeystuff.com</tt>. Sure, you could doit with a regular expression substitution, but this example isvaluable because it shows how to search for and create new nodes, andread and change values, all in the unique DOM style.</p><p>The first part of the program creates a parser object and gives it afile to parse with the call to<a name="INDEX-656" /> <tt class="literal">parsefile( )</tt>:</p><blockquote><pre class="code">use XML::DOM;&amp;process_file( shift @ARGV );sub process_file {    my $infile = shift;    my $dom_parser = new XML::DOM::Parser;            # create a parser object    my $doc = $dom_parser-&gt;parsefile( $infile );      # make it parse a file    &amp;add_links( $doc );                               # perform our changes    print $doc-&gt;toString;                             # output the tree again    $doc-&gt;dispose;                                    # clean up memory}</pre></blockquote><p>This method returns a reference to an<tt class="literal">XML::DOM::Document</tt> object, which is our gateway tothe nodes inside. We pass this reference along to a routine called<tt class="literal">add_links( ),</tt><a name="INDEX-657" /> which will do all the processing werequire. Finally, we output the tree with a call to<tt class="literal">toString( )</tt><a name="INDEX-658" />, and then dispose of the object. Thislast step performs necessary cleanup in case any circular referencesbetween nodes could result in a memory leak.</p><p>The next part burrows into the tree to start processing paragraphs: </p><blockquote><pre class="code">sub add_links {    my $doc = shift;                                      # find all the &lt;p&gt; elements    my $paras = $doc-&gt;getElementsByTagName( "p" );    for( my $i = 0; $i &lt; $paras-&gt;getLength; $i++ ) {        my $para = $paras-&gt;item( $i );        # for each child of a &lt;p&gt;, if it is a text node, process it        my @children = $para-&gt;getChildNodes;        foreach my $node ( @children ) {            &amp;fix_text( $node ) if( $node-&gt;getNodeType eq TEXT_NODE );        }    }}</pre></blockquote><p>The <tt class="literal">add_links( )</tt><a name="INDEX-659" /> routine starts with a call to thedocument object's <tt class="literal">getElementsByTagName()</tt><a name="INDEX-660" /> method. It returns an<tt class="literal">XML::DOM::NodeList</tt> object containing all matching<tt class="literal">&lt;p&gt;</tt>s in the document (multilevel searchingis so convenient) from which we can select nodes by index using<tt class="literal">item( )</tt><a name="INDEX-661" />.</p><p>The bit we're interested in will be hiding inside atext node inside the <tt class="literal">&lt;p&gt;</tt> element, so we haveto iterate over the children to find text nodes and process them. Thecall to <tt class="literal">getChildNodes()</tt><a name="INDEX-662" /> gives us several child nodes, either ina <tt class="literal">generic</tt> Perl list (when called in an arraycontext) or another <tt class="literal">XML::DOM::NodeList</tt> object; forvariety's sake, we've selected thefirst option. For each node, we test its type with a call to<tt class="literal">getNodeType</tt> and compare the result to<tt class="literal">XML::DOM</tt>'s constant for textnodes, provided by<a name="INDEX-663" /> <tt class="literal">TEXT_NODE( )</tt>. Nodesthat pass the test are sent off to a routine for some node massaging.</p><p>The last part of the program targets text nodes and splits themaround the word "monkeys" to createa link:</p><blockquote><pre class="code">sub fix_text {    my $node = shift;    my $text = $node-&gt;getNodeValue;    if( $text =~ /(monkeys)/i ) {        # split the text node into 2 text nodes around the monkey word        my( $pre, $orig, $post ) = ( $`, $1, $' );        my $tnode = $node-&gt;getOwnerDocument-&gt;createTextNode( $pre );        $node-&gt;getParentNode-&gt;insertBefore( $tnode, $node );        $node-&gt;setNodeValue( $post );        # insert an &lt;a&gt; element between the two nodes        my $link = $node-&gt;getOwnerDocument-&gt;createElement( 'a' );        $link-&gt;setAttribute( 'href', 'http://www.monkeystuff.com/' );        $tnode = $node-&gt;getOwnerDocument-&gt;createTextNode( $orig );        $link-&gt;appendChild( $tnode );        $node-&gt;getParentNode-&gt;insertBefore( $link, $node );        # recurse on the rest of the text node         # in case the word appears again        fix_text( $node );    }}</pre></blockquote><p>First, the routine grabs the node's text value bycalling its <tt class="literal">getNodeValue( )</tt> method. DOMspecifies redundant accessor methods used to get and set values ornames, either through the generic<tt class="literal">Node</tt><a name="INDEX-664" /> class or through the more specificclass's methods. Instead of <tt class="literal">getNodeValue()</tt><a name="INDEX-665" />, we could have used <tt class="literal">getData()</tt>, which is specific to the text node class. For somenodes, such as elements, there is no defined value, so the generic<tt class="literal">getNodeValue( )</tt> method would return an undefinedvalue.</p><p>Next, we slice the node in two. We do this by creating a new textnode and inserting it before the existing one. After we set the textvalues of each node, the first will contain everything before theword "monkeys", and the other willhave everything after the word. Note the use of the<tt class="literal">XML::DOM::Document</tt> object as a factory to createthe new text node. This DOM feature takes care of many administrativetasks behind the scenes, making the genesis of new nodes painless.</p><p>After that step, we create an <tt class="literal">&lt;a&gt;</tt> elementand insert it between the text nodes. Like all good links, it needs aplace to put the URL, so we set it up with an <tt class="literal">href</tt>attribute. To have something to click on, the link needs text, so wecreate a text node with the word"monkeys" and append it to theelement's child list. Then the routine will recurseon the text node after the link in case there are more instances of"monkeys" to process.</p><p>Does it work? Running the program on this file: </p><blockquote><pre class="code">&lt;html&gt;&lt;head&gt;&lt;title&gt;Why I like Monkeys&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Why I like Monkeys&lt;/h1&gt;&lt;h2&gt;Monkeys are Cute&lt;/h2&gt;&lt;p&gt;Monkeys are &lt;b&gt;cute&lt;/b&gt;. They are like small, hyper versions ofourselves. They can make funny facial expressions and stick out theirtongues.&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;</pre></blockquote><p>produces this<a name="INDEX-666" /> <a name="INDEX-667" /> output:</p><blockquote><pre class="code">&lt;html&gt;&lt;head&gt;&lt;title&gt;Why I like Monkeys&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Why I like Monkeys&lt;/h1&gt;&lt;h2&gt;Monkeys are Cute&lt;/h2&gt;&lt;p&gt;&lt;a href="http://www.monkeystuff.com/"&gt;Monkeys&lt;/a&gt; are &lt;b&gt;cute&lt;/b&gt;. They are like small, hyper versions ofourselves. They can make funny facial expressions and stick out theirtongues.&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;</pre></blockquote><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch07_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">7.2. DOM Class Interface Reference</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">7.4. XML::LibXML</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
ch07_03.htm - 源码说明

本页面展示了「Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April」中的 ch07_03.htm 源码文件，采用 HTM 编程语言编写，共 207 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与T.相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?