📄 ch14_04.htm
字号:
<?label 14.4. Writing an XML Parser?><html><head><title>Writing an XML Parser (CGI Programming with Perl)</title><link href="../style/style1.css" type="text/css" rel="stylesheet" /><meta name="DC.Creator" content="Scott Guelich, Gunther Birznieks and Shishir Gundavaram" /><meta scheme="MIME" content="text/xml" name="DC.Format" /><meta content="en-US" name="DC.Language" /><meta content="O'Reilly & Associates, Inc." name="DC.Publisher" /><meta scheme="ISBN" name="DC.Source" content="1565924193L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="CGI Programming with Perl" /><meta content="Text.Monograph" name="DC.Type" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" alt="Book Home" usemap="#banner-map" border="0" /><map name="banner-map"><area alt="CGI Programming with Perl" href="index.htm" coords="0,0,466,65" shape="rect" /><area alt="Search this book" href="jobjects/fsearch.htm" coords="467,0,514,18" shape="rect" /></map><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch14_03.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm">CGI Programming with Perl</a></td><td width="172" valign="top" align="right"><a href="ch14_05.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><hr align="left" width="515" /><h2 class="sect1">14.4. Writing an XML Parser</h2><p>The <a name="INDEX-2820" /><a name="INDEX-2821" /><a name="INDEX-2822" />XML parser example builds on the work ofthe XML::Parser library available on CPAN. XML::Parser is aninterface to a library written in <a name="INDEX-2823" /><a name="INDEX-2824" />C called <em class="emphasis">expat</em> byJames Clark. Originally Larry Wall wrote the first XML::Parserlibrary prototype for Perl. Since then, Clark Cooper has continued todevelop and maintain XML::Parser. In this section, we will write asimple middleware application using XML.</p><p>The latest versions of Netscape have a feature called<a name="INDEX-2825" /> <a name="INDEX-2,826" />"What's Related".When the user clicks on the What's Related button, the Netscapebrowser takes the URL that the user is currently viewing and looks uprelated URLs in a <a name="INDEX-2827" /> <a name="INDEX-2,828" /><a name="INDEX-2829" />searchengine. Most users don't know that the Netscape browser isactually doing this through an XML-based search engine. Dave Wineroriginally wrote an article with accompanying Frontier code to accessthe What's Related search engine at <a href="http://nirvana.userland.com/whatsRelated/">http://nirvana.userland.com/whatsRelated/</a>.</p><p>Netscape maintains a server that takes URLs and returns the relatedURL information in an XML format. Netscape wisely chose XML becausethey did not intend for users to interact directly with this serverusing HTML forms. Instead, they expected users to choose"What's Related" as a menu item and then have theNetscape browser do the XML parsing.</p><p>In other words, the Netscape "What's Related" webserver is actually serving as a middleware layer between the searchengine database and the Netscape browser itself. We will write a CGIfrontend to the Netscape application that serves up this XML todemonstrate the XML parser. In addition, we will also go one stepfurther and automatically reissue the "What'sRelated" query for each URL returned.</p><p>Before we jump into the Perl code, we need to take a look at the<a name="INDEX-2830" />XML that is typically returned from theNetscape server. In this example, we did a search on What'sRelated to <a href="http://www.eff.org/">http://www.eff.org/</a>,the web site that houses the <a name="INDEX-2831" />Electronic FrontierFoundation. Here is the returned XML:</p><blockquote><pre class="code"><RDF:RDF><RelatedLinks><aboutPage href="http://www.eff.org:80/"/><child href="http://www.privacy.org/ipc" name="Internet Privacy Coalition"/><child href="http://epic.org/" name="Electronic Privacy Information Center"/><child href="http://www.ciec.org/" name="Citizens Internet Empowerment Coalition"/><child href="http://www.cdt.org/" name="The Center for Democracy and Technology"/><child href="http://www.freedomforum.org/" name="FREE! The Freedom Forum Online. News about free press"/><child href="http://www.vtw.org/speech" name="VTW Focus on Internet Censorship legislation"/><child href="http://www.privacyrights.org/" name="Privacy Rights Clearinghouse"/><child href="http://www.privacy.org/pi" name="Privacy International Home Page"/><child href="http://www.epic.org/" name="Electronic Privacy Information Center"/><child href="http://www.anonymizer.com/" name="Anonymizer, Inc."/></RelatedLinks></RDF:RDF></pre></blockquote><p>This example is a little different from our plain XML exampleearlier. First, there is no DTD. Also, notice that the document issurrounded with an unusual tag, RDF:<a name="INDEX-2832" /> <a name="INDEX-2,833" /><a name="INDEX-2834" />RDF.This document is actually in an XML-based format called ResourceDescription Framework, or RDF. RDF describes resource data, such asthe data from <a name="INDEX-2835" />search engines, in a way that isstandard across data domains.</p><p>This XML is relatively straightforward. The <aboutPage> tagcontains a reference to the original URL we were searching. The<child> tag contains references to all the related URLs andtheir titles. The <RelatedLinks> tag sandwiches the entiredocument <a name="INDEX-2836" /><a name="INDEX-2837" /><a name="INDEX-2838" />as the<a name="INDEX-2839" />root data structure.</p><hr align="left" width="515" /><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch14_03.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0" /></a></td><td width="172" valign="top" align="right"><a href="ch14_05.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr><tr><td width="172" valign="top" align="left">14.3. Document Type Definition</td><td width="171" valign="top" align="center"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0" /></a></td><td width="172" valign="top" align="right">14.5. CGI Gateway to XML Middleware</td></tr></table></div><hr align="left" width="515" /><img src="../gifs/navbar.gif" alt="Library Navigation Links" usemap="#library-map" border="0" /><p><font size="-1"><a href="copyrght.htm">Copyright © 2001</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area href="../index.htm" coords="1,1,83,102" shape="rect" /><area href="../lnut/index.htm" coords="81,0,152,95" shape="rect" /><area href="../run/index.htm" coords="172,2,252,105" shape="rect" /><area href="../apache/index.htm" coords="238,2,334,95" shape="rect" /><area href="../sql/index.htm" coords="336,0,412,104" shape="rect" /><area href="../dbi/index.htm" coords="415,0,507,101" shape="rect" /><area href="../cgi/index.htm" coords="511,0,601,99" shape="rect" /></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -