📄 ch09_02.htm
字号:
<html><head><title>XML::RSS (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch09_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch09_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">9.2. XML::RSS </h2><p>By<a name="INDEX-722" /><em class="emphasis">helper modules</em>, we<a name="INDEX-723" /> mean more focused versions of the XMLprocessors we've already pawed through in our Perland XML toolbox. In a way, <tt class="literal">XML::Parser</tt> and its ilkare helper applications since they save you from approaching eachXML-chomping job with Perl's built-in file-readingfunctions and regular expressions by turning documents intoimmediately useful objects or event streams. Also,<tt class="literal">XML::Writer</tt> and friends replace plain old<tt class="literal">print</tt> statements with a more abstract and saferway to create XML documents.</p><p>However, the XML modules we cover now offer their services in a veryspecific direction. By using one of these modules in your program,you establish that you plan to use XML, but only a small, clearlydefined subsection of it. By submitting to this restriction, you getto use (and create) software modules that handle all the toil ofworking with raw XML, presenting the main part of your code withmethods and routines specific only to the application at hand.</p><p>For our example, we'll look at<tt class="literal">XML::RSS</tt> -- a little number by JonathanEisenzopf.</p><a name="perlxml-CHP-9-SECT-2.1" /><div class="sect2"><h3 class="sect2">9.2.1. Introduction to RSS </h3><p><a name="INDEX-724" /> <a name="INDEX-725" /> <a name="INDEX-726" />RSS (short for Rich Site Summary or ReallySimple Syndication, depending upon whom you ask) is one of the firstXML applications whose use became rapidly popular on a global scale,thanks to the Web. While RSS itself is little more than anagreed-upon way to summarize web page content, it gives theadministrators of news sites, web logs, and any other frequentlyupdated web site a standard and sweat-free way of telling the worldwhat's new. Programs that can parse RSS can dowhatever they'd like with this document, perhapstelling its masters by mail or by web page what interesting things ithas learned in its travels. A special type of RSS program is anaggregator, a program that collects RSS from various sources and thenknits it together into new RSS documents combining the information,so that lazier RSS-parsing programs won't have totravel so far.</p><p>Current popular aggregators include<a name="INDEX-727" />Netscape, by way ofits customizable my.netscape.comsite (which was, in fact, the birthplace of the earliest RSSversions) and Dave Winer's <a href="http://www.scripting.com">http://www.scripting.com</a> (whose aggregatorhas a public frontend at <a href="http://aggregator.userland.com/register">http://aggregator.userland.com/register</a>).These aggregators, in turn, share what they pick up as RSS, turningthem into one-stop RSS shops for other interested entities. Web sitesthat collect and present links to new stuff around the Web, such asthe O'Reilly Network's Meerkat(<a href="http://meerkat.oreillynet.com">http://meerkat.oreillynet.com</a>),hit these aggregators often to get information on RSS-enabled websites, and then present it to the site's user.</p></div><a name="perlxml-CHP-9-SECT-2.2" /><div class="sect2"><h3 class="sect2">9.2.2. Using XML::RSS </h3><p>The <tt class="literal">XML::RSS</tt><a name="INDEX-728" /> module isuseful whether you're coming or going. It can parseRSS documents that you hand it, or it can help you write your own RSSdocuments. Naturally, you can combine these abilities to parse adocument, modify it, and then write it out again; the module uses asimple and well-documented object model to represent documents inmemory, just like the tree-based modules we've seenso far. You can think of this sort of XML helper module as atricked-out version of a familiar general XML tool.</p><p>In the following examples, we'll work with anotional web log, a frequently updated and Web-readable personalcolumn or journal. RSS lends itself to web logs, letting them quicklysummarize their most recent entries within a single RSS document.</p><p>Here are a couple of web log entries (admittedly sampling from theshallow end of the concept's notional pool, but itworks for short examples). First, here is how one might look in a webbrowser:</p><blockquote><pre class="code">Oct 18, 2002 19:07:06Today I asked lab monkey 45-X how he felt about his recent chessvictory against Dr. Baker. He responded by biting my kneecap. (Themonkey did, I mean.) Ithink this could lead to a communications breakthrough. As well aspainful swelling, which is unfortunate.Oct 27, 2002 22:56:11On a tangential note, Dr. Xing's research of purple versus green monkeytrans-sociopolitical impact seems to be stalled, having gained noground for several weeks. Today she learned that her lab assistantnever mentioned on his job application that he was colorblind. Oh well.</pre></blockquote><p>Here it is again, as an RSS v1.0 document: </p><blockquote><pre class="code"><?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"><channel rdf:about="http://www.jmac.org/linklog/"><title>Link's Log</title><link>http://www.jmac.org/linklog/</link><description>Dr. Lance Link's online research journal</description><dc:language>en-us</dc:language><dc:rights>Copright 2002 by Dr. Lance Link</dc:rights><dc:date>2002-10-27T23:59:15+05:00</dc:date><dc:publisher>llink@jmac.org</dc:publisher><dc:creator>llink@jmac.org</dc:creator><dc:subject>llink</dc:subject><syn:updatePeriod>daily</syn:updatePeriod><syn:updateFrequency>1</syn:updateFrequency><syn:updateBase>2002-03-03T00:00:00+05:00</syn:updateBase><items> <rdf:Seq> <rdf:li rdf:resource="http://www.jmac.org/linklog?2002-10-27#22:56:11" /> <rdf:li rdf:resource="http://www.jmac.org/linklog?2002-10-18#19:07:06" /> </rdf:Seq></items></channel><item rdf:about="http://www.jmac.org/linklog?2002-10-27#22:56:11"><title>2002-10-27 22:56:11</title><link>http://www.jmac.org/linklog?2002-10-27#22:56:11</link><description>Today I asked lab monkey 45-X how he felt about his recent chessvictory against Dr. Baker. He responded by biting my kneecap. (Themonkey did, I mean.) Ithink this could lead to a communications breakthrough. As well aspainful swelling, which is unfortunate.</description></item><item rdf:about="http://www.jmac.org/linklog?2002-10-18#19:07:06"><title>2002-10-18 19:07:06</title><link>http://www.jmac.org/linklog?2002-10-18#19:07:06</link><description>On a tangential note, Dr. Xing's research of purple versus green monkeytrans-sociopolitical impact seems to be stalled, having gained noground for several weeks. Today she learned that her lab assistantnever mentioned on his job application that he was colorblind. Oh well.</description></item></rdf:RDF></pre></blockquote><p>Note RSS 1.0's use of various metadata-enablingnamespaces before it gets into the meat of laying out the actualcontent.<a href="#FOOTNOTE-30">[30]</a> The curious may wish to pointtheir web browsers at the URIs with which they identify themselves,since they are good little namespaces who put their documentationwhere their mouth is. ("dc" is theDublin Core, a standard set of elements for describing adocument's source."syn" points to a syndicationnamespace -- itself a sub-project by the RSS people -- holdinga handful of elements that state how often a source refreshes itselfwith new content.) Then the whole document is wrapped up in an RDFelement.</p><blockquote class="footnote"> <a name="FOOTNOTE-30" /><p>[30]I am careful to specify the RSS version herebecause RSS Version .9 and 0.91 documents are much simpler instructure, eschewing namespaces and RDF-encapsulated metadata infavor of a simple list of <tt class="literal"><item></tt> elementswrapped in an <tt class="literal"><rss></tt> element. For thisreason, many people prefer to use pre-1.0 RSS, and socially astuteRSS software can read from and write to all these versions.<tt class="literal">XML::RSS</tt> can do this, and as a side effect, allowseasy conversion between these different versions (given a singleoriginal document).</p> </blockquote><a name="perlxml-CHP-9-SECT-2.2.1" /><div class="sect3"><h3 class="sect3">9.2.2.1. Parsing </h3><p>Using <tt class="literal">XML::RSS</tt><a name="INDEX-729" /><a name="INDEX-730" /> to read an existing document oughtto look familiar if you've read the precedingchapters, and is quite simple:</p><blockquote><pre class="code">use XML::RSS;# Accept file from user argumentsmy @rss_docs = @ARGV;# For now, we'll assume they're all files on disk...foreach my $rss_doc (@rss_docs) { # First, create a new RSS object that will represent the parsed doc my $rss = XML::RSS->new; # Now parse that puppy $rss->parsefile($rss_doc); # And that's all. Do whatever else we may want here.}</pre></blockquote></div><a name="perlxml-CHP-9-SECT-2.2.2" /><div class="sect3"><h3 class="sect3">9.2.2.2. Inheriting from XML::Parser </h3><p>If that <tt class="literal">parsefile</tt> method looked familiar, it hadgood reason: it's the same one used by grandpappy<tt class="literal">XML::Parser</tt><a name="INDEX-731" />, both in word and deed.</p><p><tt class="literal">XML::RSS</tt> takes direct advantage of<tt class="literal">XML::Parser</tt>'s inheritabilityright off the bat, placing this module into its<tt class="literal">@ISA</tt> array before getting down to business withall that map definition.</p><p>It shouldn't surprise those familiar withobject-oriented Perl programming that, while it chooses to define itsown <tt class="literal">new</tt> method, it does little more than invoke<tt class="literal">SUPER::new</tt>. In doing so, it lets<tt class="literal">XML::Parser</tt> initialize itself as it sees fit.Let's look at some code from that moduleitself -- specifically its constructor, <tt class="literal">new</tt>,which we invoked in our example:</p><blockquote><pre class="code">sub new { my $class = shift; my $self = $class->SUPER::new(Namespaces => 1, NoExpand => 1, ParseParamEnt => 0, Handlers => { Char => \&handle_char, XMLDecl => \&handle_dec, Start => \&handle_start})
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -