⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch05_07.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 4 页
字号:
<html><head><title>XML::SAX: The Second Generation (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch06_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">5.7. XML::SAX: The Second Generation</h2><p>The<a name="INDEX-414" /> proliferation of <a name="INDEX-415" />SAX parsers presents two problems: how tokeep them all synchronized with the standard API and how to keep themorganized on your system. <tt class="literal">XML::SAX</tt>, a marvelousteam effort by Matt <a name="INDEX-416" />Sergeant, Kip <a name="INDEX-417" />Hampton, and Robin<a name="INDEX-418" />Berjon, solvesboth problems at once. As a bonus, it also includes support for SAXLevel 2 that previous modules lacked.</p><p>"What," you ask,"do you mean about keeping all the modulessynchronized with the API?" All along,we've touted the wonders of using a standard likeSAX to ensure that modules are really interchangeable. Buthere's the rub: in Perl, there'smore than one way to implement SAX. SAX was originally designed forJava, which has a wonderful interface type of class that nails downthings like what type of argument to pass to which method.There's nothing like that in Perl.</p><p>This wasn't as much of a problem with the older SAXmodules we've been talking about so far. They allsupport SAX Level 1, which is fairly simple. However, a new crop ofmodules that support SAX2 is breaking the surface. SAX2 is morecomplex because it introduces namespaces to the mix. An element eventhandler should receive both the namespace prefix and the local nameof the element. How should this information be passed in parameters?Do you keep them together in the same string like<tt class="literal">foo:bar</tt>? Or do you separate them into twoparameters?</p><p>This debate created a lot of heat on the<em class="emphasis">perl-xml</em> mailing list until a few membersdecided to hammer out a specification for"Perlish" SAX(we'll see in a moment how to use this new API forSAX2). To encourage others to adhere to this convention,<tt class="literal">XML::SAX</tt> includes a class called<tt class="literal">XML::SAX::ParserFactory</tt><a name="INDEX-419" />. A<em class="emphasis">factory</em><a name="INDEX-420" /> is an object whose sole purpose is togenerate objects of a specific type -- in this case, parsers.<tt class="literal">XML::SAX::ParserFactory</tt> is a useful way to handlehousekeeping chores related to the parsers, such as registering theiroptions and initialization requirements. Tell the factory what kindof parser you want and it doles out a copy to you.</p><p><tt class="literal">XML::SAX</tt> represents a shift in the way XML andPerl work together. It builds on the work of the past, including allthe best features of previous modules, while avoiding many of themistakes. To ensure that modules are truly compatible, the kitprovides a base class for parsers, abstracting out most of themundane work that all parsers have to do, leaving the developer thetask of doing only what is unique to the task. It also creates anabstract interface for users of parsers, allowing them to keep theplethora of modules organized with a registry that is indexed byproperties to make it easy to find the right one with a simple query.It's a bold step and carries a lot of heft, so beprepared for a lot of information and detail in this section. Wethink it will be worth your while.</p><a name="perlxml-CHP-5-SECT-7.1" /><div class="sect2"><h3 class="sect2">5.7.1. XML::SAX::ParserFactory</h3><p>We start with the parser selection interface,<tt class="literal">XML::SAX::ParserFactory</tt>. For those of you who haveused DBI, this class is very similar. It's a frontend to all the SAX parsers on your system. You simply request a newparser from the factory and it will dig one up for you.Let's say you want to use any SAX parser with yourhandler package <tt class="literal">XML::SAX::MyHandler</tt>.</p><p>Here's how to fetch the parser and use it to read afile:</p><blockquote><pre class="code">use XML::SAX::ParserFactory;use XML::SAX::MyHandler;my $handler = new XML::SAX::MyHandler;my $parser = XML::SAX::ParserFactory-&gt;parser( Handler =&gt; $handler );$parser-&gt;parse_uri( "foo.xml" );</pre></blockquote><p>The parser you get depends on the order in whichyou've installed the modules. The last one (with allthe available features specified with<tt class="literal">RequiredFeatures</tt>, if any) will be returned bydefault. But maybe you don't want that one. Noproblem; <tt class="literal">XML::SAX</tt> maintains a registry of SAXparsers that you can choose from. Every time you install a new SAXparser, it registers itself so you can call upon it with<tt class="literal">ParserFactory</tt>. If you know you have the<tt class="literal">XML::SAX::BobsParser</tt> parser installed, you canrequire an instance of it by setting the variable<tt class="literal">$XML::SAX::ParserPackage</tt> as follows:</p><blockquote><pre class="code">use XML::SAX::ParserFactory;use XML::SAX::MyHandler;my $handler = new XML::SAX::MyHandler;$XML::SAX::ParserPackage = "XML::SAX::BobsParser( 1.24 )";my $parser = XML::SAX::ParserFactory-&gt;parser( Handler =&gt; $handler );</pre></blockquote><p>Setting <tt class="literal">$XML::SAX:ParserPackage</tt> to<tt class="literal">XML::SAX::BobsParser(</tt> <tt class="literal">1.24</tt><tt class="literal">)</tt> returns an instance of the package. Internally,<tt class="literal">ParserFactory</tt> is <tt class="literal">require( )</tt>-ingthat parser and calling its <tt class="literal">new()</tt><a name="INDEX-421" /> class method. The<tt class="literal">1.24</tt> in the variable setting specifies a minimumversion number for the parser. If that version isn'ton your system, an exception will be thrown.</p><p>To see a list of all the parsers available to<tt class="literal">XML::SAX</tt>, call the <tt class="literal">parsers( )</tt>method:</p><blockquote><pre class="code">use XML::SAX;my @parsers = @{XML::SAX-&gt;parsers( )};foreach my $p ( @parsers ) {    print "\n", $p-&gt;{ Name }, "\n";    foreach my $f ( sort keys %{$p-&gt;{ Features }} ) {        print "$f =&gt; ", $p-&gt;{ Features }-&gt;{ $f }, "\n";    }}</pre></blockquote><p>It returns a reference to a list of hashes, with each hash containinginformation about a parser, including the name and a hash offeatures. When we ran the program above we were told that<tt class="literal">XML::SAX</tt> had two registered parsers, eachsupporting namespaces:</p><blockquote><pre class="code">XML::LibXML::SAX::Parserhttp://xml.org/sax/features/namespaces =&gt; 1XML::SAX::PurePerlhttp://xml.org/sax/features/namespaces =&gt; 1</pre></blockquote><p>At the time this book was written, these parsers were the only twoparsers included with <tt class="literal">XML::SAX</tt>.<tt class="literal">XML::LibXML::SAX::Parser</tt> is a SAX API for the<em class="emphasis">libxml2</em><a name="INDEX-422" />library we use in <a href="ch06_01.htm">Chapter 6, "Tree Processing"</a>. To use it,you'll need to have <em class="emphasis">libxml2</em>, acompiled, dynamically linked library written in C, installed on yoursystem. It's fast, but unless you can find a binaryor compile it yourself, it isn't very portable.<tt class="literal">XML::SAX::PurePerl</tt> is, as the name suggests, aparser written completely in Perl. As such, it'scompletely portable because you can run it wherever Perl isinstalled. This starter set of parsers already gives you somedifferent options.</p><p>The feature list associated with each parser is important because itallows a user to select a parser based on a set of criteria. Forexample, suppose you wanted a parser that did validation andsupported namespaces. You could request one by calling thefactory's <tt class="literal">require_feature( )</tt>method:</p><blockquote><pre class="code">my $factory = new XML::SAX::ParserFactory;$factory-&gt;require_feature( 'http://xml.org/sax/features/validation' );$factory-&gt;require_feature( 'http://xml.org/sax/features/namespaces' );my $parser = $factory-&gt;parser( Handler =&gt; $handler );</pre></blockquote><p>Alternatively, you can pass such information to the factory in itsconstructor method:</p><blockquote><pre class="code">my $factory = new XML::SAX::ParserFactory(             Required_features =&gt; {                    'http://xml.org/sax/features/validation' =&gt; 1                    'http://xml.org/sax/features/namespaces' =&gt; 1             });my $parser = $factory-&gt;parser( Handler =&gt; $handler );</pre></blockquote><p>If multiple parsers pass the test, the most recently installed one isused. However, if the factory can't find a parser tofit your requirements, it simply throws an exception.</p><p>To add more SAX modules to the registry, you only need to downloadand install them. Their installer packages should know about<tt class="literal">XML::SAX</tt> and automatically register the moduleswith it. To add a module of your own, you can use<tt class="literal">XML::SAX</tt>'s <tt class="literal">add_parser()</tt><a name="INDEX-423" /> with a list of module names. Make sureit follows the conventions of SAX modules by subclassing<tt class="literal">XML::SAX::Base</tt>. Later, we'll showyou how to write a parser, install it, and add<a name="INDEX-424" /> it to theregistry.</p></div><a name="perlxml-CHP-5-SECT-7.2" /><div class="sect2"><h3 class="sect2">5.7.2. SAX2 Handler Interface</h3><p>Once<a name="INDEX-425" /> you'veselected<a name="INDEX-426" /> aparser, the next step is to code up a handler package to catch theparser's event stream, much like the SAX moduleswe've seen so far. <tt class="literal">XML::SAX</tt>specifies events and their properties in exquisite detail and inlarge numbers. This specification gives your handler considerablecontrol while ensuring absolute conformance to the API.</p><p>The types of supported event handlers fall into several groups. Theones we are most familiar with include the <em class="emphasis">contenthandlers</em><a name="INDEX-427" />, including those for elements and generaldocument information, <em class="emphasis">entity resolvers</em>, and<em class="emphasis">lexical handlers</em><a name="INDEX-428" /> that handle CDATA sections andcomments.<a name="INDEX-429" /><em class="emphasis">DTD handlers</em>and<a name="INDEX-430" /> <em class="emphasis">declaration handlers</em>take care of everything outside of the document element, includingelement and entity declarations. <tt class="literal">XML::SAX</tt> adds anew group, the <em class="emphasis">errorhandlers</em><a name="INDEX-431" />, to catch and process any exceptions thatmay occur during parsing.</p><p>One important new facet to this class of parsers is that theyrecognize namespaces. This recognition is one of the innovations ofSAX2. Previously, SAX parsers treated a qualified name as a singleunit: a combined namespace prefix and local name. Now you can teaseout the namespaces, see where their scope begins and ends, and domore than you could before.</p><a name="perlxml-CHP-5-SECT-7.2.1" /><div class="sect3"><h3 class="sect3">5.7.2.1. Content event handlers</h3><p>Focusing on the content of the document, these handlers are the mostlikely ones to be implemented in a SAX handling program. Note theuseful addition of a document locator reference, which gives thehandler a special window into the machinations of the parser. Thesupport for namespaces is also new.</p><dl><a name="INDEX-432" /><dt><b><tt class="literal">set_document_locator(</tt> <em class="replaceable">locator</em> <tt class="literal">)</tt></b></dt><dd><p>Called at the beginning of parsing, a parser uses this method to tellthe handler where the events are coming from. The<em class="replaceable">locator</em> parameter is a reference to a hashcontaining these properties:</p><dl><dt><b><tt class="literal">PublicID</tt></b></dt><dd><p>The public identifier of the current entity being parsed. </p></dd><dt><b><tt class="literal">SystemID</tt></b></dt><dd><p>The system identifier of the current entity being parsed. </p></dd><dt><b><tt class="literal">LineNumber</tt></b></dt><dd><p>The line number of the current entity being parsed. </p></dd><dt><b><tt class="literal">ColumnNumber</tt></b></dt><dd><p>The last position in the line currently being parsed. </p></dd></dl><p>The hash is continuously updated with the latest information. If yourhandler doesn't like the informationit's being fed and decides to abort, it can checkthe locator to construct a meaningful message to the user about wherein the source document an error was found. A SAX parserisn't required to give a locator, though it is

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -