⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch05_07.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 4 页
字号:
</dd></dl></dd></dl></div><a name="perlxml-CHP-5-SECT-7.2.2" /><div class="sect3"><h3 class="sect3">5.7.2.2. Entity resolver</h3><p>By default, XML parsers resolve external entity references withoutyour program ever knowing they were there. You may want to overridethat behavior occasionally. For example, you may have a special wayof resolving public identifiers, or the entities are entries in adatabase. Whatever the reason, if you implement this handler, theparser will call it before attempting to resolve the entity on itsown.</p><p>The argument to <tt class="literal">resolve_entity()</tt><a name="INDEX-444" /><a name="INDEX-445" /> is a hash with two properties:<tt class="literal">PublicID</tt>, a public identifier for the entity, and<tt class="literal">SystemID</tt>, the system-specific location of theidentity, such as a filesystem path or a URI. If the publicidentifier is <tt class="literal">undef</tt>, then none was given, but asystem identifier will always<a name="INDEX-446" /> be present.</p></div><a name="perlxml-CHP-5-SECT-7.2.3" /><div class="sect3"><h3 class="sect3">5.7.2.3. Lexical event handlers</h3><p>Impl<a name="INDEX-447" /><a name="INDEX-448" />ementation<a name="INDEX-449" /> of this group of events is optional.You probably don't need to see these events, so notall parsers will give them to you. However, a few very complete oneswill. If you want to be able to duplicate the original source XMLdown to the very comments and CDATA sections, then you need a parserthat supports these event handlers.</p><p>They include: </p><ul><li><p><tt class="literal">start_dtd( )</tt><a name="INDEX-450" /> and <tt class="literal">end_dtd()</tt><a name="INDEX-451" />, for marking the boundaries of thedocument type definition</p></li><li><p><tt class="literal">start_entity( )</tt><a name="INDEX-452" /> and <tt class="literal">end_entity()</tt><a name="INDEX-453" />, for delineating the region of aresolved entity reference</p></li><li><p><tt class="literal">start_cdata( )</tt><a name="INDEX-454" /> and<a name="INDEX-455" /> <tt class="literal">end_cdata()</tt>, to describe the range of a CDATA section</p></li><li><p><tt class="literal">comment( )</tt><a name="INDEX-456" />, announcing a lexical comment that wouldotherwise be ignored by parsers</p></li></ul></div><a name="perlxml-CHP-5-SECT-7.2.4" /><div class="sect3"><h3 class="sect3">5.7.2.4. Error event handlers and catching exceptions</h3><p><tt class="literal">XML::SAX</tt> lets you customize your error handlingwith this group of handlers. Each handler takes one argument, calledan exception, that describes the error in detail. The particularhandler called represents the severity of the error, as defined bythe W3C recommendation for parser behavior. There are three types:</p><dl><a name="INDEX-457" /><dt><b><tt class="literal">warning( )</tt></b></dt><dd><p>This is the least serious of the exception handlers. It representsany error that is not bad enough to halt parsing. For example, an IDreference without a matching ID would elicit a warning, but allow theparser to keep grinding on. If you don't implementthis handler, the parser will ignore the exception and keep going.</p></dd><a name="INDEX-458" /><dt><b><tt class="literal">error( )</tt></b></dt><dd><p>This kind of error is considered serious, but recoverable. A validityerror falls in this category. The parser should still trundle on,generating events, unless your application decides to call it quits.In the absence of a handler, the parser usually continues parsing.</p></dd><a name="INDEX-459" /><dt><b><tt class="literal">fatal_error( )</tt></b></dt><dd><p>A fatal error might cause the parser to abort parsing. The parser isunder no obligation to continue, but might just to collect more errormessages. The exception could be a syntax error that makes thedocument into non-well-formed XML, or it might be an entity thatcan't be resolved. In any case, this example showsthe highest level of error reporting provided in<tt class="literal">XML::SAX</tt>.</p></dd></dl><p>According to the XML specification, conformant parsers are supposedto halt when they encounter any kind of well-formedness or validityerror. In Perl SAX, halting results in a call to <tt class="literal">die()</tt>. That's not the end of story, however.Even after the parse session has died, you can raise it from thegrave to continue where it left off, using the<tt class="literal">eval{}</tt> construct, like this:</p><blockquote><pre class="code">eval{ $parser-&gt;parse( $uri ) };if( $@ ) {  # yikes! handle error here...}</pre></blockquote><p>The <tt class="literal">$@</tt> variable is a blessed hash of propertiesthat piece together the story about why parsing failed.</p><p>These properties include: </p><dl><dt><b><tt class="literal">Message</tt></b></dt><dd><p>A text description about what happened </p></dd><dt><b><tt class="literal">ColumnNumber</tt></b></dt><dd><p>The number of characters into the line where the error occurred, ifthis error is a parse error</p></dd><dt><b><tt class="literal">LineNumber</tt></b></dt><dd><p>Which line the error happened on, if the exception was thrown whileparsing</p></dd><dt><b><tt class="literal">PublicID</tt></b></dt><dd><p>A public identifier for the entity in which the error occurred, ifthis error is a parse error</p></dd><dt><b><tt class="literal">SystemID</tt></b></dt><dd><p>A system identifier pointing to the offending entity, if a parseerror occurred</p></dd></dl><p>Not all thrown exceptions indicate that a failure to parse occurred.Sometimes the parser throws an exception<a name="INDEX-460" /> because<a name="INDEX-461" /> of a badfeature setting.</p></div></div><a name="perlxml-CHP-5-SECT-7.3" /><div class="sect2"><h3 class="sect2">5.7.3. SAX2 Parser Interface</h3><p>After<a name="INDEX-462" /> <a name="INDEX-463" />you've written a handler package, you need to createan instance of the parser, set its features, and run it on the XMLsource. This section discusses the standard interface for<tt class="literal">XML::SAX</tt> parsers.</p><p>The<a name="INDEX-464" /><tt class="literal">parse( )</tt> method, which gets the parsing processrolling, takes a hash of options as an argument. Here you can assignhandlers, set features, and define the data source to be parsed. Forexample, the following line sets both the handler package and thesource document to parse:</p><blockquote><pre class="code">$parser-&gt;parse( Handler =&gt; $handler,                  Source =&gt; { SystemId =&gt; "data.xml" });</pre></blockquote><p>The <tt class="literal">Handler</tt> property sets a generic set ofhandlers that will be used by default. However, each class ofhandlers has its own assignment slot that will be checked before<tt class="literal">Handler</tt>. These settings include:<tt class="literal">ContentHandler</tt>, <tt class="literal">DTDHandler</tt>,<tt class="literal">EntityResolver</tt>, and<tt class="literal">ErrorHandler</tt>. All of these settings are optional.If you don't assign a handler, the parser willsilently ignore events and handle errors in its own way.</p><p>The <tt class="literal">Source</tt><a name="INDEX-465" /> parameter is a hash usedby a parser to hold all the information about the XML being input. Ithas the following properties:</p><dl><dt><b><tt class="literal">CharacterStream</tt></b></dt><dd><p>This kind of filehandle works in Perl Version 5.7.2 and higher usingPerlIO. No encoding translation should be necessary. Use the<tt class="literal">read( )</tt> function to get a number of charactersfrom it, or use <tt class="literal">sysread( )</tt> to get a number ofbytes. If the <tt class="literal">CharacterStream</tt> property is set, theparser ignores <tt class="literal">ByteStream</tt> or<tt class="literal">SystemId</tt>.</p></dd><dt><b><tt class="literal">ByteStream</tt></b></dt><dd><p>This property sets a byte stream to be read. If<tt class="literal">CharacterStream</tt> is set, this property is ignored.However, it supersedes <tt class="literal">SystemId</tt>. The<tt class="literal">Encoding</tt> property should be set along with thisproperty.</p></dd><dt><b><tt class="literal">PublicId</tt></b></dt><dd><p>This property is optional, but if the application submits a publicidentifier, it is stored here.</p></dd><dt><b><tt class="literal">SystemId</tt></b></dt><dd><p>This string represents a system-specific location for a document,such as a URI or filesystem path. Even if the source is a characterstream or byte stream, this parameter is still useful because it canbe used as an offset for external entity references.</p></dd><dt><b><tt class="literal">Encoding</tt></b></dt><dd><p>The character encoding, if known, is stored here. </p></dd></dl><p>Any other options you want to set are in the set of<em class="emphasis">features</em> defined for SAX2. For example, youcan tell a parser that you are interested in special treatment fornamespaces. One way to set features is by defining the<tt class="literal">Features</tt> property in the options hash given to the<tt class="literal">parse( )</tt> method. Another way is with the method<tt class="literal">set_feature( )</tt>. For example,here's how you would turn on validation in avalidating parser using both methods:</p><blockquote><pre class="code">$parser-&gt;parse( Features =&gt; { 'http://xml.org/sax/properties/validate' =&gt; 1 } );$parser-&gt;set_feature( 'http://xml.org/sax/properties/validate', 1 );</pre></blockquote><p>For a complete list of features defined for SAX2, see thedocumentation at <a href="http://sax.sourceforge.net/apidoc/org/xml/sax/package-summary.html">http://sax.sourceforge.net/apidoc/org/xml/sax/package-summary.html</a>.You can also define your own features if your parser has specialabilities others don't. To see what features yourparser supports, <tt class="literal">get_features( )</tt> returns a listand <tt class="literal">get_feature( )</tt> with a<tt class="literal">name</tt> parameter reports the setting of a specificfeature.</p></div><a name="perlxml-CHP-5-SECT-7.4" /><div class="sect2"><h3 class="sect2">5.7.4. Example: A Driver</h3><p>Making your own SAX parser is simple, as most of the work is handledby a base class, <tt class="literal">XML::SAX::Base</tt>. All you have todo is create a subclass of this object and override anything thatisn't taken care of by default. Not only is itconvenient to do this, but it will result in code that is much saferand more reliable than if you tried to create it from scratch. Forexample, checking if the handler package implements the handler youwant to call is done for you automatically.</p><p>The next example proves just how easy it is to create a parser thatworks with <tt class="literal">XML::SAX</tt>. It's adriver, similar to the kind we saw in <a href="ch05_04.htm#perlxml-CHP-5-SECT-4">Section 5.4, "Drivers for Non-XML Sources"</a>, except that instead of

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -