📄 ch05_03.htm
字号:
<html><head><title>External Entity Resolution (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch05_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">5.3. External Entity Resolution</h2><p>By<a name="INDEX-398" /> <a name="INDEX-399" /><a name="INDEX-400" /> default, the parser substitutesall entity references with their actual values for you. Usuallythat's what you want it to do, but sometimes, as inthe case with our filter example, you'd rather keepthe entity references in place. As we saw, keeping the entityreferences is pretty easy to do; just include an<tt class="literal">entity_reference( )</tt> handler method to overridethat behavior by outputting the references again. What wehaven't seen yet is how to override the defaulthandling of external entity references. Again, the parser wants toreplace the references with their values by locating the files andinserting their contents into the stream. Would you ever want tochange that behavior, and if so, how would you do it?</p><p>Storing documents in multiple files is convenient, especially forreally large documents. For example, suppose you have a big book towrite in XML and you want to store each chapter in its own file. Youcan do so easily with external entities. Here's anexample:</p><blockquote><pre class="code"><?xml version="1.0"?><doctype book [ <!ENTITY intro-chapter SYSTEM "chapters/intro.xml"> <!ENTITY pasta-chapter SYSTEM "chapters/pasta.xml"> <!ENTITY stirfry-chapter SYSTEM "chapters/stirfry.xml"> <!ENTITY soups-chapter SYSTEM "chapters/soups.xml"> ]><book> <title>The Bonehead Cookbook</title> &intro-chapter; &pasta-chapter; &stirfry-chapter; &soups-chapter;</book></pre></blockquote><p>The previous filter example would resolve the external entityreferences for you diligently and output the entire book in onepiece. Your file separation scheme would be lost andyou'd have to edit the resulting file to break itback into multiple files. Fortunately, we can override the resolutionof external entity references using a handler called<tt class="literal">resolve_entity( )</tt>.</p><p>This handler has four properties: <tt class="literal">Name</tt>, theentity's name; <tt class="literal">SystemId</tt> and<tt class="literal">PublicId</tt>, identifiers that help you locate thefile containing the entity's text; and<tt class="literal">Base</tt>, which helps resolve relative URLs, if anyexist. Unlike the other handlers, this one should return a value totell the parser what to do. Returning <tt class="literal">undef</tt> tellsthe parser to load the external entity as it normally would.Otherwise, you need to return a hash describing an alternative sourcefrom which the entity should be loaded. The hash is the same type youwould use to give to the object's <tt class="literal">parse()</tt> method, with keys like <tt class="literal">SystemId</tt> togive it a filename or URL, or <tt class="literal">String</tt> to give it astring of text. For example:</p><blockquote><pre class="code">sub resolve_entity { my( $self, $props ) = @_; if( exists( $props->{ SystemId }) and open( ENT, $props->{ SystemId })) { my $entval = '<?start-file ' . $props->{ SystemId } . '?>'; while( <ENT> ) { $entval .= $_; } close ENT; $entval .= '<?end-file ' . $props->{ SystemId } . '?>'; return { String => $entval }; } else { return undef; }}</pre></blockquote><p>This routine opens the entity resource, if it's in afile it can find, and gives it to the parser as a string. First, itattaches a processing instruction before and after the entity text,marking the boundary of the file. Later, you can write a routine tolook for the PIs and separate the files back out again.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch05_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">5.2. DTD Handlers</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">5.4. Drivers for Non-XML Sources</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -