⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch02_05.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
字号:
<html><head><title>Entities (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch02_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch02_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">2.5. Entities</h2><p>For<a name="INDEX-94" /><a name="INDEX-95" /> yourauthoring convenience, XML has another feature called<em class="emphasis">entities</em>. An entity is useful when you need aplaceholder for text or markup that would be inconvenient orimpossible to just type in. It's a piece of XML setaside from your document;<a href="#FOOTNOTE-8">[8]</a> you use an <em class="emphasis">entityreference</em><a name="INDEX-96" /> to stand in for it. An XML processormust resolve all entity references with their replacement text at thetime of parsing. Therefore, every referenced entity must be declaredsomewhere so that the processor knows how to resolve it.</p><blockquote class="footnote"> <a name="FOOTNOTE-8" /><p>[8]Technically, the wholedocument is one entity, called the <em class="emphasis">documententity</em>. However, people usually use the term"entity" to refer to a subset ofthe document. </p> </blockquote><p>The <em class="emphasis">Document TypeDeclaration</em><a name="INDEX-97" /> (DTD) is the place to declare anentity. It has two parts, the <em class="emphasis">internalsubset</em><a name="INDEX-98" /> that is part of your document, and the<em class="emphasis">external subset</em><a name="INDEX-99" /> <a name="INDEX-100" /> that lives in another document.(Often, people talk about the external subset as"the DTD" and call the internalsubset "the internal subset," eventhough both subsets together make up the whole DTD.) In both places,the method for declaring entities is the same. The document in <a href="ch02_05.htm#perlxml-CHP-2-EX-3">Example 2-3</a> shows how this feature works.</p><a name="perlxml-CHP-2-EX-3" /><div class="example"><h4 class="objtitle">Example 2-3. A document with entity declarations </h4><blockquote><pre class="code">&lt;!DOCTYPE memo  SYSTEM "/xml-dtds/memo.dtd"[  &lt;!ENTITY companyname "Willy Wonka's Chocolate Factory"&gt;  &lt;!ENTITY healthplan  SYSTEM "hp.txt"&gt;]&gt;&lt;memo&gt;  &lt;to&gt;All Oompa-loompas&lt;/to&gt;  &lt;para&gt;    &amp;companyname; has a new owner and CEO, Charlie Bucket. Since    our name, &amp;companyname;, has considerable brand recognition,    the board has decided not to change it. However, at Charlie's    request, we will be changing our healthcare provider to the    more comprehensive &amp;Uuml;mpacare, which has better facilities    for 'Loompas (text of the plan to follow). Thank you for working    at &amp;companyname;!  &lt;/para&gt;  &amp;healthplan;&lt;/memo&gt;</pre></blockquote></div><p>Let's examine the new material in this example. Atthe top is the DTD, a special markup instruction that contains a lotof important information, including the internal subset and a path tothe external subset. Like all declarative markup (i.e., it definessomething new), it starts with an exclamation point, and is followedby a keyword, <tt class="literal">DOCTYPE</tt><a name="INDEX-101" />. After that keyword is the name of anelement that will be used to contain the document. We call thatelement the <em class="emphasis">rootelement</em><a name="INDEX-102" /> <a name="INDEX-103" /> or <em class="emphasis">documentelement</em>. This element is followed by a path to theexternal subset, given by <tt class="literal">SYSTEM"/xml-dtds/memo.dtd"</tt>, and the internal subset ofdeclarations, enclosed in <a name="INDEX-104" /> <a name="INDEX-105" />square brackets ([ ]).</p><p>The external subset is used for declarations that will be used inmany documents, so it naturally resides in another file. The internalsubset is best used for declarations that are local to the document.They may override declarations in the external subset or contain newones. As you see in the example, two entities are declared in theinternal subset. An entity declaration has two parameters: the entityname and its replacement text. The entities are named<tt class="literal">companyname</tt> and <tt class="literal">healthplan</tt>.</p><p>These entities are called<a name="INDEX-106" /> <em class="emphasis">general entities</em>and are distinguished from other kinds of entities because they aredeclared by you, the author. Replacement text for general entitiescan come from two different places. The first entity declarationdefines the text within the declaration itself. The second points toanother file where the text resides. It uses a <em class="emphasis">systemidentifier</em><a name="INDEX-107" /> to specify thefile's location, acting much like a URL used by aweb browser to find a page to load. In this case, the file is loadedby an XML processor and inserted verbatim wherever an entity isreferenced. Such an entity is called an <em class="emphasis">externalentity</em>.</p><p>If you look closely at the example, you'll seemarkup instructions of the form<tt class="literal">&amp;</tt><em class="replaceable">name</em>;. The<a name="INDEX-108" /><a name="INDEX-109" />ampersand(<tt class="literal">&amp;</tt>) indicates an entity reference, where<em class="replaceable">name</em> is the name of the entity beingreferenced. The same reference can be used repeatedly, making it aconvenient way to insert repetitive text or markup, as we do with theentity <tt class="literal">companyname</tt>.</p><p>An entity can contain markup as well as text, as is the case with<tt class="literal">healthplan</tt> (actually, we don'tknow what's in that entity becauseit's in another file, but sinceit's going to be a large document, you can assume itwill have markup as well as text). An entity can even contain otherentities, to any nesting level you want. The only restriction is thatentities can't contain themselves, at any level,lest you create a circular definition that can never be constructedby the XML processor. Some XML technologies, such as XSLT, do let youhave fun with recursive logic, but think of entity references as codeconstants -- playing with circular references here will make anyparser very unhappy.</p><p>Finally, the <tt class="literal">&amp;Uuml;</tt> entity reference isdeclared somewhere in the external subset to fill in for a characterthat the chocolate factory's ancient text editorprograms have trouble rendering -- in this case, a capital"U" with an umlaut over it:&Uuml;. Since the referenced entity is one character wide, thereference in this case is almost more of an alias than a pointer. Theusual way to handle unusual characters (the waythat's built into the XML specification) involvesusing a numeric <a name="INDEX-110" /> <a name="INDEX-111" /> <a name="INDEX-112" />character entity, which, inthis case, would be <tt class="literal">&amp;#00DC;</tt>.<tt class="literal">0x00DC</tt> is the hexadecimal equivalent of the number220, which is the position of the U-umlaut character in Unicode (thecharacter set used natively by XML, which we cover in more detail inthe next section).</p><p>However, since an abbreviated descriptive name like<tt class="literal">Uuml</tt> is generally easier to remember than thearcane <tt class="literal">00DC</tt>, some XML users prefer to use thesetypes of aliases by placing lines such as this into their<a name="INDEX-113" /> <a name="INDEX-114" />documents' DTDs:</p><blockquote><pre class="code">&lt;!ENTITY % Uuml &amp;#x00DC;&gt;</pre></blockquote><p>XML recognizes only five built-in, named entity references, shown in<a href="ch02_05.htm#perlxml-CHP-2-TABLE-1">Table 2-1</a>. They're not actuallyreferences, but are escapes for five punctuation marks that havespecial meaning for XML.</p><a name="perlxml-CHP-2-TABLE-1" /><h4 class="objtitle">Table 2-1. XML entity references</h4><table border="1"><tr><th><p>Character</p></th><th><p>Entity</p></th></tr><tr><td><p><tt class="literal">&lt;</tt></p></td><td><p><tt class="literal">&amp;lt;</tt></p></td></tr><tr><td><p><tt class="literal">&gt;</tt></p></td><td><p><tt class="literal">&amp;gt;</tt></p></td></tr><tr><td><p><tt class="literal">&amp;</tt></p></td><td><p><tt class="literal">&amp;amp;</tt></p></td></tr><tr><td><p><tt class="literal">"</tt></p></td><td><p><tt class="literal">&amp;quot;</tt></p></td></tr><tr><td><p><tt class="literal">'</tt></p></td><td><p><tt class="literal">&amp;apos;</tt></p></td></tr></table><p><p>The only two of these references that must be used throughout any XMLdocument are <tt class="literal">&amp;lt</tt> and<tt class="literal">&amp;amp;</tt>. Element tags and entity references canappear at any point in a document. No parser could guess, forexample, whether a <tt class="literal">&lt;</tt> character is used as aless-than math symbol or as a genuine XML token; it will alwaysassume the latter and will report a malformed document if thisassumption proves false.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch02_04.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch02_06.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">2.4. Spacing</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">2.6. Unicode, Character Sets, and Encodings</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -