📄 0117-0120.html
字号:
<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>Presenting XML:Physical Structures in XML Documents:EarthWeb Inc.-</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) { var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!-- <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15" HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE=" Book Search "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1575213346 //--><!-- TITLE=Presenting XML//--><!-- AUTHOR=Richard Light//--><!-- PUBLISHER=Macmillan Computer Publishing//--><!-- IMPRINT=Sams//--><!-- CHAPTER=07 //--><!-- PAGES=0109-0122 //--><!-- UNASSIGNED1 //--><!-- UNASSIGNED2 //--><P><CENTER><A HREF="0113-0116.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0121-0122.html">Next</A></CENTER></P><A NAME="PAGENUM-117"><P>Page 117</P></A><P>minimum, any Web-aware XML application will have an entity managerthat is able to find the following:</P><UL><LI> Entities contained within your XML documents<LI> Any resource that can be addressed by a URL</UL><H3><A NAME="ch07_ 14">Synchronicity of Logical and Physical Structures</A></H3><P>I mentioned in Chapter 5 that the logical and physical structures in anXML document need to nest neatly within each other. Now that I havedescribed those logical structures in Chapter 6, I can be more precise about the rules.</P><P>The first text entity you encounter when reading an XML document iscalled the document entity. This is the starting point for an XML processor, andit acts as the root of the tree of entities specified within the XML document.</P><H4><A NAME="ch07_ 15">What's Allowed in a Logical Structure?</A></H4><P>First, look at this question from the point of view of logical structures. Alltags and elements must be completely inside a single entity. In particular, thedocument element must start and end in the same file; you can't switch fileshalfway through a document. However, you can have references to other entitiesinside an element. (This is another type of nesting—more suitcases.) Thefollowing is a document entity that describes a complete (three-chapter) book.</P><!-- CODE //--><PRE><?XML version="1.0"?><!DOCTYPE doc SYSTEM "mydoc.dtd" [<!ENTITY chapter1 SYSTEM "chap1.xml"><!ENTITY chapter2 SYSTEM "chap2.xml"><!ENTITY chapter3 SYSTEM "chap3.xml">]><doc><body>&chapter1;&chapter2;&chapter3;</body></doc></PRE><!-- END CODE //--><P>Here, the three chapters each can be found within its own entities: thefiles chap1.xml, chap2.xml, and chap3.xml. Note how the document element<doc> (and its subelement <body>) are both completely declared in thedocument entity, and the embedded entities nest neatly inside the<body> element.</P><P>The other logical structures I have mentioned (comments and processinginstructions), together with two I will soon introduce (character referencesand entity references), must also be contained entirely within a singleentity.</P><A NAME="PAGENUM-118"><P>Page 118</P></A><H4><A NAME="ch07_ 16">What's Allowed in a Physical Structure?</A></H4><P>Looking at it from the physical structures' point of view, a text entitymust contain a whole number of logical structures, possibly together with straycharacter data that is not inside any element in the entity. (This includes thevery common case in which the whole entity contains nothing but characterdata.) So the following three examples are valid text entities:</P><!-- CODE //--><PRE><chapter><head>1. Beginnings</head><p> .... </p></chapter>Museum <emph>Documentation</emph> AssociationMuseum Documentation Association</PRE><!-- END CODE //--><P>The following two examples are not valid text entities, because theycontain elements that are incomplete:</P><!-- CODE SNIP //--><PRE><chapter><head>1. Beginnings</head>Museum <emph>Documentation Association</PRE><!-- END CODE SNIP //--><H3><A NAME="ch07_ 17">Predefined Entities</A></H3><P>XML provides the following predefined entities:</P><UL><LI> amp: Ampersand (&)<LI> lt: Less than, or opening angle bracket (<)<LI> gt: Greater than, or closing angle bracket (>)<LI> apos: Apostrophe, or single quote (`)<LI> quot: Quotation mark (")</UL><P>An XML document that uses only these entities can still be well-formed.(However, to be valid, it needs to declare these default entities if they are used.Also, it must give them the same single-character values that they would haveby default.)</P><H3><A NAME="ch07_ 18">Character References</A></H3><P>A character reference is a code for a specific character in the ISO10646/Unicode character set. It is useful to be able to do this when you cannot enter thecharacter from your keyboard. The reference is to the character's characternumber in ISO 10646/Unicode. It can be expressed either as an ordinarynumber or in hexadecimal.</P><A NAME="PAGENUM-119"><P>Page 119</P></A><P>The syntax of character references is &# for decimal or&#x for hexadecimal, followed by the character number, and terminated by a semicolon(;). For example, the character reference for the copyright symbol(") is &#169; or &#xA9;.</P><H3><A NAME="ch07_ 19">Character Encoding in XML Text Entities</A></H3><P>XML uses the ISO 10646 character encoding scheme (also calledUnicode). This provides a range of encoding schemes, starting with 8-bit charactersand moving on to 16-bit or even 32-bit representations of each character. AllXML processors are expected to support the UTF-8 and UCS-2 schemes.</P><P>Obviously, there is a trade-off between the size of each character and thenumber of possible characters you can represent. You won't benefit from using16-bit character encoding (which gives more than 65,000 possible characters)if you never require anything except standard 8-bit ASCII characters,because your documents all will be twice the size they would be otherwise.</P><P>Every XML text entity must declare and then stick to a single encodingscheme. However, each text entity in an XML document can use a differentencoding for the characters it contains. This means that you can declare separateentities to hold sections of an XML document that contain, for example, Cyrillicor Arabic characters, and assign the 16-bit UCS-2 encoding to thesesections. Meanwhile the rest of the document can use the more efficient 8-bit encoding.</P><P>By default, the ISO 10646 UTF-8 encoding is assumed. By default, thefirst part of this encoding matches standard ASCII, so you do not need to takeany special action on character encoding unless you have some unusualcharacters to encode.</P><P>You should also bear in mind that switching to a different encoding is notthe only way to represent characters that do not fall into the UTF-8 characterset. You can reference any character by quoting its ISO 10646 characternumber in a character reference, or you can declare and then reference an entitythat represents the character in question. These techniques are probably best ifyou have only a sprinkling of non-UTF-8 characters to mark up.</P><A NAME="PAGENUM-120"><P>Page 120</P></A><P>If an XML text entity is encoded in UCS-2, it must start with anappropriate encoding signature, the Byte Order Mark, which is the character withhexadecimal value FEFF. This is not considered to be part of the markup orcharacter data of the XML document.</P><P>XML provides a more generalized method of signaling encodingschemes—the encoding declaration. This takes the form of a processing instruction(PI). It is part of the XML declaration for the document entity and is a specialencoding PI for any other entity. Every XML text entity that is not in UTF-8or UCS-2 must begin with the following declaration:</P><!-- CODE SNIP //--><PRE><?XML encoding="[encodingDesc]" ?></PRE><!-- END CODE SNIP //--><P>In the declaration, [encodingDesc] is a Name consisting of only Latinalphabetic characters (A to Z and a to z), digits, full stops, hyphens, andunderscores. The following values are recognized:</P><UL><LI> UTF-8<LI> UTF-16<LI> ISO-10646-UCS-2<LI> ISO-10646-UCS-4<LI> ISO-8859-1 to -9<LI> ISO-2022-JP<LI> Shift_JIS<LI> EUC_JP</UL><P>Here is an example:</P><!-- CODE SNIP //--><PRE><?XML ENCODING="UTF-8"><?XML ENCODING="ISO-10646-UCS-4"></PRE><!-- END CODE SNIP //--><P>The idea behind limiting the encoding description to Latin characters is tomake it possible for XML processors to read the encoding descriptionunambiguously.</P><P>Having recognized the encoding of an XML text entity, an XMLprocessor might not be able to process that encoding. If this is the case, it has theoption of treating the entity as a binary entity instead, or of abandoning theattempt to read the entity.</P><P><CENTER><A HREF="0113-0116.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0121-0122.html">Next</A></CENTER></P></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -