📄 0219-0222.html
字号:
<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>Presenting XML:Morphing Existing HTML into XML:EarthWeb Inc.-</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) { var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!-- <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15" HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE=" Book Search "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1575213346 //--><!-- TITLE=Presenting XML//--><!-- AUTHOR=Richard Light//--><!-- PUBLISHER=Macmillan Computer Publishing//--><!-- IMPRINT=Sams//--><!-- CHAPTER=12 //--><!-- PAGES=0213-0234 //--><!-- UNASSIGNED1 //--><!-- UNASSIGNED2 //--><P><CENTER><A HREF="0213-0218.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0223-0226.html">Next</A></CENTER></P><A NAME="PAGENUM-219"><P>Page 219</P></A><P>Listing 12.3. The sample HTML page as a well-formed XML document.</P><!-- CODE //--><PRE><?XML version="1.0"?><html><head><title>Morphing existing HTML into XML</title></head><body><hr><a href="l12_1.htm"><IMG ALIGN="MIDDLE" SRC="home.gif"alt="[home page]"/></a><a href="l12html1.htm"><IMG ALIGN="MIDDLE" SRC="html.gif"alt="[HTML]"/></a><a href="l12xml1.htm"><IMG ALIGN="MIDDLE" SRC="xml.gif"alt="[XML]"/></a><hr><h1>Morphing existing HTML into XML</h1><P>We will start our XML practicals by taking a fairly typicalWeb page, written in HTML, and converting it to XML. The reasonfor doing this is <i>not</i> to suggest that all HTMLpages will require this treatment! Rather, it's a good way toexplore the differences in approach between HTML and XML.</p><p><b><tt>SGML short-cuts</tt></b> are probably to blame for muchof the incorrect HTML that we see [<a href="l12note1.htm">Note 1</a>]</p><p>We will:<ul compact="compact"><li>make our web page well-formed</li><li>update the HTML DTD for XML and make our page valid</li><li>use XML features to enhance our page</li></ul></p><p><tt>Page last updated July 8th 1997 by Richard Light</tt></p></body></html></PRE><!-- END CODE //--><H3><A NAME="ch12_ 8">Toward Validity: Adding a DOCTYPE Declaration</A></H3><P>As you see in the section "Well-Formed and Valid Documents" in Chapter5, "The XML Approach," converting an HTML page into a well-formedXML document only achieves the first level of XML conformance. If you want togo further and make the sample page into a valid XML document, you needto declare the DTD to which the sample page conforms. In itself, that isn'tdifficult to do; you simply add a DOCTYPE declaration to the start of thedocument (but after the XML declaration):</P><!-- CODE SNIP //--><PRE><!DOCTYPE HTML SYSTEM "HTML2_X.DTD"></PRE><!-- END CODE SNIP //--><A NAME="PAGENUM-220"><P>Page 220</P></A><P>This states that the document type is HTML, and the DTD is held in thefile "HTML2_X.DTD". Let's note a couple of differences here.</P><P>First, the (external) DTD is declared using aSYSTEM name. You might be accustomed to seeing the HTML DTD declared asPUBLIC, as in this example:</P><!-- CODE SNIP //--><PRE><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 2.0//EN"></PRE><!-- END CODE SNIP //--><P>It is pretty standard SGML practice to use aPUBLIC identifier to refer to the DTD that a document uses.</P><P>In SGML, a PUBLIC identifier (to be precise, a formalPUBLIC identifier) is carefully constructed. It tells you who is responsible (in this case,W3C). It tells you it's a DTD and it assigns a unique name to that DTD. The one thing aPUBLIC identifier doesn't do is tell you where to find the DTDitself!</P><P>XML takes a more pragmatic view. First, it doesn't attempt to control orinterpret the value of the PUBLIC identifier. Second, it insists that you saywhere the DTD can be located by providing a SYSTEM identifier as well as aPUBLIC identifier. In the example, the SYSTEM name is a URL representing alocal filename, but it could equally have been the URL of an "official" HTMLDTD stored on W3C's Web site.</P><TABLE BGCOLOR="#FFFF99"><TR><TD>Note:</TD></TR><TR><TD><BLOCKQUOTE>You can provide a PUBLIC name in XML only if it is followed bya SYSTEM name:<BR><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 2.0 FOR XML//EN""HTML2_X.DTD"><BR></BLOCKQUOTE></TD></TR></TABLE><P>The second difference is in the filename:"HTML2_X.DTD". This is different from the filenames normally assigned to the HTML 2.0 DTD, and for a goodreason. It refers to a different DTD. Before you can make your sample pageinto a valid XML document, you need an XML-valid DTD to check it against.The existing HTML DTDs will not work with XML, for several reasons that Icover in the next section.</P><H3><A NAME="ch12_ 9">Creating an XML-Valid HTML DTD</A></H3><P>This section reviews the changes that need to be made to the HTML 2.0DTD to make it XML-compatible. If your interest is solely at the level ofdocument authoring, you might like to skip this section.</P><A NAME="PAGENUM-221"><P>Page 221</P></A><TABLE BGCOLOR="#FFFF99"><TR><TD>Note:</TD></TR><TR><TD><BLOCKQUOTE>The updated HTML 2.0 DTD for XML is on the Web site thataccompanies this book. It represents my own attempt to upgrade the DTDproduced by W3C, and it has no official status. Hopefully, the W3C consortiumwill look at the issue of supporting HTML as an XML application, at whichpoint W3C might come up with an official set of DTDs.</BLOCKQUOTE></TD></TR></TABLE><H4><A NAME="ch12_ 10">Omitted Tag Minimization Rules</A></H4><P>In SGML you can omit start-tags and end-tags from your document andallow SGML-aware software to infer their presence, but not just anywhere.The HTML DTD contains omitted tag minimization parameters that statewhich tags can be omitted without causing an error. These consist of a pair ofcharacters, one for the start-tag and one for the end-tag, which are either- to indicate that the tag must be present or O (capital O, not zero) to indicate thatit can be omitted.</P><P>Anyway, this is all rather academic for XML; tag omission is outlawed. Soall these omitted tag minimization rules must come out of the DTD.Otherwise, the DTD itself will give rise to parsing errors. For example, the elementdeclaration</P><!-- CODE SNIP //--><PRE><!ELEMENT HTML O O (%html.content;)></PRE><!-- END CODE SNIP //--><P>becomes</P><!-- CODE SNIP //--><PRE><!ELEMENT HTML (%html.content;)></PRE><!-- END CODE SNIP //--><H4><A NAME="ch12_ 11">Grouped Element and Attribute List Declarations</A></H4><P>In SGML, it is allowable to provide a single declaration for a whole set ofelement types or attribute lists. With parameter entities, this can lead to avery elegant declaration, such as the following:</P><!-- CODE SNIP //--><PRE><!ELEMENT (%font;|%phrase;) - - (%text;)*></PRE><!-- END CODE SNIP //--><P>After %font; and %phrase; have been expanded, this declaration deals withno fewer than 10 different elements.</P><P>In XML, this isn't allowed. Every element type and attribute list must haveits own separate declaration:</P><A NAME="PAGENUM-222"><P>Page 222</P></A><!-- CODE SNIP //--><PRE><!ELEMENT TT (%text;)*><!ELEMENT CODE (%text;)*><!ELEMENT SAMP (%text;)*><!ELEMENT KBD (%text;)*></PRE><!-- END CODE SNIP //--><H4><A NAME="ch12_ 12">Inclusion and Exclusion Exceptions</A></H4><P>SGML lets you declare that certain element types canfloat anywhere within another element. These are calledinclusion exceptions. Conversely, element types can be barred from appearing within other elements byexclusion exceptions. Exceptions are declared after the element type's content model proper</P><!-- CODE SNIP //--><PRE><!ELEMENT (DIR|MENU) - - (LI)+ -(%block;)></PRE><!-- END CODE SNIP //--><P>states that block element types are not allowed anywhere withindir or menu elements. This prohibition will affect theli subelements and any children they might have.</P><P>XML doesn't allow such nonsense; all content models need to becomplete and explicit. This requirement posed one of the two biggest problems thatI faced in converting the HTML DTD for XML, because inclusions andexclusions at a high level in the document structure can affect the contentmodels of many other element types.</P><P>In this case, I created a special element type,SIMPLELI, which has a different content model fromLI, because block element types are not allowedwithin it. However, this introduces a new element type, which is not upward-<BR>compatible with the existing DTD. It also has no effect on the childrenof SIMPLELI elements, which could still introduceblock element types. A simpler approach would be to remove the exclusion and accept that the XMLDTD is less restrictive than the original.</P><P>Inclusions are potentially harder to deal with, because by ignoring themyou are removing possibilities that are allowed by the current DTD. Forexample, the FORM element type has an exclusion and three inclusions:</P><!-- CODE SNIP //--><PRE><!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)></PRE><!-- END CODE SNIP //--><P>In this case, I changed the content model so that the inclusions can appearas immediate children of the FORM element type:</P><!-- CODE //--><PRE><!ENTITY % form.content "(%heading;|%forms.block;|HR|ADDRESS|IMG|INPUT|SELECT|TEXTAREA)*"><!ELEMENT FORM (%form.content;)></PRE><!-- END CODE //--><P>Again, this change will apply only to the immediate children of theFORM <BR>element.</P><P><CENTER><A HREF="0213-0218.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0223-0226.html">Next</A></CENTER></P></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -