📄 0041-0043.html
字号:
<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>Presenting XML:The XML Advantage:EarthWeb Inc.-</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) { var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!-- <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15" HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE=" Book Search "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1575213346 //--><!-- TITLE=Presenting XML//--><!-- AUTHOR=Richard Light//--><!-- PUBLISHER=Macmillan Computer Publishing//--><!-- IMPRINT=Sams//--><!-- CHAPTER=03 //--><!-- PAGES=0037-0050 //--><!-- UNASSIGNED1 //--><!-- UNASSIGNED2 //--><P><CENTER><A HREF="0037-0040.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0044-0046.html">Next</A></CENTER></P><A NAME="PAGENUM-41"><P>Page 41</P></A><P>Therefore, it could be found by any of the following queries:</P><!-- CODE SNIP //--><PRE>the word "quoted" within a <q> elementthe content of all <q> elementsthe word "quoted" within a <p> within a <div4></PRE><!-- END CODE SNIP //--><P>In other words, its data content and its context can both be used, alone orin combination, as search criteria. And, of course, attribute values also can beused to refine the query.</P><P>This opens up the possibility of vastly improved searchingof free-text documents. You have already seen that the current generation of Web searchengines are great on recall ("Yes! 15,000 hits!") but pretty poor on precision("Why should I be interested in this?"). XML offers built-in context sensitivity fortext retrieval.</P><H4><A NAME="ch03_ 6">Document Classes</A></H4><P>All HTML pages share a common core set of tags that will be the sameno matter what type of HTML is in use—for example,head, title, and p. In the same way, XML documents that conform to a particular application willform classes. These will be either tightly controlled groupings with everydocument instance conforming to a single DTD, or looser groupings around a familyof related DTDs. In both cases, document classes will bring advantages.</P><P>When a document class is tightly controlled by a single DTD,application software can treat the rules expressed by that DTD rather like a databaseschema. This means that all documents in the class can be subjected to veryspecific processing, as long as they are all valid XML documents, and it is trueeven when the DTD covers a wide range of possible uses. For example, theXML metadata initiatives that are currently on the table (Meta ContentFramework and XML-Data) would both provide information about Web sites andother information resources in a form that search engines and other automatedagents could use directly. To quote the MCF proposal, "For interoperability andefficiency, schemata designed to serve different applications should share asmuch as possible in the way of data structures, syntax, and vocabulary."</P><P>If you have a class of documents conforming to a single DTD, you have todo the job of setting up the complete XML application only once—whenyou develop style sheets, link protocols, and possibly write custom software.Every document that conforms to that DTD can then benefit from itsassociated application.</P><A NAME="PAGENUM-42"><P>Page 42</P></A><P>Even where there is a looser affiliation of document types, there are benefitsin belonging to a class. A good example is the Text Encoding Initiative, awell-established SGML framework for Humanities documents that will shortlybe available in XML form. Here, there isn't a single TEI DTD. Instead, thereis a framework for designing your own DTD, using the Chicago Pizza Model:</P><UL><LI> Choose your pizza base (the basic type of document you aredealing with—prose, poetry, or drama).<LI> Spread on the cheese and tomato (a set of elements common toall bases).<LI> Pick toppings to suit your taste (specialized elements forlinking, textual analysis, and so on).</UL><P>The key point here is the "cheese and tomato" layer. Every TEI-basedpizza—make that DTD—will use the same element types for common featuressuch as paragraphs, lists, and bibliographicreferences.</P><P>Therefore, it is possible to design a generic XS style sheet for the commonTEI elements, which will be useful no matter how you choose to design yourTEI application. The modular design of XS allows you to plug in a base stylesheet and then add styles for the elements that are unique to your particular useof TEI. Interestingly, XS is also relaxed about which element types you specifyin your style sheets. It is quite possible to have a too-broad style sheet thatincludes elements you are not using in your DTD.</P><P>In a similar spirit, all TEI applications that use fancy linking will use thesame Links "topping," so it will be possible to write support for this intobrowser software.</P><TABLE BGCOLOR="#FFFF99"><TR><TD>Note:</TD></TR><TR><TD><BLOCKQUOTE>This has already happened in the SGML world, where Synex'sViewport engine—which is used in the well-known Panorama browserfrom SoftQuad—offers native support for TEI extended pointers. These arethe precursors of XML XPointers. See Chapter 9, "Linking with XML,"for details.</BLOCKQUOTE></TD></TR></TABLE><P>Finally, it is still possible to gain the benefits of structural precision whenrunning cross-document searches within a loosely grouped class ofdocuments.</P><A NAME="PAGENUM-43"><P>Page 43</P></A><H4><A NAME="ch03_ 7">Chunk Delivery</A></H4><P>One of the problems with using HTML for all your Web content is thatsometimes you want to write long documents. There are two ways to deal withthis. You could produce a 500KB page, which takes your users forever todownload. Or you could put in a few happy hours designing a set of pages intowhich the document can be broken down, with appropriate buttons so that userscan navigate the pages and (hopefully) never get lost. Frames help a bit, but notall of your users will have frames support.</P><P>XML doesn't provide any magic to download documents quicker thanHTML can download, but it does have a few tricks up its sleeve that help in this area.</P><P>For a start, XML's entity mechanism lets you create big documents inbite-sized chunks. You can write your magnum opus in separate sections andhave a master document that dynamically pulls the sections together. However,this is mainly a convenience at the authoring stage, because the user is stilllooking at (and downloading) one big document.</P><P>A second approach is to use XML-Link's XPointers to link in the sectionsof the document. This is better because the linked sections are not treated bythe XML processor as part of the core document. The XML processor cansimply read the core document, indicate that the links exist, and let the userdecide which ones to follow. This is rather like the set of linked pages you mightdesign in HTML, except that the navigation between the linked sections willbe supported natively by the XML browser without any extra effort on yourpart.</P><P>Finally, it is possible to deliver arbitrary chunks from a large XMLdocument even when it is stored as one big file. Like HTML, XML supports the# separator for parts within the current page, and it works the same way:</P><!-- CODE SNIP //--><PRE><a href="http://www.mysite.com/home.htm#pets"></A></PRE><!-- END CODE SNIP //--><P>First, you download the whole file referenced by the URL:<A HREF="http://www.mysite.com/home.htm.">http://www.mysite.com/home.htm.</A> Then, when the element marked withname="pets" is read, you display the document, starting at that point. The only troubleis that with long documents, you can stare at an empty screen for a longtime before that section turns up.</P><P>XML has an additional part separator that avoids most of this problem.The part separator ?XML-XPTR= can be used in place of#. It says, "I only want you</P><P><CENTER><A HREF="0037-0040.html">Previous</A> | <A HREF="../ewtoc.html">Table of Contents</A> | <A HREF="0044-0046.html">Next</A></CENTER></P></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -