📄 ch02_01.htm
字号:
<html><head><title>An XML Recap (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch01_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch02_02.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h1 class="chapter">Chapter 2. An XML Recap</h1><div class="htmltoc"><h4 class="tochead">Contents:</h4><p><a href="ch02_01.htm">A Brief History of XML</a><br /><a href="ch02_02.htm">Markup, Elements, and Structure</a><br /><a href="ch02_03.htm">Namespaces</a><br /><a href="ch02_04.htm">Spacing</a><br /><a href="ch02_05.htm">Entities</a><br /><a href="ch02_06.htm">Unicode, Character Sets, and Encodings</a><br /><a href="ch02_07.htm">The XML Declaration</a><br /><a href="ch02_08.htm">Processing Instructions and Other Markup</a><br /><a href="ch02_09.htm">Free-Form XML and Well-Formed Documents</a><br /><a href="ch02_10.htm">Declaring Elements and Attributes</a><br /><a href="ch02_11.htm">Schemas</a><br /><a href="ch02_12.htm">Transformations</a><br /></p></div><p><a name="INDEX-43" />XML<a name="INDEX-44" /> is a revolutionary (and evolutionary)markup language. It combines the generalized markup power of<a name="INDEX-45" /> <a name="INDEX-46" />SGML with the simplicityof free-form markup and well-formedness rules. Its unambiguousstructure and predictable syntax make it a very easy and attractiveformat to process with computer programs.</p><p>You are free, with XML, to design your own <a name="INDEX-47" />markup language that best fits yourdata. You can select element names that make sense to you, ratherthan use tags that are overloaded and presentation-heavy. If youlike, you can formalize the language by using element and attributedeclarations in the <a name="INDEX-48" /> <a name="INDEX-49" />DTD.</p><p>XML has syntactic shortcuts such as entities, comments, processinginstructions, and CDATA sections. It allows you to group elements andattributes by namespace to further organize the vocabulary of yourdocuments. Using the <tt class="literal">xml:space</tt> attribute canregulate whitespace, sometimes a tricky issue in markup in whichhuman readability is as important as correct formatting.</p><p>Some very useful technologies are available to help you maintain andmutate your documents. Schemas, like DTDs, can measure the validityof XML as compared to a canonical model. Schemas go even further byenforcing patterns in character data and improving content modelsyntax. XSLT is a rich language for transforming documents intodifferent forms. It could be an easier way to work with XML thanhaving to write a program, but isn't always.</p><p>This chapter gives a quick recap of XML, where it came from, howit's structured, and how to work with it. If youchoose to skip this chapter (because you already know XML or becauseyou're impatient to start writing code),that's fine; just remember thatit's here if you need it.</p><div class="sect1"><a name="perlxml-CHP-2-SECT-1" /><h2 class="sect1">2.1. A Brief History of XML</h2><p>Early text processing was closely tied to the machines that displayedit. Sophisticated formatting was tied to a particular device -- orrather, a class of devices called printers.</p><p>Take <a name="INDEX-50" />troff, for example. Troff was a verypopular text formatting language included in most Unix distributions.It was revolutionary because it allowed high-quality formattingwithout a typesetting machine.</p><p>Troff mixes formatting instructions with data. The instructions aresymbols composed of characters, with a special syntax so a troffinterpreter can tell the two apart. For example, the symbol<tt class="literal">\fI</tt> changes the current font style to italic.Without the backslash character, it would be treated as data. Thismixture of instructions and data is called<em class="emphasis">markup</em>.</p><p>Troff can be even more detailed than that. The instruction<tt class="literal">.vs</tt> <tt class="literal">18p</tt> tells the formatter toinsert 18 points of vertical space at whatever point in the documentwhere the instruction appears. Beyond aesthetics, wecan't tell just by looking at it what purpose thisspacing serves; it gives a very specific instruction to the processorthat can't be interpreted in any other way. Thisinstruction is fine if you only want to prepare a document forprinting in a specific style. If you want to make changes, though, itcan be quite painful.</p><p>Suppose you've marked up a book in troff so thatevery newly defined term is in boldface. Your document has thousandsof bold font instructions in it. You're happy andready to send it to the printer when suddenly, you get a call fromthe design department. They tell you that the design has changed andthey now want the new terms to be formatted as italic. Now you have aproblem. You have to turn every bold instruction for a new term intoan italic instruction.</p><p>Your first thought is to open the document in your editor and do asearch-and-replace maneuver. But, to your horror, you realize thatnew terms aren't the only places where you used boldfont instructions. You also used them for emphasis and for propernouns, meaning that a global replace would also mangle theseinstances, which you definitely don't want. You canchange the right instructions only by going through them one at atime, which could take hours, if not days.</p><p>No matter how smart you make a formatting language like troff, itstill has the same problem: it's inherentlypresentational. A<em class="emphasis">presentational</em><a name="INDEX-51" /> markup language describescontent in terms of how to format it. Troff specifies details aboutfonts and spacing, but it never tells you what something is. Usingtroff makes the document less useful in some ways.It's hard to search through troff and come back withthe last paragraph of the third section of a book, for example. Thepresentational markup gets in the way of any task other than itsspecific purpose: to format the document for printing.</p><p>We can characterize troff, then, as a <em class="emphasis">destinationformat</em><a name="INDEX-52" />. It's not good foranything but a specific end purpose. What other kind of format couldthere be? Is there an "origin"format -- that is, something that doesn't dictateany particular formatting but still packages the data in a usefulway? People began to ask this key question in the late 1960s whenthey devised the concept of <em class="emphasis">genericcoding</em><a name="INDEX-53" />: marking up content in apresentation-agnostic way, using descriptive tags rather thanformatting instructions.</p><p>The <a name="INDEX-54" /> <a name="INDEX-55" />GraphicCommunications Association (GCA) started a project to explore thisnew area called GenCode, which develops ways to encode documents ingeneric tags and assemble documents from multiple pieces -- aprecursor to hypertext. IBM's<a name="INDEX-56" /> <a name="INDEX-57" />Generalized Markup Language(GML), developed by Charles <a name="INDEX-58" />Goldfarb, Edward<a name="INDEX-59" />Mosher, andRaymond <a name="INDEX-60" />Lorie, built on this concept.<a href="#FOOTNOTE-3">[3]</a> As aresult of this work, IBM could edit, view on a terminal, print, andsearch through the same source material using different programs. Youcan imagine that this benefit would be important for a company thatchurned out millions of pages of documentation per year.</p><blockquote class="footnote"><a name="FOOTNOTE-3" /><p>[3]Cute fact: the initials of these researchers also spell out
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -