📄 ch02_02.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
字号:
<html><head><title>Markup, Elements, and Structure (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch02_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch02_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">2.2. Markup, Elements, and Structure</h2><p>A <a name="INDEX-70" />markup language<a name="INDEX-71" /><a name="INDEX-72" /> provides a way to embed instructions insidedata to help a computer program process the data. Most markupschemes, such as troff,<a name="INDEX-73" />TeX, and HTML, haveinstructions that are optimized for one purpose, such as formattingthe document to be printed or to be displayed on a computer screen.These languages rely on a<em class="emphasis">presentational</em><a name="INDEX-74" /> description of data, which controlstypeface, font size, color, or other media-specific properties.Although such markup can result in nicely formatted documents, it canbe like a prison for your data, consigning it to one format forever;you won't be able to extract your data for otherpurposes without significant work.</p><p>That's where XML comes in. It's ageneric markup language that describes data according to itsstructure and purpose, rather than with specific formattinginstructions. The actual presentation information is stored somewhereelse, such as in a stylesheet. What's left is afunctional description of the parts of your document, which issuitable for many different kinds of processing. With proper use ofXML, your document will be ready for an unlimited variety ofapplications and purposes.</p><p>Now let's review the basic components of XML. Itsmost important feature is the <em class="emphasis">element</em>.Elements are encapsulated regions of data that serve a unique role inyour document. For example, consider a typical book, composed of apreface, chapters, appendixes, and an index. In XML, marking up eachof these sections as a unique element within the book would beappropriate. Elements may themselves be divided into other elements;you might find the chapter's title, paragraphs,examples, and sections all marked up as elements. This divisioncontinues as deeply as necessary, so even a paragraph can containelements such as emphasized text, quotations, and hypertext links.</p><p>Besides dividing text into a hierarchy of regions, elements associatea label and other properties with the data. Every element has a name,or <em class="emphasis">elementtype</em><a name="INDEX-75" />,usually describing its function in the document. Thus, a chapterelement could be called a "chapter"(or "chapt" or"ch" -- whatever you fancy). Anelement can include other information besides the type, using aname-value pair called an<em class="emphasis">attribute</em><a name="INDEX-76" />. Together, an element'stype and attributes distinguish it from other elements in thedocument.</p><p><a href="ch02_02.htm#perlxml-CHP-2-EX-1">Example 2-1</a> shows a typical piece of XML. </p><a name="perlxml-CHP-2-EX-1" /><div class="example"><h4 class="objtitle">Example 2-1. An XML fragment </h4><blockquote><pre class="code">&lt;list id="eriks-todo-47"&gt;  &lt;title&gt;Things to Do This Week&lt;/title&gt;  &lt;item&gt;clean the aquarium&lt;/item&gt;  &lt;item&gt;mow the lawn&lt;/item&gt;  &lt;item priority="important"&gt;save the whales&lt;/item&gt;&lt;/list&gt;</pre></blockquote></div><p>This is, as you've probably guessed, a to-do listwith three items and a title. Anyone who has worked with HTML willrecognize the markup. The pieces of text surrounded by<a name="INDEX-77" /> <a name="INDEX-78" />angle brackets("<tt class="literal">&lt;</tt>"and"<tt class="literal">&gt;</tt>")are called <em class="emphasis">tags</em><a name="INDEX-79" />, and they act as bookends forelements. Every nonempty element must have both a start and end tag,each containing the element type label. The start tag can optionallycontain a number of attributes (name-value pairs like<tt class="literal">priority="important"</tt>). Thus, the markup is prettyclear and unambiguous -- even a human can read it.</p><p>A human can read it, but more importantly, a computer program canread it very easily. The framers of XML have taken great care toensure that XML is easy to read by all XML processors, regardless ofthe types of tags used or the context. If your markup follows all theproper syntactic rules, then the XML is absolutely unambiguous. Thismakes processing it much easier, since you don'thave to add code to handle unclear situations.</p><p>Consider HTML, as it was originally defined (an application ofXML's predecessor, SGML).<a href="#FOOTNOTE-5">[5]</a> For certain elements, it was acceptable to omit the endtag, and it's usually possible to tell from thecontext where an element should end. Even so, making code robustenough to handle every ambiguous situation comes at the price ofcomplexity and inaccurate output from bad guessing. Now imagine howit would be if the same processor had to handle any element type, notjust the HTML elements. Generic XML processors can'tmake assumptions about how elements should be arranged. An ambiguoussituation, such as the omission of an end tag, would be disastrous.</p><blockquote class="footnote"> <a name="FOOTNOTE-5" /><p>[5]Currently,XHTML is an XML-legal variant of HTML that HTML authors areencouraged to adopt in support of coming XML tools. XML enablesdifferent kinds of markup to be processed by the same programs (e.g.,editors, syntax-checkers, or formatters). HTML will soon be joined onthe Web by such XML-derived languages as DocBook and MathML. </p></blockquote><p>Any piece of XML can be represented in a diagram called a<em class="emphasis">tree</em><a name="INDEX-80" />, a structure familiar to mostprogrammers. At the top (since trees in computer science grow upsidedown) is the root element. The elements that are contained one leveldown branch from it. Each element may contain elements at stilldeeper levels, and so on, until you reach the bottom, or"leaves" of the tree. The leavesconsist of either data (text) or empty elements. An element at anylevel can be thought of as the root of its own tree (or subtree, ifyou prefer to call it that). A tree diagram of the previous exampleis shown in <a href="ch02_02.htm#perlxml-CHP-2-FIG-1">Figure 2-1</a>.</p><a name="perlxml-CHP-2-FIG-1" /><div class="figure"><img height="161" alt="Figure 2-1" src="figs/pxml.0201.gif" width="456" /></div><h4 class="objtitle">Figure 2-1. A to-do list represented as a tree structure</h4><p>Besides the arboreal analogy, it's also useful tospeak of XML genealogically. Here, we describe anelement's content (both data and elements) as itsdescendants, and the elements that contain it as its ancestors. Inour list example, each <tt class="literal">&lt;item&gt;</tt> element is achild of the same parent, the <tt class="literal">&lt;list&gt;</tt>element, and a sibling of the others. (We generallydon't carry the terminology too far, as talkingabout third cousins twice-removed can make your head hurt.)We<a name="INDEX-81" /><a name="INDEX-82" /> willuse both the tree and family terminology to describe<a name="INDEX-83" />element<a name="INDEX-84" /> relationships<a name="INDEX-85" /> throughout thebook.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch02_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch02_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">2. An XML Recap</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">2.3. Namespaces</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -