📄 ch01_06.htm
字号:
<html><head><title>XML Gotchas (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch01_05.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch02_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">1.6. XML Gotchas</h2><p>This section introduces topics we think you should keep in mind asyou read the book. They are the source of many of the problemsyou'll encounter when working with<a name="INDEX-32" />XML.</p><dl><dt><i>Well-formedness</i></dt><dd><p>XML has built-in quality control. A document has to pass some minimalsyntax rules in order to be blessed as well-formed XML. Most parsersfail to handle a document that breaks any of these rules, so youshould make sure any data you input is of sufficient quality.</p></dd><a name="INDEX-33" /><dt><i>Character encodings</i></dt><dd><p>Now that we're in the 21st century, we have to payattention to things like character encodings. Gone are the days whenyou could be content knowing only about <a name="INDEX-34" />ASCII, the little characterset that could. <a name="INDEX-35" />Unicode is the new king, presiding overall major character sets of the world. XML prefers to work withUnicode, but there are many ways to represent it, includingPerl's favorite Unicode encoding, UTF-8. You usuallywon't have to think about it, but you should stillbe aware of the potential.</p></dd><dt><i>Namespaces</i></dt><dd><p>Not everyone works with or even knows about<a name="INDEX-36" />namespaces.It's a feature in XML whose usefulness is notimmediately obvious, yet it is creeping into our reality slowly butsurely. These devices categorize markup and declare tags to be fromdifferent places. With them, you can mix and match document types,blurring the distinctions between them. Equations in HTML? Markup asdata in XSLT? Yes, and namespaces are the reason. Older modulesdon't have special support for namespaces, but thenewer generation will. Keep it in mind.</p></dd><dt><i>Declarations</i></dt><dd><p><a name="INDEX-37" />Declarations aren't partof the document per se; they just define pieces of it. That makesthem weird, and something you might not pay enough attention to.Remember that documents often use DTDs and have declarations for suchthings as entities and attributes. If you forget, you could end upbreaking something.</p></dd><dt><i>Entities</i></dt><dd><p><a name="INDEX-38" />Entities and entityreferences seem simple enough: they stand in for content thatyou'd rather not type in at that moment. Maybe thecontent is in another file, or maybe it contains characters that aredifficult to type. The concept is simple, but the execution can be aroyal pain. Sometimes you want to resolve references and sometimesyou'd rather keep them there. Sometimes a parserwants to see the declarations; at other times itdoesn't care. Entities can contain other entities toan arbitrary depth. They're tricky little beastiesand we guarantee that if you don't give carefulthought to how you're going to handle them, theywill haunt you.</p></dd><dt><i>Whitespace</i></dt><dd><p>According to XML, anything that isn't a markup tagis significant character data. This fact can lead to some surprisingresults. For example, it isn't always clear whatshould happen with<a name="INDEX-39" />whitespace. Bydefault, an XML <a name="INDEX-40" />processor will preserve all ofit -- even the newlines you put after tags to make them morereadable or the spaces you use to indent text. Some parsers will giveyou options to ignore space in certain circumstances, but there areno hard and fast rules.</p></dd></dl><p>In the end, Perl and XML are well suited for each other. There may bea few traps and pitfalls along the way, but with the generosity ofvarious module developers, your path toward Perl/XML enlightenmentshould be<a name="INDEX-41" /> well<a name="INDEX-42" /> lit.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch01_05.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch02_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">1.5. Keep in Mind...</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">2. An XML Recap</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -