⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch03_07.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
字号:
<html><head><title>Document Validation (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl &amp; XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch03_08.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">3.7. Document Validation</h2><p>Being<a name="INDEX-263" /><a name="INDEX-264" />well-formed is a minimal requirement for XML everywhere. However, XMLprocessors have to accept a lot on blind faith. If we try to build adocument to meet some specific XML application'sspecifications, it doesn't do us any good if acontent generator slips in a strange element we'venever seen before and the parser lets it go by with nary a whimper.Luckily, a higher level of quality control is available to us when weneed to check for things like that. It's calleddocument validation.</p><p><em class="emphasis">Validation</em> is a sophisticated way of comparinga <em class="emphasis">documentinstance</em><a name="INDEX-265" /> <a name="INDEX-266" /> against a template or grammarspecification. It can restrict the number and type of elements adocument can use and control where they go. It can even regulate thepatterns of character data in any element or attribute. A<em class="emphasis">validatingparser</em><a name="INDEX-267" /> tells you whether a document is validor not, when given a DTD or schema to check against.</p><p>Remember that you don't need to validate every XMLdocument that passes over your desk. DTDs and other validationschemes shine when working with specific XML-based markup languages(such as XHTML for web pages, MathML for equations, or CaveML forspelunking), which have strict rules about which elements andattributes go where (because having an automated way to drawattention to something fishy in the document structure becomes afeature).</p><p>However, validation usually isn't crucial when youuse Perl and XML to perform a less specific task, such as tossingtogether XML documents on the fly based on some other, less sane dataformat, or when ripping apart and analyzing existing XML documents.</p><p>Basically, if you feel that validation is a needless step for the jobat hand, you're probably right. However, if youknowingly generate or modify some flavor of XML that needs to stickto a defined standard, then taking the extra step or three necessaryto perform document validation is probably wise. Your toolbox,naturally, gives you lots of ways to do this. Read on.</p><a name="perlxml-CHP-3-SECT-7.1" /><div class="sect2"><h3 class="sect2">3.7.1. DTDs</h3><p><a name="INDEX-268" />Document type descriptions (DTDs)are documents written in a special markup language defined in the XMLspecification, though they themselves are not XML. Everything withinthese documents is a declaration starting with a<tt class="literal">&lt;!</tt><a name="INDEX-269" /> <a name="INDEX-270" /> delimiter and comes in four flavors:elements, attributes, entities, and notations.</p><p><a href="ch03_07.htm#perlxml-CHP-3-EX-8">Example 3-8</a> is a very simple DTD. </p><a name="perlxml-CHP-3-EX-8" /><div class="example"><h4 class="objtitle">Example 3-8. A wee little DTD </h4><blockquote><pre class="code">&lt;!ELEMENT memo (to, from, message)&gt;&lt;!ATTLIST memo priority (urgent|normal|info) 'normal'&gt;&lt;!ENTITY % text-only "(#PCDATA)*"&gt;&lt;!ELEMENT to %text-only;&gt;&lt;!ELEMENT from %text-only;&gt;&lt;!ELEMENT message (#PCDATA | emphasis)*&gt;&lt;!ELEMENT emphasis %text-only;&gt;&lt;!ENTITY myname "Bartholomus Chiggin McNugget"&gt;</pre></blockquote></div><p>This DTD declares five elements, an attribute for the<tt class="literal">&lt;memo&gt;</tt> element, a parameter entity to makeother declarations cleaner, and an entity that can be used inside adocument instance. Based on this information, a validating parser canreject or approve a document. The following document would passmuster:</p><blockquote><pre class="code">&lt;!DOCTYPE memo SYSTEM "/dtdstuff/memo.dtd"&gt;&lt;memo priority="info"&gt;  &lt;to&gt;Sara Bellum&lt;/to&gt;  &lt;from&gt;&amp;myname;&lt;/from&gt;  &lt;message&gt;Stop reading memos and get back to work!&lt;/message&gt;&lt;/memo&gt;</pre></blockquote><p>If you removed the <tt class="literal">&lt;to&gt;</tt> element from thedocument, it would suddenly become invalid. A well-formedness checkerwouldn't give a hoot about missing elements. Thus,you see the value of validation.</p><p>Because DTDs are so easy to parse, some general XML processorsinclude the ability to validate the documents they parse againstDTDs. <tt class="literal">XML::LibXML</tt> is one such parser. A verysimple validating parser is shown in <a href="ch03_07.htm#perlxml-CHP-3-EX-9">Example 3-9</a>.</p><a name="perlxml-CHP-3-EX-9" /><div class="example"><h4 class="objtitle">Example 3-9. A validating parser </h4><blockquote><pre class="code">use XML::LibXML;use IO::Handle;# initialize the parsermy $parser = new XML::LibXML;# open a filehandle and parsemy $fh = new IO::Handle;if( $fh-&gt;fdopen( fileno( STDIN ), "r" )) {    my $doc = $parser-&gt;parse_fh( $fh );    if( $doc and $doc-&gt;is_valid ) {        print "Yup, it's valid.\n";    } else {        print "Yikes! Validity error.\n";    }    $fh-&gt;close;}</pre></blockquote></div><p>This parser would be simple to add to any program that requires validinput documents. Unfortunately, it doesn't give anyinformation about what specific problem makes it invalid (e.g., anelement in an improper place), so you wouldn't wantto use it as a general-purpose validity checking tool.<a href="#FOOTNOTE-19">[19]</a> T. J. Mather's<tt class="literal">XML::Checker</tt> is a better module for reportingspecific validation errors.</p><blockquote class="footnote"><a name="FOOTNOTE-19" /><p>[19]The authors prefer to use a command-line tool called<em class="emphasis">nsgmls</em> available from <a href="http://www.jclark.com/">http://www.jclark.com/</a>. Public web sites,such as <a href="http://www.stg.brown.edu/service/xmlvalid/">http://www.stg.brown.edu/service/xmlvalid/</a>,can also validate arbitrary documents. Note that, in these cases, theXML document must have a DOCTYPE declaration, whose system identifier(if it has one) must contain a resolvable URL and not a path on yourlocal system.</p> </blockquote></div><a name="perlxml-CHP-3-SECT-7.2" /><div class="sect2"><h3 class="sect2">3.7.2. Schemas</h3><p>DTDs<a name="INDEX-271" /> havelimitations; they aren't able to check what kind ofcharacter data is in an element and if it matches a particularpattern. What if you wanted a parser to tell you if a<tt class="literal">&lt;date&gt;</tt> element has the wrong format for adate, or if it contains a street address by mistake? For that, youneed a solution such as XML Schema. XML Schema is a second generationof DTD and brings more power and flexibility to validation.</p><p>As noted in <a href="ch02_01.htm">Chapter 2, "An XML Recap"</a>, XML Schema enjoys thedubious distinction among the XML-related W3C specification familyfor being the most controversial schema (at least among hackers).Many people like the concept of schemas, but manydon't approve of the XML Schema implementation,which is seen as too cumbersome or constraining to be usedeffectively.</p><p>Alternatives to XML Schema include<a name="INDEX-272" /><a name="INDEX-273" />OASIS-Open'sRelaxNG (<a href="http://www.oasis-open.org/committees/relax-ng/">http://www.oasis-open.org/committees/relax-ng/</a>)and Rick Jelliffe's<a name="INDEX-274" />Schematron(<a href="http://www.ascc.net/xml/resource/schematron/schematron.html">http://www.ascc.net/xml/resource/schematron/schematron.html</a>).Like XML Schema, these specifications detail XML-based languages usedto describe other XML-based languages and let a program that knowshow to speak that schema use it to validate other XML documents. Wefind Schematron particularly interesting because it has had a Perlmodule attached to it for a while (in the form of KipHampton's <tt class="literal">XML::Schematron</tt>family).</p><p>Schematron is especially interesting to many Perl and XML hackersbecause it builds on existing popular XML technologies that alreadyhave venerable Perl implementations. Schematron defines a very simplelanguage with which you list and group together assertions of whatthings should look like based on XPath expressions. Instead of aforward-looking grammar that must list and define everything that canpossibly appear in the document, you can choose to validate afraction of it. You can also choose to have elements and attributesvalidate based on conditions involving anything anywhere else in thedocument (wherever an XPath expression can reach). In practice, aSchematron document looks and feels like an XSLT stylesheet, and withgood reason: it's intended to be fully implementableby way of XSLT. In fact, two of the<tt class="literal">XML::Schematron</tt><a name="INDEX-275" /> Perl modules work by first transformingthe user-specified schema document into an XSLT sheet, which it thensimply passes through an XSLT processor.</p><p>Schematron lacks any kind of built-in data typing, so youcan't, for example, do a one-word check to insistthat an attribute conforms to the W3C date format. You can, however,have your Perl program make a separate step using any methodyou'd like (perhaps through the<tt class="literal">XML::XPath</tt> module) to come through date attributesand run a good old Perl regular expression on them. Also note that noschema language will ever provide a way to query anelement's content against a database, or perform anyother action outside the realm of the document. This is where mixingPerl and schemas can come in very handy.</p></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch03_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch03_08.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">3.6. XML::XPath</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">3.8. XML::Writer</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -