📄 ch05_04.htm
字号:
<html><head><title>Drivers for Non-XML Sources (Perl and XML)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Erik T. Ray and Jason McIntosh" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="059600205XL" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl and XML" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl & XML" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch05_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">5.4. Drivers for Non-XML Sources</h2><p>The<a name="INDEX-401" /> <a name="INDEX-402" /> filter example used a file containingan XML document as an input source. This example shows just one ofmany ways to use SAX. Another popular use is to read data from adriver, which is a program that generates a stream of data from anon-XML source, such as a database. A SAX driver converts the datastream into a sequence of SAX events that we can process the way wedid previously. What makes this so cool is that we can use the samecode regardless of where the data came from. The SAX event streamabstracts the data and markup so we don't have toworry about it. Changing the program to work with files or otherdrivers would be trivial.</p><p>To see a driver in action, we will write a program that uses Ilya<a name="INDEX-403" />Sterin's module<tt class="literal">XML::SAXDriver::Excel</tt><a name="INDEX-404" /> to convert <a name="INDEX-405" />Microsoft Excelspreadsheets into XML documents. This example shows how a data streamcan be processed in a pipeline fashion to ultimately arrive in theform we want it. A <tt class="literal">Spreadsheet::ParseExcel</tt> objectreads the file and generates a generic data stream, which an<tt class="literal">XML::SAXDriver::Excel</tt> object translates into a SAXevent stream. This stream is then output as XML by our program.</p><p>Here's a test Excel spreadsheet, represented as atable:</p><a name="ch05-5-fm2xml" /><table border="1"><tr><td> </td><td><p>A</p></td><td><p>B</p></td></tr><tr><td><p>1</p></td><td><p>baseballs</p></td><td><p>55</p></td></tr><tr><td><p>2</p></td><td><p>tennisballs</p></td><td><p>33</p></td></tr><tr><td><p>3</p></td><td><p>pingpong balls</p></td><td><p>12</p></td></tr><tr><td><p>4</p></td><td><p>footballs</p></td><td><p>77</p></td></tr></table><p><p>The SAX driver will create new elements for us, giving us the namesin the form of arguments to handler method calls. We will just printthem out as they come and see how the driver structures the document.<a href="ch05_04.htm#perlxml-CHP-5-EX-6">Example 5-6</a> is a simple program that does this.</p><a name="perlxml-CHP-5-EX-6" /><div class="example"><h4 class="objtitle">Example 5-6. Excel parsing program </h4><blockquote><pre class="code">use XML::SAXDriver::Excel;# get the file name to processdie( "Must specify an input file" ) unless( @ARGV );my $file = shift @ARGV;print "Parsing $file...\n";# initialize the parsermy $handler = new Excel_SAX_Handler;my %props = ( Source => { SystemId => $file }, Handler => $handler );my $driver = XML::SAXDriver::Excel->new( %props );# start parsing$driver->parse( %props );# The handler package we define to print out the XML# as we receive SAX events.package Excel_SAX_Handler;# initialize the packagesub new { my $type = shift; my $self = {@_}; return bless( $self, $type );}# create the outermost elementsub start_document { print "<doc>\n";}# end the document elementsub end_document { print "</doc>\n";}# handle any character datasub characters { my( $self, $properties ) = @_; my $data = $properties->{'Data'}; print $data if defined($data);}# start a new element, outputting the start tagsub start_element { my( $self, $properties ) = @_; my $name = $properties->{'Name'}; print "<$name>";}# end the new elementsub end_element { my( $self, $properties ) = @_; my $name = $properties->{'Name'}; print "</$name>";}</pre></blockquote></div><p>As you can see, the handler methods look very similar to those usedin the previous SAX example. All that has changed is what we do withthe arguments. Now let's see what the output lookslike when we run it on the test file:</p><blockquote><pre class="code"><doc><records> <record> <column1>baseballs</column1> <column2>55</column2> </record> <record> <column1>tennisballs</column1> <column2>33</column2> </record> <record> <column1>pingpong balls</column1> <column2>12</column2> </record> <record> <column1>footballs</column1> <column2>77</column2> </record> <record>Use of uninitialized value in print at conv line 39. <column1></column1>Use of uninitialized value in print at conv line 39. <column2></column2> </record></records></doc></pre></blockquote><p>The driver did most of the work in creating elements and formattingthe data. All we did was output the packages it gave us in the formof method calls. It wrapped the whole document in<tt class="literal"><records></tt>, making our use of<tt class="literal"><doc></tt> superfluous. (In the next revision ofthe code, we'll make the <tt class="literal">start_document()</tt> and <tt class="literal">end_document( )</tt> methods outputnothing.) Each row of the spreadsheet is encapsulated in a<tt class="literal"><record></tt> element. Finally, the two columnsare differentiated with <tt class="literal"><column1></tt> and<tt class="literal"><column2></tt> labels. All in all, not a bad job.</p><p>You can see that with a minimal amount of effort on our part, we haveharnessed the power of SAX to do some complex work converting fromone format to another. The driver actually automates the conversion,but it gives us enough flexibility in interpreting the events so thatwe can reject bad data (the empty row, for example) or renameelements. We can even perform complex processing, such as adding upvalues or sorting<a name="INDEX-406" /> <a name="INDEX-407" /> rows.</p><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch05_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">5.3. External Entity Resolution</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">5.5. A Handler Base Class</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -