⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch05_07.htm

📁 Perl & XML. by Erik T. Ray and Jason McIntosh ISBN 0-596-00205-X First Edition, published April
💻 HTM
📖 第 1 页 / 共 4 页
字号:
turning Excel documents into XML, it reads from web server log files.The parser turns a line like this from a log file:</p><blockquote><pre class="code">10.16.251.137 - - [26/Mar/2000:20:30:52 -0800] "GET /index.html HTTP/1.0" 200 16171</pre></blockquote><p>into this snippet of XML: </p><blockquote><pre class="code">&lt;entry&gt;&lt;ip&gt;10.16.251.137&lt;ip&gt;&lt;date&gt;26/Mar/2000:20:30:52 -0800&lt;date&gt;&lt;req&gt;GET /apache-modlist.html HTTP/1.0&lt;req&gt;&lt;stat&gt;200&lt;stat&gt;&lt;size&gt;16171&lt;size&gt;&lt;entry&gt;</pre></blockquote><p><a href="ch05_07.htm#perlxml-CHP-5-EX-8">Example 5-8</a> implements the<tt class="literal">XML::SAX</tt> driver for web logs. The first subroutinein the package is <tt class="literal">parse( )</tt>. Ordinarily, youwouldn't write your own <tt class="literal">parse()</tt> method because the base class does that for you, but itassumes that you want to input some form of XML, which is not thecase for drivers. Thus, we shadow that routine with one of our own,specifically trained to handle web server log files.</p><a name="perlxml-CHP-5-EX-8" /><div class="example"><h4 class="objtitle">Example 5-8. Web log SAX driver </h4><blockquote><pre class="code">package LogDriver;require 5.005_62;use strict;use XML::SAX::Base;our @ISA = ('XML::SAX::Base');our $VERSION = '0.01';sub parse {    my $self = shift;    my $file = shift;    if( open( F, $file )) {        $self-&gt;SUPER::start_element({ Name =&gt; 'server-log' });        while( &lt;F&gt; ) {            $self-&gt;_process_line( $_ );        }        close F;        $self-&gt;SUPER::end_element({ Name =&gt; 'server-log' });    }}sub _process_line {    my $self = shift;    my $line = shift;    if( $line =~           /(\S+)\s\S+\s\S+\s\[([^\]]+)\]\s\"([^\"]+)\"\s(\d+)\s(\d+)/ ) {        my( $ip, $date, $req, $stat, $size ) = ( $1, $2, $3, $4, $5 );        $self-&gt;SUPER::start_element({ Name =&gt; 'entry' });                $self-&gt;SUPER::start_element({ Name =&gt; 'ip' });        $self-&gt;SUPER::characters({ Data =&gt; $ip });        $self-&gt;SUPER::end_element({ Name =&gt; 'ip' });                $self-&gt;SUPER::start_element({ Name =&gt; 'date' });        $self-&gt;SUPER::characters({ Data =&gt; $date });        $self-&gt;SUPER::end_element({ Name =&gt; 'date' });                $self-&gt;SUPER::start_element({ Name =&gt; 'req' });        $self-&gt;SUPER::characters({ Data =&gt; $req });        $self-&gt;SUPER::end_element({ Name =&gt; 'req' });                $self-&gt;SUPER::start_element({ Name =&gt; 'stat' });        $self-&gt;SUPER::characters({ Data =&gt; $stat });        $self-&gt;SUPER::end_element({ Name =&gt; 'stat' });                $self-&gt;SUPER::start_element({ Name =&gt; 'size' });        $self-&gt;SUPER::characters({ Data =&gt; $size });        $self-&gt;SUPER::end_element({ Name =&gt; 'size' });                $self-&gt;SUPER::end_element({ Name =&gt; 'entry' });    }}1;</pre></blockquote></div><p>Since web logs are line oriented (one entry per line), it makes senseto create a subroutine that handles a single line,<tt class="literal">_process_line( )</tt>. All it has to do is break downthe web log entry into component parts and package them in XMLelements. The <tt class="literal">parse( )</tt> routine simply chops thedocument into separate lines and feeds them into the line processorone at a time.</p><p>Notice that we don't call event handlers in thehandler package directly. Rather, we pass the data through routinesin the base class, using it as an abstract layer between the parserand the handler. This is convenient for you, the parser developer,because you don't have to check if the handlerpackage is listening for that type of event. Again, the base class islooking out for us, making our lives easier.</p><p>Let's test the parser now. Assuming that you havethis module already installed (don't worry,we'll cover the topic of installing<tt class="literal">XML::SAX</tt> parsers in the next section), writing aprogram that uses it is easy. <a href="ch05_07.htm#perlxml-CHP-5-EX-9">Example 5-9</a> creates ahandler package and applies it to the parser we just developed.</p><a name="perlxml-CHP-5-EX-9" /><div class="example"><h4 class="objtitle">Example 5-9. A program to test the SAX driver </h4><blockquote><pre class="code">use XML::SAX::ParserFactory;use LogDriver;my $handler = new MyHandler;my $parser = XML::SAX::ParserFactory-&gt;parser( Handler =&gt; $handler );$parser-&gt;parse( shift @ARGV );package MyHandler;# initialize object with options#sub new {    my $class = shift;    my $self = {@_};    return bless( $self, $class );}sub start_element {    my $self = shift;    my $data = shift;    print "&lt;", $data-&gt;{Name}, "&gt;";    print "\n" if( $data-&gt;{Name} eq 'entry' );    print "\n" if( $data-&gt;{Name} eq 'server-log' );}sub end_element {    my $self = shift;    my $data = shift;    print "&lt;", $data-&gt;{Name}, "&gt;\n";}sub characters {    my $self = shift;    my $data = shift;    print $data-&gt;{Data};}</pre></blockquote></div><p>We use <tt class="literal">XML::SAX::ParserFactory</tt> to demonstrate howa parser can be selected once it is registered. If you wish, you candefine attributes for the parser so that subsequent queries canselect it based on those properties rather than its name.</p><p>The handler package is not terribly complicated; it turns the eventsinto an XML character stream. Each handler receives a hash referenceas an argument through which you can access eachobject's properties by the appropriate key. Anelement's name, for example, is stored under thehash key <tt class="literal">Name</tt>. It all works pretty<a name="INDEX-466" /> <a name="INDEX-467" /> <a name="INDEX-468" /> much as youwould expect.</p></div><a name="perlxml-CHP-5-SECT-7.5" /><div class="sect2"><h3 class="sect2">5.7.5. Installing Your Own Parser</h3><p>Our<a name="INDEX-469" />coverage of <tt class="literal">XML::SAX</tt> wouldn't becomplete without showing you how to create an installation packagethat adds a parser to the registry automatically. Adding a parser isvery easy with the <em class="emphasis">h2xs</em> utility. Though it wasoriginally made to facilitate extensions to Perl written in C, it isinvaluable in other ways.</p><p>Here, we will use it to create something much like the moduleinstallers you've downloaded from CPAN.<a href="#FOOTNOTE-26">[26]</a> </p><blockquote class="footnote"><a name="FOOTNOTE-26" /><p>[26]For a helpful tutorial on using <em class="emphasis">h2xs</em>, seeO'Reilly's <em class="citetitle">The PerlCookbook</em> by Tom Christiansen and Nat Torkington.</p></blockquote><p>First, we start a new project with the following command:</p><blockquote><pre class="code">h2xs -AX -n LogDriver</pre></blockquote><p><em class="emphasis">h2xs</em> automatically creates a directory called<em class="filename">LogDriver</em><a name="INDEX-470" />, stocked withseveral files.</p><dl><dt><i><em class="filename">LogDriver.pm</em></i></dt><dd><p>A stub for our module, ready to be filled out with subroutines.</p></dd><a name="INDEX-471" /><dt><i><em class="filename">Makefile.PL</em></i></dt><dd><p>A Perl program that generates a <em class="filename">Makefile</em> forinstalling the module. (Look familiar, CPAN users?)</p></dd><dt><i><em class="filename">test.pl</em></i></dt><dd><p>A stub for adding test code to check on the success of installation.</p></dd><dt><i><em class="filename">Changes</em>, <em class="filename">MANIFEST</em></i></dt><dd><p>Other files used to aid in installation and give information to users.</p></dd></dl><p><em class="filename">LogDriver.pm</em>, the module to be installed,doesn't need much extra code to make<em class="emphasis">h2xs</em> happy. It only needs a variable,<tt class="literal">$VERSION</tt>, since <em class="emphasis">h2xs</em> is(justifiably) finicky about that information.</p><p>As you know from installing CPAN modules, the first thing you do whenopening an installer archive is run the command <tt class="literal">perlMakefile.PM</tt>. Running this command generates a file called<em class="filename">Makefile</em>, which configures the installer to yoursystem. Then you can run <tt class="literal">make</tt> and <tt class="literal">makeinstall</tt> to load the module in the right place.</p><p>Any deviation from the default behavior of the installer must becoded in the <em class="filename">Makefile.PM</em> program. Untouched, itlooks like this:</p><blockquote><pre class="code">use ExtUtils::MakeMaker;WriteMakefile(    'NAME'                =&gt; 'LogDriver',         # module name    'VERSION_FROM'        =&gt; 'LogDriver.pm',      # finds version );</pre></blockquote><p>The argument to<a name="INDEX-472" /> <tt class="literal">WriteMakeFile()</tt> is a hash of properties about the module, used ingenerating a <em class="filename">Makefile</em> file. We can add moreproperties here to make the installer do more sophisticated thingsthan just copy a module onto the system. For our parser, we want toadd this line:</p><blockquote><pre class="code">'PREREQ_PM' =&gt; { 'XML::SAX' =&gt; 0 }</pre></blockquote><p>Adding this line triggers a check during installation to see if<tt class="literal">XML::SAX</tt> exists on the system. If not, theinstallation aborts with an error message. We don'twant to install our parser until there is a framework to accept it.</p><p>This subroutine should also be added to<em class="filename">Makefile.PM</em>:</p><blockquote><pre class="code">sub MY::install {    package MY;    my $script = shift-&gt;SUPER::install(@_);    $script =~ s/install :: (.*)$/install :: $1 install_sax_driver/m;    $script .= &lt;&lt;"INSTALL";    install_sax_driver :       \t\@\$(PERL) -MXML::SAX -e "XML::SAX-&gt;add_parser(q(\$(NAME)))-&gt;save_parsers( )"INSTALL    return $script;}</pre></blockquote><p>This example adds<a name="INDEX-473" /> <a name="INDEX-474" /> the parser to the list<a name="INDEX-475" /> maintainedby<a name="INDEX-476" /><tt class="literal">XML::SAX</tt>. Now you can install your<a name="INDEX-477" /> module.</p></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch05_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch06_01.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">5.6. XML::Handler::YAWriter as a Base Handler Class</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">6. Tree Processing</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -