📄 ch14_05.htm
字号:
<?label 14.5. CGI Gateway to XML Middleware?><html><head><title>CGI Gateway to XML Middleware (CGI Programming with Perl)</title><link href="../style/style1.css" type="text/css" rel="stylesheet" /><meta name="DC.Creator" content="Scott Guelich, Gunther Birznieks and Shishir Gundavaram" /><meta scheme="MIME" content="text/xml" name="DC.Format" /><meta content="en-US" name="DC.Language" /><meta content="O'Reilly & Associates, Inc." name="DC.Publisher" /><meta scheme="ISBN" name="DC.Source" content="1565924193L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="CGI Programming with Perl" /><meta content="Text.Monograph" name="DC.Type" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" alt="Book Home" usemap="#banner-map" border="0" /><map name="banner-map"><area alt="CGI Programming with Perl" href="index.htm" coords="0,0,466,65" shape="rect" /><area alt="Search this book" href="jobjects/fsearch.htm" coords="467,0,514,18" shape="rect" /></map><div class="navbar"><table border="0" width="515"><tr><td width="172" valign="top" align="left"><a href="ch14_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td width="171" valign="top" align="center"><a href="index.htm">CGI Programming with Perl</a></td><td width="172" valign="top" align="right"><a href="ch15_01.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><hr align="left" width="515" /><h2 class="sect1">14.5. CGI Gateway to XML Middleware</h2><p>The following <a name="INDEX-2840" /> <a name="INDEX-2,841" /> <a name="INDEX-2,842" />CGI script will act as a gateway parsingthe XML from the Netscape What's Related server. Given a URL,it will print out all the related URLs. In addition, it will alsoquery the Netscape What's Related server for all the URLsrelated to this list of URLs and display them. From this pointonward, we will refer to URLs that are related to the first set ofrelated URLs as second-level related URLs. <a href="ch14_05.htm#ch14-37712">Figure 14-2</a> shows the initial query screen while <a href="ch14_05.htm#ch14-41053">Figure 14-3</a> illustrates the results from a sample query.<a href="ch14_05.htm#ch14-34273">Example 14-4</a> shows the HTML for the initial form.</p><a name="ch14-37712" /><div class="figure"><img width="457" src="figs/cgi2.1402.gif" height="188" alt="Figure 14-2" /></div><h4 class="objtitle">Figure 14-2. Search form for the "What's Related" CGI script</h4><a name="ch14-41053" /><div class="figure"><img width="457" src="figs/cgi2.1403.gif" height="580" alt="Figure 14-3" /></div><h4 class="objtitle">Figure 14-3. "What's Related to What's Related" results from querying http://www.eff.org/</h4><a name="ch14-34273" /><div class="example"><h4 class="objtitle">Example 14-4. whats_related.html </h4><a name="INDEX-2844" /><blockquote><pre class="code"><HTML><HEAD> <TITLE>What's Related To What's Related Query</TITLE></HEAD><BODY BGCOLOR="#ffffff"> <H1>Enter URL To Search:</H1> <HR> <FORM METHOD="POST"> <INPUT TYPE="text" NAME="url" SIZE=30><P> <INPUT TYPE="submit" NAME="submit_query" VALUE="Submit Query"> </FORM></BODY></HTML></pre></blockquote></div><p>Two Perl modules will be used to provide the core data connection andtranslation services to the search engine. First, the library for webprogramming (<a name="INDEX-2845" /><a name="INDEX-2846" />LWP) module will be used tograb data from the search engine. Since the What's Relatedserver can respond to GET requests, we use the<a name="INDEX-2847" />LWP::Simple subset of LWP rather than thefull-blown API. Then, <a name="INDEX-2848" /><a name="INDEX-2849" />XML::Parserwill take the retrieved data and process it so that we can manipulatethe XML using Perl data structures. The code is shown in <a href="ch14_05.htm#ch14-20388">Example 14-5</a>.</p><a name="ch14-20388" /><div class="example"><h4 class="objtitle">Example 14-5. whats_related.cgi </h4><a name="INDEX-2850" /><blockquote><pre class="code">#!/usr/bin/perl -wTuse strict;use constant WHATS_RELATED_URL => "http://www-rl.netscape.com/wtgn?";use vars qw( @RECORDS $RELATED_RECORDS );use CGI;use CGI::Carp qw( fatalsToBrowser );use XML::Parser;use LWP::Simple;my $q = new CGI( );if ( $q->param( "url" ) ) { display_whats_related_to_whats_related( $q );} else { print $q->redirect( "/quiz.html" );}sub display_whats_related_to_whats_related { my $q = shift; my $url = $q->param( "url" ); my $scriptname = $q->script_name; print $q->header( "text/html" ), $q->start_html( "What's Related To What's Related Query" ), $q->h1( "What's Related To What's Related Query" ), $q->hr, $q->start_ul; my @related = get_whats_related_to_whats_related( $url ); foreach ( @related ) { print $q->a( { -href => "$scriptname?url=$_->[0]" }, "[*]" ), $q->a( { -href => "$_->[0]" }, $_->[1] ); my @subrelated = @{$_->[2]}; if ( @subrelated ) { print $q->start_ul; foreach ( @subrelated ) { print $q->a( { -href => "$scriptname?url=$_->[0]" }, "[*]" ), $q->a( { -href => "$_->[0]" }, $_->[1] ); } print $q->end_ul; } else { print $q->p( "No Related Items Were Found" ); } } if ( ! @related ) { print $q->p( "No Related Items Were Found. Sorry." ); } print $q->end_ul, $q->p( "[*] = Go to What's Related To That URL." ), $q->hr, $q->start_form( -method => "GET" ), $q->p( "Enter Another URL To Search:", $q->text_field( -name => "url", -size => 30 ), $q->submit( -name => "submit_query", -value => "Submit Query" ) ), $q->end_form, $q->end_html;}sub get_whats_related_to_whats_related { my $url = shift; my @related = get_whats_related( $url ); my $record; foreach $record ( @related ) { $record->[2] = [ get_whats_related( $record->[0] ) ]; } return @related;}sub get_whats_related { my $url = shift; my $parser = new XML::Parser( Handlers => { Start => \&handle_start } ); my $data = get( WHATS_RELATED_URL . $url ); $data =~ s/&/&amp;/g; while ( $data =~ s|(=\"[^"]*)\"([^/ ])|$1'$2|g ) { }; while ( $data =~ s|(=\"[^"]*)<[^"]*>|$1|g ) { }; while ( $data =~ s|(=\"[^"]*)<|$1|g ) { }; while ( $data =~ s|(=\"[^"]*)>|$1|g ) { }; $data =~ s/[\x80-\xFF]//g; local @RECORDS = ( ); local $RELATED_RECORDS = 1; $parser->parse( $data ); sub handle_start { my $expat = shift; my $element = shift; my %attributes = @_; if ( $element eq "child" ) { my $href = $attributes{"href"}; $href =~ s/http.*http(.*)/http$1/; if ( $attributes{"name"} && $attributes{"name"} !~ /smart browsing/i && $RELATED_RECORDS ) { if ( $attributes{"name"} =~ /no related/i ) { $RELATED_RECORDS = 0; } else { my $fields = [ $href, $attributes{"name"} ]; push @RECORDS, $fields; } } } } return @RECORDS;}</pre></blockquote></div><p>This script starts like most of our others, except we declare the<tt class="literal">@RECORDS</tt><a name="INDEX-2851" /> and<tt class="literal">@RELATED_RECORDS</tt> as global variables that will beused to temporarily store information about parsing the XML document.In particular, <tt class="literal">@RECORDS</tt> will contain the URLs andtitles of the related URLs, and <tt class="literal">$RELATED_RECORDS</tt>will be a flag that is set if related documents are discovered byNetscape's What's Related server.<tt class="literal">WHATS_RELATED_URL</tt> is a constant that contains theURL of Netscape's What's Related server.</p><p>In addition to the <a name="INDEX-2852" /> <a name="INDEX-2,853" /> <a name="INDEX-2,854" /><a name="INDEX-2855" /><a name="INDEX-2856" />CGI.pm module, we use CGI::Carp withthe <tt class="literal">fatalsToBrowser</tt> option in order to make anyerrors echo to the browser for easier<a name="INDEX-2857" />debugging. This is importantbecause <a name="INDEX-2858" /><a name="INDEX-2859" />XML::Parser dies when it encounters aparsing error. XML::Parser is the heart of the program. It willperform the data extraction of the related items. LWP::Simple is asimplified subset of LWP, a library of functions for grabbing datafrom a <a name="INDEX-2860" />URL.</p><p>We create a CGI object and then check whether we received a<em class="emphasis">url</em> parameter. If so, then we process the query;otherwise, we simply forward the user to the HTML form. To processour query, a<a name="INDEX-2861" />subroutineis called to display "What's Related to What'sRelated" to the URL(<tt class="function">display_whats_related_to_whats_related </tt>).</p><p>The <tt class="function">display_whats_related_to_whats_related</tt>subroutine contains the code that displays the HTML of a list of URLsthat are related to the submitted URL including the second-levelrelated URLs.</p><p>We declare a lexical variable called <tt class="literal">@related</tt>.This data structure contains all the related URL information afterthe data gets returned from the<tt class="function">get_whats_related_to_whats_related</tt><a name="INDEX-2862" />subroutine.</p><p>More specifically,<tt class="literal">@related</tt><a name="INDEX-2863" /><a name="INDEX-2864" /> contains references tothe related URLs, which in turn contain references to second-levelrelated URLs. <tt class="literal">@related</tt> contains references to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -