📄 ch09.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<!-- This document was created from RTF source by rtftohtml version 3.0.1 -->
<META NAME="GENERATOR" Content="Symantec Visual Page 1.0">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=iso-8859-1">
<TITLE>Without a title - Title</TITLE>
</HEAD>
<BODY BACKGROUND="r2harch.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/r2harch.gif" TEXT="#000000" BGCOLOR="#FFFFFF">
<H2 ALIGN="CENTER"><A HREF="ch08.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch08.htm"><IMG SRC="blanprev.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blanprev.gif" WIDTH="37" HEIGHT="37"
ALIGN="BOTTOM" BORDER="2"></A><A HREF="index-1.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/index-1.htm"><IMG SRC="blantoc.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blantoc.gif" WIDTH="42"
HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><A HREF="ch10.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch10.htm"><IMG SRC="blannext.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blannext.gif"
WIDTH="45" HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><BR>
<BR>
<BR>
<FONT COLOR="#0000AA">9</FONT><BR>
<A NAME="Heading1"></A><FONT COLOR="#000077">AgentsWeb Scanning, Mirroring, and Background
Tasks</FONT></H2>
<P ALIGN="CENTER">by Brian Deng
<H2 ALIGN="CENTER">
<HR>
</H2>
<UL>
<LI><A HREF="#Heading1">AgentsWeb Scanning, Mirroring, and Background Tasks</A>
<UL>
<LI><A HREF="#Heading2">Retrieving Specific Documents from the Web</A>
<UL>
<LI><A HREF="#Heading3">Stock Quotes on the Hour</A>
</UL>
<LI><A HREF="#Heading4">Listing 9.1. HTML returned by the Security APL Quote Server.</A>
<LI><A HREF="#Heading5">Listing 9.2. Automatic stock quote retriever (getquote.pl).</A>
<LI><A HREF="#Heading6">Listing 9.3. Subroutine to extract the quote information.</A>
<UL>
<LI><A HREF="#Heading7">Adapting the Code for General Purpose Use</A>
</UL>
<LI><A HREF="#Heading8">Listing 9.4. General purpose URL retriever going through
a firewall.</A>
<LI><A HREF="#Heading9">Generating Web Indexes</A>
<UL>
<LI><A HREF="#Heading10">Web RobotsSpiders</A>
</UL>
<LI><A HREF="#Heading11">Listing 9.5. The crawlIt() main function of the Web spider.</A>
<LI><A HREF="#Heading13">Listing 9.6. Converting a relative URL to an absolute URL.</A>
<LI><A HREF="#Heading14">Listing 9.7. Writing the title and URL to the log file.</A>
<LI><A HREF="#Heading15">Listing 9.8. Specifying the starting points and stopping
points.</A>
<LI><A HREF="#Heading17">Mirroring Remote Sites</A>
<LI><A HREF="#Heading18">Listing 9.9. Modified function to convert relative URLs
to absolute URLs.</A>
<LI><A HREF="#Heading19">Listing 9.10. Modified crawlIt() function for mirroring
a site.</A>
<LI><A HREF="#Heading20">Summary</A>
</UL>
</UL>
<P>
<HR>
</P>
<UL>
<LI>Retrieving Specific Documents from the Web
<P>
<LI>Generating Web Indexes
<P>
<LI>Mirroring Remote Sites
</UL>
<P>This chapter focuses on agents that make use of the Web protocol to perform some
automated tasks. Many Webmaster responsibilities, such as figuring out when links
are stale, generating usage reports, generating search indexes and mirroring of sites
are easily automated using Perl. In addition to these server-related background tasks,
consider the usefulness of client-side automation, such as retrieving up-to-the-minute
information including news headlines or stock quotes.</P>
<P>This chapter shows you how to leverage existing Perl modules to make these automated
tasks even easier. These are just a few examples, but you can apply what you learn
here toward some tasks specific to your needs.
<H3 ALIGN="CENTER"><A NAME="Heading2"></A><FONT COLOR="#000077">Retrieving Specific
Documents from the Web</FONT></H3>
<P>Retrieving documents from the Web is what everyone does when they surf the Web.
The Web browser provides a nice front-end navigation tool for this type of interactive
retrieval. You can also retrieve documents in an automated way by using the HTTP
protocol within a Perl script. The most common example of this is retrieving stock
quotes. You can think of Web servers as the information providers and the user agents
as the information retrievers. Suppose a Web server provides up-to-the-minute news,
sports scores, stock quotes, and so on. You can write a fairly simple script in Perl
to monitor these Web sites and provide you with that up-to-date information.
<H4 ALIGN="CENTER"><A NAME="Heading3"></A><FONT COLOR="#000077">Stock Quotes on the
Hour</FONT></H4>
<P>Stock quotes are, of course, the most obvious application for retrieving information
from the Web. Public Web pages are available from which you can get the latest stock
prices at the click of a button. This example shows you how to write your own customized
Perl script to tell you the current stock price every hour on the hour. You can simply
feed it stock symbols, and it retrieves the information, parses it, and displays
only what you are interested in.</P>
<P>One Web site that provides stock quotes is the Security APL Quote Server. The
URL for obtaining quotes is <A HREF="javascript:if(confirm('http://qs.secapl.com/cgi-bin/qso \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://qs.secapl.com/cgi-bin/qso'" tppabs="http://qs.secapl.com/cgi-bin/qso">http://qs.secapl.com/cgi-bin/qso</A>.
After spending some time figuring out the format of the data coming back, it's easy
to come up with regular expressions for extracting the price data, percent of fluctuation,
date, and time. To specify a list of quotes to retrieve in the URL, append the string
"?tick=symbol1+symbol2". This string contains the parameter list that is
passed to the quote serving CGI script. This particular site allows you to specify
up to five stock symbols at a time. The data coming back contains the stock quotes
separated by a horizontal line tag, <HR>. Each quote begins and ends with the
pre-formatted text tags, <PRE> and </PRE>. The HTML in Listing 9.1 is
a sample returned by the quote server for two stock symbols. Figure 9.1 shows this
page in Netscape.
<H3 ALIGN="CENTER"><A NAME="Heading4"></A><FONT COLOR="#000077">Listing 9.1. HTML
returned by the Security APL Quote Server.</FONT></H3>
<PRE><FONT COLOR="#0066FF"><HTML><HEAD>
<TITLE>Security APL Quote Server</TITLE>
<LINK REV="made" HREF="dhp@secapl.com"></HEAD>
<CENTER>
<NOBR>
<A HREF=http://www.secapl.com/HOMELink>
<IMG ALIGN=CENTER BORDER=0 SRC=http://www.secapl.com/qsImages/apl.gif
ALT=" Security APL" HEIGHT=80 WIDTH=90></A>
<B>
<FONT SIZE=+3> S</FONT><FONT SIZE=+1>ecurity <FONT SIZE=+3>APL</FONT>
<FONT SIZE=+3>Q</FONT>uote<FONT SIZE=+3>S</FONT>erver</FONT>
</B>
<HR WIDTH=600 SIZE=3 NOSHADE>
<A HREF=http://www.secapl.com/ADLink>
<IMG SRC=http://www.secapl.com/qsImages/barron.gif WIDTH=600
HEIGHT=60 BORDER=0 ALT="Barrons" BORDER=0></A><BR>
<HR WIDTH=600 SIZE=3 NOSHADE>
<FONT SIZE=-1>
<I>If your browser does not support tables, see the
<A HREF=http://www.secapl.com/cgi-bin/qso><B>alternate Quote Server</B>
</A> page.</I>
</FONT>
<P>
<FORM METHOD="POST" ACTION="http://qs.secapl.com/cgi-bin/qso">
<A HREF=http://www.secapl.com/secapl/quoteserver/ticks.html>
<B>Ticker Symbols</A> : </B>
<I>(Up to 5 tickers may be entered separated by spaces)</I><BR>
<DD><INPUT NAME="tick" SIZE=30 MAXLENGTH=50>
<B><FONT COLOR=0000FF><INPUT TYPE="submit" VALUE=" Get Quotes ">
</FONT></B></FORM>
</CENTER><PRE>
Symbol : ADBE Exchange : NASDAQ
Description : ADOBE SYSTEMS INC
Last Traded at: 35.1250 Date/Time : Jul 05 1:01:34
$ Change : -0.1250 % Change : -0.35
Volume : 330300 # of Trades : 251
Bid : 35.1250 Ask : 35.2500
Day Low : 35.1250 Day High : 35.7500
52 Week Low : 30.0000 52 Week High: 74.2500
</PRE><CENTER>
<A HREF="http://www.secapl.com/cgi-bin/edgarlink?'ADBE'">WWW hyperlinks
</A> for the symbol ADBE are available
including those from the
<A HREF="http://town.hall.org/edgar/edgar.html">EDGAR Dissemination Project.</A>
<HR>
</CENTER><PRE>
Symbol : NSCP Exchange : NASDAQ
Description : NETSCAPE COMMUNICATIONS CP
Last Traded at: 58.2500 Date/Time : Jul 05 1:00:36
$ Change : -2.7500 % Change : -4.51
Volume : 1000900 # of Trades : 803
Bid : 58.2500 Ask : 59.0000
Day Low : 57.5000 Day High : 60.2500
52 Week Low : 22.8750 52 Week High: 87.0000
</PRE><CENTER>
<A HREF="http://www.secapl.com/cgi-bin/edgarlink?'NSCP'">
WWW hyperlinks</A> for the symbol NSCP are available
including those from the
<A HREF="http://town.hall.org/edgar/edgar.html">EDGAR Dissemination Project.</A>
<HR>
<P>
<A HREF=http://www.secapl.com/secapl/quoteserver/mw.html>
<FONT SIZE=+1><B>Market Watch</B></FONT></A> A Detailed Look at Market Activity
<BR>
<HR WIDTH=600 SIZE=3 NOSHADE>
<A HREF=http://www.secapl.com/PAWLink>
<IMG SRC=http://www.secapl.com/qsImages/qs.gif HEIGHT=30 WIDTH=600 BORDER=0
ALT=></A><BR>
<HR WIDTH=600 SIZE=3 NOSHADE>
<B>
<A HREF=http://www.secapl.com/PORTVUELink> PORTVUE</A> -
<A HREF=http://www.secapl.com/NEWLink.html> WhatsNew</A> -
<A HREF=http://www.secapl.com/PAWLink>PAWWS</A> -
<FONT COLOR=777777>QuoteServer</FONT> -
<A HREF=http://www.secapl.com/PODIUMLink>PODIUM</A> -
<A HREF=http://www.secapl.com/SPONSORLink>Sponsored Sites</A>
</B>
<HR WIDTH=600 SIZE=1 NOSHADE>
<FONT SIZE=+1>
<B>
<A HREF=http://www.secapl.com/secapl/quoteserver/search.html>Ticker Search</A> -
<A HREF=http://www.secapl.com/secapl/qsq1.html>Questionnaire</A> -
<A HREF=http://pawws.com/C_phtml/calculators.shtml>Financial Calculators</A>
<HR WIDTH=100 SIZE=1 NOSHADE>
<A HREF=http://www.secapl.com/NEWLink>What's New</A></B></FONT>
-- Jun 4 1996: <A HREF=http://www.secapl.com/APLACCESSLink>Security APLACCESS
</A> - electronic statement delivery via the Internet
<BR>
<HR WIDTH=600 SIZE=1 NOSHADE>
<HR WIDTH=100 SIZE=1 NOSHADE>
<FONT SIZE=-1>
<A HREF=http://www.secapl.com/HOMELink>
<B>Security APL</B></A>
<I>and</I> <B><A HREF=http://www.secapl.com/NAQLink>
North American Quotations, Inc.</B></A>
<I>make no claims concerning the validity<BR>
of the information provided herein, and will not be held liable
for any use thereof.</I>
</B>
</FONT>
</NOBR>
<HR WIDTH=600 SIZE=3 NOSHADE>
<I><A HREF=http://www.secapl.com/HOMELink>Security APL</A><BR>
<A HREF=mailto:g.www@secapl.com>g.www@secapl.com</A></I>
</CENTER></BODY></HTML>
</FONT></PRE>
<P>To make our lives a lot easier, we won't attempt to submit the URL request by
using raw socket calls--even though this can be done using Perl. Instead, let's use
the WWW libraries available in the CPAN. Two important modules are HTTP::Request
and HTTP::Response. Here, you'll use the HTTP::Request module to package up the URL
request and the HTTP::Response module to handle the data coming back. The other module
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -