⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch09.htm

📁 Web_Programming_with_Perl5,一个不错的Perl语言教程。
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">



<HTML>







<HEAD>



<!-- This document was created from RTF source by rtftohtml version 3.0.1 -->







	<META NAME="GENERATOR" Content="Symantec Visual Page 1.0">



	<META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=iso-8859-1">



	<TITLE>Without a title - Title</TITLE>



</HEAD>







<BODY BACKGROUND="r2harch.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/r2harch.gif" TEXT="#000000" BGCOLOR="#FFFFFF">







<H2 ALIGN="CENTER"><A HREF="ch08.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch08.htm"><IMG SRC="blanprev.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blanprev.gif" WIDTH="37" HEIGHT="37"



ALIGN="BOTTOM" BORDER="2"></A><A HREF="index-1.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/index-1.htm"><IMG SRC="blantoc.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blantoc.gif" WIDTH="42"



HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><A HREF="ch10.htm" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/ch10.htm"><IMG SRC="blannext.gif" tppabs="http://210.32.137.15/ebook/Web%20Programming%20with%20Perl%205/blannext.gif"



WIDTH="45" HEIGHT="37" ALIGN="BOTTOM" BORDER="2"></A><BR>



<BR>



<BR>



<FONT COLOR="#0000AA">9</FONT><BR>



<A NAME="Heading1"></A><FONT COLOR="#000077">AgentsWeb Scanning, Mirroring, and Background



Tasks</FONT></H2>



<P ALIGN="CENTER">by Brian Deng



<H2 ALIGN="CENTER">



<HR>



</H2>







<UL>



	<LI><A HREF="#Heading1">AgentsWeb Scanning, Mirroring, and Background Tasks</A>



	<UL>



		<LI><A HREF="#Heading2">Retrieving Specific Documents from the Web</A>



		<UL>



			<LI><A HREF="#Heading3">Stock Quotes on the Hour</A>



		</UL>



		<LI><A HREF="#Heading4">Listing 9.1. HTML returned by the Security APL Quote Server.</A>



		<LI><A HREF="#Heading5">Listing 9.2. Automatic stock quote retriever (getquote.pl).</A>



		<LI><A HREF="#Heading6">Listing 9.3. Subroutine to extract the quote information.</A>



		<UL>



			<LI><A HREF="#Heading7">Adapting the Code for General Purpose Use</A>



		</UL>



		<LI><A HREF="#Heading8">Listing 9.4. General purpose URL retriever going through



		a firewall.</A>



		<LI><A HREF="#Heading9">Generating Web Indexes</A>



		<UL>



			<LI><A HREF="#Heading10">Web RobotsSpiders</A>



		</UL>



		<LI><A HREF="#Heading11">Listing 9.5. The crawlIt() main function of the Web spider.</A>



		<LI><A HREF="#Heading13">Listing 9.6. Converting a relative URL to an absolute URL.</A>



		<LI><A HREF="#Heading14">Listing 9.7. Writing the title and URL to the log file.</A>



		<LI><A HREF="#Heading15">Listing 9.8. Specifying the starting points and stopping



		points.</A>



		<LI><A HREF="#Heading17">Mirroring Remote Sites</A>



		<LI><A HREF="#Heading18">Listing 9.9. Modified function to convert relative URLs



		to absolute URLs.</A>



		<LI><A HREF="#Heading19">Listing 9.10. Modified crawlIt() function for mirroring



		a site.</A>



		<LI><A HREF="#Heading20">Summary</A>



	</UL>



</UL>







<P>



<HR>



</P>







<UL>



	<LI>Retrieving Specific Documents from the Web



	<P>



	<LI>Generating Web Indexes



	<P>



	<LI>Mirroring Remote Sites



</UL>







<P>This chapter focuses on agents that make use of the Web protocol to perform some



automated tasks. Many Webmaster responsibilities, such as figuring out when links



are stale, generating usage reports, generating search indexes and mirroring of sites



are easily automated using Perl. In addition to these server-related background tasks,



consider the usefulness of client-side automation, such as retrieving up-to-the-minute



information including news headlines or stock quotes.</P>



<P>This chapter shows you how to leverage existing Perl modules to make these automated



tasks even easier. These are just a few examples, but you can apply what you learn



here toward some tasks specific to your needs.



<H3 ALIGN="CENTER"><A NAME="Heading2"></A><FONT COLOR="#000077">Retrieving Specific



Documents from the Web</FONT></H3>



<P>Retrieving documents from the Web is what everyone does when they surf the Web.



The Web browser provides a nice front-end navigation tool for this type of interactive



retrieval. You can also retrieve documents in an automated way by using the HTTP



protocol within a Perl script. The most common example of this is retrieving stock



quotes. You can think of Web servers as the information providers and the user agents



as the information retrievers. Suppose a Web server provides up-to-the-minute news,



sports scores, stock quotes, and so on. You can write a fairly simple script in Perl



to monitor these Web sites and provide you with that up-to-date information.



<H4 ALIGN="CENTER"><A NAME="Heading3"></A><FONT COLOR="#000077">Stock Quotes on the



Hour</FONT></H4>



<P>Stock quotes are, of course, the most obvious application for retrieving information



from the Web. Public Web pages are available from which you can get the latest stock



prices at the click of a button. This example shows you how to write your own customized



Perl script to tell you the current stock price every hour on the hour. You can simply



feed it stock symbols, and it retrieves the information, parses it, and displays



only what you are interested in.</P>



<P>One Web site that provides stock quotes is the Security APL Quote Server. The



URL for obtaining quotes is <A HREF="javascript:if(confirm('http://qs.secapl.com/cgi-bin/qso  \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address.  \n\nDo you want to open it from the server?'))window.location='http://qs.secapl.com/cgi-bin/qso'" tppabs="http://qs.secapl.com/cgi-bin/qso">http://qs.secapl.com/cgi-bin/qso</A>.



After spending some time figuring out the format of the data coming back, it's easy



to come up with regular expressions for extracting the price data, percent of fluctuation,



date, and time. To specify a list of quotes to retrieve in the URL, append the string



&quot;?tick=symbol1+symbol2&quot;. This string contains the parameter list that is



passed to the quote serving CGI script. This particular site allows you to specify



up to five stock symbols at a time. The data coming back contains the stock quotes



separated by a horizontal line tag, &lt;HR&gt;. Each quote begins and ends with the



pre-formatted text tags, &lt;PRE&gt; and &lt;/PRE&gt;. The HTML in Listing 9.1 is



a sample returned by the quote server for two stock symbols. Figure 9.1 shows this



page in Netscape.



<H3 ALIGN="CENTER"><A NAME="Heading4"></A><FONT COLOR="#000077">Listing 9.1. HTML



returned by the Security APL Quote Server.</FONT></H3>



<PRE><FONT COLOR="#0066FF">&lt;HTML&gt;&lt;HEAD&gt;



&lt;TITLE&gt;Security APL Quote Server&lt;/TITLE&gt;



&lt;LINK REV=&quot;made&quot; HREF=&quot;dhp@secapl.com&quot;&gt;&lt;/HEAD&gt;



&lt;CENTER&gt;



&lt;NOBR&gt;



&lt;A HREF=http://www.secapl.com/HOMELink&gt;



&lt;IMG ALIGN=CENTER BORDER=0 SRC=http://www.secapl.com/qsImages/apl.gif 



     ALT=&quot; Security APL&quot; HEIGHT=80 WIDTH=90&gt;&lt;/A&gt;



&lt;B&gt;



&lt;FONT SIZE=+3&gt; S&lt;/FONT&gt;&lt;FONT SIZE=+1&gt;ecurity &lt;FONT SIZE=+3&gt;APL&lt;/FONT&gt;



&lt;FONT SIZE=+3&gt;Q&lt;/FONT&gt;uote&lt;FONT SIZE=+3&gt;S&lt;/FONT&gt;erver&lt;/FONT&gt;



&lt;/B&gt;



&lt;HR WIDTH=600 SIZE=3 NOSHADE&gt;



&lt;A HREF=http://www.secapl.com/ADLink&gt;



&lt;IMG SRC=http://www.secapl.com/qsImages/barron.gif WIDTH=600 



 HEIGHT=60 BORDER=0 ALT=&quot;Barrons&quot; BORDER=0&gt;&lt;/A&gt;&lt;BR&gt;



&lt;HR WIDTH=600 SIZE=3 NOSHADE&gt;



&lt;FONT SIZE=-1&gt;



&lt;I&gt;If your browser does not support tables, see the 



&lt;A HREF=http://www.secapl.com/cgi-bin/qso&gt;&lt;B&gt;alternate Quote Server&lt;/B&gt;



&lt;/A&gt; page.&lt;/I&gt;



&lt;/FONT&gt;



&lt;P&gt;



&lt;FORM METHOD=&quot;POST&quot; ACTION=&quot;http://qs.secapl.com/cgi-bin/qso&quot;&gt;



&lt;A HREF=http://www.secapl.com/secapl/quoteserver/ticks.html&gt;



&lt;B&gt;Ticker Symbols&lt;/A&gt; : &lt;/B&gt;



&lt;I&gt;(Up to 5 tickers may be entered separated by spaces)&lt;/I&gt;&lt;BR&gt;



&lt;DD&gt;&lt;INPUT NAME=&quot;tick&quot; SIZE=30 MAXLENGTH=50&gt;



&lt;B&gt;&lt;FONT COLOR=0000FF&gt;&lt;INPUT TYPE=&quot;submit&quot; VALUE=&quot;  Get Quotes  &quot;&gt;



&lt;/FONT&gt;&lt;/B&gt;&lt;/FORM&gt;



&lt;/CENTER&gt;&lt;PRE&gt;



Symbol        : ADBE         Exchange    : NASDAQ



Description   : ADOBE SYSTEMS INC                       



Last Traded at: 35.1250      Date/Time   : Jul 05  1:01:34



$ Change      : -0.1250      % Change    : -0.35   







Volume        : 330300       # of Trades : 251      



Bid           : 35.1250      Ask         : 35.2500  



Day Low       : 35.1250      Day High    : 35.7500  



52 Week Low   : 30.0000      52 Week High: 74.2500  



&lt;/PRE&gt;&lt;CENTER&gt;



&lt;A HREF=&quot;http://www.secapl.com/cgi-bin/edgarlink?'ADBE'&quot;&gt;WWW hyperlinks



&lt;/A&gt; for the symbol ADBE are available



including those from the



&lt;A HREF=&quot;http://town.hall.org/edgar/edgar.html&quot;&gt;EDGAR Dissemination Project.&lt;/A&gt;



&lt;HR&gt;



&lt;/CENTER&gt;&lt;PRE&gt;



Symbol        : NSCP         Exchange    : NASDAQ



Description   : NETSCAPE COMMUNICATIONS CP              



Last Traded at: 58.2500      Date/Time   : Jul 05  1:00:36



$ Change      : -2.7500      % Change    : -4.51   







Volume        : 1000900      # of Trades : 803      



Bid           : 58.2500      Ask         : 59.0000  



Day Low       : 57.5000      Day High    : 60.2500  



52 Week Low   : 22.8750      52 Week High: 87.0000  



&lt;/PRE&gt;&lt;CENTER&gt;



&lt;A HREF=&quot;http://www.secapl.com/cgi-bin/edgarlink?'NSCP'&quot;&gt;



WWW hyperlinks&lt;/A&gt; for the symbol NSCP are available



including those from the



&lt;A HREF=&quot;http://town.hall.org/edgar/edgar.html&quot;&gt;EDGAR Dissemination Project.&lt;/A&gt;



&lt;HR&gt;



&lt;P&gt;



&lt;A HREF=http://www.secapl.com/secapl/quoteserver/mw.html&gt;



&lt;FONT SIZE=+1&gt;&lt;B&gt;Market Watch&lt;/B&gt;&lt;/FONT&gt;&lt;/A&gt; A Detailed Look at Market Activity



&lt;BR&gt;



&lt;HR WIDTH=600 SIZE=3 NOSHADE&gt;



&lt;A HREF=http://www.secapl.com/PAWLink&gt;



&lt;IMG SRC=http://www.secapl.com/qsImages/qs.gif HEIGHT=30 WIDTH=600 BORDER=0 



ALT=&gt;&lt;/A&gt;&lt;BR&gt;



&lt;HR WIDTH=600 SIZE=3 NOSHADE&gt;



&lt;B&gt;



&lt;A HREF=http://www.secapl.com/PORTVUELink&gt; PORTVUE&lt;/A&gt; -



&lt;A HREF=http://www.secapl.com/NEWLink.html&gt; WhatsNew&lt;/A&gt; -



&lt;A HREF=http://www.secapl.com/PAWLink&gt;PAWWS&lt;/A&gt; -



&lt;FONT COLOR=777777&gt;QuoteServer&lt;/FONT&gt; -



&lt;A HREF=http://www.secapl.com/PODIUMLink&gt;PODIUM&lt;/A&gt; -



&lt;A HREF=http://www.secapl.com/SPONSORLink&gt;Sponsored Sites&lt;/A&gt;



&lt;/B&gt;



&lt;HR WIDTH=600 SIZE=1 NOSHADE&gt;



&lt;FONT SIZE=+1&gt;



&lt;B&gt;



&lt;A HREF=http://www.secapl.com/secapl/quoteserver/search.html&gt;Ticker Search&lt;/A&gt; -



&lt;A HREF=http://www.secapl.com/secapl/qsq1.html&gt;Questionnaire&lt;/A&gt; -



&lt;A HREF=http://pawws.com/C_phtml/calculators.shtml&gt;Financial Calculators&lt;/A&gt;



&lt;HR WIDTH=100 SIZE=1 NOSHADE&gt;



&lt;A HREF=http://www.secapl.com/NEWLink&gt;What's New&lt;/A&gt;&lt;/B&gt;&lt;/FONT&gt;



  -- Jun 4 1996: &lt;A HREF=http://www.secapl.com/APLACCESSLink&gt;Security APLACCESS



&lt;/A&gt; - electronic statement delivery via the Internet



&lt;BR&gt;



&lt;HR WIDTH=600 SIZE=1 NOSHADE&gt;



&lt;HR WIDTH=100 SIZE=1 NOSHADE&gt;



&lt;FONT SIZE=-1&gt;



&lt;A HREF=http://www.secapl.com/HOMELink&gt;



&lt;B&gt;Security APL&lt;/B&gt;&lt;/A&gt;



&lt;I&gt;and&lt;/I&gt; &lt;B&gt;&lt;A HREF=http://www.secapl.com/NAQLink&gt;



North American Quotations, Inc.&lt;/B&gt;&lt;/A&gt;



&lt;I&gt;make no claims concerning the validity&lt;BR&gt;



of the information provided herein, and will not be held liable 



for any use thereof.&lt;/I&gt;



&lt;/B&gt;



&lt;/FONT&gt;



&lt;/NOBR&gt;



&lt;HR WIDTH=600 SIZE=3 NOSHADE&gt;



&lt;I&gt;&lt;A HREF=http://www.secapl.com/HOMELink&gt;Security APL&lt;/A&gt;&lt;BR&gt;



&lt;A HREF=mailto:g.www@secapl.com&gt;g.www@secapl.com&lt;/A&gt;&lt;/I&gt;



&lt;/CENTER&gt;&lt;/BODY&gt;&lt;/HTML&gt;



</FONT></PRE>



<P>To make our lives a lot easier, we won't attempt to submit the URL request by



using raw socket calls--even though this can be done using Perl. Instead, let's use



the WWW libraries available in the CPAN. Two important modules are HTTP::Request



and HTTP::Response. Here, you'll use the HTTP::Request module to package up the URL



request and the HTTP::Response module to handle the data coming back. The other module

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -