⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch20_02.htm

📁 by Randal L. Schwartz and Tom Phoenix ISBN 0-596-00132-0 Third Edition, published July 2001. (See
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<html><head><title>The LWP Modules (Perl in a Nutshell, 2nd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Stephen Spainhour" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596002416L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Perl in a Nutshell, 2nd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Java and XSLT" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch20_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0" /></a></td><td align="center" valign="top" width="228" /><td align="right" valign="top" width="228"><a href="ch20_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0" /></a></td></tr></table></div><h2 class="sect1">20.2. The LWP Modules</h2><p>The LWP modules provide the core of functionality for web programmingin Perl. It contains the foundations for networking applications,protocol implementations, media type definitions, and debuggingability.</p><p><a name="INDEX-2388" /><a name="INDEX-2389" /><a name="INDEX-2390" /><a name="INDEX-2391" /><a name="INDEX-2392" />The modules LWP::Simple andLWP::UserAgent define client applications that implement networkconnections, send requests, and receive response data from servers.LWP::RobotUA is another client application used to build automatedweb searchers following a specified set of guidelines.</p><p>LWP::UserAgent is the primary module used in applications built withLWP. With it, you can build your own robust web client. It is alsothe base class for the Simple and RobotUA modules. These two modulesprovide a specialized set of functions for creating clients.</p><p>Additional LWP modules provide the building blocks required for webcommunications, but you often don't need to use them<a name="INDEX-2393" />directly in your applications.LWP::Protocol implements the actual socket connections with theappropriate protocol. The most common protocol is HTTP, but mailprotocols (such as SMTP), FTP for file transfers, and others can beused across networks.</p><p><a name="INDEX-2394" /><a name="INDEX-2395" />LWP::MediaTypes implements the MIMEdefinitions for media type identification and mapping to fileextensions. The LWP::Debug module provides functions to help youdebug your LWP applications.</p><p>The following sections describe the RobotUA, Simple, and UserAgentmodules of LWP.</p><a name="perlnut2-CHP-20-SECT-2.1" /><div class="sect2"><h3 class="sect2">20.2.1. LWP::RobotUA Sections</h3><p><a name="INDEX-2396" /><a name="INDEX-2397" />The Robot User Agent (LWP::RobotUA) is asubclass of LWP::UserAgent and is used to create robot clientapplications. A robot application requests resources in an automatedfashion. Robots perform such activities as searching, mirroring, andsurveying. Some robots collect statistics, while others wander theWeb and summarize their findings for a search engine.</p><p>The LWP::RobotUA module defines methods to help program robotapplications and observes the Robot Exclusion Standards, which webserver administrators can define on their web site to keep robotsaway from certain (or all) areas of the site.</p><p><a name="INDEX-2398" />The constructor for an LWP::RobotUAobject looks like this:</p><blockquote><pre class="code">$rob = LWP::RobotUA-&gt;new(<em class="replaceable"><tt>agent_name</em>, <em class="replaceable">email</em>, [$<em class="replaceable">rules</tt></em>]);</pre></blockquote><p>The first parameter, <em class="replaceable"><tt>agent_name</tt></em>, is theuser agent identifier used for the value of the User-Agent header inthe request. The second parameter is the email address of the personusing the robot, and the optional third parameter is a reference to aWWW::RobotRules object, which is used to store the robot rules for aserver. If you omit the third parameter, the LWP::RobotUA modulerequests the <em class="emphasis">robots.txt</em> file from every serverit contacts and generates its own WWW::RobotRules object.</p><p>Since LWP::RobotUA is a subclass of LWP::UserAgent, theLWP::UserAgent methods are used to perform the basic clientactivities. The following methods are defined by LWP::RobotUA forrobot-related functionality.</p><a name="INDEX-2399" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>as_string</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;as_string( )</pre><p><a name="INDEX-2399" />Returns a human-readable string thatdescribes the robot's status.</p></div><a name="INDEX-2400" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>delay</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;delay ([<em class="replaceable">time</em>])</pre><p><a name="INDEX-2400" />Sets or returns the specified<em class="replaceable"><tt>time</tt></em> (in minutes) to wait betweenrequests. The default value is <tt class="literal">1</tt>.</p></div><a name="INDEX-2401" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>host_wait</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;host_wait(<em class="replaceable">netloc</em>)</pre><p><a name="INDEX-2401" />Returns the number of secondsthe robot must wait before it can request another resource from theserver identified by <em class="replaceable"><tt>netloc</tt></em>.</p></div><a name="INDEX-2402" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>no_visits</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;no_visits(<em class="replaceable">netloc</em>)</pre><p><a name="INDEX-2402" />Returns the number of visits toa given server. <em class="replaceable"><tt>netloc</tt></em> is of the form<em class="replaceable"><tt>user:password@host:port</tt></em>. The user,password, and port are optional.</p></div><a name="INDEX-2403" /><a name="INDEX-2404" /><a name="INDEX-2405" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>rules</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;rules([$<em class="replaceable">rules</em>])</pre><p><a name="INDEX-2403" />Sets or returns theWWW:RobotRules object<tt class="literal">$</tt><em class="replaceable"><tt>rules</tt></em>, which is usedwhen determining if the module is allowed access to a particularresource.<a name="INDEX-2404" /><a name="INDEX-2405" /> </p></div><a name="INDEX-2406" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>use_sleep</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">rob</em>-&gt;use_sleep ([<em class="replaceable">boolean</em>])</pre><p><a name="INDEX-2406" />Determines whether the useragent should <tt class="literal">sleep( )</tt> if requests arrive too fast.The default is true. If set to false, an internal SERVICE_UNAVAILABLEresponse is generated, with a Retry-After header indicating when itis permissable to send another request to this server. With noarguments, returns the current value of this flag.</p></div></div><a name="perlnut2-CHP-20-SECT-2.2" /><div class="sect2"><h3 class="sect2">20.2.2. LWP::Simple</h3><p><a name="INDEX-2407" /><a name="INDEX-2408" />LWP::Simple provides an easy-to-useinterface for creating a web client, although it is only capable ofperforming basic retrieving functions. An object constructor is notused for this class; it defines functions for retrieving informationfrom a specified URL and interpreting the status codes from therequests.</p><p>This module isn't named Simple for nothing. Thefollowing shows how to use it to get a web page and save it to afile:</p><blockquote><pre class="code">use LWP::Simple;$homepage = 'oreilly_com.html';$status = getstore('http:www.oreilly.com/', $homepage);print("hooray") if is_success($status);</pre></blockquote><p><a name="INDEX-2409" /><a name="INDEX-2410" />The retrieving functions<tt class="literal">get</tt> and <tt class="literal">head</tt> return theURL's contents and header contents, respectively.The other retrieving functions return the HTTP status code of therequest. The status codes are returned as the constants from theHTTP::Status module, which is also where the<tt class="literal">is_success</tt> and <tt class="literal">is_failure</tt>methods are obtained. See <a href="ch20_03.htm#perlnut2-CHP-20-SECT-3.4">Section 20.3.4, "HTTP::Status"</a> for a listing of theresponse codes.</p><p>The user agent identifier produced by LWP::Simple is<tt class="literal">LWP::Simple</tt><em class="replaceable"><tt>/n.nn</tt></em>, inwhich <em class="replaceable"><tt>n.nn</tt></em> is the version number of LWPbeing used.</p><p>The following are the functions exported by LWP::Simple.</p><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>get</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>get (<em class="replaceable">url</em>)</pre><p>Returns the contents of the specified <em class="replaceable">url</em>.Upon failure, <tt class="literal">get</tt> returns<tt class="literal">undef</tt>. Other than returning<tt class="literal">undef</tt>, there is no way of accessing the HTTPstatus code or headers returned by the server.</p></div><a name="INDEX-2411" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>getprint</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>getprint (<em class="replaceable">url</em>)</pre><p><a name="INDEX-2411" />Prints the contents of<em class="replaceable"><tt>url</tt></em> on standard output and returns theHTTP status code given by the server.</p></div><a name="INDEX-2412" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>getstore</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>getstore (<em class="replaceable">url</em>, <em class="replaceable">file</em>)</pre><p><a name="INDEX-2412" />Stores the contents of thespecified <em class="replaceable"><tt>url</tt></em> into<em class="replaceable"><tt>file</tt></em> and returns the HTTP status codegiven by the server.</p></div><a name="INDEX-2413" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>head</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>head (<em class="replaceable">url</em>)</pre><p><a name="INDEX-2413" />Returns header information about thespecified <em class="replaceable"><tt>url</tt></em> in the form of:<tt class="literal">($content_type,</tt> <tt class="literal">$document_length,$modified_time, $expires, $server)</tt>. Upon failure,<tt class="literal">head</tt> returns an empty list.</p></div><a name="INDEX-2414" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>is_error</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>is_error (<em class="replaceable">code</em>)</pre><p><a name="INDEX-2414" />Given a status<em class="replaceable"><tt>code</tt></em> from <tt class="literal">getprint</tt>,<tt class="literal">getstore</tt>, or <tt class="literal">mirror</tt>, returnstrue if the request was not successful.</p></div><a name="INDEX-2415" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>is_success</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>is_success (<em class="replaceable">code</em>)</pre><p><a name="INDEX-2415" />Given a status<em class="replaceable"><tt>code</tt></em> from <tt class="literal">getprint</tt>,<tt class="literal">getstore</tt>, or <tt class="literal">mirror</tt>, returnstrue if the request was successful.</p></div><a name="INDEX-2416" /><a name="INDEX-2417" /><a name="INDEX-2418" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>mirror</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>mirror (<em class="replaceable">url</em>, <em class="replaceable">file</em>)</pre><p><a name="INDEX-2416" />Copies the contents of the specified<em class="replaceable"><tt>url</em> into <em class="replaceable">file</tt></em>,when the modification time or length of the online version isdifferent from that of the named file.<a name="INDEX-2417" /><a name="INDEX-2418" /> </p></div></div><a name="perlnut2-CHP-20-SECT-2.3" /><div class="sect2"><h3 class="sect2">20.2.3. LWP::UserAgent</h3>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -