⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 lwptut.html

📁 perl教程
💻 HTML
📖 第 1 页 / 共 4 页
字号:
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../displayToc.js"></script>
<script language="JavaScript" src="../tocParas.js"></script>
<script language="JavaScript" src="../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../scineplex.css">
<title>lwptut -- An LWP Tutorial</title>
<link rel="stylesheet" href="../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>

<body>

<script>writelinks('__top__',1);</script>
<h1><a>lwptut -- An LWP Tutorial</a></h1>
<p><a name="__index__"></a></p>

<!-- INDEX BEGIN -->

<ul>

	<li><a href="#name">NAME</a></li>
	<li><a href="#description">DESCRIPTION</a></li>
	<ul>

		<li><a href="#getting_documents_with_lwp__simple">Getting documents with LWP::Simple</a></li>
		<li><a href="#the_basics_of_the_lwp_class_model">The Basics of the LWP Class Model</a></li>
		<li><a href="#adding_other_http_request_headers">Adding Other HTTP Request Headers</a></li>
		<li><a href="#enabling_cookies">Enabling Cookies</a></li>
		<li><a href="#posting_form_data">Posting Form Data</a></li>
		<li><a href="#sending_get_form_data">Sending GET Form Data</a></li>
		<li><a href="#absolutizing_urls">Absolutizing URLs</a></li>
		<li><a href="#other_browser_attributes">Other Browser Attributes</a></li>
		<li><a href="#writing_polite_robots">Writing Polite Robots</a></li>
		<li><a href="#using_proxies">Using Proxies</a></li>
		<li><a href="#http_authentication">HTTP Authentication</a></li>
		<li><a href="#accessing_https_urls">Accessing HTTPS URLs</a></li>
		<li><a href="#getting_large_documents">Getting Large Documents</a></li>
	</ul>

	<li><a href="#see_also">SEE ALSO</a></li>
	<li><a href="#copyright">COPYRIGHT</a></li>
	<li><a href="#author">AUTHOR</a></li>
</ul>
<!-- INDEX END -->

<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>lwptut -- An LWP Tutorial</p>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>LWP (short for &quot;Library for WWW in Perl&quot;) is a very popular group of
Perl modules for accessing data on the Web. Like most Perl
module-distributions, each of LWP's component modules comes with
documentation that is a complete reference to its interface. However,
there are so many modules in LWP that it's hard to know where to start
looking for information on how to do even the simplest most common
things.</p>
<p>Really introducing you to using LWP would require a whole book -- a book
that just happens to exist, called <em>Perl &amp; LWP</em>. But this article
should give you a taste of how you can go about some common tasks with
LWP.</p>
<p>
</p>
<h2><a name="getting_documents_with_lwp__simple">Getting documents with LWP::Simple</a></h2>
<p>If you just want to get what's at a particular URL, the simplest way
to do it is LWP::Simple's functions.</p>
<p>In a Perl program, you can call its <code>get($url)</code> function.  It will try
getting that URL's content.  If it works, then it'll return the
content; but if there's some error, it'll return undef.</p>
<pre>
  <span class="keyword">my</span> <span class="variable">$url</span> <span class="operator">=</span> <span class="string">'http://freshair.npr.org/dayFA.cfm?todayDate=current'</span><span class="operator">;</span>
    <span class="comment"># Just an example: the URL for the most recent /Fresh Air/ show</span>
</pre>
<pre>
  <span class="keyword">use</span> <span class="variable">LWP::Simple</span><span class="operator">;</span>
  <span class="keyword">my</span> <span class="variable">$content</span> <span class="operator">=</span> <span class="variable">get</span> <span class="variable">$url</span><span class="operator">;</span>
  <span class="keyword">die</span> <span class="string">"Couldn't get $url"</span> <span class="keyword">unless</span> <span class="keyword">defined</span> <span class="variable">$content</span><span class="operator">;</span>
</pre>
<pre>
  <span class="comment"># Then go do things with $content, like this:</span>
</pre>
<pre>
  <span class="keyword">if</span><span class="operator">(</span><span class="variable">$content</span> <span class="operator">=~</span> <span class="regex">m/jazz/i</span><span class="operator">)</span> <span class="operator">{</span>
    <span class="keyword">print</span> <span class="string">"They're talking about jazz today on Fresh Air!\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
  <span class="keyword">else</span> <span class="operator">{</span>
    <span class="keyword">print</span> <span class="string">"Fresh Air is apparently jazzless today.\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
</pre>
<p>The handiest variant on <code>get</code> is <code>getprint</code>, which is useful in Perl
one-liners.  If it can get the page whose URL you provide, it sends it
to STDOUT; otherwise it complains to STDERR.</p>
<pre>
  % perl -MLWP::Simple -e &quot;getprint '<a href="http://cpan.org/RECENT">http://cpan.org/RECENT</a>'&quot;</pre>
<p>That is the URL of a plaintext file that lists new files in CPAN in
the past two weeks.  You can easily make it part of a tidy little
shell command, like this one that mails you the list of new
<code>Acme::</code> modules:</p>
<pre>
  % perl -MLWP::Simple -e &quot;getprint '<a href="http://cpan.org/RECENT">http://cpan.org/RECENT</a>'&quot;  \
     | grep &quot;/by-module/Acme&quot; | mail -s &quot;New Acme modules! Joy!&quot; $USER</pre>
<p>There are other useful functions in LWP::Simple, including one function
for running a HEAD request on a URL (useful for checking links, or
getting the last-revised time of a URL), and two functions for
saving/mirroring a URL to a local file. See <a href="../lib/LWP/Simple.html">the LWP::Simple documentation</a> for the full details, or chapter 2 of <em>Perl
&amp; LWP</em> for more examples.</p>
<p>
</p>
<h2><a name="the_basics_of_the_lwp_class_model">The Basics of the LWP Class Model</a></h2>
<p>LWP::Simple's functions are handy for simple cases, but its functions
don't support cookies or authorization, don't support setting header
lines in the HTTP request, generally don't support reading header lines
in the HTTP response (notably the full HTTP error message, in case of an
error). To get at all those features, you'll have to use the full LWP
class model.</p>
<p>While LWP consists of dozens of classes, the main two that you have to
understand are <a href="../lib/LWP/UserAgent.html">the LWP::UserAgent manpage</a> and <a href="../lib/HTTP/Response.html">the HTTP::Response manpage</a>. LWP::UserAgent
is a class for &quot;virtual browsers&quot; which you use for performing requests,
and <a href="../lib/HTTP/Response.html">the HTTP::Response manpage</a> is a class for the responses (or error messages)
that you get back from those requests.</p>
<p>The basic idiom is <code>$response = $browser-&gt;get($url)</code>, or more fully
illustrated:</p>
<pre>
  <span class="comment"># Early in your program:</span>
  
  <span class="keyword">use</span> <span class="variable">LWP</span> <span class="number">5.64</span><span class="operator">;</span> <span class="comment"># Loads all important LWP classes, and makes</span>
                <span class="comment">#  sure your version is reasonably recent.</span>
</pre>
<pre>
  <span class="keyword">my</span> <span class="variable">$browser</span> <span class="operator">=</span> <span class="variable">LWP::UserAgent</span><span class="operator">-&gt;</span><span class="variable">new</span><span class="operator">;</span>
  
  <span class="operator">...</span>
  
  <span class="comment"># Then later, whenever you need to make a get request:</span>
  <span class="keyword">my</span> <span class="variable">$url</span> <span class="operator">=</span> <span class="string">'http://freshair.npr.org/dayFA.cfm?todayDate=current'</span><span class="operator">;</span>
  
  <span class="keyword">my</span> <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">get</span><span class="operator">(</span> <span class="variable">$url</span> <span class="operator">);</span>
  <span class="keyword">die</span> <span class="string">"Can't get $url -- "</span><span class="operator">,</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">status_line</span>
   <span class="keyword">unless</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">is_success</span><span class="operator">;</span>
</pre>
<pre>
  <span class="keyword">die</span> <span class="string">"Hey, I was expecting HTML, not "</span><span class="operator">,</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">content_type</span>
   <span class="keyword">unless</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">content_type</span> <span class="keyword">eq</span> <span class="string">'text/html'</span><span class="operator">;</span>
     <span class="comment"># or whatever content-type you're equipped to deal with</span>
</pre>
<pre>
  <span class="comment"># Otherwise, process the content somehow:</span>
  
  <span class="keyword">if</span><span class="operator">(</span><span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">decoded_content</span> <span class="operator">=~</span> <span class="regex">m/jazz/i</span><span class="operator">)</span> <span class="operator">{</span>
    <span class="keyword">print</span> <span class="string">"They're talking about jazz today on Fresh Air!\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
  <span class="keyword">else</span> <span class="operator">{</span>
    <span class="keyword">print</span> <span class="string">"Fresh Air is apparently jazzless today.\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
</pre>
<p>There are two objects involved: <code>$browser</code>, which holds an object of
class LWP::UserAgent, and then the <code>$response</code> object, which is of
class HTTP::Response. You really need only one browser object per
program; but every time you make a request, you get back a new
HTTP::Response object, which will have some interesting attributes:</p>
<ul>
<li>
<p>A status code indicating
success or failure
(which you can test with <code>$response-&gt;is_success</code>).</p>
</li>
<li>
<p>An HTTP status
line that is hopefully informative if there's failure (which you can
see with <code>$response-&gt;status_line</code>,
returning something like &quot;404 Not Found&quot;).</p>
</li>
<li>
<p>A MIME content-type like &quot;text/html&quot;, &quot;image/gif&quot;,
&quot;application/xml&quot;, etc., which you can see with 
<code>$response-&gt;content_type</code></p>
</li>
<li>
<p>The actual content of the response, in <code>$response-&gt;decoded_content</code>.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -