⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 lwptut.html

📁 perl教程
💻 HTML
📖 第 1 页 / 共 4 页
字号:
If the response is HTML, that's where the HTML source will be; if
it's a GIF, then <code>$response-&gt;decoded_content</code> will be the binary
GIF data.</p>
</li>
<li>
<p>And dozens of other convenient and more specific methods that are
documented in the docs for <a href="../HTML/Response.html">the HTML::Response manpage</a>, and its superclasses
<a href="../HTML/Message.html">the HTML::Message manpage</a> and <a href="../HTML/Headers.html">the HTML::Headers manpage</a>.</p>
</li>
</ul>
<p>
</p>
<h2><a name="adding_other_http_request_headers">Adding Other HTTP Request Headers</a></h2>
<p>The most commonly used syntax for requests is <code>$response =
$browser-&gt;get($url)</code>, but in truth, you can add extra HTTP header
lines to the request by adding a list of key-value pairs after the URL,
like so:</p>
<pre>
  <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">get</span><span class="operator">(</span> <span class="variable">$url</span><span class="operator">,</span> <span class="variable">$key1</span><span class="operator">,</span> <span class="variable">$value1</span><span class="operator">,</span> <span class="variable">$key2</span><span class="operator">,</span> <span class="variable">$value2</span><span class="operator">,</span> <span class="operator">...</span> <span class="operator">);</span>
</pre>
<p>For example, here's how to send some more Netscape-like headers, in case
you're dealing with a site that would otherwise reject your request:</p>
<pre>
  <span class="keyword">my</span> <span class="variable">@ns_headers</span> <span class="operator">=</span> <span class="operator">(</span>
   <span class="string">'User-Agent'</span> <span class="operator">=&gt;</span> <span class="string">'Mozilla/4.76 [en] (Win98; U)'</span><span class="operator">,</span>
   <span class="string">'Accept'</span> <span class="operator">=&gt;</span> <span class="string">'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*'</span><span class="operator">,</span>
   <span class="string">'Accept-Charset'</span> <span class="operator">=&gt;</span> <span class="string">'iso-8859-1,*,utf-8'</span><span class="operator">,</span>
   <span class="string">'Accept-Language'</span> <span class="operator">=&gt;</span> <span class="string">'en-US'</span><span class="operator">,</span>
  <span class="operator">);</span>
</pre>
<pre>
  <span class="operator">...</span>
  
  <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">get</span><span class="operator">(</span><span class="variable">$url</span><span class="operator">,</span> <span class="variable">@ns_headers</span><span class="operator">);</span>
</pre>
<p>If you weren't reusing that array, you could just go ahead and do this:</p>
<pre>
  <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">get</span><span class="operator">(</span><span class="variable">$url</span><span class="operator">,</span>
   <span class="string">'User-Agent'</span> <span class="operator">=&gt;</span> <span class="string">'Mozilla/4.76 [en] (Win98; U)'</span><span class="operator">,</span>
   <span class="string">'Accept'</span> <span class="operator">=&gt;</span> <span class="string">'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*'</span><span class="operator">,</span>
   <span class="string">'Accept-Charset'</span> <span class="operator">=&gt;</span> <span class="string">'iso-8859-1,*,utf-8'</span><span class="operator">,</span>
   <span class="string">'Accept-Language'</span> <span class="operator">=&gt;</span> <span class="string">'en-US'</span><span class="operator">,</span>
  <span class="operator">);</span>
</pre>
<p>If you were only ever changing the 'User-Agent' line, you could just change
the <code>$browser</code> object's default line from &quot;libwww-perl/5.65&quot; (or the like)
to whatever you like, using the LWP::UserAgent <code>agent</code> method:</p>
<pre>
   <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">agent</span><span class="operator">(</span><span class="string">'Mozilla/4.76 [en] (Win98; U)'</span><span class="operator">);</span>
</pre>
<p>
</p>
<h2><a name="enabling_cookies">Enabling Cookies</a></h2>
<p>A default LWP::UserAgent object acts like a browser with its cookies
support turned off. There are various ways of turning it on, by setting
its <code>cookie_jar</code> attribute. A &quot;cookie jar&quot; is an object representing
a little database of all
the HTTP cookies that a browser can know about. It can correspond to a
file on disk (the way Netscape uses its <em>cookies.txt</em> file), or it can
be just an in-memory object that starts out empty, and whose collection of
cookies will disappear once the program is finished running.</p>
<p>To give a browser an in-memory empty cookie jar, you set its <code>cookie_jar</code>
attribute like so:</p>
<pre>
  <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">cookie_jar</span><span class="operator">(</span><span class="operator">{}</span><span class="operator">);</span>
</pre>
<p>To give it a copy that will be read from a file on disk, and will be saved
to it when the program is finished running, set the <code>cookie_jar</code> attribute
like this:</p>
<pre>
  <span class="keyword">use</span> <span class="variable">HTTP::Cookies</span><span class="operator">;</span>
  <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">cookie_jar</span><span class="operator">(</span> <span class="variable">HTTP::Cookies</span><span class="operator">-&gt;</span><span class="variable">new</span><span class="operator">(</span>
    <span class="string">'file'</span> <span class="operator">=&gt;</span> <span class="string">'/some/where/cookies.lwp'</span><span class="operator">,</span>
        <span class="comment"># where to read/write cookies</span>
    <span class="string">'autosave'</span> <span class="operator">=&gt;</span> <span class="number">1</span><span class="operator">,</span>
        <span class="comment"># save it to disk when done</span>
  <span class="operator">));</span>
</pre>
<p>That file will be an LWP-specific format. If you want to be access the
cookies in your Netscape cookies file, you can use the
HTTP::Cookies::Netscape class:</p>
<pre>
  <span class="keyword">use</span> <span class="variable">HTTP::Cookies</span><span class="operator">;</span>
    <span class="comment"># yes, loads HTTP::Cookies::Netscape too</span>
  
  <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">cookie_jar</span><span class="operator">(</span> <span class="variable">HTTP::Cookies::Netscape</span><span class="operator">-&gt;</span><span class="variable">new</span><span class="operator">(</span>
    <span class="string">'file'</span> <span class="operator">=&gt;</span> <span class="string">'c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt'</span><span class="operator">,</span>
        <span class="comment"># where to read cookies</span>
  <span class="operator">));</span>
</pre>
<p>You could add an <code>'autosave' =&gt; 1</code> line as further above, but at
time of writing, it's uncertain whether Netscape might discard some of
the cookies you could be writing back to disk.</p>
<p>
</p>
<h2><a name="posting_form_data">Posting Form Data</a></h2>
<p>Many HTML forms send data to their server using an HTTP POST request, which
you can send with this syntax:</p>
<pre>
 <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">post</span><span class="operator">(</span> <span class="variable">$url</span><span class="operator">,</span>
   <span class="operator">[</span>
     <span class="string">formkey1</span> <span class="operator">=&gt;</span> <span class="variable">value1</span><span class="operator">,</span> 
     <span class="string">formkey2</span> <span class="operator">=&gt;</span> <span class="variable">value2</span><span class="operator">,</span> 
     <span class="operator">...</span>
   <span class="operator">]</span><span class="operator">,</span>
 <span class="operator">);</span>
</pre>
<p>Or if you need to send HTTP headers:</p>
<pre>
 <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">post</span><span class="operator">(</span> <span class="variable">$url</span><span class="operator">,</span>
   <span class="operator">[</span>
     <span class="string">formkey1</span> <span class="operator">=&gt;</span> <span class="variable">value1</span><span class="operator">,</span> 
     <span class="string">formkey2</span> <span class="operator">=&gt;</span> <span class="variable">value2</span><span class="operator">,</span> 
     <span class="operator">...</span>
   <span class="operator">]</span><span class="operator">,</span>
   <span class="string">headerkey1</span> <span class="operator">=&gt;</span> <span class="variable">value1</span><span class="operator">,</span> 
   <span class="string">headerkey2</span> <span class="operator">=&gt;</span> <span class="variable">value2</span><span class="operator">,</span> 
 <span class="operator">);</span>
</pre>
<p>For example, the following program makes a search request to AltaVista
(by sending some form data via an HTTP POST request), and extracts from
the HTML the report of the number of matches:</p>
<pre>
  <span class="keyword">use</span> <span class="variable">strict</span><span class="operator">;</span>
  <span class="keyword">use</span> <span class="variable">warnings</span><span class="operator">;</span>
  <span class="keyword">use</span> <span class="variable">LWP</span> <span class="number">5.64</span><span class="operator">;</span>
  <span class="keyword">my</span> <span class="variable">$browser</span> <span class="operator">=</span> <span class="variable">LWP::UserAgent</span><span class="operator">-&gt;</span><span class="variable">new</span><span class="operator">;</span>
  
  <span class="keyword">my</span> <span class="variable">$word</span> <span class="operator">=</span> <span class="string">'tarragon'</span><span class="operator">;</span>
  
  <span class="keyword">my</span> <span class="variable">$url</span> <span class="operator">=</span> <span class="string">'http://www.altavista.com/sites/search/web'</span><span class="operator">;</span>
  <span class="keyword">my</span> <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">post</span><span class="operator">(</span> <span class="variable">$url</span><span class="operator">,</span>
    <span class="operator">[</span> <span class="string">'q'</span> <span class="operator">=&gt;</span> <span class="variable">$word</span><span class="operator">,</span>  <span class="comment"># the Altavista query string</span>
      <span class="string">'pg'</span> <span class="operator">=&gt;</span> <span class="string">'q'</span><span class="operator">,</span> <span class="string">'avkw'</span> <span class="operator">=&gt;</span> <span class="string">'tgz'</span><span class="operator">,</span> <span class="string">'kl'</span> <span class="operator">=&gt;</span> <span class="string">'XX'</span><span class="operator">,</span>
    <span class="operator">]</span>
  <span class="operator">);</span>
  <span class="keyword">die</span> <span class="string">"$url error: "</span><span class="operator">,</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">status_line</span>
   <span class="keyword">unless</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">is_success</span><span class="operator">;</span>
  <span class="keyword">die</span> <span class="string">"Weird content type at $url -- "</span><span class="operator">,</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">content_type</span>
   <span class="keyword">unless</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">content_type</span> <span class="keyword">eq</span> <span class="string">'text/html'</span><span class="operator">;</span>
</pre>
<pre>
  <span class="keyword">if</span><span class="operator">(</span> <span class="variable">$response</span><span class="operator">-&gt;</span><span class="variable">decoded_content</span> <span class="operator">=~</span> <span class="regex">m{AltaVista found ([0-9,]+) results}</span> <span class="operator">)</span> <span class="operator">{</span>
    <span class="comment"># The substring will be like "AltaVista found 2,345 results"</span>
    <span class="keyword">print</span> <span class="string">"$word: $1\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
  <span class="keyword">else</span> <span class="operator">{</span>
    <span class="keyword">print</span> <span class="string">"Couldn't find the match-string in the response\n"</span><span class="operator">;</span>
  <span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="sending_get_form_data">Sending GET Form Data</a></h2>
<p>Some HTML forms convey their form data not by sending the data
in an HTTP POST request, but by making a normal GET request with
the data stuck on the end of the URL.  For example, if you went to
<code>imdb.com</code> and ran a search on &quot;Blade Runner&quot;, the URL you'd see
in your browser window would be:</p>
<pre>
  <a href="http://us.imdb.com/Tsearch?title=Blade%20Runner&amp;restrict=Movies+and+TV">http://us.imdb.com/Tsearch?title=Blade%20Runner&amp;restrict=Movies+and+TV</a></pre>
<p>To run the same search with LWP, you'd use this idiom, which involves
the URI class:</p>
<pre>
  <span class="keyword">use</span> <span class="variable">URI</span><span class="operator">;</span>
  <span class="keyword">my</span> <span class="variable">$url</span> <span class="operator">=</span> <span class="variable">URI</span><span class="operator">-&gt;</span><span class="variable">new</span><span class="operator">(</span> <span class="string">'http://us.imdb.com/Tsearch'</span> <span class="operator">);</span>
    <span class="comment"># makes an object representing the URL</span>
  
  <span class="variable">$url</span><span class="operator">-&gt;</span><span class="variable">query_form</span><span class="operator">(</span>  <span class="comment"># And here the form data pairs:</span>
    <span class="string">'title'</span>    <span class="operator">=&gt;</span> <span class="string">'Blade Runner'</span><span class="operator">,</span>
    <span class="string">'restrict'</span> <span class="operator">=&gt;</span> <span class="string">'Movies and TV'</span><span class="operator">,</span>
  <span class="operator">);</span>
  
  <span class="keyword">my</span> <span class="variable">$response</span> <span class="operator">=</span> <span class="variable">$browser</span><span class="operator">-&gt;</span><span class="variable">get</span><span class="operator">(</span><span class="variable">$url</span><span class="operator">);</span>
</pre>
<p>See chapter 5 of <em>Perl &amp; LWP</em> for a longer discussion of HTML forms
and of form data, and chapters 6 through 9 for a longer discussion of
extracting data from HTML.</p>
<p>
</p>
<h2><a name="absolutizing_urls">Absolutizing URLs</a></h2>
<p>The URI class that we just mentioned above provides all sorts of methods
for accessing and modifying parts of URLs (such as asking sort of URL it
is with <code>$url-&gt;scheme</code>, and asking what host it refers to with <code>&lt;
$url-</code>host &gt;&gt;, and so on, as described in <a href="../lib/URI.html">the docs for the URI class</a>.  However, the methods of most immediate interest
are the <code>query_form</code> method seen above, and now the <code>new_abs</code> method
for taking a probably-relative URL string (like &quot;../foo.html&quot;) and getting
back an absolute URL (like &quot;http://www.perl.com/stuff/foo.html&quot;), as
shown here:</p>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -