📄 lwptut.3
字号:
.PPFor example, here's how to send some more Netscape-like headers, in caseyou're dealing with a site that would otherwise reject your request:.PP.Vb 6\& my @ns_headers = (\& \*(AqUser\-Agent\*(Aq => \*(AqMozilla/4.76 [en] (Win98; U)\*(Aq,\& \*(AqAccept\*(Aq => \*(Aqimage/gif, image/x\-xbitmap, image/jpeg, image/pjpeg, image/png, */*\*(Aq,\& \*(AqAccept\-Charset\*(Aq => \*(Aqiso\-8859\-1,*,utf\-8\*(Aq,\& \*(AqAccept\-Language\*(Aq => \*(Aqen\-US\*(Aq,\& );\&\& ...\& \& $response = $browser\->get($url, @ns_headers);.Ve.PPIf you weren't reusing that array, you could just go ahead and do this:.PP.Vb 6\& $response = $browser\->get($url,\& \*(AqUser\-Agent\*(Aq => \*(AqMozilla/4.76 [en] (Win98; U)\*(Aq,\& \*(AqAccept\*(Aq => \*(Aqimage/gif, image/x\-xbitmap, image/jpeg, image/pjpeg, image/png, */*\*(Aq,\& \*(AqAccept\-Charset\*(Aq => \*(Aqiso\-8859\-1,*,utf\-8\*(Aq,\& \*(AqAccept\-Language\*(Aq => \*(Aqen\-US\*(Aq,\& );.Ve.PPIf you were only ever changing the 'User\-Agent' line, you could just changethe \f(CW$browser\fR object's default line from \*(L"libwww\-perl/5.65\*(R" (or the like)to whatever you like, using the LWP::UserAgent \f(CW\*(C`agent\*(C'\fR method:.PP.Vb 1\& $browser\->agent(\*(AqMozilla/4.76 [en] (Win98; U)\*(Aq);.Ve.Sh "Enabling Cookies".IX Subsection "Enabling Cookies"A default LWP::UserAgent object acts like a browser with its cookiessupport turned off. There are various ways of turning it on, by settingits \f(CW\*(C`cookie_jar\*(C'\fR attribute. A \*(L"cookie jar\*(R" is an object representinga little database of allthe \s-1HTTP\s0 cookies that a browser can know about. It can correspond to afile on disk (the way Netscape uses its \fIcookies.txt\fR file), or it canbe just an in-memory object that starts out empty, and whose collection ofcookies will disappear once the program is finished running..PPTo give a browser an in-memory empty cookie jar, you set its \f(CW\*(C`cookie_jar\*(C'\fRattribute like so:.PP.Vb 1\& $browser\->cookie_jar({});.Ve.PPTo give it a copy that will be read from a file on disk, and will be savedto it when the program is finished running, set the \f(CW\*(C`cookie_jar\*(C'\fR attributelike this:.PP.Vb 7\& use HTTP::Cookies;\& $browser\->cookie_jar( HTTP::Cookies\->new(\& \*(Aqfile\*(Aq => \*(Aq/some/where/cookies.lwp\*(Aq,\& # where to read/write cookies\& \*(Aqautosave\*(Aq => 1,\& # save it to disk when done\& ));.Ve.PPThat file will be an LWP-specific format. If you want to be access thecookies in your Netscape cookies file, you can use theHTTP::Cookies::Netscape class:.PP.Vb 2\& use HTTP::Cookies;\& # yes, loads HTTP::Cookies::Netscape too\& \& $browser\->cookie_jar( HTTP::Cookies::Netscape\->new(\& \*(Aqfile\*(Aq => \*(Aqc:/Program Files/Netscape/Users/DIR\-NAME\-HERE/cookies.txt\*(Aq,\& # where to read cookies\& ));.Ve.PPYou could add an \f(CW\*(C`\*(Aqautosave\*(Aq => 1\*(C'\fR line as further above, but attime of writing, it's uncertain whether Netscape might discard some ofthe cookies you could be writing back to disk..Sh "Posting Form Data".IX Subsection "Posting Form Data"Many \s-1HTML\s0 forms send data to their server using an \s-1HTTP\s0 \s-1POST\s0 request, whichyou can send with this syntax:.PP.Vb 7\& $response = $browser\->post( $url,\& [\& formkey1 => value1, \& formkey2 => value2, \& ...\& ],\& );.Ve.PPOr if you need to send \s-1HTTP\s0 headers:.PP.Vb 9\& $response = $browser\->post( $url,\& [\& formkey1 => value1, \& formkey2 => value2, \& ...\& ],\& headerkey1 => value1, \& headerkey2 => value2, \& );.Ve.PPFor example, the following program makes a search request to AltaVista(by sending some form data via an \s-1HTTP\s0 \s-1POST\s0 request), and extracts fromthe \s-1HTML\s0 the report of the number of matches:.PP.Vb 4\& use strict;\& use warnings;\& use LWP 5.64;\& my $browser = LWP::UserAgent\->new;\& \& my $word = \*(Aqtarragon\*(Aq;\& \& my $url = \*(Aqhttp://www.altavista.com/sites/search/web\*(Aq;\& my $response = $browser\->post( $url,\& [ \*(Aqq\*(Aq => $word, # the Altavista query string\& \*(Aqpg\*(Aq => \*(Aqq\*(Aq, \*(Aqavkw\*(Aq => \*(Aqtgz\*(Aq, \*(Aqkl\*(Aq => \*(AqXX\*(Aq,\& ]\& );\& die "$url error: ", $response\->status_line\& unless $response\->is_success;\& die "Weird content type at $url \-\- ", $response\->content_type\& unless $response\->content_type eq \*(Aqtext/html\*(Aq;\&\& if( $response\->decoded_content =~ m{AltaVista found ([0\-9,]+) results} ) {\& # The substring will be like "AltaVista found 2,345 results"\& print "$word: $1\en";\& }\& else {\& print "Couldn\*(Aqt find the match\-string in the response\en";\& }.Ve.Sh "Sending \s-1GET\s0 Form Data".IX Subsection "Sending GET Form Data"Some \s-1HTML\s0 forms convey their form data not by sending the datain an \s-1HTTP\s0 \s-1POST\s0 request, but by making a normal \s-1GET\s0 request withthe data stuck on the end of the \s-1URL\s0. For example, if you went to\&\f(CW\*(C`imdb.com\*(C'\fR and ran a search on \*(L"Blade Runner\*(R", the \s-1URL\s0 you'd seein your browser window would be:.PP.Vb 1\& http://us.imdb.com/Tsearch?title=Blade%20Runner&restrict=Movies+and+TV.Ve.PPTo run the same search with \s-1LWP\s0, you'd use this idiom, which involvesthe \s-1URI\s0 class:.PP.Vb 3\& use URI;\& my $url = URI\->new( \*(Aqhttp://us.imdb.com/Tsearch\*(Aq );\& # makes an object representing the URL\& \& $url\->query_form( # And here the form data pairs:\& \*(Aqtitle\*(Aq => \*(AqBlade Runner\*(Aq,\& \*(Aqrestrict\*(Aq => \*(AqMovies and TV\*(Aq,\& );\& \& my $response = $browser\->get($url);.Ve.PPSee chapter 5 of \fIPerl & \s-1LWP\s0\fR for a longer discussion of \s-1HTML\s0 formsand of form data, and chapters 6 through 9 for a longer discussion ofextracting data from \s-1HTML\s0..Sh "Absolutizing URLs".IX Subsection "Absolutizing URLs"The \s-1URI\s0 class that we just mentioned above provides all sorts of methodsfor accessing and modifying parts of URLs (such as asking sort of \s-1URL\s0 itis with \f(CW\*(C`$url\->scheme\*(C'\fR, and asking what host it refers to with \f(CW\*(C`$url\->host\*(C'\fR, and so on, as described in the docs for the \s-1URI\s0class. However, the methods of most immediate interestare the \f(CW\*(C`query_form\*(C'\fR method seen above, and now the \f(CW\*(C`new_abs\*(C'\fR methodfor taking a probably-relative \s-1URL\s0 string (like \*(L"../foo.html\*(R") and gettingback an absolute \s-1URL\s0 (like \*(L"http://www.perl.com/stuff/foo.html\*(R"), asshown here:.PP.Vb 2\& use URI;\& $abs = URI\->new_abs($maybe_relative, $base);.Ve.PPFor example, consider this program that matches URLs in the \s-1HTML\s0list of new modules in \s-1CPAN:\s0.PP.Vb 4\& use strict;\& use warnings;\& use LWP;\& my $browser = LWP::UserAgent\->new;\& \& my $url = \*(Aqhttp://www.cpan.org/RECENT.html\*(Aq;\& my $response = $browser\->get($url);\& die "Can\*(Aqt get $url \-\- ", $response\->status_line\& unless $response\->is_success;\& \& my $html = $response\->decoded_content;\& while( $html =~ m/<A HREF=\e"(.*?)\e"/g ) {\& print "$1\en";\& }.Ve.PPWhen run, it emits output that starts out something like this:.PP.Vb 7\& MIRRORING.FROM\& RECENT\& RECENT.html\& authors/00whois.html\& authors/01mailrc.txt.gz\& authors/id/A/AA/AASSAD/CHECKSUMS\& ....Ve.PPHowever, if you actually want to have those be absolute URLs, youcan use the \s-1URI\s0 module's \f(CW\*(C`new_abs\*(C'\fR method, by changing the \f(CW\*(C`while\*(C'\fRloop to this:.PP.Vb 3\& while( $html =~ m/<A HREF=\e"(.*?)\e"/g ) {\& print URI\->new_abs( $1, $response\->base ) ,"\en";\& }.Ve.PP(The \f(CW\*(C`$response\->base\*(C'\fR method from HTTP::Messageis for returning what \s-1URL\s0should be used for resolving relative URLs \*(-- it's usually justthe same as the \s-1URL\s0 that you requested.).PPThat program then emits nicely absolute URLs:.PP.Vb 7\& http://www.cpan.org/MIRRORING.FROM\& http://www.cpan.org/RECENT\& http://www.cpan.org/RECENT.html\& http://www.cpan.org/authors/00whois.html\& http://www.cpan.org/authors/01mailrc.txt.gz\& http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS\& ....Ve.PPSee chapter 4 of \fIPerl & \s-1LWP\s0\fR for a longer discussion of \s-1URI\s0 objects..PPOf course, using a regexp to match hrefs is a bit simplistic, and formore robust programs, you'll probably want to use an HTML-parsing modulelike HTML::LinkExtor or HTML::TokeParser or even maybeHTML::TreeBuilder..Sh "Other Browser Attributes".IX Subsection "Other Browser Attributes"LWP::UserAgent objects have many attributes for controlling how theywork. Here are a few notable ones:.IP "\(bu" 4\&\f(CW\*(C`$browser\->timeout(15);\*(C'\fR.SpThis sets this browser object to give up on requests that don't answerwithin 15 seconds..IP "\(bu" 4\&\f(CW\*(C`$browser\->protocols_allowed( [ \*(Aqhttp\*(Aq, \*(Aqgopher\*(Aq] );\*(C'\fR.SpThis sets this browser object to not speak any protocols other than \s-1HTTP\s0and gopher. If it tries accessing any other kind of \s-1URL\s0 (like an \*(L"ftp:\*(R"or \*(L"mailto:\*(R" or \*(L"news:\*(R" \s-1URL\s0), then it won't actually try connecting, butinstead will immediately return an error code 500, with a message like\&\*(L"Access to 'ftp' URIs has been disabled\*(R"..IP "\(bu" 4\&\f(CW\*(C`use LWP::ConnCache; $browser\->conn_cache(LWP::ConnCache\->new());\*(C'\fR.SpThis tells the browser object to try using the \s-1HTTP/1\s0.1 \*(L"Keep-Alive\*(R"feature, which speeds up requests by reusing the same socket connectionfor multiple requests to the same server..IP "\(bu" 4\&\f(CW\*(C`$browser\->agent( \*(AqSomeName/1.23 (more info here maybe)\*(Aq )\*(C'\fR.SpThis changes how the browser object will identify itself inthe default \*(L"User-Agent\*(R" line is its \s-1HTTP\s0 requests. By default,it'll send "libwww\-perl/\fIversionnumber\fR\*(L", like\&\*(R"libwww\-perl/5.65". You can change that to something more descriptivelike this:.Sp.Vb 1\& $browser\->agent( \*(AqSomeName/3.14 (contact@robotplexus.int)\*(Aq );.Ve.Sp
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -