📄 lwptut.pod

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 POD
📖 第 1 页 / 共 2 页
字号:
上一页 12
  http://www.cpan.org/authors/00whois.html  http://www.cpan.org/authors/01mailrc.txt.gz  http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS  ...See chapter 4 of I<Perl & LWP> for a longer discussion of URI objects.Of course, using a regexp to match hrefs is a bit simplistic, and formore robust programs, you'll probably want to use an HTML-parsing modulelike L<HTML::LinkExtor> or L<HTML::TokeParser> or even maybeL<HTML::TreeBuilder>.=for comment ##########################################################################=head2 Other Browser AttributesLWP::UserAgent objects have many attributes for controlling how theywork.  Here are a few notable ones:=over=item *C<< $browser->timeout(15); >>This sets this browser object to give up on requests that don't answerwithin 15 seconds.=item *C<< $browser->protocols_allowed( [ 'http', 'gopher'] ); >>This sets this browser object to not speak any protocols other than HTTPand gopher. If it tries accessing any other kind of URL (like an "ftp:"or "mailto:" or "news:" URL), then it won't actually try connecting, butinstead will immediately return an error code 500, with a message like"Access to 'ftp' URIs has been disabled".=item *C<< use LWP::ConnCache; $browser->conn_cache(LWP::ConnCache->new()); >>This tells the browser object to try using the HTTP/1.1 "Keep-Alive"feature, which speeds up requests by reusing the same socket connectionfor multiple requests to the same server.=item *C<< $browser->agent( 'SomeName/1.23 (more info here maybe)' ) >>This changes how the browser object will identify itself inthe default "User-Agent" line is its HTTP requests.  By default,it'll send "libwww-perl/I<versionnumber>", like"libwww-perl/5.65".  You can change that to something more descriptivelike this:  $browser->agent( 'SomeName/3.14 (contact@robotplexus.int)' );Or if need be, you can go in disguise, like this:  $browser->agent( 'Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)' );=item *C<< push @{ $ua->requests_redirectable }, 'POST'; >>This tells this browser to obey redirection responses to POST requests(like most modern interactive browsers), even though the HTTP RFC saysthat should not normally be done.=backFor more options and information, see L<the full documentation forLWP::UserAgent|LWP::UserAgent>.=for comment ##########################################################################=head2 Writing Polite RobotsIf you want to make sure that your LWP-based program respects F<robots.txt>files and doesn't make too many requests too fast, you can use the LWP::RobotUAclass instead of the LWP::UserAgent class.LWP::RobotUA class is just like LWP::UserAgent, and you can use it like so:  use LWP::RobotUA;  my $browser = LWP::RobotUA->new('YourSuperBot/1.34', 'you@yoursite.com');    # Your bot's name and your email address  my $response = $browser->get($url);But HTTP::RobotUA adds these features:=over=item *If the F<robots.txt> on C<$url>'s server forbids you from accessingC<$url>, then the C<$browser> object (assuming it's of class LWP::RobotUA)won't actually request it, but instead will give you back (in C<$response>) a 403 errorwith a message "Forbidden by robots.txt".  That is, if you have this line:  die "$url -- ", $response->status_line, "\nAborted"   unless $response->is_success;then the program would die with an error message like this:  http://whatever.site.int/pith/x.html -- 403 Forbidden by robots.txt  Aborted at whateverprogram.pl line 1234=item *If this C<$browser> object sees that the last time it talked toC<$url>'s server was too recently, then it will pause (via C<sleep>) toavoid making too many requests too often. How long it will pause for, isby default one minute -- but you can control it with the C<<$browser->delay( I<minutes> ) >> attribute.For example, this code:  $browser->delay( 7/60 );...means that this browser will pause when it needs to avoid talking toany given server more than once every 7 seconds.=backFor more options and information, see L<the full documentation forLWP::RobotUA|LWP::RobotUA>.=for comment ##########################################################################=head2 Using ProxiesIn some cases, you will want to (or will have to) use proxies foraccessing certain sites and/or using certain protocols. This is mostcommonly the case when your LWP program is running (or could be running)on a machine that is behind a firewall.To make a browser object use proxies that are defined in the usualenvironment variables (C<HTTP_PROXY>, etc.), just call the C<env_proxy>on a user-agent object before you go making any requests on it.Specifically:  use LWP::UserAgent;  my $browser = LWP::UserAgent->new;    # And before you go making any requests:  $browser->env_proxy;For more information on proxy parameters, see L<the LWP::UserAgentdocumentation|LWP::UserAgent>, specifically the C<proxy>, C<env_proxy>,and C<no_proxy> methods.=for comment ##########################################################################=head2 HTTP AuthenticationMany web sites restrict access to documents by using "HTTPAuthentication". This isn't just any form of "enter your password"restriction, but is a specific mechanism where the HTTP server sends thebrowser an HTTP code that says "That document is part of a protected'realm', and you can access it only if you re-request it and add somespecial authorization headers to your request".For example, the Unicode.org admins stop email-harvesting bots fromharvesting the contents of their mailing list archives, by protectingthem with HTTP Authentication, and then publicly stating the usernameand password (at C<http://www.unicode.org/mail-arch/>) -- namelyusername "unicode-ml" and password "unicode".  For example, consider this URL, which is part of the protectedarea of the web site:  http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.htmlIf you access that with a browser, you'll get a promptlike "Enter username and password for 'Unicode-MailList-Archives' at server'www.unicode.org'".In LWP, if you just request that URL, like this:  use LWP;  my $browser = LWP::UserAgent->new;  my $url =   'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html';  my $response = $browser->get($url);  die "Error: ", $response->header('WWW-Authenticate') || 'Error accessing',    #  ('WWW-Authenticate' is the realm-name)    "\n ", $response->status_line, "\n at $url\n Aborting"   unless $response->is_success;Then you'll get this error:  Error: Basic realm="Unicode-MailList-Archives"   401 Authorization Required   at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html   Aborting at auth1.pl line 9.  [or wherever]...because the C<$browser> doesn't know any the username and passwordfor that realm ("Unicode-MailList-Archives") at that host("www.unicode.org").  The simplest way to let the browser know about thisis to use the C<credentials> method to let it know about a username andpassword that it can try using for that realm at that host.  The syntax is:  $browser->credentials(    'servername:portnumber',    'realm-name',   'username' => 'password'  );In most cases, the port number is 80, the default TCP/IP port for HTTP; andyou usually call the C<credentials> method before you make any requests.For example:  $browser->credentials(    'reports.mybazouki.com:80',    'web_server_usage_reports',    'plinky' => 'banjo123'  );So if we add the following to the program above, right after the C<<$browser = LWP::UserAgent->new; >> line...  $browser->credentials(  # add this to our $browser 's "key ring"    'www.unicode.org:80',    'Unicode-MailList-Archives',    'unicode-ml' => 'unicode'  );...then when we run it, the request succeeds, instead of causing theC<die> to be called.=for comment ##########################################################################=head2 Accessing HTTPS URLsWhen you access an HTTPS URL, it'll work for you just like an HTTP URLwould -- if your LWP installation has HTTPS support (via an appropriateSecure Sockets Layer library).  For example:  use LWP;  my $url = 'https://www.paypal.com/';   # Yes, HTTPS!  my $browser = LWP::UserAgent->new;  my $response = $browser->get($url);  die "Error at $url\n ", $response->status_line, "\n Aborting"   unless $response->is_success;  print "Whee, it worked!  I got that ",   $response->content_type, " document!\n";If your LWP installation doesn't have HTTPS support set up, then theresponse will be unsuccessful, and you'll get this error message:  Error at https://www.paypal.com/   501 Protocol scheme 'https' is not supported   Aborting at paypal.pl line 7.   [or whatever program and line]If your LWP installation I<does> have HTTPS support installed, then theresponse should be successful, and you should be able to consultC<$response> just like with any normal HTTP response.For information about installing HTTPS support for your LWPinstallation, see the helpful F<README.SSL> file that comes in thelibwww-perl distribution.=for comment ##########################################################################=head2 Getting Large DocumentsWhen you're requesting a large (or at least potentially large) document,a problem with the normal way of using the request methods (like C<<$response = $browser->get($url) >>) is that the response object inmemory will have to hold the whole document -- I<in memory>. If theresponse is a thirty megabyte file, this is likely to be quite animposition on this process's memory usage.A notable alternative is to have LWP save the content to a file on disk,instead of saving it up in memory.  This is the syntax to use:  $response = $ua->get($url,                         ':content_file' => $filespec,                      );For example,  $response = $ua->get('http://search.cpan.org/',                         ':content_file' => '/tmp/sco.html'                      );When you use this C<:content_file> option, the C<$response> will haveall the normal header lines, but C<< $response->content >> will beempty.Note that this ":content_file" option isn't supported under olderversions of LWP, so you should consider adding C<use LWP 5.66;> to checkthe LWP version, if you think your program might run on systems witholder versions.If you need to be compatible with older LWP versions, then usethis syntax, which does the same thing:  use HTTP::Request::Common;  $response = $ua->request( GET($url), $filespec );=for comment ##########################################################################=head1 SEE ALSORemember, this article is just the most rudimentary introduction toLWP -- to learn more about LWP and LWP-related tasks, you reallymust read from the following:=over=item *L<LWP::Simple> -- simple functions for getting/heading/mirroring URLs=item *L<LWP> -- overview of the libwww-perl modules=item *L<LWP::UserAgent> -- the class for objects that represent "virtual browsers"=item *L<HTTP::Response> -- the class for objects that represent the response toa LWP response, as in C<< $response = $browser->get(...) >>=item *L<HTTP::Message> and L<HTTP::Headers> -- classes that provide more methodsto HTTP::Response.=item *L<URI> -- class for objects that represent absolute or relative URLs=item *L<URI::Escape> -- functions for URL-escaping and URL-unescaping strings(like turning "this & that" to and from "this%20%26%20that").=item *L<HTML::Entities> -- functions for HTML-escaping and HTML-unescaping strings(like turning "C. & E. BrontE<euml>" to and from "C. &amp; E. Bront&euml;")=item *L<HTML::TokeParser> and L<HTML::TreeBuilder> -- classes for parsing HTML=item *L<HTML::LinkExtor> -- class for finding links in HTML documents=item *The book I<Perl & LWP> by Sean M. Burke.  O'Reilly & Associates, 2002.ISBN: 0-596-00178-9.  C<http://www.oreilly.com/catalog/perllwp/>=back=head1 COPYRIGHTCopyright 2002, Sean M. Burke.  You can redistribute this document and/ormodify it, but only under the same terms as Perl itself.=head1 AUTHORSean M. Burke C<sburke@cpan.org>=for comment ##########################################################################=cut# End of Pod
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -