📄 ch20_09.htm
字号:
<HTML><HEAD><TITLE>Recipe 20.8. Finding Fresh Links (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:45:59Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch20_01.htm"TITLE="20. Web Automation"><LINKREL="prev"HREF="ch20_08.htm"TITLE="20.7. Finding Stale Links"><LINKREL="next"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_08.htm"TITLE="20.7. Finding Stale Links"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 20.7. Finding Stale Links"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch20_01.htm"TITLE="20. Web Automation"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 20.9. Creating HTML Templates"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch20-37341">20.8. Finding Fresh Links</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-861">Problem<ACLASS="indexterm"NAME="ch20-idx-1000002657-0"></A></A></H3><PCLASS="para"><ACLASS="indexterm"NAME="ch20-idx-1000003777-0"></A><ACLASS="indexterm"NAME="ch20-idx-1000003777-1"></A>Given a list of URLs, you want to determine which have been most recently modified.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-867">Solution</A></H3><PCLASS="para">The program in <ACLASS="xref"HREF="ch20_09.htm#ch20-35690"TITLE="surl">Example 20.6</A> reads URLs from standard input, rearranges by date, and prints them back to standard output with those dates prepended.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch20-35690">Example 20.6: surl</A></H4><PRECLASS="programlisting">#!/usr/bin/perl -w# surl - sort URLs by their last modification dateuse LWP::UserAgent;use HTTP::Request;use URI::URL qw(url);my($url, %Date);my $ua = LWP::UserAgent->new();while ( $url = url(scalar <>) ) { my $ans; next unless $url->scheme =~ /^(file|https?)$/; $ans = $ua->request(HTTP::Request->new("HEAD", $url)); if ($ans->is_success) { $Date{$url} = $ans->last_modified || 0; # unknown } else { print STDERR "$url: Error [", $ans->code, "] ", $ans->message, "!\n"; }}foreach $url ( sort { $Date{$b} <=> $Date{$a} } keys %Date ) { printf "%-25s %s\n", $Date{$url} ? (scalar localtime $Date{$url}) : "<NONE SPECIFIED>", $url;}</PRE></DIV></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-923">Discussion</A></H3><PCLASS="para">The <ACLASS="indexterm"NAME="ch20-idx-1000002659-0"></A>surl script works more like a traditional filter program. It reads from standard input one URL per line. (Actually, it reads from <<CODECLASS="literal">ARGV</CODE>>, which defaults to STDIN if <CODECLASS="literal">@ARGV</CODE> is empty.) The last-modified date on each URL is fetched using a HEAD request. That date is stored in a hash using the URL for a key. Then a simple sort by value is run on the hash to reorder the URLs by date. On output, the internal date is converted into <CODECLASS="literal">localtime</CODE> format.</P><PCLASS="para">Here's an example of using the xurl program from the earlier recipe to extract the URLs, then running that program's output to feed into surl.</P><PRECLASS="programlisting">% xurl http://www.perl.com/ | surl | head<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Mon Apr 20 06:16:02 1998 http://electriclichen.com/linux/srom.html</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Fri Apr 17 13:38:51 1998 http://www.oreilly.com/</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Fri Mar 13 12:16:47 1998 http://www2.binevolve.com/</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Sun Mar 8 21:01:27 1998 http://www.perl.org/</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Tue Nov 18 13:41:32 1997 http://www.perl.com/universal/header.map</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Wed Oct 1 12:55:13 1997 http://www.songline.com/</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Sun Aug 17 21:43:51 1997 http://www.perl.com/graphics/perlhome_header.jpg</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Sun Aug 17 21:43:47 1997 http://www.perl.com/graphics/perl_id_313c.gif</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Sun Aug 17 21:43:46 1997 http://www.perl.com/graphics/ora_logo.gif</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Sun Aug 17 21:43:44 1997 http://www.perl.com/graphics/header-nav.gif</I></CODE></B></CODE></PRE><PCLASS="para">Having a variety of small programs that each do one thing and that can be combined into more powerful constructs is the hallmark of good programming. You could even argue that xurl should work on files, and that some other program should actually fetch the URL's contents over the Web to feed into xurl, churl, or surl. That program would probably be called gurl, except that a program by that name already exists: the LWP module suite has a program called lwp-request with aliases HEAD, GET, and POST to run those operations in shell <ACLASS="indexterm"NAME="ch20-idx-1000003779-0"></A><ACLASS="indexterm"NAME="ch20-idx-1000003779-1"></A>scripts.<ACLASS="indexterm"NAME="ch20-idx-1000002653-0"></A><ACLASS="indexterm"NAME="ch20-idx-1000002653-1"></A><ACLASS="indexterm"NAME="ch20-idx-1000002653-2"></A><ACLASS="indexterm"NAME="ch20-idx-1000002653-3"></A><ACLASS="indexterm"NAME="ch20-idx-1000002653-4"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-955">See Also</A></H3><PCLASS="para">The documentation for the CPAN modules LWP::UserAgent, HTTP::Request, and URI::URL; <ACLASS="xref"HREF="ch20_08.htm"TITLE="Finding Stale Links">Recipe 20.7</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_08.htm"TITLE="20.7. Finding Stale Links"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 20.7. Finding Stale Links"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 20.9. Creating HTML Templates"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">20.7. Finding Stale Links</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">20.9. Creating HTML Templates</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -