⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch20_11.htm

📁 By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998
💻 HTM
字号:
<HTML><HEAD><TITLE>Recipe 20.10. Mirroring Web Pages (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:46:01Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch20_01.htm"TITLE="20. Web Automation"><LINKREL="prev"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"><LINKREL="next"HREF="ch20_12.htm"TITLE="20.11. Creating a Robot"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 20.9. Creating HTML Templates"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch20_01.htm"TITLE="20. Web Automation"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_12.htm"TITLE="20.11. Creating a Robot"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 20.11. Creating a Robot"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch20-21335">20.10. Mirroring Web Pages</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-1205">Problem<ACLASS="indexterm"NAME="ch20-idx-1000002666-0"></A><ACLASS="indexterm"NAME="ch20-idx-1000002666-1"></A><ACLASS="indexterm"NAME="ch20-idx-1000002666-2"></A><ACLASS="indexterm"NAME="ch20-idx-1000002666-3"></A></A></H3><PCLASS="para">You want to keep a local copy of a web page up-to-date.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-1211">Solution</A></H3><PCLASS="para">Use LWP::Simple's <CODECLASS="literal">mirror</CODE> function:</P><PRECLASS="programlisting">use LWP::Simple;mirror($URL, $local_filename);</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-1221">Discussion</A></H3><PCLASS="para">Although closely related to the <CODECLASS="literal">get</CODE> function discussed in <ACLASS="xref"HREF="ch20_02.htm"TITLE="Fetching a URL from a Perl Script">Recipe 20.1</A>, the <CODECLASS="literal">mirror</CODE> function doesn't download the file unconditionally. It adds the <CODECLASS="literal">If-Modified-Since</CODE><ACLASS="indexterm"NAME="ch20-idx-1000003787-0"></A> header to the GET request it creates, so the server will not transfer the file unless it has been updated.</P><PCLASS="para">The <CODECLASS="literal">mirror</CODE> function mirrors only a single page, not a full tree. To mirror a set of pages, use this recipe in conjunction <ACLASS="xref"HREF="ch20_04.htm"TITLE="Extracting URLs">Recipe 20.3</A>. A good solution to mirroring an entire remote tree can be found in the w3mir program, also found on CPAN.</P><PCLASS="para">Be careful! It's possible (and easy) to write programs that run amok and begin downloading all web pages on the net. This is not only poor etiquette, it's also an infinite task, since some pages are dynamically generated. It could also get you into trouble with someone who doesn't want their pages downloaded en masse.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch20-pgfId-1231">See Also</A></H3><PCLASS="para">The documentation for the CPAN module LWP::Simple; the HTTP specification at <ACLASS="systemitem.url"HREF="http://www.w3.org/pub/WWW/Protocols/HTTP/">http://www.w3.org/pub/WWW/Protocols/HTTP/</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_10.htm"TITLE="20.9. Creating HTML Templates"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 20.9. Creating HTML Templates"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch20_12.htm"TITLE="20.11. Creating a Robot"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 20.11. Creating a Robot"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">20.9. Creating HTML Templates</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">20.11. Creating a Robot</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -