ch06_22.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 292 行

HTM
292
字号
<HTML><HEAD><TITLE>Recipe 6.21. Program: urlify (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:35:08Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><LINKREL="prev"HREF="ch06_21.htm"TITLE="6.20. Matching Abbreviations"><LINKREL="next"HREF="ch06_23.htm"TITLE="6.22. Program: tcgrep"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_21.htm"TITLE="6.20. Matching Abbreviations"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.20. Matching Abbreviations"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_23.htm"TITLE="6.22. Program: tcgrep"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.22. Program: tcgrep"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch06-chap06_program_0">6.21. Program: urlify</A></H2><PCLASS="para"><ACLASS="indexterm"NAME="ch06-idx-1000007729-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007729-1"></A><ACLASS="indexterm"NAME="ch06-idx-1000007729-2"></A><ACLASS="indexterm"NAME="ch06-idx-1000007729-3"></A><ACLASS="indexterm"NAME="ch06-idx-1000007729-4"></A>This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.</P><PCLASS="para">It is a typical Perl filter, so it can be used by feeding it input:</P><PRECLASS="programlisting">% gunzip -c ~/mail/archive.gz | urlify &gt; archive.urlified</PRE><PCLASS="para">or by supplying files on the command line:</P><PRECLASS="programlisting">% urlify ~/mail/*.inbox &gt; ~/allmail.urlified</PRE><PCLASS="para">The program is shown in <ACLASS="xref"HREF="ch06_22.htm#ch06-24264"TITLE="urlify">Example 6.13</A>.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch06-24264">Example 6.13: urlify</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# urlify - wrap HTML links around URL-like constructs$urls = '(http|telnet|gopher|file|wais|ftp)';$ltrs = '\w';$gunk = '/#~:.?+=&amp;%@!\-';$punc = '.:?\-';$any  = &quot;${ltrs}${gunk}${punc}&quot;;while (&lt;&gt;) {    s{      \b                    # start at word boundary      (                     # begin $1  {       $urls     :          # need resource and a colon       [$any] +?            # followed by on or more                            #  of any valid character, but                            #  be conservative and take only                            #  what you need to....      )                     # end   $1  }      (?=                   # look-ahead non-consumptive assertion       [$punc]*             # either 0 or more punctuation       [^$any]              #   followed by a non-url char       |                    # or else       $                    #   then end of the string      )     }{&lt;A HREF=&quot;$1&quot;&gt;$1&lt;/A&gt;}igox;    print;}<ACLASS="indexterm"NAME="ch06-idx-1000007731-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007731-1"></A><ACLASS="indexterm"NAME="ch06-idx-1000007731-2"></A><ACLASS="indexterm"NAME="ch06-idx-1000007731-3"></A><ACLASS="indexterm"NAME="ch06-idx-1000007731-4"></A></PRE></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_21.htm"TITLE="6.20. Matching Abbreviations"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.20. Matching Abbreviations"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_23.htm"TITLE="6.22. Program: tcgrep"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.22. Program: tcgrep"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">6.20. Matching Abbreviations</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.22. Program: tcgrep</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?