📄 ch06_11.htm
字号:
<HTML><HEAD><TITLE>Recipe 6.10. Speeding Up Interpolated Matches (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:34:24Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><LINKREL="prev"HREF="ch06_10.htm"TITLE="6.9. Matching Shell Globs as Regular Expressions"><LINKREL="next"HREF="ch06_12.htm"TITLE="6.11. Testing for a Valid Pattern"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_10.htm"TITLE="6.9. Matching Shell Globs as Regular Expressions"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.9. Matching Shell Globs as Regular Expressions"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_12.htm"TITLE="6.11. Testing for a Valid Pattern"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.11. Testing for a Valid Pattern"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch06-42168">6.10. Speeding Up Interpolated Matches</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1139">Problem<ACLASS="indexterm"NAME="ch06-idx-1000007620-0"></A></A></H3><PCLASS="para">You want your function or program to take one or more regular expressions as arguments, but doing so seems to run slower than using literals.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1145">Solution</A></H3><PCLASS="para"><ACLASS="indexterm"NAME="ch06-idx-1000007622-0"></A>To overcome this bottleneck, if you have only one pattern whose value won't change during the entire run of a program, store it in a string and use <CODECLASS="literal">/$pattern/o</CODE>.</P><PRECLASS="programlisting">while ($line = <>) { if ($line =~ /$pattern/o) { # do something }}</PRE><PCLASS="para">If you have more than one pattern, however, that won't work. Use one of the three techniques outlined in the Discussion for a speed-up of an order of magnitude or so.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1163">Discussion</A></H3><PCLASS="para">When Perl compiles a program, it converts patterns into an internal form. This conversion occurs at compile time for patterns without variables in them, but at run time for those that do contain variables. That means that interpolating variables into patterns, as in <CODECLASS="literal">/$pattern/</CODE>, can slow your program down. This is particularly noticeable when <CODECLASS="literal">$pattern</CODE> changes often.</P><PCLASS="para">The <CODECLASS="literal">/o</CODE> modifier is a promise from the script's author that the values of any variables interpolated into that pattern will not change - or that if they do, Perl should disregard any such changes. Given such a promise, Perl need only interpolate the variable and compile the pattern the first time it encounters the match. But if the interpolated variable were to change, Perl wouldn't notice. Make sure to use it only on unchanging variables, or else wrong answers will result.</P><PCLASS="para">Using <CODECLASS="literal">/o</CODE> on patterns without interpolated variables does not speed anything up. The <CODECLASS="literal">/o</CODE> modifier is also of no help when you have an unknown number of regular expressions and need to check one or more strings against all of these patterns. Nor is it of any use when the interpolated variable is a function argument, since each call of the function gives the variable a new value.</P><PCLASS="para"><ACLASS="xref"HREF="ch06_11.htm#ch06-22028"TITLE="popgrep1">Example 6.4</A> is an example of the slow but straightforward technique for matching many patterns against many lines. The array <CODECLASS="literal">@popstates</CODE> contains the standard two-letter abbreviations for some of the places in the heartland of North America where we normally refer to soft drinks as <ICLASS="firstterm">pop</I> (<ICLASS="firstterm">soda</I> to us means either plain soda water or else handmade delicacies from the soda fountain at the corner drugstore, preferably with ice cream). The goal is to print out any line of input that contains any of those places, matching them at word boundaries only. It doesn't use <CODECLASS="literal">/o</CODE> because the variable that holds the pattern keeps changing.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch06-22028">Example 6.4: popgrep1</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# <ACLASS="indexterm"NAME="ch06-idx-1000008674-0"></A>popgrep1 - grep for abbreviations of places that say "pop"# version 1: slow but obvious way@popstates = qw(CO ON MI WI MN);LINE: while (defined($line = <>)) { for $state (@popstates) { if ($line =~ /\b$state\b/) { print; next LINE; } }}</PRE></DIV><PCLASS="para">Such a direct, obvious, brute-force approach is also horribly slow because it has to recompile all patterns with each line of input. Three different ways of addressing this are described in this section. One builds a string of Perl code and <CODECLASS="literal">eval</CODE>s it; one caches the internal representations of regular expressions in closures; and one uses the Regexp module from CPAN to hold compiled regular expressions.</P><PCLASS="para">The traditional way to get Perl to speed up a multiple match is to build up a string containing the code and <CODECLASS="literal">eval</CODE> <CODECLASS="literal">"$code"</CODE> it. <ACLASS="xref"HREF="ch06_11.htm#ch06-36871"TITLE="popgrep2">Example 6.5</A> contains a version that uses this technique.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch06-36871">Example 6.5: popgrep2</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# <ACLASS="indexterm"NAME="ch06-idx-1000007797-0"></A>popgrep2 - grep for abbreviations of places that say "pop"# version 2: eval strings; fast but hard to quote@popstates = qw(CO ON MI WI MN);$code = 'while (defined($line = <>)) {';for $state (@popstates) { $code .= "\tif (\$line =~ /\\b$state\\b/) { print \$line; next; }\n";}$code .= '}';print "CODE IS\n----\n$code\n----\n" if 0; # turn on to debugeval $code;die if $@;</PRE></DIV><PCLASS="para">The <CODECLASS="literal">popgrep2</CODE> program builds strings like this:</P><PRECLASS="programlisting">while (defined($line = <>)) { if ($line =~ /\bCO\b/) { print $line; next; } if ($line =~ /\bON\b/) { print $line; next; } if ($line =~ /\bMI\b/) { print $line; next; } if ($line =~ /\bWI\b/) { print $line; next; } if ($line =~ /\bMN\b/) { print $line; next; }}</PRE><PCLASS="para">As you see, those end up looking like constant strings to <CODECLASS="literal">eval</CODE>. We put the entire loop and pattern match in the <CODECLASS="literal">eval</CODE> text, too, which makes it run faster.</P><PCLASS="para">The worst thing about this <CODECLASS="literal">eval</CODE> <CODECLASS="literal">"STRING"</CODE> approach is that it's difficult to get the quoting and escaping right. The <CODECLASS="literal">dequote</CODE> function from <ACLASS="xref"HREF="ch01_12.htm"TITLE="Indenting Here Documents">Recipe 1.11</A> can make it easier to read, but escaping variables whose use is delayed will still be an issue. Also, none of the strings can contain a slash, since that's what we're using as a delimiter for the <CODECLASS="literal">m//</CODE> operator.</P><PCLASS="para">A solution to these problems is a subtle technique first developed by Jeffrey <ACLASS="indexterm"NAME="ch06-idx-1000007631-0"></A>Friedl. The key here is building an anonymous subroutine that caches the compiled patterns in the closure it creates. To do this, we <CODECLASS="literal">eval</CODE
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -