📄 ch06_06.htm

📁 By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998
💻 HTM
字号:
<HTML><HEAD><TITLE>Recipe 6.5. Finding the Nth Occurrence of a Match (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:33:45Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><LINKREL="prev"HREF="ch06_05.htm"TITLE="6.4.  Commenting Regular Expressions"><LINKREL="next"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_05.htm"TITLE="6.4.  Commenting Regular Expressions"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.4.  Commenting Regular Expressions"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.6. Matching Multiple Lines"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch06-chap06_finding_0">6.5. Finding the N<SUPCLASS="superscript">th</SUP> Occurrence of a Match</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-553">Problem<ACLASS="indexterm"NAME="ch06-idx-1000007557-0"></A></A></H3><PCLASS="para">You want to find the N<EMCLASS="emphasis"> </EM><SUPCLASS="superscript">th</SUP> match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of <CODECLASS="literal">&quot;fish&quot;</CODE>:</P><PRECLASS="programlisting"><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>One fish two fish red fish blue fish</I></CODE></B></CODE></PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-561">Solution</A></H3><PCLASS="para">Use the <CODECLASS="literal">/g</CODE><ACLASS="indexterm"NAME="ch06-idx-1000007565-0"></A> modifier in a <CODECLASS="literal">while</CODE> loop, keeping count of matches:</P><PRECLASS="programlisting">$WANT = 3;$count = 0;while (/(\w+)\s+fish\b/gi) {    if (++$count == $WANT) {        print &quot;The third fish is a $1 one.\n&quot;;        # Warning: don't `last' out of this loop    }}<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>The third fish is a red one.</I></CODE></B></CODE></PRE><PCLASS="para">Or use a repetition count and repeated pattern like this:</P><PRECLASS="programlisting">/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-589">Discussion</A></H3><PCLASS="para">As explained in the chapter introduction, using the <CODECLASS="literal">/g</CODE> modifier in scalar context creates something of a <ACLASS="indexterm"NAME="ch06-idx-1000008346-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000008346-1"></A><EMCLASS="emphasis">progressive match</EM>, useful in <CODECLASS="literal">while</CODE> loops. This is commonly used to count the number of times a pattern matches in a string:</P><PRECLASS="programlisting"># simple way with while loop$count = 0;while ($string =~ /PAT/g) {    $count++;               # or whatever you'd like to do here}# same thing with trailing while$count = 0;$count++ while $string =~ /PAT/g;# or with for loopfor ($count = 0; $string =~ /PAT/g; $count++) { }    # Similar, but this time count overlapping matches$count++ while $string =~ /(?=PAT)/g;</PRE><PCLASS="para">To find the N<SUPCLASS="superscript">th</SUP> match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every N<SUPCLASS="superscript">th</SUP> match by checking for multiples of N using the modulus operator. For example, <CODECLASS="literal">(++$count</CODE> <CODECLASS="literal">%</CODE> <CODECLASS="literal">3)</CODE> <CODECLASS="literal">==</CODE> <CODECLASS="literal">0</CODE> would be every third match.</P><PCLASS="para">If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.</P><PRECLASS="programlisting">$pond  = 'One fish two fish red fish blue fish';# using a temporary@colors = ($pond =~ /(\w+)\s+fish\b/gi);      # get all matches$color  = $colors[2];                         # then the one we want# or without a temporary array$color = ( $pond =~ /(\w+)\s+fish\b/gi )[2];  # just grab element 3print &quot;The third fish in the pond is $color.\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>The third fish in the pond is red.</I></CODE></B></CODE></PRE><PCLASS="para">Or finding all even-numbered fish:</P><PRECLASS="programlisting">$count = 0;$_ = 'One fish two fish red fish blue fish';@evens = grep { $count++ % 2 == 1 } /(\w+)\s+fish\b/gi;print &quot;Even numbered fish are @evens.\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Even numbered fish are two blue.</I></CODE></B></CODE></PRE><PCLASS="para">For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for the cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:</P><PRECLASS="programlisting">$count = 0;s{   \b               # makes next \w more efficient   ( \w+ )          # this is what we'll be changing   (     \s+ fish \b   )}{    if (++$count == 4) {        &quot;sushi&quot; . $2;    } else {         $1   . $2;    }}gex;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>One fish two fish red fish sushi fish</I></CODE></B></CODE></PRE><PCLASS="para">Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After <CODECLASS="literal">/.*\b(\w+)\s+fish\b/</CODE>, for example, the <CODECLASS="literal">$1</CODE> variable would have the last fish.</P><PCLASS="para">Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:</P><PRECLASS="programlisting">$pond = 'One fish two fish red fish blue fish swim here.';$color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1];print &quot;Last fish is $color.\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Last fish is blue.</I></CODE></B></CODE></PRE><PCLASS="para">If you need to express this same notion of finding the last match in a single pattern without <CODECLASS="literal">/g</CODE>, you can do so with the negative lookahead assertion <CODECLASS="literal">(?!THING)</CODE>. When you want the last match of arbitrary pattern A, you find A followed by any amount of not A through the end of the string. The general construct is <CODECLASS="literal">A(?!.*A)*$</CODE>, which can be broken up for legibility:</P><PRECLASS="programlisting">m{    A               # find some pattern A    (?!             # mustn't be able to find        .*          # something        A           # and A    )    $               # through the end of the string}x</PRE><PCLASS="para">That leaves us with this approach for selecting the last fish:</P><PRECLASS="programlisting">$pond = 'One fish two fish red fish blue fish swim here.';if ($pond =~ m{                    \b  (  \w+) \s+ fish \b                (?! .* \b fish \b )            }six ){    print &quot;Last fish is $1.\n&quot;;} else {    print &quot;Failed!\n&quot;;}<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Last fish is blue.</I></CODE></B></CODE></PRE><PCLASS="para">This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in <ACLASS="xref"HREF="ch06_18.htm"TITLE="Expressing AND, OR, and NOT in a Single Pattern">Recipe 6.17</A>. It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. But it also runs more slowly though &nbsp;-  around twice as slowly on the data set tested <CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch06-idx-1000009178-0"></A>above.<ACLASS="indexterm"NAME="ch06-idx-1000009179-0"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1000009180">See Also</A></H3><PCLASS="para">The behavior of <CODECLASS="literal">m//g</CODE> in scalar context is given in the "Regexp Quote-like Operators" section of <ICLASS="filename">perlop</I> (1), and in the <ACLASS="olink"HREF="../prog/ch02_04.htm#PERL2-CH-2-SECT-4.2">"Pattern Matching Operators"</A> section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; zero-width positive lookahead assertions are shown in the "Regular Expressions" section of <ICLASS="filename">perlre </I>(1), and in the <ACLASS="olink"HREF="../prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.2">"rules of regular expression matching"</A> section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_05.htm"TITLE="6.4.  Commenting Regular Expressions"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.4.  Commenting Regular Expressions"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.6. Matching Multiple Lines"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">6.4.  Commenting Regular Expressions</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.6. Matching Multiple Lines</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
💿 文件大小 1747 K
👤 上传用户 tiandl
📂 所属分类电子书籍
🏷️ 相关标签

#Christiansen #Torkington #published #Edition
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -