📄 ch05_05.htm
字号:
<html><head><title>Quantifiers (Programming Perl)</title><!-- STYLESHEET --><link rel="stylesheet" type="text/css" href="../style/style1.css"><!-- METADATA --><!--Dublin Core Metadata--><meta name="DC.Creator" content=""><meta name="DC.Date" content=""><meta name="DC.Format" content="text/xml" scheme="MIME"><meta name="DC.Generator" content="XSLT stylesheet, xt by James Clark"><meta name="DC.Identifier" content=""><meta name="DC.Language" content="en-US"><meta name="DC.Publisher" content="O'Reilly & Associates, Inc."><meta name="DC.Source" content="" scheme="ISBN"><meta name="DC.Subject.Keyword" content=""><meta name="DC.Title" content="Quantifiers"><meta name="DC.Type" content="Text.Monograph"></head><body><!-- START OF BODY --><!-- TOP BANNER --><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home"><map name="banner-map"><AREA SHAPE="RECT" COORDS="0,0,466,71" HREF="index.htm" ALT="Programming Perl"><AREA SHAPE="RECT" COORDS="467,0,514,18" HREF="jobjects/fsearch.htm" ALT="Search this book"></map><!-- TOP NAV BAR --><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="ch05_01.htm">Chapter 5: Pattern Matching</a></td><td align="right" valign="top" width="172"><a href="ch05_06.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr></table></div><hr width="515" align="left"><!-- SECTION BODY --><h2 class="sect1">5.5. Quantifiers</h2><p><a name="INDEX-1587"></a><a name="INDEX-1588"></a>Unless you say otherwise, each item in a regular expression matchesjust once. With a pattern like <tt class="literal">/nop/</tt>, each ofthose characters must match, each right after the other. Words like"panoply" or "xenophobia" are fine, because <em class="emphasis">where</em>the match occurs doesn't matter.</p><p>If you wanted to match both "xenophobia" and "Snoopy", you couldn'tuse the <tt class="literal">/nop/</tt> pattern, since that requires just one"o" between the "n" and the "p", and Snoopy has two. This is where<em class="emphasis">quantifiers</em> come in handy: they say how manytimes something may match, instead of the default of matching justonce. Quantifiers in a regular expression are like loops in aprogram; in fact, if you think of a regex as a program, then they<em class="emphasis">are</em> loops. Some loops are exact, like "repeatthis match five times only" (<tt class="literal">{5}</tt>). Others giveboth lower and upper bounds on the match count, like "repeat thismatch at least twice but no more than four times"(<tt class="literal">{2,4}</tt>). Others have no closed upper bound at all,like "match this at least twice, but as many times as you'd like"(<tt class="literal">{2,}</tt>).</p><p><a href="ch05_05.htm#perl3-tab-quantifiers">Table 5-12</a> shows the quantifiers that Perlrecognizes in a pattern.</p><a name="perl3-tab-quantifiers"></a><h4 class="objtitle">Table 5.12. Regex Quantifiers Compared</h4><table border="1"><tr><th><b class="emphasis-bold">Maximal</b></th><th><b class="emphasis-bold">Minimal</b></th><th><b class="emphasis-bold">Allowed Range</b></th></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,</tt><em class="replaceable">MAX</em><tt class="literal">}</tt></td><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,</tt><em class="replaceable">MAX</em><tt class="literal">}?</tt></td><td>Must occur at least <em class="replaceable">MIN</em> times but no more than <em class="replaceable">MAX</em> times</td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,}</tt></td><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,}?</tt></td><td>Must occur at least <em class="replaceable">MIN</em> times</td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">COUNT</em><tt class="literal">}</tt></td><td><tt class="literal">{</tt><em class="replaceable">COUNT</em><tt class="literal">}?</tt></td><td>Must match exactly <em class="replaceable">COUNT</em> times</td></tr><tr><td><tt class="literal">*</tt></td><td><tt class="literal">*?</tt></td><td><p>0 or more times (same as <tt class="literal">{0,}</tt>)<a name="INDEX-1589"></a></p></td></tr><tr><td><tt class="literal">+</tt></td><td><tt class="literal">+?</tt></td><td>1or more times (same as <tt class="literal">{1,}</tt>)<a name="INDEX-1590"></a></td></tr><tr><td><tt class="literal">?</tt></td><td><tt class="literal">??</tt></td><td>0or 1 time (same as <tt class="literal">{0,1}</tt>)<a name="INDEX-1591"></a></td></tr></table><p>Somethingwith a <tt class="literal">*</tt> or a <tt class="literal">?</tt> doesn't actuallyhave to match.That's because they can match 0 times and still be considered asuccess. A <tt class="literal">+</tt> may often be a better fit, since it has to be thereat least once.</p><p>Don't be confused by the use of "exactly" in the previous table. Itrefers only to the repeat count, not the overall string. Forexample, <tt class="literal">$n =~ /\d{3}/</tt> doesn't say "is this string exactly threedigits long?" It asks whether there's any point within <tt class="literal">$n</tt> atwhich three digits occur in a row. Strings like "101 MorrisStreet" test true, but so do strings like "95472" or "1-800-555-1212".All <em class="emphasis">contain</em> three digits at one or more points, which is all youasked about. See the section <a href="ch05_06.htm#ch05-sect-posit">Section 5.6, "Positions"</a> for how to usepositional assertions (as in <tt class="literal">/^\d{3}$/</tt>) to nail this down.</p><p><a name="INDEX-1592"></a>Given the opportunity to match something a variable number of times,maximal quantifiers will elect to maximize the repeat count. So whenwe say "as many times as you'd like", the greedy quantifier interpretsthis to mean "as many times as you can possibly get away with",constrained only by the requirement that this not cause specificationslater in the match to fail. If a pattern contains two open-endedquantifiers, then obviously both cannot consume the entire string:characters used by one part of the match are no longer availableto a later part. Each quantifier is greedy at the expense of thosethat follow it, reading the pattern left to right.</p><p><a name="INDEX-1593"></a><a name="INDEX-1594"></a>That's the traditional behavior of quantifiers in regular expressions.However, Perl permits you to reform the behavior of its quantifiers: byplacing a <tt class="literal">?</tt> after that quantifier, you change it from maximal tominimal. That doesn't mean that a minimal quantifier will alwaysmatch the smallest number of repetitions allowed by its range, anymore than a maximal quantifier must always match the greatest numberallowed in its range. The overall match must still succeed, and theminimal match will take as much as it needs to succeed, and no more.(Minimal quantifiers value contentment over greed.)</p><p>For example, in the match:<blockquote><pre class="programlisting">"exasperate" =~ /e(.*)e/ # $1 now "xasperat"</pre></blockquote>the <tt class="literal">.*</tt> matches "<tt class="literal">xasperat</tt>", the longest possible string for it tomatch. (It also stores that value in <tt class="literal">$1</tt>, as described inthe section <a href="ch05_07.htm#ch05-sect-candc">Section 5.7, "Capturing and Clustering"</a> later in the chapter.) Although a shorter match wasavailable, a greedy match doesn't care. Given two choices at thesame starting point, it always returns the <em class="emphasis">longer</em> of the two.<a name="INDEX-1595"></a></p><p>Contrast this with this:<blockquote><pre class="programlisting">"exasperate" =~ /e(.*?)e/ # $1 now "xasp"</pre></blockquote>Here, the minimal matching version, <tt class="literal">.*?</tt>, is used. Adding the<tt class="literal">?</tt> to <tt class="literal">*</tt> makes <tt class="literal">*?</tt> take on the opposite behavior: now giventwo choices at the same starting point, it always returns the<em class="emphasis">shorter</em> of the two.</p><p>Although you could read <tt class="literal">*?</tt> as saying to match zero or more ofsomething but preferring zero, that doesn't mean it will alwaysmatch zero characters. If it did so here, for example, and left<tt class="literal">$1</tt> set to <tt class="literal">""</tt>, then the second "<tt class="literal">e</tt>" wouldn't be found, sinceit doesn't immediately follow the first one.</p><p><a name="INDEX-1596"></a><a name="INDEX-1597"></a>You might also wonder why, in minimally matching<tt class="literal">/e(.*?)e/</tt>, Perl didn't stick"<tt class="literal">rat</tt>" into <tt class="literal">$1</tt>. After all,"<tt class="literal">rat</tt>" also falls between two<tt class="literal">e</tt>'s, and is shorter than "<tt class="literal">xasp</tt>".In Perl, the minimal/maximal choice applies only when selecting theshortest or longest from among several matches that all have the samestarting point. If two possible matches exist, but these start atdifferent offsets in the string, then their lengths don't matter--nordoes it matter whether you've used a minimal quantifier or a maximalone. The earliest of several valid matches always wins out over alllatecomers. It's only when multiple possible matches start at thesame point that you use minimal or maximal matching to break the tie.If the starting points differ, there's no tie to break. Perl'smatching is normally <em class="emphasis">leftmost longest</em>;with minimal matching, it becomes <em class="emphasis">leftmostshortest</em>. But the "leftmost" part never varies and is thedominant criterion.<a href="#FOOTNOTE-7">[7]</a></p><blockquote class="footnote"><a name="FOOTNOTE-7"></a><p>[7] Not all regex engines work thisway. Some believe in overall greed, in which the longest match alwayswins, even if it shows up later. Perl isn't that way. You might saythat eagerness holds priority over greed (or thrift). For a moreformal discussion of this principle and many others, see the section<a href="ch05_09.htm#ch05-sect-engine">Section 5.9.4, "The Little Engine That /Could(n't)?/"</a>.</p></blockquote><p><a name="INDEX-1598"></a><a name="INDEX-1599"></a>There are two ways to defeat the leftward leanings of the patternmatcher. First, you can use an earlier greedy quantifier (typically<tt class="literal">.*</tt>) to try to slurp earlier parts of the string. In searching for amatch for a greedy quantifier, it tries for the longest match first,which effectively searches the rest of the string right-to-left:<blockquote><pre class="programlisting">"exasperate" =~ /.*e(.*?)e/ # $1 now "rat"</pre></blockquote>But be careful with that, since the overall match now includes the entirestring up to that point.</p><p>The second way to defeat leftmostness to use positional assertions,discussed in the next section.</p><a name="INDEX-1600"></a><a name="INDEX-1601"></a><!-- BOTTOM NAV BAR --><hr width="515" align="left"><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_04.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0"></a></td><td align="right" valign="top" width="172"><a href="ch05_06.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr><tr><td align="left" valign="top" width="172">5.4. Character Classes</td><td align="center" valign="top" width="171"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0"></a></td><td align="right" valign="top" width="172">5.6. Positions</td></tr></table></div><hr width="515" align="left"><!-- LIBRARY NAV BAR --><img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p><font size="-1"><a href="copyrght.htm">Copyright © 2001</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"> <area shape="rect" coords="2,-1,79,99" href="../index.htm"><area shape="rect" coords="84,1,157,108" href="../perlnut/index.htm"><area shape="rect" coords="162,2,248,125" href="../prog/index.htm"><area shape="rect" coords="253,2,326,130" href="../advprog/index.htm"><area shape="rect" coords="332,1,407,112" href="../cookbook/index.htm"><area shape="rect" coords="414,2,523,103" href="../sysadmin/index.htm"></map><!-- END OF BODY --></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -