📄 ch05_08.htm
字号:
<html><head><title>Alternation (Programming Perl)</title><!-- STYLESHEET --><link rel="stylesheet" type="text/css" href="../style/style1.css"><!-- METADATA --><!--Dublin Core Metadata--><meta name="DC.Creator" content=""><meta name="DC.Date" content=""><meta name="DC.Format" content="text/xml" scheme="MIME"><meta name="DC.Generator" content="XSLT stylesheet, xt by James Clark"><meta name="DC.Identifier" content=""><meta name="DC.Language" content="en-US"><meta name="DC.Publisher" content="O'Reilly & Associates, Inc."><meta name="DC.Source" content="" scheme="ISBN"><meta name="DC.Subject.Keyword" content=""><meta name="DC.Title" content="Alternation"><meta name="DC.Type" content="Text.Monograph"></head><body><!-- START OF BODY --><!-- TOP BANNER --><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home"><map name="banner-map"><AREA SHAPE="RECT" COORDS="0,0,466,71" HREF="index.htm" ALT="Programming Perl"><AREA SHAPE="RECT" COORDS="467,0,514,18" HREF="jobjects/fsearch.htm" ALT="Search this book"></map><!-- TOP NAV BAR --><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_07.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="ch05_01.htm">Chapter 5: Pattern Matching</a></td><td align="right" valign="top" width="172"><a href="ch05_09.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr></table></div><hr width="515" align="left"><!-- SECTION BODY --><h2 class="sect1">5.8. Alternation</h2><a name="INDEX-1670"></a><a name="INDEX-1671"></a><p><a name="INDEX-1672"></a>Inside a pattern or subpattern, use the <tt class="literal">|</tt> metacharacter to specify aset of possibilities, any one of which could match. For instance:<blockquote><pre class="programlisting">/Gandalf|Saruman|Radagast/</pre></blockquote>matches <tt class="literal">Gandalf</tt> or <tt class="literal">Saruman</tt> or <tt class="literal">Radagast</tt>. The alternationextends only as far as the innermost enclosing parentheses (whethercapturing or not):<blockquote><pre class="programlisting">/prob|n|r|l|ate/ # Match prob, n, r, l, or ate/pro(b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate/pro(?:b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate</pre></blockquote>The second and third forms match the same strings, but the secondform captures the variant character in <tt class="literal">$1</tt> and the third formdoes not.</p><p>At any given position, the Engine tries to match the firstalternative, and then the second, and so on. The relative length ofthe alternatives does not matter, which means that in this pattern:<blockquote><pre class="programlisting">/(Sam|Samwise)/</pre></blockquote><tt class="literal">$1</tt> will never be set to <tt class="literal">Samwise</tt> no matter what string it'smatched against, because <tt class="literal">Sam</tt> will always match first. When youhave overlapping matches like this, put the longer ones at thebeginning.</p><p>But the ordering of the alternatives only matters at a given position.The outer loop of the Engine does left-to-right matching,so the following always matches the first <tt class="literal">Sam</tt>:<blockquote><pre class="programlisting">"'Sam I am,' said Samwise" =~ /(Samwise|Sam)/; # $1 eq "Sam"</pre></blockquote>But you can force right-to-left scanning by making use of greedyquantifiers, as discussed earlier in "Quantifiers":<blockquote><pre class="programlisting">"'Sam I am,' said Samwise" =~ /.*(Samwise|Sam)/; # $1 eq "Samwise"</pre></blockquote><a name="INDEX-1673"></a>You can defeat any left-to-right (or right-to-left) matching byincluding any of the various positional assertions we saw earlier, suchas <tt class="literal">\G</tt>, <tt class="literal">^</tt>, and <tt class="literal">$</tt>. Here we anchor the pattern to the end ofthe string:<blockquote><pre class="programlisting">"'Sam I am,' said Samwise" =~ /(Samwise|Sam)$/; # $1 eq "Samwise"</pre></blockquote>That example factors the <tt class="literal">$</tt> out of the alternation(since we already had a handy pair of parentheses to put it after),but in the absence of parentheses you can also distribute theassertions to any or all of the individual alternatives, depending onhow you want them to match. This little program displays lines thatbegin with either a <tt class="literal">__DATA__</tt> or<tt class="literal">__END__</tt> token:<blockquote><pre class="programlisting">#!/usr/bin/perlwhile (<>) { print if /^__DATA__|^__END__/;}</pre></blockquote>But be careful with that. Remember that the first and lastalternatives (before the first <tt class="literal">|</tt> and after the last one) tend togobble up the other elements of the regular expression on either side,out to the ends of the expression, unless there are enclosingparentheses. A common mistake is to ask for:<blockquote><pre class="programlisting">/^cat|dog|cow$/</pre></blockquote>when you really mean:<blockquote><pre class="programlisting">/^(cat|dog|cow)$/</pre></blockquote>The first matches "<tt class="literal">cat</tt>" at the beginning of thestring, or "<tt class="literal">dog</tt>" anywhere, or"<tt class="literal">cow</tt>" at the end of the string. The second matchesany string consisting solely of "<tt class="literal">cat</tt>" or"<tt class="literal">dog</tt>" or "<tt class="literal">cow</tt>". It alsocaptures <tt class="literal">$1</tt>, which you may not want. You can alsosay:<blockquote><pre class="programlisting">/^cat$|^dog$|^cow$/</pre></blockquote>We'll show you another solution later.</p><p>An alternative can be empty, in which case it always matches.<blockquote><pre class="programlisting">/com(pound|)/; # Matches "compound" or "com"/com(pound(s|)|)/; # Matches "compounds", "compound", or "com"</pre></blockquote>This is much like using the <tt class="literal">?</tt> quantifier, which matches 0 times or 1 time:<blockquote><pre class="programlisting">/com(pound)?/; # Matches "compound" or "com"/com(pound(s?))?/; # Matches "compounds", "compound", or "com"/com(pounds?)?/; # Same, but doesn't use $2</pre></blockquote>There is one difference, though. When you apply the <tt class="literal">?</tt> to asubpattern that captures into a numbered variable, that variablewill be undefined if there's no string to go there. If you used anempty alternative, it would still be false, but would be a defined nullstring instead.</p><!-- BOTTOM NAV BAR --><hr width="515" align="left"><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_07.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0"></a></td><td align="right" valign="top" width="172"><a href="ch05_09.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr><tr><td align="left" valign="top" width="172">5.7. Capturing and Clustering</td><td align="center" valign="top" width="171"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0"></a></td><td align="right" valign="top" width="172">5.9. Staying in Control</td></tr></table></div><hr width="515" align="left"><!-- LIBRARY NAV BAR --><img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p><font size="-1"><a href="copyrght.htm">Copyright © 2001</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"> <area shape="rect" coords="2,-1,79,99" href="../index.htm"><area shape="rect" coords="84,1,157,108" href="../perlnut/index.htm"><area shape="rect" coords="162,2,248,125" href="../prog/index.htm"><area shape="rect" coords="253,2,326,130" href="../advprog/index.htm"><area shape="rect" coords="332,1,407,112" href="../cookbook/index.htm"><area shape="rect" coords="414,2,523,103" href="../sysadmin/index.htm"></map><!-- END OF BODY --></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -