ch07_02.htm

来自「by Randal L. Schwartz and Tom Phoenix I」· HTM 代码 · 共 240 行

HTM
240
字号
<html><head><title>Using Simple Patterns (Learning Perl, 3rd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Randal L. Schwartz and Tom Phoenix" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596001320L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Learning Perl, 3rd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Learning Perl, 3rd Edition" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"></a></td><td align="right" valign="top" width="228"><a href="ch07_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">7.2. Using Simple Patterns</h2><p><a name="INDEX-500" />Tocompare a pattern (regular expression) to the contents of<tt class="literal">$_</tt>, simply put the pattern between a pair of<a name="INDEX-501" /> <a name="INDEX-502" />forward slashes(<tt class="literal">/</tt>), like we do here:</p><blockquote><pre class="code">$_ = "yabba dabba doo";if (/abba/) {  print "It matched!\n";}</pre></blockquote><p>The expression <tt class="literal">/abba/</tt> looks for that four-letterstring in <tt class="literal">$_</tt>; if it finds it, it returns a truevalue. In this case, it's found more than once, but thatdoesn't make any difference. If it's found at all,it's a match; if it's not in there at all, it fails.</p><p>Because the pattern match is generally being used to return a true orfalse value, it is almost always found in the conditional expressionof <tt class="literal">if</tt> or <tt class="literal">while</tt>.</p><p>All of the usual backslash escapes that you can put intodouble-quoted strings are available in patterns, so you could use thepattern <tt class="literal">/coke\tsprite/</tt> to match the elevencharacters of <tt class="literal">coke</tt>, a tab, and<tt class="literal">sprite</tt>.</p><a name="lperl3-CHP-7-SECT-2.1" /><div class="sect2"><h3 class="sect2">7.2.1. About Metacharacters</h3><p>Of course, if patterns matched only simple literal strings, theywouldn't be very useful. That's why there are a number ofspecial characters, called<em class="firstterm">metacharacters</em><a name="INDEX-503" />,that have special meanings in regular expressions.</p><p>For example, the <a name="INDEX-504" /><a name="INDEX-505" /> <a name="INDEX-506" /><a name="INDEX-507" />dot(<tt class="literal">.</tt>) is a wildcard character -- it matches anysingle character except a newline (which is represented by<tt class="literal">"\n"</tt>). So, the pattern <tt class="literal">/bet.y/</tt>would match <tt class="literal">betty</tt>. Or it would match<tt class="literal">betsy</tt>, or <tt class="literal">bet=y</tt>, or<tt class="literal">bet.y</tt>, or any other string that has<tt class="literal">bet</tt>, followed by any one character (except anewline), followed by <tt class="literal">y</tt>. It wouldn't match<tt class="literal">bety</tt> or <tt class="literal">betsey</tt>, though, sincethose don't have exactly one character between the<tt class="literal">t</tt> and the <tt class="literal">y</tt>. The dot alwaysmatches exactly one character.</p><p>So, if you wanted to match a period in the string, you<em class="emphasis">could</em> use the dot. But that would match anypossible character (except a newline), which might be more than youwanted. If you wanted the dot to match <em class="emphasis">just</em> aperiod, you can simply backslash it. In fact, that rule goes for allof Perl's regular expression metacharacters: a backslash infront of any metacharacter makes it nonspecial. So, the pattern<tt class="literal">/3\.14159/</tt> doesn't have a wildcardcharacter.</p><p>So the backslash is our second metacharacter. If you mean a realbackslash, just use a pair of them -- a rule that applies just aswell everywhere else in Perl.</p></div><a name="lperl3-CHP-7-SECT-2.2" /><div class="sect2"><h3 class="sect2">7.2.2. Simple Quantifiers</h3><p>It often happens that you'll need to repeat something in apattern. The <a name="INDEX-508" /><a name="INDEX-509" />star (<tt class="literal">*</tt>) means tomatch the preceding item zero or more times. So,<tt class="literal">/fred\t*barney/</tt> matches any number of tabcharacters between <tt class="literal">fred</tt> and<tt class="literal">barney</tt>. That is, it matches<tt class="literal">"fred\tbarney"</tt> with one tab, or<tt class="literal">"fred\t\tbarney"</tt> with two tabs, or<tt class="literal">"fred\t\t\tbarney"</tt> with three tabs, or even<tt class="literal">"fredbarney"</tt> with nothing in between at all.That's because the star means "zero ormore" -- so you could even have hundreds of tab charactersin between, but nothing other than tabs. You may find it helpful tothink of star as saying, "that previous thing, any number of<em class="emphasis">times</em>, even zero times."</p><p>What if you wanted to allow something besides tab characters? The dotmatches any character<a href="#FOOTNOTE-167">[167]</a>, so <tt class="literal">.*</tt> will match anycharacter, any number of times. That means that the pattern<tt class="literal">/fred.*barney/</tt> matches "any old junk"between <tt class="literal">fred</tt> and <tt class="literal">barney</tt>. Anyline that mentions <tt class="literal">fred</tt> and (somewhere later)<tt class="literal">barney</tt> will match that pattern. We often call<tt class="literal">.*</tt> the "any old junk" pattern, becauseit can match any old junk in your strings.</p><blockquote class="footnote"> <a name="FOOTNOTE-167" /><p>[167]Except newline. But we'regoing to stop reminding you of that so often, because you know it bynow. Most of the time it doesn't matter, anyway, because yourstrings will most-often not have newlines. But don't forgetthis detail, because someday a newline will sneak into your stringand you'll need to remember that the dot doesn't matchnewline.</p> </blockquote><p>The star is formally called a<em class="firstterm">quantifier</em><a name="INDEX-510" />,meaning that it specifies a quantity of the preceding item. Butit's not the only quantifier; the <a name="INDEX-511" /> <a name="INDEX-512" />plus("<tt class="literal">+</tt>") is another. The plus means tomatch the preceding item <em class="emphasis">one</em> or more times:<tt class="literal">/fred +barney/</tt> matches if <tt class="literal">fred</tt>and <tt class="literal">barney</tt> are separated by spaces and onlyspaces. (The space is not a metacharacter.) This won't match<tt class="literal">fredbarney</tt>, since the plus means that there mustbe one or more spaces between the two names, so at least one space isrequired. It may be helpful to think of the plus as saying,"that last thing, <em class="emphasis">plus</em> any number more ofthe same thing."</p><p>There's a third quantifier like the star and plus, but morelimited. It's the <a name="INDEX-513" /> <a name="INDEX-514" />question mark("<tt class="literal">?</tt>"), which means that the precedingitem is optional. That is, the preceding item may occur once or notat all. Like the other two quantifiers, the question mark means thatthe preceding item appears a certain number of times. It's justthat in this case the item may match one time (if it's there)or zero times (if it's not). There aren't any otherpossibilities. So, <tt class="literal">/bamm-?bamm/</tt> matches eitherspelling: <tt class="literal">bamm-bamm</tt> or<tt class="literal">bammbamm</tt>. This is easy to remember, sinceit's saying "that last thing, maybe? Or maybe not?"</p><p>All three of these quantifiers must follow something, since they tellhow many times the <em class="emphasis">previous</em> item may repeat.</p></div><a name="lperl3-CHP-7-SECT-2.3" /><div class="sect2"><h3 class="sect2">7.2.3. Grouping in Patterns</h3><p>As in mathematics,<a name="INDEX-515" />parentheses ("<tt class="literal">()</tt>") may be used for grouping. So, parentheses arealso metacharacters. As an example, the pattern<tt class="literal">/fred+/</tt> matches strings like<tt class="literal">freddddddddd</tt>, but strings like that don'tshow up often in real life. But the pattern<tt class="literal">/(fred)+/</tt> matches strings like<tt class="literal">fredfredfred</tt>, which is more likely to be what youwanted. And what about the pattern <tt class="literal">/(fred)*/</tt>? Thatmatches strings like <tt class="literal">hello, world</tt>.<a href="#FOOTNOTE-168">[168]</a></p><blockquote class="footnote"><a name="FOOTNOTE-168" /><p>[168]The star means to match <em class="emphasis">zero</em> or morerepetitions of <tt class="literal">fred</tt>. When you're willing tosettle for zero, it's hard to be disappointed! That patternwill match any string, even the empty string.</p> </blockquote></div><a name="lperl3-CHP-7-SECT-2.4" /><div class="sect2"><h3 class="sect2">7.2.4. Alternatives</h3><p>The <a name="INDEX-516" /><a name="INDEX-517" />vertical bar (<tt class="literal">|</tt>),often pronounced "or" in this usage, means that eitherthe left side may match, or the right side. That is, if the part ofthe pattern on the left of the bar fails, the part on the right getsa chance to match. So, <tt class="literal">/fred|barney|betty/</tt> willmatch any string that mentions <tt class="literal">fred</tt>, or<tt class="literal">barney</tt>, or <tt class="literal">betty</tt>.</p><p>Now we can make patterns like <tt class="literal">/fred( |\t)+barney/</tt>,which matches if <tt class="literal">fred</tt> and<tt class="literal">barney</tt> are separated by spaces, tabs, or a mixtureof the two. The plus means to repeat one or more times; each time itrepeats, the <tt class="literal">( |\t)</tt> has the chance to match eithera space or a tab.<a href="#FOOTNOTE-169">[169]</a> There must beat least one character between the two names.</p><blockquote class="footnote"> <a name="FOOTNOTE-169" /><p>[169]This particular match wouldnormally be done more efficiently with a character class, aswe'll see in the next chapter.</p> </blockquote><p>If you wanted the characters between <tt class="literal">fred</tt> and<tt class="literal">barney</tt> to all be the same, you could rewrite thatpattern as <tt class="literal">/fred( +|\t+)barney/</tt>. In this case, theseparators must be all spaces, or all tabs.</p><p>The pattern <tt class="literal">/fred (and|or) barney/</tt> matches anystring containing either of the two possible strings: <tt class="literal">fredand barney</tt>, or <tt class="literal">fred or barney</tt>.<a href="#FOOTNOTE-170">[170]</a> Wecould match the same two strings with the pattern <tt class="literal">/fred andbarney|fred or</tt> <tt class="literal">barney/</tt>, but that wouldbe too much typing. It would probably also be less efficient,depending upon what optimizations are built into the regularexpression engine.</p><blockquote class="footnote"><a name="FOOTNOTE-170" /><p>[170]Note that the words <tt class="literal">and</tt> and<tt class="literal">or</tt> are <em class="emphasis">not</em> operators inregular expressions! They are shown here in a fixed-width typefacebecause they're part of the strings.</p> </blockquote></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch07_01.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch07_03.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">7. Concepts of Regular Expressions</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">7.3. A Pattern Test Program</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?