📄 ch08_04.htm
字号:
<html><head><title>Memory Parentheses (Learning Perl, 3rd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Randal L. Schwartz and Tom Phoenix" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596001320L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Learning Perl, 3rd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Learning Perl, 3rd Edition" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"></a></td><td align="right" valign="top" width="228"><a href="ch08_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">8.4. Memory Parentheses</h2><p><a name="INDEX-556" /> <a name="INDEX-557" />You remember that parentheses("<tt class="literal">( )</tt>") may be used for groupingtogether parts of a pattern. They also have a second function: theytell the regular expression engine to remember what was in thesubstring matched by the pattern in the parentheses. That is to say,it doesn't remember what was in the pattern itself; itremembers what was in the corresponding part of the string. Wheneveryou use parentheses for grouping, they automatically work as memoryparentheses as well.</p><p>So, if you use <tt class="literal">/./</tt>, you'll match any singlecharacter (except newline); if you use <tt class="literal">/(.)/</tt>,you'll still match any single character, but now it will bekept in a <a name="INDEX-558" />regular expression memory. For eachpair of parentheses in the pattern, you'll have one regularexpression memory.</p><a name="lperl3-CHP-8-SECT-4.1" /><div class="sect2"><h3 class="sect2">8.4.1. Backreferences</h3><p>A<em class="firstterm">backreference</em><a name="INDEX-559" />refers back to a memory that was saved earlier in the currentpattern's processing. Backreferences are made with a<a name="INDEX-560" /><a name="INDEX-561" />backslash, which is easy toremember. For example, <tt class="literal">\1</tt> contains the firstregular expression memory (that is, the part of the string matched bythe first pair of parentheses).</p><p>Backreferences are used to go back and match the exact same<a href="#FOOTNOTE-182">[182]</a> string that was matched earlier in the pattern. So,<tt class="literal">/(.)\1/</tt> means to match any one character, rememberit as memory one, then match memory one again. In other words, matchany character, followed by the <em class="emphasis">same</em> character.So, this pattern will match strings with doubled-letters, as in<tt class="literal">bamm-bamm</tt> and <tt class="literal">betty</tt>. Of course,the dot will match characters other than letters, so if a string hastwo spaces in a row, two tabs in a row, or two asterisks in a row, itwill match.</p><blockquote class="footnote"><a name="FOOTNOTE-182" /><p>[182]Well, if the pattern is case-insensitive, as we'll learnin the next chapter, the capitalization doesn't have to match.Other than that, though, the string must be the same.</p></blockquote><p>That's not the same as the pattern <tt class="literal">/../</tt>,which will match any character followed by any character -- thosetwo could be the same, or they could be different.<tt class="literal">/(.)\1/</tt> means to match any character followed bythe <em class="emphasis">same</em> character.</p><p>A typical usage of these memories might be if you have someHTML-like<a href="#FOOTNOTE-183">[183]</a> text to process. Forexample, maybe you want to match a tag like these two, which may useeither single quotes or double quotes:</p><blockquote class="footnote"> <a name="FOOTNOTE-183" /><p>[183]These examples are intentionally<span class="option">not</span> HTML, because there are too many tricky thingsthat crop up in real HTML, or any similar markup language like XML orSGML. If you need to work with HTML, don't use simple patternslike these. Get a robust module from CPAN, so that you can start withcode that's already written and debugged. If you don't,we promise that you'll be sorry. Don't say wedidn't warn you.</p> </blockquote><blockquote><pre class="code"><image source='fred.png'><image source="fred's-birthday.png"></pre></blockquote><p>The tag may have either single quotes or double quotes, since thequoted data may include the other kind of mark (as with theapostrophe in the second example tag). So the pattern might look likethis: <tt class="literal">/<image source=(['"]).*\1>/</tt>. That saysthat the opening quote mark may be of either type, but there must bea matching mark at the end of the quote.<a href="#FOOTNOTE-184">[184]</a></p><blockquote class="footnote"> <a name="FOOTNOTE-184" /><p>[184]If yourealize that there may be problems with using this pattern on amarkup language like HTML, that's okay. There are lots ofproblems with that! This is just an example to illustrate a use of abackreference. You shouldn't use simple patterns to parseanything as complex as HTML anyway.</p> </blockquote><p>If you have more sets of parentheses, you can have morebackreferences. As you might guess, <tt class="literal">\17</tt> is thecontents of the seventeenth regular expression memory, if you have atleast that many sets of parentheses.<a href="#FOOTNOTE-185">[185]</a></p><blockquote class="footnote"> <a name="FOOTNOTE-185" /><p>[185]If youdon't have that many sets of parentheses before that point inthe pattern, backreferences <tt class="literal">\10</tt> and beyond will betreated as octal character escapes. To keep an octal character escapelike <tt class="literal">\12</tt> from accidentally meaning abackreference, just use a leading zero: <tt class="literal">\012</tt> isalways a character, never a backreference.</p> </blockquote><p>In numbering backreferences, you can just count the left (opening)parentheses. The pattern<tt class="literal">/((fred|wilma) (flintstone))\1/</tt><a name="INDEX-562" /> says to match strings like<tt class="literal">fred</tt> <tt class="literal">flintstone fredflintstone</tt>, since the first opening parenthesis and itscorresponding closing parenthesis hold a pattern that matches<tt class="literal">fred flintstone</tt>.<a href="#FOOTNOTE-186">[186]</a></p><blockquote class="footnote"> <a name="FOOTNOTE-186" /><p>[186]This patternwould also match <tt class="literal">wilma flintstone wilmaflintstone</tt>.</p> </blockquote><p>If we wrote <tt class="literal">/((fred|wilma) (flintstone)) \2/</tt>instead, we would match strings like <tt class="literal">fred flintstonefred</tt>; memory two is the choice of <tt class="literal">fred</tt>or <tt class="literal">wilma</tt>. (Notice that it wouldn't match<tt class="literal">fred flintsone wilma</tt>, since the backreference canmatch only the same name that was matched earlier: either<tt class="literal">fred</tt> or <tt class="literal">wilma</tt>. But it couldmatch <tt class="literal">wilma flintstone wilma</tt>, since that one usesthe same name.) And the pattern <tt class="literal">/((fred|wilma) (flintstone))\3/</tt> would match strings like <tt class="literal">fred flintstoneflintstone</tt>. It's uncommon to have a literal stringlike <tt class="literal">flintstone</tt> in memory parentheses, though; wedid that one just to have a third example.</p></div><a name="lperl3-CHP-8-SECT-4.2" /><div class="sect2"><h3 class="sect2">8.4.2. Memory Variables</h3><p><a name="INDEX-563" />When we getto the next chapter and back into the world of Perl, we'll seethat the contents of these regular expression memories are availableto us in special variables like <tt class="literal">$1</tt> after thepattern match is done. We mention this here just so you'll knowthat the memories aren't merely used for backreferences; if yousee what seem to be unnecessary parentheses in a pattern, they mayactually be setting up those memories.<a name="INDEX-564" /> <a name="INDEX-565" /> </p></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_03.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch08_05.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">8.3. Anchors</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">8.5. Precedence</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -