ch06_01.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 1,459 行 · 第 1/3 页

HTM
1,459
字号
></TABLE><PCLASS="para"><CODECLASS="literal">/i</CODE><ACLASS="indexterm"NAME="ch06-idx-1000010475-0"></A> and <CODECLASS="literal">/g</CODE> are the most commonly used modifiers. The pattern <CODECLASS="literal">/ram/i</CODE> matches <CODECLASS="literal">&quot;ram&quot;</CODE>, <CODECLASS="literal">&quot;RAM&quot;</CODE>, <CODECLASS="literal">&quot;Ram&quot;</CODE>, and so forth. Backreferences will be checked case-insensitively if this modifier is on; see <ACLASS="xref"HREF="ch06_17.htm"TITLE="Detecting Duplicate Words">Recipe 6.16</A> for an example. This comparison can be made aware of the user's current locale settings if the <CODECLASS="literal">use</CODE> <CODECLASS="literal">locale</CODE> pragma has been invoked. As currently implemented, <CODECLASS="literal">/i</CODE> slows down a pattern match because it disables several performance optimizations.</P><PCLASS="para"><CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch06-idx-1000007495-0"></A>The <CODECLASS="literal">/g</CODE> modifier is used with <CODECLASS="literal">s///</CODE> to replace every match, not just the first one. <CODECLASS="literal">/g</CODE> is also used with <CODECLASS="literal">m//</CODE> in loops to find (but not replace) every matching occurrence:</P><PRECLASS="programlisting">while (m/(\d+)/g) {    print &quot;Found number $1\n&quot;;}</PRE><PCLASS="para">Used in list context, <CODECLASS="literal">/g</CODE> pulls out all matches:</P><PRECLASS="programlisting">@numbers = m/(\d+)/g;</PRE><PCLASS="para">That finds only non-overlapping matches. You have to be much sneakier to get overlapping ones by making a zero-width look-ahead with the <CODECLASS="literal">(?=...)</CODE> construct. Because it's zero-width, the match engine hasn't advanced at all. Within the look-ahead, capturing parentheses are used to grab the thing anyway. Although we've saved something, Perl notices we haven't made any forward progress on the <CODECLASS="literal">/g</CODE> so bumps us forward one character position.</P><PCLASS="para">This shows the difference:</P><PRECLASS="programlisting">$digits = &quot;123456789&quot;;@nonlap = $digits =~ /(\d\d\d)/g;@yeslap = $digits =~ /(?=(\d\d\d))/g;print &quot;Non-overlapping:  @nonlap\n&quot;;print &quot;Overlapping:      @yeslap\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Non-overlapping:  123 456 789</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>Overlapping:      123 234 345 456 567 678 789</I></CODE></B></CODE></PRE><PCLASS="para"><CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch06-idx-1000007500-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007500-1"></A>The <CODECLASS="literal">/s</CODE> and <CODECLASS="literal">/m</CODE> modifiers are used when matching strings with embedded newlines. <CODECLASS="literal">/s</CODE> makes dot match <CODECLASS="literal">&quot;\n&quot;</CODE>, something it doesn't normally do; it also makes the match ignore the value of the old, deprecated <CODECLASS="literal">$*</CODE> variable. <CODECLASS="literal">/m</CODE> makes <CODECLASS="literal">^</CODE> and <CODECLASS="literal">$</CODE> match after and before <CODECLASS="literal">&quot;\n&quot;</CODE> respectively. They are useful with paragraph slurping mode as explained in the introduction to <ACLASS="xref"HREF="ch08_01.htm"TITLE="File Contents">Chapter 8, <CITECLASS="chapter">File Contents</CITE></A>, and in <ACLASS="xref"HREF="ch06_07.htm"TITLE="Matching Multiple Lines">Recipe 6.6</A>.</P><PCLASS="para"><CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch06-idx-1000010974-0"></A>The <CODECLASS="literal">/e</CODE> switch is used so that the right-hand part is run as code and its return value is used as the replacement string. <CODECLASS="literal">s/(\d+)/sprintf(&quot;%#x&quot;,</CODE> <CODECLASS="literal">$1)/ge</CODE> would convert all numbers into hex, changing, for example, <CODECLASS="literal">2581</CODE> into <CODECLASS="literal">0xb23</CODE>.</P><PCLASS="para"><CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch06-idx-1000011003-0"></A>Because different countries have different ideas of what constitutes an alphabet, the POSIX standard provides systems (and thus programs) with a standard way of representing alphabets, character set ordering, and so on. Perl gives you access to some of these through the <CODECLASS="literal">use</CODE><ACLASS="indexterm"NAME="ch06-idx-1000008502-0"></A> <CODECLASS="literal">locale</CODE> pragma; see the <CODECLASS="literal">perllocale</CODE> manpage for more information. When <CODECLASS="literal">use</CODE> <CODECLASS="literal">locale</CODE> is in effect, the <CODECLASS="literal">\w</CODE> character class includes accented and other exotic characters. The case-changing <CODECLASS="literal">\u</CODE>, <CODECLASS="literal">\U</CODE>, <CODECLASS="literal">\l</CODE>, and <CODECLASS="literal">\L</CODE> (and the corresponding <CODECLASS="literal">uc</CODE>, <CODECLASS="literal">ucfirst</CODE>, etc. functions) escapes also respect <CODECLASS="literal">use</CODE> <CODECLASS="literal">locale</CODE>, so <IMGSRC="../chars/sigma.gif"ALT="[sigma]"> will be turned into <IMGSRC="../chars/ssigma.gif"ALT="[Sigma]"> with <CODECLASS="literal">\u</CODE> if the locale says it should. <ACLASS="indexterm"NAME="ch06-idx-1000008503-0"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-chap06_special_0">Special Variables</A></H3><PCLASS="para"><ACLASS="indexterm"NAME="ch06-idx-1000007510-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007510-1"></A>Perl sets special variables as the result of certain kinds of matches: <CODECLASS="literal">$1</CODE>, <CODECLASS="literal">$2</CODE>, <CODECLASS="literal">$3</CODE>, and so on <EMCLASS="emphasis">ad infinitum</EM> (Perl doesn't stop at <CODECLASS="literal">$9</CODE>) are set when a pattern contains back-references (parentheses around part of the pattern). Each left parenthesis as you read left to right in the pattern begins filling a new, numbered variable. The variable <CODECLASS="literal">$+</CODE><ACLASS="indexterm"NAME="ch06-idx-1000007511-0"></A> contains the contents of the last backreference of the last successful match. This helps you tell which of several alternate matches was found (for example, if <CODECLASS="literal">/(x.*y)|(y.*z)/</CODE> matches, <CODECLASS="literal">$+</CODE> contains whichever of <CODECLASS="literal">$1</CODE> or <CODECLASS="literal">$2</CODE> got filled). <CODECLASS="literal">$&amp;</CODE> contains the complete text matched in the last successful pattern match. <CODECLASS="literal">$'</CODE><ACLASS="indexterm"NAME="ch06-idx-1000007512-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007512-1"></A> and <CODECLASS="literal">$`</CODE> are the strings before and after the successful match, respectively:</P><PRECLASS="programlisting">$string = &quot;And little lambs eat ivy&quot;;$string =~ /l[^s]*s/;print &quot;<CODECLASS="literal">($`)</CODE> ($&amp;) <CODECLASS="literal">($')\n</CODE>&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>(And ) (little lambs) ( eat ivy)</I></CODE></B></CODE></PRE><PCLASS="para"><CODECLASS="literal">$`</CODE>, <CODECLASS="literal">$&amp;</CODE>, and <CODECLASS="literal">$'</CODE> are tempting, but dangerous. Their very presence anywhere in a program slows down every pattern match because the engine must populate these variables for every match. This is true even if you use one of these variables only once, or, for that matter, if you never actually use them at all but merely mention them. As of release 5.005, <CODECLASS="literal">$&amp;</CODE> is no longer as expensive.</P><PCLASS="para">All this power may make patterns seem omnipotent. Surprisingly enough, this is not (quite) the case. Regular expressions are fundamentally incapable of doing some things. For some of those, special modules lend a hand. Regular expressions are unable to deal with balanced input, that is, anything that's arbitrarily nested, like matching parentheses, matching HTML tags, etc. For that, you have to build up a real parser, like the HTML::Parser recipes in <ACLASS="xref"HREF="ch20_01.htm"TITLE="Web Automation">Chapter 20, <CITECLASS="chapter">Web Automation</CITE></A>. Another thing Perl patterns can't do yet is fuzzy matches; <ACLASS="xref"HREF="ch06_14.htm"TITLE="Approximate Matching">Recipe 6.13</A> shows how to use a module to work around that.</P><PCLASS="para">To learn far more about regular expressions than you ever thought existed, check out <EMCLASS="emphasis">Mastering Regular Expressions</EM>, written by Jeffrey Friedl and published by O'Reilly &amp; Associates. This book is dedicated to explaining regular expressions from a practical perspective. It not only covers general regular expressions and Perl patterns well, it also compares and contrasts these with those used in other popular languages.</P></DIV></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch05_17.htm"TITLE="5.16. Program: dutree"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 5.16. Program: dutree"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_02.htm"TITLE="6.1. Copying and Substituting Simultaneously"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.1. Copying and Substituting Simultaneously"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">5.16. Program: dutree</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.1. Copying and Substituting Simultaneously</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?