⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch05_10.htm

📁 编程珍珠,里面很多好用的代码,大家可以参考学习呵呵,
💻 HTM
📖 第 1 页 / 共 3 页
字号:
subpattern, <tt class="literal">(?{ $result = $i })</tt>, ensures that the count willlive on in <tt class="literal">$result</tt>.</p><p>The special variable <tt class="literal">$^R</tt> (described in <a href="ch28_01.htm">Chapter 28, "Special Names"</a>)holds the result of the last <tt class="literal">(?{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> that wasexecuted as part of a successful match.</p><p>You can use a <tt class="literal">(?{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> extension as the <em class="replaceable">COND</em> of a <tt class="literal">(?(</tt><em class="replaceable">COND</em><tt class="literal">)</tt><em class="replaceable">IFTRUE</em><tt class="literal">|</tt><em class="replaceable">IFFALSE</em><tt class="literal">)</tt>.  If you do this, <tt class="literal">$^R</tt> will notbe set, and you may omit the parentheses around the conditional:<blockquote><pre class="programlisting">"glyph" =~ /.+(?(?{ $foo{bar} gt "symbol" }).|signet)./;</pre></blockquote>Here, we test whether <tt class="literal">$foo{bar}</tt> is greater than <tt class="literal">symbol</tt>.  If so, weinclude <tt class="literal">.</tt> in the pattern, and if not, we include <tt class="literal">signet</tt> in thepattern.  Stretched out a bit, it might be construed as more readable:<blockquote><pre class="programlisting">"glyph" =~ m{    .+                              # some anythings    (?(?{                           # if             $foo{bar} gt "symbol"   # this is true        })            .                       # match another anything        |                           # else            signet                  # match signet    )    .                               # and one more anything}x;</pre></blockquote>When <tt class="literal">use re 'eval'</tt> is in effect, a regex is allowedto contain <tt class="literal">(?{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> subpatterns even if the regular expressioninterpolates variables:<blockquote><pre class="programlisting">/(.*?) (?{length($1) &lt; 3 &amp;&amp; warn}) $suffix/;  # Error without use re 'eval'</pre></blockquote>This is normally disallowed since it is a potential security risk.Even though the pattern above may be innocuous because <tt class="literal">$suffix</tt> isinnocuous, the regex parser can't tell which parts of the string wereinterpolated and which ones weren't, so it just disallows codesubpatterns entirely if there were any interpolations.</p><p>If the pattern is obtained from tainted data, even <tt class="literal">use re'eval'</tt> won't allow the pattern match to proceed.</p><p>When <tt class="literal">use re 'taint'</tt> is in effect and a taintedstring is the target of a regex, the captured subpatterns (either inthe numbered variables or in the list of values returned by<tt class="literal">m//</tt> in list context) are tainted.  This is usefulwhen regex operations on tainted data are meant not to extract safesubstrings, but merely to perform other transformations.  See<a href="ch23_01.htm">Chapter 23, "Security"</a>, for more on tainting.  For the purposeof this pragma, precompiled regular expressions (usually obtained from<tt class="literal">qr//</tt>) are not considered to be interpolated:<blockquote><pre class="programlisting">/foo${pat}bar/</pre></blockquote>This is allowed if <tt class="literal">$pat</tt> is a precompiled regular expression, evenif <tt class="literal">$pat</tt> contains <tt class="literal">(?{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> subpatterns.</p><p>Earlier we showed you a bit of what <tt class="literal">use</tt><tt class="literal">re</tt><tt class="literal">'debug'</tt> prints out.  A moreprimitive debugging solution is to use <tt class="literal">(?{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> subpatterns toprint out what's been matched so far during the match:<blockquote><pre class="programlisting">"abcdef" =~ / .+ (?{print "Matched so far: $&amp;\n"}) bcdef $/x;</pre></blockquote>This prints:<blockquote><pre class="programlisting">Matched so far: abcdefMatched so far: abcdeMatched so far: abcdMatched so far: abcMatched so far: abMatched so far: a</pre></blockquote>showing the <tt class="literal">.+</tt> grabbing all the letters and giving them up oneby one as the Engine backtracks.</p><h3 class="sect3">5.10.3.4. Match-time pattern interpolation</h3><p><a name="INDEX-1766"></a><a name="INDEX-1767"></a><a name="INDEX-1768"></a>You can build parts of your pattern from within the pattern itself.The<tt class="literal">(??{</tt>&nbsp;<em class="replaceable">CODE</em>&nbsp;<tt class="literal">})</tt>extension allows you to insert code that evaluates to a valid pattern.It's like saying <tt class="literal">/$pattern/</tt>, except that you cangenerate <tt class="literal">$pattern</tt> at run time--more specifically,at match time.  For instance:<blockquote><pre class="programlisting">/\w (??{ if ($threshold &gt; 1) { "red" } else { "blue" } }) \d/x;</pre></blockquote>This is equivalent to <tt class="literal">/\wred\d/</tt> if<tt class="literal">$threshold</tt> is greater than 1, and<tt class="literal">/\wblue\d/</tt> otherwise.</p><p><a name="INDEX-1769"></a>You can include backreferences inside the evaluated code to derivepatterns from just-matched substrings (even if they will later becomeunmatched through backtracking).  For instance, this matches allstrings that read the same backward as forward (known aspalindromedaries, phrases with a hump in the middle):<blockquote><pre class="programlisting">/^ (.+) .? (??{quotemeta reverse $1}) $/xi;</pre></blockquote>You can balance parentheses like so:<blockquote><pre class="programlisting">$text =~ /( \(+ ) (.*?) (??{ '\)' x length $1 })/x;</pre></blockquote>This matches strings of the form <tt class="literal">(shazam!)</tt> and<tt class="literal">(((shazam!)))</tt>, sticking <tt class="literal">shazam!</tt>into <tt class="literal">$2</tt>.  Unfortunately, it doesn't notice whetherthe parentheses in the middle are balanced.  For that we needrecursion.</p><p><a name="INDEX-1770"></a><a name="INDEX-1771"></a>Fortunately, you can do recursive patterns too.  You can have acompiled pattern that uses <tt class="literal">(??{</tt><em class="replaceable">CODE</em><tt class="literal">})</tt> to refer toitself.  Recursive matching is pretty irregular, as regularexpressions go.  Any text on regular expressions will tell you that astandard regex can't match nested parentheses correctly.  And that'scorrect.  It's also correct that Perl's regexes aren't standard.  Thefollowing pattern<a href="#FOOTNOTE-16">[16]</a> matches aset of nested parentheses, however deep they go:<blockquote><pre class="programlisting">$np = qr{           \(           (?:              (?&gt; [^()]+ )    # Non-parens without backtracking            |              (??{ $np })     # Group with matching parens           )*           \)        }x;</pre></blockquote>You could use it like this to match a function call:<blockquote><pre class="programlisting">$funpat = qr/\w+$np/;'myfunfun(1,(2*(3+4)),5)' =~ /^$funpat$/;   # Matches!</pre></blockquote></p><blockquote class="footnote"><a name="FOOTNOTE-16"></a><p>[16]Note that you can't declare thevariable in the same statement in which you're going to use it.  Youcan always declare it earlier, of course.</p></blockquote><h3 class="sect3">5.10.3.5. Conditional interpolation</h3><p><a name="INDEX-1772"></a><a name="INDEX-1773"></a><a name="INDEX-1774"></a>The<tt class="literal">(?(</tt><em class="replaceable">COND</em><tt class="literal">)</tt><em class="replaceable">IFTRUE</em><tt class="literal">|</tt><em class="replaceable">IFFALSE</em><tt class="literal">)</tt>regex extension is similar to Perl's <tt class="literal">?:</tt> operator.If <em class="replaceable">COND</em> is true, the<em class="replaceable">IFTRUE</em> pattern is used; otherwise, the<em class="replaceable">IFFALSE</em> pattern is used.  The<em class="replaceable">COND</em> can be a backreference (expressed as abare integer, without the <tt class="literal">\</tt> or<tt class="literal">$</tt>), a lookaround assertion, or a code subpattern.(See <a href="ch05_10.htm#ch05-sect-la">Section 5.10.1, "Lookaround Assertions"</a> and <a href="ch05_10.htm#ch05-sect-mt">Section 5.10.3.3, "Match-time code evaluation"</a> earlier in this chapter.)</p><p>If the <em class="replaceable">COND</em> is an integer, it is treated asa backreference.  For instance, consider:<blockquote><pre class="programlisting">#!/usr/bin/perl$x = 'Perl is free.';$y = 'ManagerWare costs $99.95.';foreach ($x, $y) {    /^(\w+) (?:is|(costs)) (?(2)(\$\d+)|\w+)/;  # Either (\$\d+) or \w+    if ($3) {        print "$1 costs money.\n";         # ManagerWare costs money.    } else {        print "$1 doesn't cost money.\n";  # Perl doesn't cost money.    }}</pre></blockquote>Here, the <em class="replaceable">COND</em> is <tt class="literal">(2)</tt>,which is true if a second backreference exists.  If that's the case,<tt class="literal">(\$\d+)</tt> is included in the pattern at that point(creating the <tt class="literal">$3</tt> backreference); otherwise,<tt class="literal">\w+</tt> is used.</p><p>If the <em class="replaceable">COND</em> is a lookaround or codesubpattern, the truth of the assertion is used to determine whether toinclude <em class="replaceable">IFTRUE</em> or<em class="replaceable">IFFALSE</em>:<blockquote><pre class="programlisting">/[ATGC]+(?(?&lt;=AA)G|C)$/;</pre></blockquote>This uses a lookbehind assertion as the<em class="replaceable">COND</em> to match a DNA sequence that ends ineither <tt class="literal">AAG</tt>, or some other base combination and<tt class="literal">C</tt>.</p><p>You can omit the<tt class="literal">|</tt><em class="replaceable">IFFALSE</em> alternative.If you do, the <em class="replaceable">IFTRUE</em> pattern will beincluded in the pattern as usual if the<em class="replaceable">COND</em> is true, but if the condition isn'ttrue, the Engine will move on to the next portion of the pattern.</p><h3 class="sect2">5.10.4. Defining Your Own Assertions</h3><p><a name="INDEX-1775"></a><a name="INDEX-1776"></a><a name="INDEX-1777"></a>You can't change how Perl's Engine works, but if you're sufficientlywarped, you can change how it sees your pattern.  Since Perl interpretsyour pattern similarly to double-quoted strings, you can use the wonderof overloaded string constants to see to it that text sequences of yourchoosing are automatically translated into other text sequences.</p><p><a name="INDEX-1778"></a>In the example below, we specify two transformations to occur whenPerl encounters a pattern.  First, we define <tt class="literal">\tag</tt>so that when it appears in a pattern, it's automatically translated to<tt class="literal">(?:&lt;.*?&gt;)</tt>, which matches most HTML and XML tags.Second, we "redefine" the <tt class="literal">\w</tt> metasymbol so that ithandles only English letters.</p><p>We'll define a package called <tt class="literal">Tagger</tt> that hides theoverloading from our main program.  Once we do that, we'll be able tosay:<blockquote><pre class="programlisting">use Tagger;$_ = '&lt;I&gt;camel&lt;/I&gt;';print "Tagged camel found" if /\tag\w+\tag/;</pre></blockquote>Here's <em class="emphasis">Tagger.pm</em>, couched in the form of a Perlmodule (see <a href="ch11_01.htm">Chapter 11, "Modules"</a>):<blockquote><pre class="programlisting">package Tagger;use overload;sub import { overload::constant 'qr' =&gt; \&amp;convert }sub convert {    my $re = shift;    $re =~ s/ \\tag  /&lt;.*?&gt;/xg;    $re =~ s/ \\w    /[A-Za-z]/xg;    return $re;}1;</pre></blockquote>The <tt class="literal">Tagger</tt> module is handed the pattern immediately beforeinterpolation, so you can bypass the overloading by bypassing interpolation,as follows:<blockquote><pre class="programlisting">$re = '\tag\w+\tag';   # This string begins with \t, a tabprint if /$re/;        # Matches a tab, followed by an "a"...</pre></blockquote>If you wanted the interpolated variable to be customized, callthe <tt class="literal">convert</tt> function directly:<blockquote><pre class="programlisting">$re = '\tag\w+\tag';         # This string begins with \t, a tab$re = Tagger::convert $re;   # expand \tag and \wprint if /$re/;              # $re becomes &lt;.*?&gt;[A-Za-z]+&lt;.*?&gt;</pre></blockquote>Now if you're still wondering what those <tt class="literal">sub</tt>thingies are there in the <tt class="literal">Tagger</tt> module, you'llfind out soon enough because that's what our next chapter is allabout.<a name="INDEX-1779"></a></p><!-- BOTTOM NAV BAR --><hr width="515" align="left"><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_09.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0"></a></td><td align="right" valign="top" width="172"><a href="ch06_01.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr><tr><td align="left" valign="top" width="172">5.9. Staying in Control</td><td align="center" valign="top" width="171"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0"></a></td><td align="right" valign="top" width="172">6. Subroutines</td></tr></table></div><hr width="515" align="left"><!-- LIBRARY NAV BAR --><img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2001</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"> <area shape="rect" coords="2,-1,79,99" href="../index.htm"><area shape="rect" coords="84,1,157,108" href="../perlnut/index.htm"><area shape="rect" coords="162,2,248,125" href="../prog/index.htm"><area shape="rect" coords="253,2,326,130" href="../advprog/index.htm"><area shape="rect" coords="332,1,407,112" href="../cookbook/index.htm"><area shape="rect" coords="414,2,523,103" href="../sysadmin/index.htm"></map><!-- END OF BODY --></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -