📄 ch05_02.htm

📁 编程珍珠,里面很多好用的代码,大家可以参考学习呵呵,
💻 HTM
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
which would make a rather dull sequel.</p><p><a name="INDEX-1423"></a><a name="INDEX-1424"></a><a name="INDEX-1425"></a><a name="INDEX-1426"></a><a name="INDEX-1427"></a>You can't use a <tt class="literal">s///</tt> operator directly on an array.For that, you need a loop.  By a lucky coincidence, the aliasingbehavior of <tt class="literal">for</tt>/<tt class="literal">foreach</tt>,combined with its use of <tt class="literal">$_</tt> as the default loopvariable, yields the standard Perl idiom to search and replace eachelement in an array:<blockquote><pre class="programlisting">for (@chapters) { s/Bilbo/Frodo/g }  # Do substitutions chapter by chapter.s/Bilbo/Frodo/g for @chapters;       # Same thing.</pre></blockquote>As with a simple scalar variable, you can combine the substitutionwith an assignment if you'd like to keep the original values around,too:<blockquote><pre class="programlisting">@oldhues = ('bluebird', 'bluegrass',  'bluefish', 'the blues');for (@newhues = @oldhues) { s/blue/red/ }print "@newhues\n";           # prints: redbird redgrass redfish the reds</pre></blockquote><a name="INDEX-1428"></a><a name="INDEX-1429"></a>The idiomatic way to perform repeated substitutes on the samevariable is to use a once-through loop.   For example, here's howto canonicalize whitespace in a variable:<blockquote><pre class="programlisting">for ($string) {    s/^\s+//;       # discard leading whitespace    s/\s+$//;       # discard trailing whitespace    s/\s+/ /g;      # collapse internal whitespace}</pre></blockquote>which just happens to produce the same result as:<blockquote><pre class="programlisting">$string = join(" ", split " ", $string);</pre></blockquote>You can also use such a loop with an assignment, as we did in thearray case:<blockquote><pre class="programlisting">for ($newshow = $oldshow) {    s/Fred/Homer/g;    s/Wilma/Marge/g;    s/Pebbles/Lisa/g;    s/Dino/Bart/g;}</pre></blockquote></p><h3 class="sect3">5.2.3.2. When a global substitution just isn't global enough</h3><p><a name="INDEX-1430"></a>Occasionally, you can't just use a <tt class="literal">/g</tt> to get allthe changes to occur, either because the substitutions have to happenright-to-left or because you need the length of <tt class="literal">$`</tt>to change between matches.  You can usually do what you want bycalling <tt class="literal">s///</tt> repeatedly.  However, you want theloop to stop when the <tt class="literal">s///</tt> finally fails, so youhave to put it into the conditional, which leaves nothing to do in themain part of the loop.  So we just write a <tt class="literal">1</tt>, whichis a rather boring thing to do, but bored is the best you can hope forsometimes.  Here are some examples that use a few more of those oddregex beasties that keep popping up:<blockquote><pre class="programlisting"># put commas in the right places in an integer1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/;# expand tabs to 8-column spacing1 while s/\t+/' ' x (length($&amp;)*8 - length($`)%8)/e;# remove (nested (even deeply nested (like this))) remarks1 while s/\([^()]*\)//g;# remove duplicate words (and triplicate (and quadruplicate...))1 while s/\b(\w+) \1\b/$1/gi;</pre></blockquote>That last one needs a loop because otherwise it would turn this:<blockquote><pre class="programlisting">Paris in THE THE THE THE spring.</pre></blockquote>into this:<blockquote><pre class="programlisting">Paris in THE THE spring.</pre></blockquote>which might cause someone who knows a little French to picture Parissitting in an artesian well emitting iced tea, since "th&amp;#233;" isFrench for "tea".  A Parisian is never fooled, of course.</p><a name="INDEX-1431"></a><h3 class="sect2">5.2.4. The tr/// Operator (Transliteration)</h3><p><a name="INDEX-1432"></a><blockquote><pre class="programlisting"><em class="replaceable">LVALUE</em> =~ tr/<em class="replaceable">SEARCHLIST</em>/<em class="replaceable">REPLACEMENTLIST</em>/cdstr/<em class="replaceable">SEARCHLIST</em>/<em class="replaceable">REPLACEMENTLIST</em>/cds</pre></blockquote><a name="INDEX-1433"></a><a name="INDEX-1434"></a>For <em class="emphasis">sed</em> devotees, <tt class="literal">y///</tt> isprovided as a synonym for <tt class="literal">tr///</tt>. This is why youcan't call a function named <tt class="literal">y</tt>, any more than youcan call a function named <tt class="literal">q</tt> or<tt class="literal">m</tt>.  In all other respects, <tt class="literal">y///</tt>is identical to <tt class="literal">tr///</tt>, and we won't mention itagain.</p><p><a name="INDEX-1435"></a><a name="INDEX-1436"></a>This operator might not appear to fit into a chapter on patternmatching, since it doesn't use patterns.  This operator scans astring, character by character, and replaces each occurrence of acharacter found in <em class="replaceable">SEARCHLIST</em> (which is nota regular expression) with the corresponding character from<em class="replaceable">REPLACEMENTLIST</em> (which is not a replacementstring).  It looks a bit like <tt class="literal">m//</tt> and<tt class="literal">s///</tt>, though, and you can even use the<tt class="literal">=~</tt> or <tt class="literal">!~</tt> binding operators onit, so we describe it here.  (<tt class="literal">qr//</tt> and<tt class="literal">split</tt> are pattern-matching operators, but you don'tuse the binding operators on them, so they're elsewhere in the book.Go figure.)</p><p>Transliteration returns the number of characters replaced or deleted.If no string is specified via the <tt class="literal">=~</tt> or <tt class="literal">!~</tt> operator, the <tt class="literal">$_</tt>string is altered.  The <em class="replaceable">SEARCHLIST</em> and <em class="replaceable">REPLACEMENTLIST</em> maydefine ranges of sequential characters with a dash:</p><p><a name="INDEX-1437"></a><blockquote><pre class="programlisting">$message =~ tr/A-Za-z/N-ZA-Mn-za-m/;    # rot13 encryption.</pre></blockquote><a name="INDEX-1438"></a><a name="INDEX-1439"></a>Note that a range like <tt class="literal">A-Z</tt> assumes a linearcharacter set like ASCII.  But each character set has its own ideas ofhow characters are ordered and thus of which characters fall in aparticular range.  A sound principle is to use only ranges that beginfrom and end at either alphabets of equal case(<tt class="literal">a-e</tt>, <tt class="literal">A-E</tt>), or digits(<tt class="literal">0-4</tt>).  Anything else is suspect.  When in doubt,spell out the character sets in full: <tt class="literal">ABCDE</tt>.</p><p>The <em class="replaceable">SEARCHLIST</em> and <em class="replaceable">REPLACEMENTLIST</em> are not variable interpolated asdouble-quoted strings; you may, however, use those backslash sequencesthat map to a specific character, such as <tt class="literal">\n</tt> or<tt class="literal">\015</tt>.</p><p><a name="INDEX-1440"></a><a href="ch05_02.htm#perl3-tab-trmods">Table 5-3</a> lists the modifiers applicable tothe <tt class="literal">tr///</tt> operator.  They're completely differentfrom those you apply to <tt class="literal">m//</tt>,<tt class="literal">s///</tt>, or <tt class="literal">qr//</tt>, even if some lookthe same.</p><a name="perl3-tab-trmods"></a><h4 class="objtitle">Table 5.3. tr/// Modifiers</h4><table border="1"><tr><th>Modifier</th><th>Meaning</th></tr><tr><td><tt class="literal">/c</tt></td><td>Complement <em class="replaceable">SEARCHLIST</em>.<a name="INDEX-1441"></a></td></tr><tr><td><tt class="literal">/d</tt></td><td>Delete found but unreplaced characters.<a name="INDEX-1442"></a><a name="INDEX-1443"></a></td></tr><tr><td><tt class="literal">/s</tt></td><td>Squash duplicate replaced characters.<a name="INDEX-1444"></a><a name="INDEX-1445"></a></td></tr></table><p>If the <tt class="literal">/c</tt> modifier is specified, the character set in <em class="replaceable">SEARCHLIST</em>is complemented; that is, the effective search list consists of all thecharacters <em class="emphasis">not</em> in <em class="replaceable">SEARCHLIST</em>.  In the case of Unicode, this can represent a <em class="emphasis">lot</em> of characters, but since they're stored logically,not physically, you don't need to worry about running out of memory.</p><p>The <tt class="literal">/d</tt> modifier turns <tt class="literal">tr///</tt> intowhat might be called the "transobliteration" operator: any charactersspecified by <em class="replaceable">SEARCHLIST</em> but not given areplacement in <em class="replaceable">REPLACEMENTLIST</em> aredeleted. (This is slightly more flexible than the behavior of some<em class="emphasis">tr</em>(1) programs, which delete anything theyfind in <em class="replaceable">SEARCHLIST</em>, period.)</p><p>If the <tt class="literal">/s</tt> modifier is specified, sequences of characters convertedto the same character are squashed down to a single instance of thecharacter.</p><p><a name="INDEX-1446"></a>If the <tt class="literal">/d</tt> modifier is used, <em class="replaceable">REPLACEMENTLIST</em> is always interpretedexactly as specified.  Otherwise, if <em class="replaceable">REPLACEMENTLIST</em> is shorter than<em class="replaceable">SEARCHLIST</em>, the final character is replicated until it is longenough.  If <em class="replaceable">REPLACEMENTLIST</em> is null, the <em class="replaceable">SEARCHLIST</em> isreplicated, which is surprisingly useful if you just want to countcharacters, not change them.  It's also useful for squashingcharacters using <tt class="literal">/s</tt>.<blockquote><pre class="programlisting">tr/aeiou/!/;                 # change any vowel into !tr{/\\\r\n\b\f. }{_};        # change strange chars into an underscoretr/A-Z/a-z/ for @ARGV;       # canonicalize to lowercase ASCII$count = ($para =~ tr/\n//); # count the newlines in $para$count = tr/0-9//;           # count the digits in $_$word =~ tr/a-zA-Z//s;       # bookkeeper -&gt; bokepertr/@$%*//d;                  # delete any of thosetr#A-Za-z0-9+/##cd;          # remove non-base64 chars# change en passant($HOST = $host) =~ tr/a-z/A-Z/;$pathname =~ tr/a-zA-Z/_/cs; # change non-(ASCII)alphas to single underbartr [\200-\377]   [\000-\177];              # strip 8th bit, bytewise</pre></blockquote>If the same character occurs more than once in <em class="replaceable">SEARCHLIST</em>,only the first is used.  Therefore, this:<blockquote><pre class="programlisting">tr/AAA/XYZ/</pre></blockquote>will change any single character A to an X (in <tt class="literal">$_</tt>).</p><p><a name="INDEX-1447"></a><a name="INDEX-1448"></a>Although variables aren't interpolated into <tt class="literal">tr///</tt>, you can stillget the same effect by using <tt class="literal">eval</tt><em class="replaceable">EXPR</em>:<blockquote><pre class="programlisting">$count = eval "tr/$oldlist/$newlist/";die if $@;  # propagates exception from illegal eval contents</pre></blockquote><a name="INDEX-1449"></a><a name="INDEX-1450"></a></p><p><a name="INDEX-1451"></a><a name="INDEX-1452"></a>One more note: if you want to change your text to uppercase orlowercase, don't use <tt class="literal">tr///</tt>.  Use the<tt class="literal">\U</tt> or <tt class="literal">\L</tt> sequences in adouble-quoted string (or the equivalent <tt class="literal">uc</tt> and<tt class="literal">lc</tt> functions) since they will pay attention tolocale or Unicode information and <tt class="literal">tr/a-z/A-Z/</tt>won't.  Additionally, in Unicode strings, the <tt class="literal">\u</tt>sequence and its corresponding <tt class="literal">ucfirst</tt> functionunderstand the notion of titlecase, which for some languages may bedistinct from simply converting to uppercase.</p><a name="INDEX-1453"></a><a name="INDEX-1454"></a><a name="INDEX-1455"></a><!-- BOTTOM NAV BAR --><hr width="515" align="left"><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="index.htm"><img src="../gifs/txthome.gif" alt="Home" border="0"></a></td><td align="right" valign="top" width="172"><a href="ch05_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr><tr><td align="left" valign="top" width="172">5.1. The Regular Expression Bestiary</td><td align="center" valign="top" width="171"><a href="index/index.htm"><img src="../gifs/index.gif" alt="Book Index" border="0"></a></td><td align="right" valign="top" width="172">5.3. Metacharacters and Metasymbols</td></tr></table></div><hr width="515" align="left"><!-- LIBRARY NAV BAR --><img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2001</a> O'Reilly &amp; Associates. All rights reserved.</font></p><map name="library-map"> <area shape="rect" coords="2,-1,79,99" href="../index.htm"><area shape="rect" coords="84,1,157,108" href="../perlnut/index.htm"><area shape="rect" coords="162,2,248,125" href="../prog/index.htm"><area shape="rect" coords="253,2,326,130" href="../advprog/index.htm"><area shape="rect" coords="332,1,407,112" href="../cookbook/index.htm"><area shape="rect" coords="414,2,523,103" href="../sysadmin/index.htm"></map><!-- END OF BODY --></body></html>
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -