perlfaq6.html
来自「perl教程」· HTML 代码 · 共 849 行 · 第 1/5 页
HTML
849 行
than the default, or else we won't actually ever have a multiline
record read in.</p>
<pre>
<span class="variable">$/</span> <span class="operator">=</span> <span class="string">''</span><span class="operator">;</span> <span class="comment"># read in more whole paragraph, not just one line</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="operator"><></span> <span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="regex">/\b([\w'-]+)(\s+\1)+\b/gi</span> <span class="operator">)</span> <span class="operator">{</span> <span class="comment"># word starts alpha</span>
<span class="keyword">print</span> <span class="string">"Duplicate $1 at paragraph $.\n"</span><span class="operator">;</span>
<span class="operator">}</span>
<span class="operator">}</span>
</pre>
<p>Here's code that finds sentences that begin with "From " (which would
be mangled by many mailers):</p>
<pre>
<span class="variable">$/</span> <span class="operator">=</span> <span class="string">''</span><span class="operator">;</span> <span class="comment"># read in more whole paragraph, not just one line</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="operator"><></span> <span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="regex">/^From /gm</span> <span class="operator">)</span> <span class="operator">{</span> <span class="comment"># /m makes ^ match next to \n</span>
<span class="keyword">print</span> <span class="string">"leading from in paragraph $.\n"</span><span class="operator">;</span>
<span class="operator">}</span>
<span class="operator">}</span>
</pre>
<p>Here's code that finds everything between START and END in a paragraph:</p>
<pre>
<span class="keyword">undef</span> <span class="variable">$/</span><span class="operator">;</span> <span class="comment"># read in whole file, not just one line or paragraph</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="operator"><></span> <span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">while</span> <span class="operator">(</span> <span class="regex">/START(.*?)END/sgm</span> <span class="operator">)</span> <span class="operator">{</span> <span class="comment"># /s makes . cross line boundaries</span>
<span class="keyword">print</span> <span class="string">"$1\n"</span><span class="operator">;</span>
<span class="operator">}</span>
<span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="how_can_i_pull_out_lines_between_two_patterns_that_are_themselves_on_different_lines">How can I pull out lines between two patterns that are themselves on different lines?</a></h2>
<p>You can use Perl's somewhat exotic <code>..</code> operator (documented in
<a href="../../lib/Pod/perlop.html">the perlop manpage</a>):</p>
<pre>
perl -ne 'print if /START/ .. /END/' file1 file2 ...</pre>
<p>If you wanted text and not lines, you would use</p>
<pre>
perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...</pre>
<p>But if you want nested occurrences of <code>START</code> through <code>END</code>, you'll
run up against the problem described in the question in this section
on matching balanced text.</p>
<p>Here's another example of using <code>..</code>:</p>
<pre>
<span class="keyword">while</span> <span class="operator">(<>)</span> <span class="operator">{</span>
<span class="variable">$in_header</span> <span class="operator">=</span> <span class="number">1</span> <span class="operator">..</span> <span class="operator">/^</span><span class="variable">$/</span><span class="operator">;</span>
<span class="variable">$in_body</span> <span class="operator">=</span> <span class="regex">/^$/</span> <span class="operator">..</span> <span class="keyword">eof</span><span class="operator">();</span>
<span class="comment"># now choose between them</span>
<span class="operator">}</span> <span class="keyword">continue</span> <span class="operator">{</span>
<span class="keyword">reset</span> <span class="keyword">if</span> <span class="keyword">eof</span><span class="operator">();</span> <span class="comment"># fix $.</span>
<span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="i_put_a_regular_expression_into____but_it_didn_t_work__what_s_wrong">I put a regular expression into $/ but it didn't work. What's wrong?</a></h2>
<p>Up to Perl 5.8.0, $/ has to be a string. This may change in 5.10,
but don't get your hopes up. Until then, you can use these examples
if you really need to do this.</p>
<p>If you have File::Stream, this is easy.</p>
<pre>
<span class="keyword">use</span> <span class="variable">File::Stream</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="variable">$stream</span> <span class="operator">=</span> <span class="variable">File::Stream</span><span class="operator">-></span><span class="variable">new</span><span class="operator">(</span>
<span class="variable">$filehandle</span><span class="operator">,</span>
<span class="string">separator</span> <span class="operator">=></span> <span class="string">qr/\s*,\s*/</span><span class="operator">,</span>
<span class="operator">);</span>
</pre>
<pre>
<span class="keyword">print</span> <span class="string">"$_\n"</span> <span class="keyword">while</span> <span class="operator"><</span><span class="variable">$stream</span><span class="operator">>;</span>
</pre>
<p>If you don't have File::Stream, you have to do a little more work.</p>
<p>You can use the four argument form of sysread to continually add to
a buffer. After you add to the buffer, you check if you have a
complete line (using your regular expression).</p>
<pre>
<span class="keyword">local</span> <span class="variable">$_</span> <span class="operator">=</span> <span class="string">""</span><span class="operator">;</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="keyword">sysread</span> <span class="variable">FH</span><span class="operator">,</span> <span class="variable">$_</span><span class="operator">,</span> <span class="number">8192</span><span class="operator">,</span> <span class="keyword">length</span> <span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="regex">s/^((?s).*?)your_pattern/ ) {
my $record = $1;
# do stuff here.
}
}
</span>
</pre>
<pre>
You can do the same thing with foreach and a match using the
c flag and the \G anchor, if you do not mind your entire file
being in memory at the end.</pre>
<pre>
<span class="keyword">local</span> <span class="variable">$_</span> <span class="operator">=</span> <span class="string">""</span><span class="operator">;</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="keyword">sysread</span> <span class="variable">FH</span><span class="operator">,</span> <span class="variable">$_</span><span class="operator">,</span> <span class="number">8192</span><span class="operator">,</span> <span class="keyword">length</span> <span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">foreach</span> <span class="keyword">my</span> <span class="variable">$record</span> <span class="operator">(</span> <span class="regex">m/\G((?s).*?)your_pattern/gc</span> <span class="operator">)</span> <span class="operator">{</span>
<span class="comment"># do stuff here.</span>
<span class="operator">}</span>
<span class="keyword">substr</span><span class="operator">(</span> <span class="variable">$_</span><span class="operator">,</span> <span class="number">0</span><span class="operator">,</span> <span class="keyword">pos</span> <span class="operator">)</span> <span class="operator">=</span> <span class="string">""</span> <span class="keyword">if</span> <span class="keyword">pos</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="how_do_i_substitute_case_insensitively_on_the_lhs_while_preserving_case_on_the_rhs">How do I substitute case insensitively on the LHS while preserving case on the RHS?</a></h2>
<p>Here's a lovely Perlish solution by Larry Rosler. It exploits
properties of bitwise xor on ASCII strings.</p>
<pre>
<span class="variable">$_</span><span class="operator">=</span> <span class="string">"this is a TEsT case"</span><span class="operator">;</span>
</pre>
<pre>
<span class="variable">$old</span> <span class="operator">=</span> <span class="string">'test'</span><span class="operator">;</span>
<span class="variable">$new</span> <span class="operator">=</span> <span class="string">'success'</span><span class="operator">;</span>
</pre>
<pre>
<span class="regex">s{(\Q$old\E)}
{ uc $new | (uc $1 ^ $1) .
(uc(substr $1, -1) ^ substr $1, -1) x
(length($new) - length $1)
}egi</span><span class="operator">;</span>
</pre>
<pre>
<span class="keyword">print</span><span class="operator">;</span>
</pre>
<p>And here it is as a subroutine, modeled after the above:</p>
<pre>
<span class="keyword">sub</span><span class="variable"> preserve_case</span><span class="operator">(</span>$$<span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$old</span><span class="operator">,</span> <span class="variable">$new</span><span class="operator">)</span> <span class="operator">=</span> <span class="variable">@_</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="variable">$mask</span> <span class="operator">=</span> <span class="keyword">uc</span> <span class="variable">$old</span> <span class="operator">^</span> <span class="variable">$old</span><span class="operator">;</span>
</pre>
<pre>
uc $new | $mask .
substr($mask, -1) x (length($new) - length($old))
}</pre>
<pre>
<span class="variable">$a</span> <span class="operator">=</span> <span class="string">"this is a TEsT case"</span><span class="operator">;</span>
<span class="variable">$a</span> <span class="operator">=~</span> <span class="regex">s/(test)/preserve_case($1, "success")/egi</span><span class="operator">;</span>
<span class="keyword">print</span> <span class="string">"$a\n"</span><span class="operator">;</span>
</pre>
<p>This prints:</p>
<pre>
this is a SUcCESS case</pre>
<p>As an alternative, to keep the case of the replacement word if it is
longer than the original, you can use this code, by Jeff Pinyan:</p>
<pre>
<span class="keyword">sub</span><span class="variable"> preserve_case </span><span class="operator">{</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$from</span><span class="operator">,</span> <span class="variable">$to</span><span class="operator">)</span> <span class="operator">=</span> <span class="variable">@_</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$lf</span><span class="operator">,</span> <span class="variable">$lt</span><span class="operator">)</span> <span class="operator">=</span> <span class="keyword">map</span> <span class="keyword">length</span><span class="operator">,</span> <span class="variable">@_</span><span class="operator">;</span>
</pre>
<pre>
<span class="keyword">if</span> <span class="operator">(</span><span class="variable">$lt</span> <span class="operator"><</span> <span class="variable">$lf</span><span class="operator">)</span> <span class="operator">{</span> <span class="variable">$from</span> <span class="operator">=</span> <span class="keyword">substr</span> <span class="variable">$from</span><span class="operator">,</span> <span class="number">0</span><span class="operator">,</span> <span class="variable">$lt</span> <span class="operator">}</span>
<span class="keyword">else</span> <span class="operator">{</span> <span class="variable">$from</span> <span class="operator">.=</span> <span class="keyword">substr</span> <span class="variable">$to</span><span class="operator">,</span> <span class="variable">$lf</span> <span class="operator">}</span>
</pre>
<pre>
<span class="keyword">return</span> <span class="keyword">uc</span> <span class="variable">$to</span> <span class="operator">|</span> <span class="operator">(</span><span class="variable">$from</span> <span class="operator">^</span> <span class="keyword">uc</span> <span class="variable">$from</span><span class="operator">);</span>
<span class="operator">}</span>
</pre>
<p>This changes the sentence to "this is a SUcCess case."</p>
<p>Just to show that C programmers can write C in any programming language,
if you prefer a more C-like solution, the following script makes the
substitution have the same case, letter by letter, as the original.
(It also happens to run about 240% slower than the Perlish solution runs.)
If the substitution has more characters than the string being substituted,
the case of the last character is used for the rest of the substitution.</p>
<pre>
<span class="comment"># Original by Nathan Torkington, massaged by Jeffrey Friedl</span>
<span class="comment">#</span>
<span class="keyword">sub</span><span class="variable"> preserve_case</span><span class="operator">(</span>$$<span class="operator">)</span>
<span class="operator">{</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$old</span><span class="operator">,</span> <span class="variable">$new</span><span class="operator">)</span> <span class="operator">=</span> <span class="variable">@_</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$state</span><span class="operator">)</span> <span class="operator">=</span> <span class="number">0</span><span class="operator">;</span> <span class="comment"># 0 = no change; 1 = lc; 2 = uc</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$i</span><span class="operator">,</span> <span class="variable">$oldlen</span><span class="operator">,</span> <span class="variable">$newlen</span><span class="operator">,</span> <span class="variable">$c</span><span class="operator">)</span> <span class="operator">=</span> <span class="operator">(</span><span class="number">0</span><span class="operator">,</span> <span class="keyword">length</span><span class="operator">(</span><span class="variable">$old</span><span class="operator">),</span> <span class="keyword">length</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">));</span>
<span class="keyword">my</span> <span class="operator">(</span><span class="variable">$len</span><span class="operator">)</span> <span class="operator">=</span> <span class="variable">$oldlen</span> <span class="operator"><</span> <span class="variable">$newlen</span> <span class="operator">?</span> <span class="variable">$oldlen</span> <span class="operator">:</span> <span class="variable">$newlen</span><span class="operator">;</span>
</pre>
<pre>
<span class="keyword">for</span> <span class="operator">(</span><span class="variable">$i</span> <span class="operator">=</span> <span class="number">0</span><span class="operator">;</span> <span class="variable">$i</span> <span class="operator"><</span> <span class="variable">$len</span><span class="operator">;</span> <span class="variable">$i</span><span class="operator">++)</span> <span class="operator">{</span>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?