perlfaq6.html
来自「perl教程」· HTML 代码 · 共 849 行 · 第 1/5 页
HTML
849 行
<span class="keyword">if</span> <span class="operator">(</span><span class="variable">$c</span> <span class="operator">=</span> <span class="keyword">substr</span><span class="operator">(</span><span class="variable">$old</span><span class="operator">,</span> <span class="variable">$i</span><span class="operator">,</span> <span class="number">1</span><span class="operator">),</span> <span class="variable">$c</span> <span class="operator">=~</span> <span class="regex">/[\W\d_]/</span><span class="operator">)</span> <span class="operator">{</span>
<span class="variable">$state</span> <span class="operator">=</span> <span class="number">0</span><span class="operator">;</span>
<span class="operator">}</span> <span class="keyword">elsif</span> <span class="operator">(</span><span class="keyword">lc</span> <span class="variable">$c</span> <span class="keyword">eq</span> <span class="variable">$c</span><span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$i</span><span class="operator">,</span> <span class="number">1</span><span class="operator">)</span> <span class="operator">=</span> <span class="keyword">lc</span><span class="operator">(</span><span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$i</span><span class="operator">,</span> <span class="number">1</span><span class="operator">));</span>
<span class="variable">$state</span> <span class="operator">=</span> <span class="number">1</span><span class="operator">;</span>
<span class="operator">}</span> <span class="keyword">else</span> <span class="operator">{</span>
<span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$i</span><span class="operator">,</span> <span class="number">1</span><span class="operator">)</span> <span class="operator">=</span> <span class="keyword">uc</span><span class="operator">(</span><span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$i</span><span class="operator">,</span> <span class="number">1</span><span class="operator">));</span>
<span class="variable">$state</span> <span class="operator">=</span> <span class="number">2</span><span class="operator">;</span>
<span class="operator">}</span>
<span class="operator">}</span>
<span class="comment"># finish up with any remaining new (for when new is longer than old)</span>
<span class="keyword">if</span> <span class="operator">(</span><span class="variable">$newlen</span> <span class="operator">></span> <span class="variable">$oldlen</span><span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">if</span> <span class="operator">(</span><span class="variable">$state</span> <span class="operator">==</span> <span class="number">1</span><span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$oldlen</span><span class="operator">)</span> <span class="operator">=</span> <span class="keyword">lc</span><span class="operator">(</span><span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$oldlen</span><span class="operator">));</span>
<span class="operator">}</span> <span class="keyword">elsif</span> <span class="operator">(</span><span class="variable">$state</span> <span class="operator">==</span> <span class="number">2</span><span class="operator">)</span> <span class="operator">{</span>
<span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$oldlen</span><span class="operator">)</span> <span class="operator">=</span> <span class="keyword">uc</span><span class="operator">(</span><span class="keyword">substr</span><span class="operator">(</span><span class="variable">$new</span><span class="operator">,</span> <span class="variable">$oldlen</span><span class="operator">));</span>
<span class="operator">}</span>
<span class="operator">}</span>
<span class="keyword">return</span> <span class="variable">$new</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="how_can_i_make__w_match_national_character_sets">How can I make <code>\w</code> match national character sets?</a></h2>
<p>Put <code>use locale;</code> in your script. The \w character class is taken
from the current locale.</p>
<p>See <a href="../../lib/Pod/perllocale.html">the perllocale manpage</a> for details.</p>
<p>
</p>
<h2><a name="how_can_i_match_a_localesmart_version_of___azaz__">How can I match a locale-smart version of <code>/[a-zA-Z]/</code>?</a></h2>
<p>You can use the POSIX character class syntax <code>/[[:alpha:]]/</code>
documented in <a href="../../lib/Pod/perlre.html">the perlre manpage</a>.</p>
<p>No matter which locale you are in, the alphabetic characters are
the characters in \w without the digits and the underscore.
As a regex, that looks like <code>/[^\W\d_]/</code>. Its complement,
the non-alphabetics, is then everything in \W along with
the digits and the underscore, or <code>/[\W\d_]/</code>.</p>
<p>
</p>
<h2><a name="how_can_i_quote_a_variable_to_use_in_a_regex">How can I quote a variable to use in a regex?</a></h2>
<p>The Perl parser will expand $variable and @variable references in
regular expressions unless the delimiter is a single quote. Remember,
too, that the right-hand side of a <a href="../../lib/Pod/perlfunc.html#item_s_"><code>s///</code></a> substitution is considered
a double-quoted string (see <a href="../../lib/Pod/perlop.html">the perlop manpage</a> for more details). Remember
also that any regex special characters will be acted on unless you
precede the substitution with \Q. Here's an example:</p>
<pre>
<span class="variable">$string</span> <span class="operator">=</span> <span class="string">"Placido P. Octopus"</span><span class="operator">;</span>
<span class="variable">$regex</span> <span class="operator">=</span> <span class="string">"P."</span><span class="operator">;</span>
</pre>
<pre>
<span class="variable">$string</span> <span class="operator">=~</span> <span class="regex">s/$regex/Polyp/</span><span class="operator">;</span>
<span class="comment"># $string is now "Polypacido P. Octopus"</span>
</pre>
<p>Because <code>.</code> is special in regular expressions, and can match any
single character, the regex <code>P.</code> here has matched the <Pl> in the
original string.</p>
<p>To escape the special meaning of <code>.</code>, we use <code>\Q</code>:</p>
<pre>
<span class="variable">$string</span> <span class="operator">=</span> <span class="string">"Placido P. Octopus"</span><span class="operator">;</span>
<span class="variable">$regex</span> <span class="operator">=</span> <span class="string">"P."</span><span class="operator">;</span>
</pre>
<pre>
<span class="variable">$string</span> <span class="operator">=~</span> <span class="regex">s/\Q$regex/Polyp/</span><span class="operator">;</span>
<span class="comment"># $string is now "Placido Polyp Octopus"</span>
</pre>
<p>The use of <code>\Q</code> causes the <.> in the regex to be treated as a
regular character, so that <code>P.</code> matches a <code>P</code> followed by a dot.</p>
<p>
</p>
<h2><a name="what_is__o_really_for">What is <code>/o</code> really for?</a></h2>
<p>Using a variable in a regular expression match forces a re-evaluation
(and perhaps recompilation) each time the regular expression is
encountered. The <code>/o</code> modifier locks in the regex the first time
it's used. This always happens in a constant regular expression, and
in fact, the pattern was compiled into the internal format at the same
time your entire program was.</p>
<p>Use of <code>/o</code> is irrelevant unless variable interpolation is used in
the pattern, and if so, the regex engine will neither know nor care
whether the variables change after the pattern is evaluated the <em>very
first</em> time.</p>
<p><code>/o</code> is often used to gain an extra measure of efficiency by not
performing subsequent evaluations when you know it won't matter
(because you know the variables won't change), or more rarely, when
you don't want the regex to notice if they do.</p>
<p>For example, here's a "paragrep" program:</p>
<pre>
<span class="variable">$/</span> <span class="operator">=</span> <span class="string">''</span><span class="operator">;</span> <span class="comment"># paragraph mode</span>
<span class="variable">$pat</span> <span class="operator">=</span> <span class="keyword">shift</span><span class="operator">;</span>
<span class="keyword">while</span> <span class="operator">(<>)</span> <span class="operator">{</span>
<span class="keyword">print</span> <span class="keyword">if</span> <span class="regex">/$pat/o</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<p>
</p>
<h2><a name="how_do_i_use_a_regular_expression_to_strip_c_style_comments_from_a_file">How do I use a regular expression to strip C style comments from a file?</a></h2>
<p>While this actually can be done, it's much harder than you'd think.
For example, this one-liner</p>
<pre>
<span class="variable">perl</span> <span class="operator">-</span><span class="number">0777</span> <span class="operator">-</span><span class="variable">pe</span> <span class="string">'s{/\*.*?\*/}{}gs'</span> <span class="variable">foo</span><span class="operator">.</span><span class="variable">c</span>
</pre>
<p>will work in many but not all cases. You see, it's too simple-minded for
certain kinds of C programs, in particular, those with what appear to be
comments in quoted strings. For that, you'd need something like this,
created by Jeffrey Friedl and later modified by Fred Curtis.</p>
<pre>
<span class="variable">$/</span> <span class="operator">=</span> <span class="keyword">undef</span><span class="operator">;</span>
<span class="variable">$_</span> <span class="operator">=</span> <span class="operator"><>;</span>
<span class="regex">s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse</span><span class="operator">;</span>
<span class="keyword">print</span><span class="operator">;</span>
</pre>
<p>This could, of course, be more legibly written with the <code>/x</code> modifier, adding
whitespace and comments. Here it is expanded, courtesy of Fred Curtis.</p>
<pre>
s{
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment</pre>
<pre>
| ## OR various things which aren't comments:</pre>
<pre>
(
" ## Start of " ... " string
(
\\. ## Escaped char
| ## OR
[^"\\] ## Non "\
)*
" ## End of " ... " string</pre>
<pre>
| ## OR</pre>
<pre>
' ## Start of ' ... ' string
(
\\. ## Escaped char
| ## OR
[^'\\] ## Non '\
)*
' ## End of ' ... ' string</pre>
<pre>
| ## OR</pre>
<pre>
. ## Anything other char
[^/"'\\]* ## Chars which doesn't start a comment, string or escape
)
}{defined $2 ? $2 : ""}gxse;</pre>
<p>A slight modification also removes C++ comments:</p>
<pre>
<span class="regex">s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse</span><span class="operator">;</span>
</pre>
<p>
</p>
<h2><a name="can_i_use_perl_regular_expressions_to_match_balanced_text">Can I use Perl regular expressions to match balanced text?</a></h2>
<p>Historically, Perl regular expressions were not capable of matching
balanced text. As of more recent versions of perl including 5.6.1
experimental features have been added that make it possible to do this.
Look at the documentation for the (??{ }) construct in recent perlre manual
pages to see an example of matching balanced parentheses. Be sure to take
special notice of the warnings present in the manual before making use
of this feature.</p>
<p>CPAN contains many modules that can be useful for matching text
depending on the context. Damian Conway provides some useful
patterns in Regexp::Common. The module Text::Balanced provides a
general solution to this problem.</p>
<p>One of the common applications of balanced text matching is working
with XML and HTML. There are many modules available that support
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?