perlfaq6.html
来自「perl教程」· HTML 代码 · 共 849 行 · 第 1/5 页
HTML
849 行
"qep'a'" # the ' char is surrounded by "p" and "a"</pre>
<p>These strings do not match /\b'\b/.</p>
<pre>
"foo'" # there is no word char after non-word '</pre>
<p>You can also use the complement of \b, \B, to specify that there
should not be a word boundary.</p>
<p>In the pattern /\Bam\B/, there must be a word character before the "a"
and after the "m". These patterns match /\Bam\B/:</p>
<pre>
"llama" # "am" surrounded by word chars
"Samuel" # same</pre>
<p>These strings do not match /\Bam\B/</p>
<pre>
"Sam" # no word boundary before "a", but one after "m"
"I am Sam" # "am" surrounded by non-word chars</pre>
<p>
</p>
<h2><a name="why_does_using_________or____slow_my_program_down">Why does using $&, $`, or $' slow my program down?</a></h2>
<p>(contributed by Anno Siegel)</p>
<p>Once Perl sees that you need one of these variables anywhere in the
program, it provides them on each and every pattern match. That means
that on every pattern match the entire string will be copied, part of it
to $`, part to $&, and part to $'. Thus the penalty is most severe with
long strings and patterns that match often. Avoid $&, $', and $` if you
can, but if you can't, once you've used them at all, use them at will
because you've already paid the price. Remember that some algorithms
really appreciate them. As of the 5.005 release, the $& variable is no
longer "expensive" the way the other two are.</p>
<p>Since Perl 5.6.1 the special variables @- and @+ can functionally replace
$`, $& and $'. These arrays contain pointers to the beginning and end
of each match (see perlvar for the full story), so they give you
essentially the same information, but without the risk of excessive
string copying.</p>
<p>
</p>
<h2><a name="what_good_is__g_in_a_regular_expression">What good is <code>\G</code> in a regular expression?</a></h2>
<p>You use the <code>\G</code> anchor to start the next match on the same
string where the last match left off. The regular
expression engine cannot skip over any characters to find
the next match with this anchor, so <code>\G</code> is similar to the
beginning of string anchor, <code>^</code>. The <code>\G</code> anchor is typically
used with the <code>g</code> flag. It uses the value of <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a>
as the position to start the next match. As the match
operator makes successive matches, it updates <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a> with the
position of the next character past the last match (or the
first character of the next match, depending on how you like
to look at it). Each string has its own <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a> value.</p>
<p>Suppose you want to match all of consective pairs of digits
in a string like "1122a44" and stop matching when you
encounter non-digits. You want to match <code>11</code> and <code>22</code> but
the letter <a> shows up between <code>22</code> and <code>44</code> and you want
to stop at <code>a</code>. Simply matching pairs of digits skips over
the <code>a</code> and still matches <code>44</code>.</p>
<pre>
<span class="variable">$_</span> <span class="operator">=</span> <span class="string">"1122a44"</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="variable">@pairs</span> <span class="operator">=</span> <span class="regex">m/(\d\d)/g</span><span class="operator">;</span> <span class="comment"># qw( 11 22 44 )</span>
</pre>
<p>If you use the \G anchor, you force the match after <code>22</code> to
start with the <code>a</code>. The regular expression cannot match
there since it does not find a digit, so the next match
fails and the match operator returns the pairs it already
found.</p>
<pre>
<span class="variable">$_</span> <span class="operator">=</span> <span class="string">"1122a44"</span><span class="operator">;</span>
<span class="keyword">my</span> <span class="variable">@pairs</span> <span class="operator">=</span> <span class="regex">m/\G(\d\d)/g</span><span class="operator">;</span> <span class="comment"># qw( 11 22 )</span>
</pre>
<p>You can also use the <code>\G</code> anchor in scalar context. You
still need the <code>g</code> flag.</p>
<pre>
<span class="variable">$_</span> <span class="operator">=</span> <span class="string">"1122a44"</span><span class="operator">;</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="regex">m/\G(\d\d)/g</span> <span class="operator">)</span>
<span class="operator">{</span>
<span class="keyword">print</span> <span class="string">"Found $1\n"</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<p>After the match fails at the letter <code>a</code>, perl resets <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a>
and the next match on the same string starts at the beginning.</p>
<pre>
<span class="variable">$_</span> <span class="operator">=</span> <span class="string">"1122a44"</span><span class="operator">;</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="regex">m/\G(\d\d)/g</span> <span class="operator">)</span>
<span class="operator">{</span>
<span class="keyword">print</span> <span class="string">"Found $1\n"</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<pre>
<span class="keyword">print</span> <span class="string">"Found $1 after while"</span> <span class="keyword">if</span> <span class="regex">m/(\d\d)/g</span><span class="operator">;</span> <span class="comment"># finds "11"</span>
</pre>
<p>You can disable <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a> resets on fail with the <code>c</code> flag.
Subsequent matches start where the last successful match
ended (the value of <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos())</code></a> even if a match on the same
string as failed in the meantime. In this case, the match
after the <code>while()</code> loop starts at the <code>a</code> (where the last
match stopped), and since it does not use any anchor it can
skip over the <code>a</code> to find "44".</p>
<pre>
<span class="variable">$_</span> <span class="operator">=</span> <span class="string">"1122a44"</span><span class="operator">;</span>
<span class="keyword">while</span><span class="operator">(</span> <span class="regex">m/\G(\d\d)/gc</span> <span class="operator">)</span>
<span class="operator">{</span>
<span class="keyword">print</span> <span class="string">"Found $1\n"</span><span class="operator">;</span>
<span class="operator">}</span>
</pre>
<pre>
<span class="keyword">print</span> <span class="string">"Found $1 after while"</span> <span class="keyword">if</span> <span class="regex">m/(\d\d)/g</span><span class="operator">;</span> <span class="comment"># finds "44"</span>
</pre>
<p>Typically you use the <code>\G</code> anchor with the <code>c</code> flag
when you want to try a different match if one fails,
such as in a tokenizer. Jeffrey Friedl offers this example
which works in 5.004 or later.</p>
<pre>
<span class="keyword">while</span> <span class="operator">(<>)</span> <span class="operator">{</span>
<span class="keyword">chomp</span><span class="operator">;</span>
<span class="variable">PARSER</span><span class="operator">:</span> <span class="operator">{</span>
<span class="regex">m/ \G( \d+\b )/gcx</span> <span class="operator">&&</span> <span class="keyword">do</span> <span class="operator">{</span> <span class="keyword">print</span> <span class="string">"number: $1\n"</span><span class="operator">;</span> <span class="keyword">redo</span><span class="operator">;</span> <span class="operator">};</span>
<span class="regex">m/ \G( \w+ )/gcx</span> <span class="operator">&&</span> <span class="keyword">do</span> <span class="operator">{</span> <span class="keyword">print</span> <span class="string">"word: $1\n"</span><span class="operator">;</span> <span class="keyword">redo</span><span class="operator">;</span> <span class="operator">};</span>
<span class="regex">m/ \G( \s+ )/gcx</span> <span class="operator">&&</span> <span class="keyword">do</span> <span class="operator">{</span> <span class="keyword">print</span> <span class="string">"space: $1\n"</span><span class="operator">;</span> <span class="keyword">redo</span><span class="operator">;</span> <span class="operator">};</span>
<span class="regex">m/ \G( [^\w\d]+ )/gcx</span> <span class="operator">&&</span> <span class="keyword">do</span> <span class="operator">{</span> <span class="keyword">print</span> <span class="string">"other: $1\n"</span><span class="operator">;</span> <span class="keyword">redo</span><span class="operator">;</span> <span class="operator">};</span>
<span class="operator">}</span>
<span class="operator">}</span>
</pre>
<p>For each line, the PARSER loop first tries to match a series
of digits followed by a word boundary. This match has to
start at the place the last match left off (or the beginning
of the string on the first match). Since <code>m/ \G( \d+\b
)/gcx</code> uses the <code>c</code> flag, if the string does not match that
regular expression, perl does not reset <a href="../../lib/Pod/perlfunc.html#item_pos"><code>pos()</code></a> and the next
match starts at the same position to try a different
pattern.</p>
<p>
</p>
<h2><a name="are_perl_regexes_dfas_or_nfas_are_they_posix_compliant">Are Perl regexes DFAs or NFAs? Are they POSIX compliant?</a></h2>
<p>While it's true that Perl's regular expressions resemble the DFAs
(deterministic finite automata) of the <code>egrep(1)</code> program, they are in
fact implemented as NFAs (non-deterministic finite automata) to allow
backtracking and backreferencing. And they aren't POSIX-style either,
because those guarantee worst-case behavior for all cases. (It seems
that some people prefer guarantees of consistency, even when what's
guaranteed is slowness.) See the book "Mastering Regular Expressions"
(from O'Reilly) by Jeffrey Friedl for all the details you could ever
hope to know on these matters (a full citation appears in
<a href="../../lib/Pod/perlfaq2.html">the perlfaq2 manpage</a>).</p>
<p>
</p>
<h2><a name="what_s_wrong_with_using_grep_in_a_void_context">What's wrong with using grep in a void context?</a></h2>
<p>The problem is that grep builds a return list, regardless of the context.
This means you're making Perl go to the trouble of building a list that
you then just throw away. If the list is large, you waste both time and space.
If your intent is to iterate over the list, then use a for loop for this
purpose.</p>
<p>In perls older than 5.8.1, map suffers from this problem as well.
But since 5.8.1, this has been fixed, and map is context aware - in void
context, no lists are constructed.</p>
<p>
</p>
<h2><a name="how_can_i_match_strings_with_multibyte_characters">How can I match strings with multibyte characters?</a></h2>
<p>Starting from Perl 5.6 Perl has had some level of multibyte character
support. Perl 5.8 or later is recommended. Supported multibyte
character repertoires include Unicode, and legacy encodings
through the Encode module. See <a href="../../lib/Pod/perluniintro.html">the perluniintro manpage</a>, <a href="../../lib/Pod/perlunicode.html">the perlunicode manpage</a>,
and <a href="../../lib/Encode.html">the Encode manpage</a>.</p>
<p>If you are stuck with older Perls, you can do Unicode with the
<code>Unicode::String</code> module, and character conversions using the
<code>Unicode::Map8</code> and <code>Unicode::Map</code> modules. If you are using
Japanese encodings, you might try using the jperl 5.005_03.</p>
<p>Finally, the following set of approaches was offered by Jeffrey
Friedl, whose article in issue #5 of The Perl Journal talks about
this very matter.</p>
<p>Let's suppose you have some weird Martian encoding where pairs of
ASCII uppercase letters encode single Martian letters (i.e. the two
bytes &q
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?