perlfaq6.html

来自「perl教程」· HTML 代码 · 共 849 行 · 第 1/5 页

HTML
849
字号
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../../displayToc.js"></script>
<script language="JavaScript" src="../../tocParas.js"></script>
<script language="JavaScript" src="../../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../../scineplex.css">
<title>perlfaq6 - Regular Expressions</title>
<link rel="stylesheet" href="../../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>

<body>

<script>writelinks('__top__',2);</script>
<h1><a>perlfaq6 - Regular Expressions</a></h1>
<p><a name="__index__"></a></p>

<!-- INDEX BEGIN -->

<ul>

	<li><a href="#name">NAME</a></li>
	<li><a href="#description">DESCRIPTION</a></li>
	<ul>

		<li><a href="#how_can_i_hope_to_use_regular_expressions_without_creating_illegible_and_unmaintainable_code">How can I hope to use regular expressions without creating illegible and unmaintainable code?</a></li>
		<li><a href="#i_m_having_trouble_matching_over_more_than_one_line__what_s_wrong">I'm having trouble matching over more than one line.  What's wrong?</a></li>
		<li><a href="#how_can_i_pull_out_lines_between_two_patterns_that_are_themselves_on_different_lines">How can I pull out lines between two patterns that are themselves on different lines?</a></li>
		<li><a href="#i_put_a_regular_expression_into____but_it_didn_t_work__what_s_wrong">I put a regular expression into $/ but it didn't work. What's wrong?</a></li>
		<li><a href="#how_do_i_substitute_case_insensitively_on_the_lhs_while_preserving_case_on_the_rhs">How do I substitute case insensitively on the LHS while preserving case on the RHS?</a></li>
		<li><a href="#how_can_i_make__w_match_national_character_sets">How can I make <code>\w</code> match national character sets?</a></li>
		<li><a href="#how_can_i_match_a_localesmart_version_of___azaz__">How can I match a locale-smart version of <code>/[a-zA-Z]/</code>?</a></li>
		<li><a href="#how_can_i_quote_a_variable_to_use_in_a_regex">How can I quote a variable to use in a regex?</a></li>
		<li><a href="#what_is__o_really_for">What is <code>/o</code> really for?</a></li>
		<li><a href="#how_do_i_use_a_regular_expression_to_strip_c_style_comments_from_a_file">How do I use a regular expression to strip C style comments from a file?</a></li>
		<li><a href="#can_i_use_perl_regular_expressions_to_match_balanced_text">Can I use Perl regular expressions to match balanced text?</a></li>
		<li><a href="#what_does_it_mean_that_regexes_are_greedy_how_can_i_get_around_it">What does it mean that regexes are greedy?  How can I get around it?</a></li>
		<li><a href="#how_do_i_process_each_word_on_each_line">How do I process each word on each line?</a></li>
		<li><a href="#how_can_i_print_out_a_wordfrequency_or_linefrequency_summary">How can I print out a word-frequency or line-frequency summary?</a></li>
		<li><a href="#how_can_i_do_approximate_matching">How can I do approximate matching?</a></li>
		<li><a href="#how_do_i_efficiently_match_many_regular_expressions_at_once">How do I efficiently match many regular expressions at once?</a></li>
		<li><a href="#why_don_t_wordboundary_searches_with__b_work_for_me">Why don't word-boundary searches with <code>\b</code> work for me?</a></li>
		<li><a href="#why_does_using_________or____slow_my_program_down">Why does using $&amp;, $`, or $' slow my program down?</a></li>
		<li><a href="#what_good_is__g_in_a_regular_expression">What good is <code>\G</code> in a regular expression?</a></li>
		<li><a href="#are_perl_regexes_dfas_or_nfas_are_they_posix_compliant">Are Perl regexes DFAs or NFAs?  Are they POSIX compliant?</a></li>
		<li><a href="#what_s_wrong_with_using_grep_in_a_void_context">What's wrong with using grep in a void context?</a></li>
		<li><a href="#how_can_i_match_strings_with_multibyte_characters">How can I match strings with multibyte characters?</a></li>
		<li><a href="#how_do_i_match_a_pattern_that_is_supplied_by_the_user">How do I match a pattern that is supplied by the user?</a></li>
	</ul>

	<li><a href="#author_and_copyright">AUTHOR AND COPYRIGHT</a></li>
</ul>
<!-- INDEX END -->

<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>perlfaq6 - Regular Expressions ($Revision: 1.38 $, $Date: 2005/12/31 00:54:37 $)</p>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>This section is surprisingly small because the rest of the FAQ is
littered with answers involving regular expressions.  For example,
decoding a URL and checking whether something is a number are handled
with regular expressions, but those answers are found elsewhere in
this document (in <a href="../../lib/Pod/perlfaq9.html">the perlfaq9 manpage</a>: &quot;How do I decode or create those %-encodings
on the web&quot; and <a href="../../lib/Pod/perlfaq4.html">the perlfaq4 manpage</a>: &quot;How do I determine whether a scalar is
a number/whole/integer/float&quot;, to be precise).</p>
<p>
</p>
<h2><a name="how_can_i_hope_to_use_regular_expressions_without_creating_illegible_and_unmaintainable_code">How can I hope to use regular expressions without creating illegible and unmaintainable code?</a></h2>
<p>Three techniques can make regular expressions maintainable and
understandable.</p>
<dl>
<dt><strong><a name="item_comments_outside_the_regex">Comments Outside the Regex</a></strong>

<dd>
<p>Describe what you're doing and how you're doing it, using normal Perl
comments.</p>
</dd>
<dd>
<pre>
    <span class="comment"># turn the line into the first word, a colon, and the</span>
    <span class="comment"># number of characters on the rest of the line</span>
    <span class="regex">s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg</span><span class="operator">;</span>
</pre>
</dd>
</li>
<dt><strong><a name="item_comments_inside_the_regex">Comments Inside the Regex</a></strong>

<dd>
<p>The <code>/x</code> modifier causes whitespace to be ignored in a regex pattern
(except in a character class), and also allows you to use normal
comments there, too.  As you can imagine, whitespace and comments help
a lot.</p>
</dd>
<dd>
<p><code>/x</code> lets you turn this:</p>
</dd>
<dd>
<pre>
    <span class="regex">s{&lt;(?:[^&gt;'"]*|".*?"|'.*?')+&gt;}{}gs</span><span class="operator">;</span>
</pre>
</dd>
<dd>
<p>into this:</p>
</dd>
<dd>
<pre>
    <span class="regex">s{ &lt;                    # opening angle bracket
        (?:                 # Non-backreffing grouping paren
             [^&gt;'"] *       # 0 or more things that are neither &gt; nor ' nor "
                |           #    or else
             ".*?"          # a section between double quotes (stingy match)
                |           #    or else
             '.*?'          # a section between single quotes (stingy match)
        ) +                 #   all occurring one or more times
       &gt;                    # closing angle bracket
    }{}gsx</span><span class="operator">;</span>                 <span class="comment"># replace with nothing, i.e. delete</span>
</pre>
</dd>
<dd>
<p>It's still not quite so clear as prose, but it is very useful for
describing the meaning of each part of the pattern.</p>
</dd>
</li>
<dt><strong><a name="item_different_delimiters">Different Delimiters</a></strong>

<dd>
<p>While we normally think of patterns as being delimited with <code>/</code>
characters, they can be delimited by almost any character.  <a href="../../lib/Pod/perlre.html">the perlre manpage</a>
describes this.  For example, the <a href="../../lib/Pod/perlfunc.html#item_s_"><code>s///</code></a> above uses braces as
delimiters.  Selecting another delimiter can avoid quoting the
delimiter within the pattern:</p>
</dd>
<dd>
<pre>
    <span class="regex">s/\/usr\/local/\/usr\/share/g</span><span class="operator">;</span>      <span class="comment"># bad delimiter choice</span>
    <span class="regex">s#/usr/local#/usr/share#g</span><span class="operator">;</span>          <span class="comment"># better</span>
</pre>
</dd>
</li>
</dl>
<p>
</p>
<h2><a name="i_m_having_trouble_matching_over_more_than_one_line__what_s_wrong">I'm having trouble matching over more than one line.  What's wrong?</a></h2>
<p>Either you don't have more than one line in the string you're looking
at (probably), or else you aren't using the correct <code>modifier(s)</code> on
your pattern (possibly).</p>
<p>There are many ways to get multiline data into a string.  If you want
it to happen automatically while reading input, you'll want to set $/
(probably to '' for paragraphs or <a href="../../lib/Pod/perlfunc.html#item_undef"><code>undef</code></a> for the whole file) to
allow you to read more than one line at a time.</p>
<p>Read <a href="../../lib/Pod/perlre.html">the perlre manpage</a> to help you decide which of <code>/s</code> and <code>/m</code> (or both)
you might want to use: <code>/s</code> allows dot to include newline, and <code>/m</code>
allows caret and dollar to match next to a newline, not just at the
end of the string.  You do need to make sure that you've actually
got a multiline string in there.</p>
<p>For example, this program detects duplicate words, even when they span
line breaks (but not paragraph ones).  For this example, we don't need
<code>/s</code> because we aren't using dot in a regular expression that we want
to cross line boundaries.  Neither do we need <code>/m</code> because we aren't
wanting caret or dollar to match at any point inside the record next
to newlines.  But it's imperative that $/ be set to something other

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?