perlrequick.html

来自「perl教程」· HTML 代码 · 共 557 行 · 第 1/3 页

HTML
557
字号
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../../displayToc.js"></script>
<script language="JavaScript" src="../../tocParas.js"></script>
<script language="JavaScript" src="../../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../../scineplex.css">
<title>perlrequick - Perl regular expressions quick start</title>
<link rel="stylesheet" href="../../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>

<body>

<script>writelinks('__top__',2);</script>
<h1><a>perlrequick - Perl regular expressions quick start</a></h1>
<p><a name="__index__"></a></p>

<!-- INDEX BEGIN -->

<ul>

	<li><a href="#name">NAME</a></li>
	<li><a href="#description">DESCRIPTION</a></li>
	<li><a href="#the_guide">The Guide</a></li>
	<ul>

		<li><a href="#simple_word_matching">Simple word matching</a></li>
		<li><a href="#using_character_classes">Using character classes</a></li>
		<li><a href="#matching_this_or_that">Matching this or that</a></li>
		<li><a href="#grouping_things_and_hierarchical_matching">Grouping things and hierarchical matching</a></li>
		<li><a href="#extracting_matches">Extracting matches</a></li>
		<li><a href="#matching_repetitions">Matching repetitions</a></li>
		<li><a href="#more_matching">More matching</a></li>
		<li><a href="#search_and_replace">Search and replace</a></li>
		<li><a href="#the_split_operator">The split operator</a></li>
	</ul>

	<li><a href="#bugs">BUGS</a></li>
	<li><a href="#see_also">SEE ALSO</a></li>
	<li><a href="#author_and_copyright">AUTHOR AND COPYRIGHT</a></li>
	<ul>

		<li><a href="#acknowledgments">Acknowledgments</a></li>
	</ul>

</ul>
<!-- INDEX END -->

<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>perlrequick - Perl regular expressions quick start</p>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>This page covers the very basics of understanding, creating and
using regular expressions ('regexes') in Perl.</p>
<p>
</p>
<hr />
<h1><a name="the_guide">The Guide</a></h1>
<p>
</p>
<h2><a name="simple_word_matching">Simple word matching</a></h2>
<p>The simplest regex is simply a word, or more generally, a string of
characters.  A regex consisting of a word matches any string that
contains that word:</p>
<pre>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/World/</span><span class="operator">;</span>  <span class="comment"># matches</span>
</pre>
<p>In this statement, <code>World</code> is a regex and the <code>//</code> enclosing
<code>/World/</code> tells perl to search a string for a match.  The operator
<code>=~</code> associates the string with the regex match and produces a true
value if the regex matched, or false if the regex did not match.  In
our case, <code>World</code> matches the second word in <code>&quot;Hello World&quot;</code>, so the
expression is true.  This idea has several variations.</p>
<p>Expressions like this are useful in conditionals:</p>
<pre>
    <span class="keyword">print</span> <span class="string">"It matches\n"</span> <span class="keyword">if</span> <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/World/</span><span class="operator">;</span>
</pre>
<p>The sense of the match can be reversed by using <code>!~</code> operator:</p>
<pre>
    <span class="keyword">print</span> <span class="string">"It doesn't match\n"</span> <span class="keyword">if</span> <span class="string">"Hello World"</span> <span class="operator">!~</span> <span class="regex">/World/</span><span class="operator">;</span>
</pre>
<p>The literal string in the regex can be replaced by a variable:</p>
<pre>
    <span class="variable">$greeting</span> <span class="operator">=</span> <span class="string">"World"</span><span class="operator">;</span>
    <span class="keyword">print</span> <span class="string">"It matches\n"</span> <span class="keyword">if</span> <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/$greeting/</span><span class="operator">;</span>
</pre>
<p>If you're matching against <a href="../../lib/Pod/perlvar.html#item___"><code>$_</code></a>, the <a href="../../lib/Pod/perlvar.html#item___"><code>$_ =~</code></a> part can be omitted:</p>
<pre>
    <span class="variable">$_</span> <span class="operator">=</span> <span class="string">"Hello World"</span><span class="operator">;</span>
    <span class="keyword">print</span> <span class="string">"It matches\n"</span> <span class="keyword">if</span> <span class="regex">/World/</span><span class="operator">;</span>
</pre>
<p>Finally, the <code>//</code> default delimiters for a match can be changed to
arbitrary delimiters by putting an <code>'m'</code> out front:</p>
<pre>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">m!World!</span><span class="operator">;</span>   <span class="comment"># matches, delimited by '!'</span>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">m{World}</span><span class="operator">;</span>   <span class="comment"># matches, note the matching '{}'</span>
    <span class="string">"/usr/bin/perl"</span> <span class="operator">=~</span> <span class="regex">m"/perl"</span><span class="operator">;</span> <span class="comment"># matches after '/usr/bin',</span>
                                 <span class="comment"># '/' becomes an ordinary char</span>
</pre>
<p>Regexes must match a part of the string <em>exactly</em> in order for the
statement to be true:</p>
<pre>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/world/</span><span class="operator">;</span>  <span class="comment"># doesn't match, case sensitive</span>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/o W/</span><span class="operator">;</span>    <span class="comment"># matches, ' ' is an ordinary char</span>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/World /</span><span class="operator">;</span> <span class="comment"># doesn't match, no ' ' at end</span>
</pre>
<p>perl will always match at the earliest possible point in the string:</p>
<pre>
    <span class="string">"Hello World"</span> <span class="operator">=~</span> <span class="regex">/o/</span><span class="operator">;</span>       <span class="comment"># matches 'o' in 'Hello'</span>
    <span class="string">"That hat is red"</span> <span class="operator">=~</span> <span class="regex">/hat/</span><span class="operator">;</span> <span class="comment"># matches 'hat' in 'That'</span>
</pre>
<p>Not all characters can be used 'as is' in a match.  Some characters,
called <strong>metacharacters</strong>, are reserved for use in regex notation.
The metacharacters are</p>
<pre>
    <span class="operator">{}</span><span class="operator">[]</span><span class="operator">()^</span><span class="variable">$.</span><span class="operator">|*+</span><span class="regex">?\
    </span>
</pre>
<p>A metacharacter can be matched by putting a backslash before it:</p>
<pre>
    <span class="string">"2+2=4"</span> <span class="operator">=~</span> <span class="regex">/2+2/</span><span class="operator">;</span>    <span class="comment"># doesn't match, + is a metacharacter</span>
    <span class="string">"2+2=4"</span> <span class="operator">=~</span> <span class="regex">/2\+2/</span><span class="operator">;</span>   <span class="comment"># matches, \+ is treated like an ordinary +</span>
    <span class="string">'C:\WIN32'</span> <span class="operator">=~</span> <span class="regex">/C:\\WIN/</span><span class="operator">;</span>                       <span class="comment"># matches</span>
    <span class="string">"/usr/bin/perl"</span> <span class="operator">=~</span> <span class="regex">/\/usr\/bin\/perl/</span><span class="operator">;</span>  <span class="comment"># matches</span>
</pre>
<p>In the last regex, the forward slash <code>'/'</code> is also backslashed,
because it is used to delimit the regex.</p>
<p>Non-printable ASCII characters are represented by <strong>escape sequences</strong>.
Common examples are <code>\t</code> for a tab, <code>\n</code> for a newline, and <code>\r</code>
for a carriage return.  Arbitrary bytes are represented by octal
escape sequences, e.g., <code>\033</code>, or hexadecimal escape sequences,
e.g., <code>\x1B</code>:</p>
<pre>
    &quot;1000\t2000&quot; =~ m(0\t2)        # matches
    &quot;cat&quot;        =~ /\143\x61\x74/ # matches, but a weird way to spell cat</pre>
<p>Regexes are treated mostly as double quoted strings, so variable
substitution works:</p>
<pre>
    <span class="variable">$foo</span> <span class="operator">=</span> <span class="string">'house'</span><span class="operator">;</span>
    <span class="string">'cathouse'</span> <span class="operator">=~</span> <span class="regex">/cat$foo/</span><span class="operator">;</span>   <span class="comment"># matches</span>
    <span class="string">'housecat'</span> <span class="operator">=~</span> <span class="regex">/${foo}cat/</span><span class="operator">;</span> <span class="comment"># matches</span>
</pre>
<p>With all of the regexes above, if the regex matched anywhere in the
string, it was considered a match.  To specify <em>where</em> it should
match, we would use the <strong>anchor</strong> metacharacters <code>^</code> and <code>$</code>.  The
anchor <code>^</code> means match at the beginning of the string and the anchor
<code>$</code> means match at the end of the string, or before a newline at the
end of the string.  Some examples:</p>
<pre>
    <span class="string">"housekeeper"</span> <span class="operator">=~</span> <span class="regex">/keeper/</span><span class="operator">;</span>         <span class="comment"># matches</span>
    <span class="string">"housekeeper"</span> <span class="operator">=~</span> <span class="regex">/^keeper/</span><span class="operator">;</span>        <span class="comment"># doesn't match</span>
    <span class="string">"housekeeper"</span> <span class="operator">=~</span> <span class="regex">/keeper$/</span><span class="operator">;</span>        <span class="comment"># matches</span>
    <span class="string">"housekeeper\n"</span> <span class="operator">=~</span> <span class="regex">/keeper$/</span><span class="operator">;</span>      <span class="comment"># matches</span>
    <span class="string">"housekeeper"</span> <span class="operator">=~</span> <span class="regex">/^housekeeper$/</span><span class="operator">;</span>  <span class="comment"># matches</span>
</pre>
<p>
</p>
<h2><a name="using_character_classes">Using character classes</a></h2>
<p>A <strong>character class</strong> allows a set of possible characters, rather than
just a single character, to match at a particular point in a regex.
Character classes are denoted by brackets <code>[...]</code>, with the set of
characters to be possibly matched inside.  Here are some examples:</p>
<pre>
    <span class="regex">/cat/</span><span class="operator">;</span>            <span class="comment"># matches 'cat'</span>
    <span class="regex">/[bcr]at/</span><span class="operator">;</span>        <span class="comment"># matches 'bat', 'cat', or 'rat'</span>
    <span class="string">"abc"</span> <span class="operator">=~</span> <span class="regex">/[cab]/</span><span class="operator">;</span> <span class="comment"># matches 'a'</span>
</pre>
<p>In the last statement, even though <code>'c'</code> is the first character in
the class, the earliest point at which the regex can match is <code>'a'</code>.</p>
<pre>
    <span class="regex">/[yY][eE][sS]/</span><span class="operator">;</span> <span class="comment"># match 'yes' in a case-insensitive way</span>
                    <span class="comment"># 'yes', 'Yes', 'YES', etc.</span>
    <span class="regex">/yes/i</span><span class="operator">;</span>         <span class="comment"># also match 'yes' in a case-insensitive way</span>
</pre>
<p>The last example shows a match with an <code>'i'</code> <strong>modifier</strong>, which makes
the match case-insensitive.</p>
<p>Character classes also have ordinary and special characters, but the

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?