📄 ch04_06.htm
字号:
</table><p><p><a name="INDEX-619" /><a name="INDEX-620" />The <tt class="literal">.</tt> (singledot) is a wildcard character. When used in a regular expression, itcan match any single character. The exception is the newlinecharacter (<tt class="literal">\n</tt>), except when you use the<tt class="literal">/s</tt> modifier on the pattern match operator. Thismodifier treats the string to be matched against as a single"long" string with embeddednewlines.</p><p><a name="INDEX-621" /><a name="INDEX-622" /><a name="INDEX-623" /><a name="INDEX-624" /><a name="INDEX-625" />The<tt class="literal">^</tt> and <tt class="literal">$</tt> metacharacters are usedas anchors in a regular expression. The <tt class="literal">^</tt> matchesthe beginning of a line. This character should appear only at thebeginning of an expression to match the beginning of the line. Theexception to this is when the <tt class="literal">/m</tt> (multiline)modifier is used, in which case it will match at the beginning of thestring and after every newline (except the last, if there is one).Otherwise, <tt class="literal">^</tt> will match itself, unescaped,anywhere in a pattern, except if it is the first character in abracketed character class, in which case it negates the class.</p><p>Similarly, <tt class="literal">$</tt> will match the end of a line (justbefore a newline character) only if it is at the end of a pattern,unless <tt class="literal">/m</tt> is used, in which case it matches justbefore every newline and at the end of a string. You need to escape<tt class="literal">$</tt> to match a literal dollar sign in all cases,because if <tt class="literal">$</tt> isn't at the end ofa pattern (or placed right before a <tt class="literal">)</tt> or<tt class="literal">]</tt>), Perl will attempt to do variableinterpretation. The same holds true for the <tt class="literal">@</tt>sign, which Perl will interpret as an array variable start unless itis backslashed.</p><p><a name="INDEX-626" /><a name="INDEX-627" /><a name="INDEX-628" /><a name="INDEX-629" /><a name="INDEX-630" /><a name="INDEX-631" />The <tt class="literal">*</tt>,<tt class="literal">+</tt>, and <tt class="literal">?</tt> metacharacters arecalled <em class="emphasis">quantifiers</em>. They specify the number oftimes to match something. They act on the element immediatelypreceding them, which could be a single character (including the<tt class="literal">.</tt>), a grouped expression in parentheses, or acharacter class. The <tt class="literal">{...}</tt> construct is ageneralized modifier. You can put two numbers separated by a commawithin the braces to specify minimum and maximum numbers that thepreceding element can match.</p><p><a name="INDEX-632" /><a name="INDEX-633" />Parentheses are usedto group characters or expressions. They also have the side effect ofremembering what they matched so you can recall and reuse patternswith a special group of variables.</p><p><a name="INDEX-634" /><a name="INDEX-635" />The<tt class="literal">|</tt> is the alternation operator in regularexpressions. It matches either what's on its leftside or right side. It does not affect only single characters. Forexample:</p><blockquote><pre class="code">/you|me|him|her/</pre></blockquote><p>looks for any of the four words. You should use parentheses toprovide boundaries for alternation:</p><blockquote><pre class="code">/And(y|rew)/</pre></blockquote><p>This will match either "Andy" or"Andrew".</p></div><a name="perlnut2-CHP-4-SECT-6.3" /><div class="sect2"><h3 class="sect2">4.6.3. Escaped Sequences</h3><p><a name="INDEX-636" /><a name="INDEX-637" /><a name="INDEX-638" /><a name="INDEX-639" /><a name="INDEX-640" /><a name="INDEX-641" /><a name="INDEX-642" /><a name="INDEX-643" /><a name="INDEX-644" /><a name="INDEX-645" /><a name="INDEX-646" /><a name="INDEX-647" />Thefollowing table lists the backslashed representations of charactersthat you can use in regular expressions: <a name="INDEX-648" /><a name="INDEX-649" /></p><a name="ch04-10-fm2xml" /><table border="1" cellpadding="3"><tr><th><p>Code</p></th><th><p>Matches</p></th></tr><tr><td><p><tt class="literal">\a</tt></p></td><td><p>Alarm (beep)</p></td></tr><tr><td><p><tt class="literal">\n</tt></p></td><td><p>Newline</p></td></tr><tr><td><p><tt class="literal">\r</tt></p></td><td><p>Carriage return</p></td></tr><tr><td><p><tt class="literal">\t</tt></p></td><td><p>Tab</p></td></tr><tr><td><p><tt class="literal">\f</tt></p></td><td><p>Formfeed</p></td></tr><tr><td><p><tt class="literal">\e</tt></p></td><td><p>Escape</p></td></tr><tr><td><p><tt class="literal">\038</tt></p></td><td><p>Any octal ASCII value</p></td></tr><tr><td><p><tt class="literal">\x7f</tt></p></td><td><p>Any hexadecimal ASCII value</p></td></tr><tr><td><p><tt class="literal">\x{263a}</tt></p></td><td><p>A wide hexadecimal value</p></td></tr><tr><td><p><tt class="literal">\c</tt><em class="replaceable"><tt>x</tt></em></p></td><td><p>Control-<em class="replaceable"><tt>x</tt></em></p></td></tr><tr><td><p><tt class="literal">\N{</tt><em class="replaceable"><tt>name</tt></em><tt class="literal">}</tt></p></td><td><p>A named character</p></td></tr></table><p></div><a name="perlnut2-CHP-4-SECT-6.4" /><div class="sect2"><h3 class="sect2">4.6.4. Character Classes</h3><p><a name="INDEX-650" /><a name="INDEX-651" /><a name="INDEX-652" /><a name="INDEX-653" /><a name="INDEX-654" /><a name="INDEX-655" /><a name="INDEX-656" />The <tt class="literal">[...]</tt> constructis used to list a set of characters (a <em class="emphasis">characterclass</em>) of which <em class="emphasis">one</em> will match.Brackets are often used when capitalization is uncertain in a match:</p><blockquote><pre class="code">/[tT]here/</pre></blockquote><p><a name="INDEX-657" /><a name="INDEX-658" /><a name="INDEX-659" />A dash (<tt class="literal">-</tt>) may beused to indicate a range of characters in a character class:</p><blockquote><pre class="code">/[a-zA-Z]/; # Match any single letter/[0-9]/; # Match any single digit</pre></blockquote><p>To put a literal dash in the list you must use a backslash before it(<tt class="literal">\-</tt>).</p><p><a name="INDEX-660" /><a name="INDEX-661" />By placing a <tt class="literal">^</tt> asthe first element in the brackets, you create a negated characterclass, i.e., it matches any character not in the list. For example:</p><blockquote><pre class="code">/[^A-Z]/; # Matches any character other than an uppercase letter</pre></blockquote><p>Some common character classes have their own predefined escapesequences for your programmingconvenience<a name="INDEX-662" /><a name="INDEX-663" /><a name="INDEX-664" /><a name="INDEX-665" /><a name="INDEX-666" /><a name="INDEX-667" /><a name="INDEX-668" /><a name="INDEX-669" /><a name="INDEX-670" /><a name="INDEX-671" />:</p><a name="ch04-11-fm2xml" /><table border="1" cellpadding="3"><tr><th><p>Code</p></th><th><p>Matches</p></th></tr><tr><td><p><tt class="literal">\d</tt></p></td><td><p>A digit, same as <tt class="literal">[0-9]</tt></p></td></tr><tr><td><p><tt class="literal">\D</tt></p></td><td><p>A nondigit, same as <tt class="literal">[^0-9]</tt></p></td></tr><tr><td><p><tt class="literal">\w</tt></p></td><td><p>A word character (alphanumeric), same as<tt class="literal">[a-zA-Z_0-9]</tt> </p></td></tr><tr><td><p><tt class="literal">\W</tt></p></td><td><p>A non-word character, <tt class="literal">[^a-zA-Z_0-9]</tt></p></td></tr><tr><td><p><tt class="literal">\s</tt></p></td><td><p>A whitespace character, same as <tt class="literal">[ \t\n\r\f]</tt></p></td></tr><tr><td><p><tt class="literal">\S</tt></p></td><td><p>A non-whitespace character, <tt class="literal">[^ \t\n\r\f]</tt></p></td></tr><tr><td><p><tt class="literal">\C</tt></p></td><td><p>Match a character (byte)</p></td></tr><tr><td><p><tt class="literal">\pP</tt></p></td><td><p>Match P-named (Unicode) property</p></td></tr><tr><td><p><tt class="literal">\PP</tt></p></td><td><p>Match non-P</p></td></tr><tr><td><p><tt class="literal">\X</tt></p></td><td><p>Match extended unicode sequence</p></td></tr></table><p><p>While Perl implements <tt class="literal">lc()</tt> and <tt class="literal">uc()</tt>, which you can use for testing the proper case of wordsor characters, you can do the same with escapesequences<a name="INDEX-672" /><a name="INDEX-673" /><a name="INDEX-674" /><a name="INDEX-675" /><a name="INDEX-676" /><a name="INDEX-677" />:</p><a name="ch04-12-fm2xml" /><table border="1" cellpadding="3"><tr><th><p>Code</p></th><th><p>Matches</p></th></tr><tr><td><p><tt class="literal">\l</tt></p></td><td><p>Lowercase until next character</p></td></tr><tr><td><p><tt class="literal">\u</tt></p></td><td><p>Uppercase until next character</p></td></tr><tr><td><p><tt class="literal">\L</tt></p></td><td><p>Lowercase until <tt class="literal">\E</tt></p></td></tr><tr><td><p><tt class="literal">\U</tt></p></td><td><p>Uppercase until <tt class="literal">\E</tt></p></td></tr><tr><td><p><tt class="literal">\Q</tt></p></td><td><p>Disable pattern metacharacters until <tt class="literal">\E</tt></p></td></tr><tr><td><p><tt class="literal">\E</tt></p></td><td><p>End case modification</p></td></tr></table><p><p>These elements match any single element in (or not in) their class. A<tt class="literal">\w</tt> matches only one character of a word. Using amodifier, you can match a whole word, for example, with<tt class="literal">\w+</tt>. The abbreviated classes may also be usedwithin brackets as elements of other character classes.</p></div><a name="perlnut2-CHP-4-SECT-6.5" /><div class="sect2"><h3 class="sect2">4.6.5. Anchors</h3><p><a name="INDEX-678" /><a name="INDEX-679" /><a name="INDEX-680" /><a name="INDEX-681" />Anchorsdon't match any characters; they match places withina string. The two most common anchors are <tt class="literal">^</tt> and<tt class="literal">$</tt>, which match the beginning and end of a line,respectively. The following table lists the anchoring patterns usedto match certain boundaries in regular expressions:</p><a name="ch04-13-fm2xml" /><table border="1" cellpadding="3"><tr><th><p>Assertion</p></th><th><p>Meaning</p></th></tr><tr><td><p><tt class="literal">^</tt></p></td><td><p>Matches at the beginning of the string (or line, if<tt class="literal">/m</tt> is used)</p></td></tr><tr><td><p><tt class="literal">$</tt></p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -