⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch05_03.htm

📁 编程珍珠,里面很多好用的代码,大家可以参考学习呵呵,
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<html><head><title>Metacharacters and Metasymbols (Programming Perl)</title><!-- STYLESHEET --><link rel="stylesheet" type="text/css" href="../style/style1.css"><!-- METADATA --><!--Dublin Core Metadata--><meta name="DC.Creator" content=""><meta name="DC.Date" content=""><meta name="DC.Format" content="text/xml" scheme="MIME"><meta name="DC.Generator" content="XSLT stylesheet, xt by James Clark"><meta name="DC.Identifier" content=""><meta name="DC.Language" content="en-US"><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc."><meta name="DC.Source" content="" scheme="ISBN"><meta name="DC.Subject.Keyword" content=""><meta name="DC.Title" content="Metacharacters and Metasymbols"><meta name="DC.Type" content="Text.Monograph"></head><body><!-- START OF BODY --><!-- TOP BANNER --><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home"><map name="banner-map"><AREA SHAPE="RECT" COORDS="0,0,466,71" HREF="index.htm" ALT="Programming Perl"><AREA SHAPE="RECT" COORDS="467,0,514,18" HREF="jobjects/fsearch.htm" ALT="Search this book"></map><!-- TOP NAV BAR --><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_02.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="ch05_01.htm">Chapter 5: Pattern Matching</a></td><td align="right" valign="top" width="172"><a href="ch05_04.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr></table></div><hr width="515" align="left"><!-- SECTION BODY --><h2 class="sect1">5.3. Metacharacters and Metasymbols</h2><p><a name="INDEX-1456"></a><a name="INDEX-1457"></a><a name="INDEX-1458"></a>Now that we've admired all the fancy cages, we can go back to lookingat the critters in the cages, those funny-looking symbols you putinside the patterns.  By now you'll have cottoned to the fact thatthese symbols aren't regular Perl code like function calls orarithmetic operators.  Regular expressions are their own littlelanguage nestled inside of Perl.  (There's a bit of the jungle in allof us.)</p><p>For all their power and expressivity, patterns in Perl recognize thesame 12 traditional metacharacters (the Dirty Dozen, as it were)found in many other regular expression packages:<blockquote><pre class="programlisting">\ | ( ) [ { ^ $ * + ? .</pre></blockquote>Some of those bend the rules, making otherwise normal characters thatfollow them special.  We don't like to call the longer sequences"characters", so when they make longer sequences, we call them<em class="emphasis">metasymbols</em> (or sometimes just "symbols").  Butat the top level, those twelve metacharacters are all you (and Perl)need to think about.  Everything else proceeds from there.</p><p><a name="INDEX-1459"></a><a name="INDEX-1460"></a><a name="INDEX-1461"></a><a name="INDEX-1462"></a><a name="INDEX-1463"></a><a name="INDEX-1464"></a><a name="INDEX-1465"></a><a name="INDEX-1466"></a><a name="INDEX-1467"></a><a name="INDEX-1468"></a><a name="INDEX-1469"></a>Some simple metacharacters stand by themselves, like<tt class="literal">.</tt> and <tt class="literal">^</tt> and<tt class="literal">$</tt>.  They don't directly affect anything aroundthem.  Some metacharacters work like prefix operators, governing whatfollows them, like <tt class="literal">\</tt>.  Others work like postfixoperators, governing what immediately precedes them, like<tt class="literal">*</tt>, <tt class="literal">+</tt>, and <tt class="literal">?</tt>.One metacharacter, <tt class="literal">|</tt>, acts like an infix operator,standing between the operands it governs.  There are even bracketingmetacharacters that work like circumfix operators, governing somethingcontained inside them, like <tt class="literal">(...)</tt> and<tt class="literal">[...]</tt>.  Parentheses are particularlyimportant, because they specify the bounds of <tt class="literal">|</tt> onthe inside, and of <tt class="literal">*</tt>, <tt class="literal">+</tt>, and<tt class="literal">?</tt> on the outside.</p><p><a name="INDEX-1470"></a><a name="INDEX-1471"></a>If you learn only one of the twelve metacharacters, choose thebackslash.  (Er&nbsp;.&nbsp;.&nbsp;.&nbsp;and the parentheses.)That's because backslash disables the others.  When a backslashprecedes a nonalphanumeric character in a Perl pattern, it alwaysmakes that next character a literal.  If you need to match one of thetwelve metacharacters in a pattern literally, you write them with abackslash in front.  Thus, <tt class="literal">\.</tt> matches a real dot,<tt class="literal">\$</tt> a real dollar sign, <tt class="literal">\\</tt> a realbackslash, and so on.  This is known as "escaping" the metacharacter,or "quoting it", or sometimes just "backslashing" it.  (Of course, youalready know that backslash is used to suppress variable interpolationin double-quoted strings.)</p><p>Although a backslash turns a metacharacter into a literal character,its effect upon a following alphanumeric character goes the otherdirection.  It takes something that was regular and makes it special.That is, together they make a metasymbol.  An alphabetical list ofthese metasymbols can be found below in<a href="ch05_03.htm#perl3-tab-regex-meta-alpha">Table 5-7</a>.</p><h3 class="sect2">5.3.1. Metasymbol Tables</h3><p><a name="INDEX-1472"></a><a name="INDEX-1473"></a>In the following tables, the Atomic column says "yes" if the givenmetasymbol is quantifiable (if it can match something with width, moreor less).  Also, we've used "<tt class="literal">...</tt>" to represent "something else".Please see the later discussion to find out what "<tt class="literal">...</tt>" means, if itis not clear from the one-line gloss in the table.)</p><p><a name="INDEX-1474"></a><a name="INDEX-1475"></a><a href="ch05_03.htm#perl3-tab-general-metacharacters">Table 5-4</a> shows the basictraditional metasymbols.  The first four of these are the structuralmetasymbols we mentioned earlier, while the last three are simplemetacharacters.  The <tt class="literal">.</tt> metacharacter is an exampleof an atom because it matches something with width (the width of acharacter, in this case); <tt class="literal">^</tt> and<tt class="literal">$</tt> are examples of assertions, because they matchsomething of zero width, and because they are only evaluated to see ifthey're true or not.</p><a name="perl3-tab-general-metacharacters"></a><h4 class="objtitle">Table 5.4. General Regex Metacharacters</h4><table border="1"><tr><th>Symbol</th><th>Atomic</th><th>Meaning</th></tr><tr><td><tt class="literal">\...</tt></td><td>Varies</td><td><p>De-meta next nonalphanumeric character, meta nextalphanumeric character (maybe).<a name="INDEX-1476"></a></p></td></tr><tr><td><tt class="literal">...|...</tt><a name="INDEX-1477"></a></td><td>No</td><td><p>Alternation (match one or the other).<a name="INDEX-1478"></a></p></td></tr><tr><td><tt class="literal">(...)</tt></td><td>Yes</td><td><p>Grouping (treat as a unit).<a name="INDEX-1479"></a><a name="INDEX-1480"></a></p></td></tr><tr><td><tt class="literal">[...]</tt></td><td>Yes</td><td><p>Character class (match one character from a set).<a name="INDEX-1481"></a></p></td></tr><tr><td><tt class="literal">^</tt></td><td>No</td><td>True at beginning of string (or after any newline, maybe).</td></tr><tr><td><tt class="literal">.</tt></td><td>Yes</td><td><p>Match one character (except newline, normally).</p></td></tr><tr><td><tt class="literal">$</tt></td><td>No</td><td><p>True at end of string (or before any newline, maybe).</p></td></tr></table><p><a name="INDEX-1482"></a><a name="INDEX-1483"></a>The quantifiers, which are further described in their own section,indicate how many times the preceding atom (that is, single characteror grouping) should match.  These are listed in<a href="ch05_03.htm#perl3-tab-regex-quantifiers">Table 5-5</a>.</p><a name="perl3-tab-regex-quantifiers"></a><h4 class="objtitle">Table 5.5. Regex Quantifiers</h4><table border="1"><tr><th>Quantifier</th><th>Atomic</th><th><p>Meaning</p></th></tr><tr><td><p><tt class="literal">*</tt></p></td><td>No</td><td><p>Match 0 or more times (maximal).<a name="INDEX-1484"></a></p></td></tr><tr><td><tt class="literal">+</tt></td><td>No</td><td><p>Match 1 or more times (maximal).<a name="INDEX-1485"></a></p></td></tr><tr><td><tt class="literal">?</tt></td><td>No</td><td><p>Match 1 or 0 times (maximal).<a name="INDEX-1486"></a></p></td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">COUNT</em><tt class="literal">}</tt></td><td>No</td><td><p>Match exactly <em class="replaceable">COUNT</em> times.<a name="INDEX-1487"></a></p></td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,}</tt></td><td>No</td><td><p>Match at least <em class="replaceable">MIN</em> times (maximal).<a name="INDEX-1488"></a></p></td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,</tt><em class="replaceable">MAX</em><tt class="literal">}</tt></td><td>No</td><td><p>Match at least <em class="replaceable">MIN</em> but not more than <em class="replaceable">MAX</em> times (maximal).<a name="INDEX-1489"></a></p></td></tr><tr><td><tt class="literal">*?</tt></td><td>No</td><td><p>Match 0 or more times (minimal).<a name="INDEX-1490"></a></p></td></tr><tr><td><tt class="literal">+?</tt></td><td>No</td><td><p>Match 1 or more times (minimal).<a name="INDEX-1491"></a></p></td></tr><tr><td><tt class="literal">??</tt></td><td>No</td><td><p>Match 0 or 1 time (minimal).<a name="INDEX-1492"></a></p></td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,}?</tt></td><td>No</td><td><p>Match at least <em class="replaceable">MIN</em> times (minimal).</p></td></tr><tr><td><tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,</tt><em class="replaceable">MAX</em><tt class="literal">}?</tt></td><td>No</td><td><p>Match at least <em class="replaceable">MIN</em> but not more than<em class="replaceable">MAX</em> times (minimal).</p></td></tr></table><p><a name="INDEX-1493"></a>A minimal quantifier tries to match as <em class="emphasis">few</em>characters as possible within its allowed range.  A maximal quantifiertries to match as <em class="emphasis">many</em> characters as possiblewithin its allowed range.  For instance, <tt class="literal">.+</tt> isguaranteed to match at least one character of the string, but willmatch all of them given the opportunity.  The opportunities arediscussed later in "The Little Engine That /Could(n't)?/".</p><p>You'll note that quantifiers may never be quantified.</p><p><a name="INDEX-1494"></a>We wanted to provide an extensible syntax for new kinds ofmetasymbols.  Given that we only had a dozen metacharacters to workwith, we chose a formerly illegal regex sequence to use for arbitrary

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -