📄 ch08_03.htm
字号:
<html><head><title>Anchors (Learning Perl, 3rd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Randal L. Schwartz and Tom Phoenix" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596001320L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Learning Perl, 3rd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Learning Perl, 3rd Edition" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"></a></td><td align="right" valign="top" width="228"><a href="ch08_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">8.3. Anchors</h2><p>By default, if a pattern doesn't match at the start of thestring, it can "float" on down the string, trying tomatch somewhere else. But there are a number of<a name="INDEX-547" />anchors that may beused to hold the pattern at a particular point in a string.</p><p>The <a name="INDEX-548" /><a name="INDEX-549" />caret<a href="#FOOTNOTE-178">[178]</a> anchor (<tt class="literal">^</tt>) marks thebeginning of the string, while the <a name="INDEX-550" />dollar sign (<tt class="literal">$</tt>)marks the end.<a href="#FOOTNOTE-179">[179]</a> So the pattern <tt class="literal">/^fred/</tt>will match <tt class="literal">fred</tt> only at the start of the string;it wouldn't match <tt class="literal">manfred mann</tt>. And<tt class="literal">/rock$/</tt> will match <tt class="literal">rock</tt> only atthe end of the string; it wouldn't match <tt class="literal">knuterockne</tt>.</p><blockquote class="footnote"> <a name="FOOTNOTE-178" /><p>[178]Yes, you'veseen that caret is already used in another way in patterns. As thefirst character of a character class, it negates the class. But<em class="emphasis">outside</em> of a character class, it's ametacharacter in a different way, being the start-of-string anchor.There are only so many characters, so we have to use some of themtwice.</p> </blockquote><blockquote class="footnote"> <a name="FOOTNOTE-179" /><p>[179]Actually, it matches either the end ofthe string, or at a newline at the end of the string. That's sothat you can match the end of the string whether it has a trailingnewline or not. Most folks don't worry about this distinctionmuch, but once in a long while it's important to remember that<tt class="literal">/^fred$/</tt>will match either<tt class="literal">"fred"</tt> or <tt class="literal">"fred\n"</tt> with equalease.</p> </blockquote><p>Sometimes, you'll want to use both of these anchors, to ensurethat the pattern matches an entire string. A common example is<tt class="literal">/^\s*$/</tt>, which matches a blank line. But this"blank" line may include some whitespace characters, liketabs and spaces, which are invisible to you and me. Any line thatmatches that pattern looks just like any other one on paper, so thispattern treats all blank lines as equivalent. Without the anchors, itwould match nonblank lines as well.</p><a name="lperl3-CHP-8-SECT-3.1" /><div class="sect2"><h3 class="sect2">8.3.1. Word Anchors</h3><p>Anchors aren't just at the ends of the string. Theword-boundary anchor, <tt class="literal">\b</tt><a name="INDEX-551" />, matches at either end ofa <a name="INDEX-552" />word.<a href="#FOOTNOTE-180">[180]</a> So we can use <tt class="literal">/\bfred\b/</tt>to match the word <tt class="literal">fred</tt> but not<tt class="literal">frederick</tt> or <tt class="literal">alfred</tt> or<tt class="literal">manfred mann</tt>. This is similar to the feature oftencalled something like "match whole words only" in a wordprocessor's search command.</p><blockquote class="footnote"> <a name="FOOTNOTE-180" /><p>[180]Some regularexpression implementations have one anchor for start-of-word andanother for end-of-word, but Perl uses <tt class="literal">\b</tt> forboth.</p> </blockquote><p>Alas, these aren't words as you and I are likely to think ofthem; they're those <tt class="literal">\w</tt>-type words made up ofordinary letters, digits, and underscores. The <tt class="literal">\b</tt>anchor matches at the start or end of a group of<tt class="literal">\w</tt><a name="INDEX-553" /> characters.</p><p>In <a href="ch08_03.htm#lperl3-CHP-8-FIG-1">Figure 8-1</a>, there's a grey underline undereach "word," and the arrows show the corresponding placeswhere <tt class="literal">\b</tt> could match. There are always an evennumber of word boundaries in a given string, since there's anend-of-word for every start-of-word.</p><p>The "words" are sequences of letters, digits, andunderscores; that is, a word in this sense is what's matched by<tt class="literal">/\w+/</tt>. There are five words in that sentence:<tt class="literal">That</tt>, <tt class="literal">s</tt>, <tt class="literal">a</tt>,<tt class="literal">word</tt>, and <tt class="literal">boundary</tt>.<a href="#FOOTNOTE-181">[181]</a> Noticethat the quote marks around <tt class="literal">word</tt> don'tchange the word boundaries; these words are made of<tt class="literal">\w</tt> characters.</p><blockquote class="footnote"><a name="FOOTNOTE-181" /><p>[181]You can see why we wish that we could change the definition of"word"; <tt class="literal">That's</tt> should be one word, nottwo words with an apostrophe in-between. And even in text that may bemostly ordinary English, it's normal to find a soupçonof other characters spicing things up. </p> </blockquote><p>Each arrow points to the beginning or the end of one of the greyunderlines, since the word boundary anchor <tt class="literal">\b</tt>matches only at the beginning or the end of a group of wordcharacters.</p><a name="lperl3-CHP-8-FIG-1" /><div class="figure"><img height="57" alt="Figure 8-1" src="figs/lrnp_0801.gif" width="281" /></div><h4 class="objtitle">Figure 8-1. Word-boundary matches with \b</h4><p>The word-boundary anchor is useful to ensure that we don'taccidentally find <tt class="literal">cat</tt> in<tt class="literal">delicatessen</tt>, <tt class="literal">dog</tt> in<tt class="literal">boondoggle</tt>, or <tt class="literal">fish</tt> in<tt class="literal">selfishness</tt>. Sometimes you'll want just oneword-boundary anchor, as when using <tt class="literal">/\bhunt/</tt> tomatch words like <tt class="literal">hunt</tt> or<tt class="literal">hunting</tt> or <tt class="literal">hunter</tt>, but not<tt class="literal">shunt</tt>, or when using <tt class="literal">/stone\b/</tt>to match words like <tt class="literal">sandstone</tt> or<tt class="literal">flintstone</tt><a name="INDEX-554" /> but not <tt class="literal">capstones</tt>.</p><p>The nonword-boundary anchor is<tt class="literal">\B</tt><a name="INDEX-555" />; it matches at any point where<tt class="literal">\b</tt> would not match. So the pattern<tt class="literal">/\bsearch\B/</tt> will match<tt class="literal">searches</tt>, <tt class="literal">searching</tt>, and<tt class="literal">searched</tt>, but not <tt class="literal">search</tt> or<tt class="literal">researching</tt>.</p></div><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch08_02.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch08_04.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">8.2. General Quantifiers</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">8.4. Memory Parentheses</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -