appa_07.htm
来自「by Randal L. Schwartz and Tom Phoenix I」· HTM 代码 · 共 198 行
HTM
198 行
<html><head><title>Answers to Chapter 8 Exercises (Learning Perl, 3rd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" /><meta name="DC.Creator" content="Randal L. Schwartz and Tom Phoenix" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly & Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596001320L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Learning Perl, 3rd Edition" /><meta name="DC.Type" content="Text.Monograph" /></head><body bgcolor="#ffffff"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Learning Perl, 3rd Edition" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="appa_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"></a></td><td align="right" valign="top" width="228"><a href="appa_08.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div><h2 class="sect1">A.7. Answers to Chapter 8 Exercises</h2><ol><li><p>Here's one way to do it:</p><blockquote><pre class="code">/\b(fred|wilma)\s+flintstone\b/</pre></blockquote><p>If you forgot to use the <tt class="literal">\b</tt> word-boundary anchors,take off half a point; without those, this would mistakenly matchstrings like <tt class="literal">alfred</tt><tt class="literal">flintstones</tt>. The exercise description said tomatch <em class="emphasis">words</em>.</p></li><li><p>The point of this exercise may not be obvious, but in the real world,you'll often have to do something similar. Someday,you'll be unlucky enough to have a confusing program tomaintain, and you'll wonder what the author was trying toaccomplish.<a href="#FOOTNOTE-394">[394]</a></p><blockquote class="footnote"> <a name="FOOTNOTE-394" /><p>[394]If you're especially unlucky, thishappens when you look at your own code ten minutes after writingit.</p> </blockquote><p><tt class="literal">/"([^"]*)"/</tt> matches a simple string in doublequotes. By a "simple" string, we don't mean onelike Perl's double-quoted strings, which could contain abackslashed double-quote mark or other backslash magic. This matchesjust a double-quote mark, the contents of the string (whichcan't contain a double quote), and a closing quote mark. Thecontents may be empty. The parentheses aren't needed forgrouping, so they seem to be memory parentheses; as we'll seein the next chapter, this regular expression memory, which holds thequoted substring, is probably being saved for some later use. Perhapsthis pattern would be used in reading a configuration file withquoted strings, although in that case it should probably use anchors.</p><p><tt class="literal">/^0?[0-3]?[0-7]{1,2}$/</tt> matches if the string hasnothing but an octal number (perhaps with a leading zero) in therange from <tt class="literal">0</tt> through <tt class="literal">0377</tt>. Notethat this one is anchored at both ends, so it doesn't allowanything else in the string before or after the number. (The previouspattern wasn't anchored; it could match anywhere in thestring.)</p><p><tt class="literal">/^\b[\w.]{1,12}\b$/</tt> matches strings made up ofnothing but letters, digits, underscores, and dots, but neverstarting or ending with a dot. Also, the strings are limited to amaximum of 12 characters.</p><p>You may have noticed that the dot inside the character class is notspecial, so it doesn't need to be backslashed. That makes thecharacter class match ordinary letters, digits, and underscores, andalso dots.</p><p>The way we can be sure that this one won't allow a string tostart or end with a dot is that it has both a word-boundary anchorand a start-of-string or end-of-string anchor at each end of thestring. The word-boundary anchor can match only if there's a"word" starting or ending there, and a dot can't bepart of a word.</p><p>So, this would match strings like <tt class="literal">perl.tar.gz</tt>, butnot <tt class="literal">some_excessively_long_filename</tt> or<tt class="literal">perl.tar.</tt> or <tt class="literal">.profile</tt> or<tt class="literal">..</tt>.<a href="#FOOTNOTE-395">[395]</a> This pattern could be useful forvalidating user-chosen filenames.</p><blockquote class="footnote"> <a name="FOOTNOTE-395" /><p>[395]You may know that file anddirectory names beginning with a dot are not displayed by default onUnix systems, and that the special directory name <tt class="literal">..</tt>always means the directory one level higher in thehierarchy.</p> </blockquote></li><li><p>Here's one way to do it:</p><blockquote><pre class="code">/^\$[A-Za-z_]\w*$/</pre></blockquote><p>The dollar sign at the start has to be backslashed to mean a realdollar sign. What follows must be a letter or underscore, then zeroor more letters, digits, or underscores.</p></li><li><p>This pattern is surprisingly tricky to get right. Here's how weconstruct it, step by step.</p><p>We start out by needing to match a word, so that's<tt class="literal">/\w+/</tt>. Of course, we want to remember that wordfor later, so we add parentheses: <tt class="literal">/(\w+)/</tt>. And wewant to match it when it occurs two or more times, so that's<tt class="literal">/(\w+)\1+/</tt>. (The plus sign at the end means<em class="emphasis">one</em> or more times -- but that's inaddition to the one time that the word occurred originally.)</p><p>But we're not done yet. Now we need to allow for the whitespacewhich may come between the words. We don't want to memorize thewhitespace (since it may vary), so we'll put it outside theparentheses: <tt class="literal">/(\w+)\s\1+/</tt>. Oh, but there could beany number of whitespace characters, so long as there's atleast one, so we'll add a plus sign. So now we have<tt class="literal">/(\w+)\s+\1+/</tt>.</p><p>But that's not right; the final plus sign is modifying thebackreference alone. We need it to apply to both the backreference(that is, our repeated word) and the whitespace in front of it:<tt class="literal">/(\w+)(\s+\1)+/</tt>. So, now we can match a tripleword. First, the part in the first parenthesis pair matches the firstoccurrence, then the part in the second parenthesis pair can twicematch some whitespace followed by that same word. When we try it out,it matches all of our sentences with doubled words, so we happily putit into our program and move on to the next project.</p><p>Then, the next week, we get a bug report! The pattern reports a matchon the sentence <tt class="literal">This is a test</tt>, even thoughthere's clearly no doubled word there. In moments, we'vefired up the pattern test program<a href="#FOOTNOTE-396">[396]</a>to see what part of the string is matching: <tt class="literal">|Th<is is>a test|</tt>. There it is, a doubled word <tt class="literal">is</tt>,hidden in an ordinary string.</p><blockquote class="footnote"> <a name="FOOTNOTE-396" /><p>[396]We told you that itwould come in handy, and we weren't kidding.</p> </blockquote><p>Clearly, this is a job for a word boundary anchor; we can'thave our word start in the middle of another word. So we fix theprogram to use <tt class="literal">/\b(\w+)(\s+\1)+/</tt>, and sit back,confident that we've got it right this time.</p><p>And then, just when you got started on another project,<em class="emphasis">another</em> bug report comes in. This time,we've matched the doubled word <tt class="literal">the</tt> in thephrase <tt class="literal">the</tt> <tt class="literal">theory</tt>. Yes, we needa word boundary at the <em class="emphasis">end</em> of the pattern tokeep from matching a partial word there:<tt class="literal">/\b(\w+)(\s+\1)+\b/</tt>. Now we've finallygotten it right.</p><p>What you've just read is a true story. The regular expressionhas been changed, but the bug reports are real. It does happen, moreoften than we'd like to admit, that even after you'vebeen writing these patterns for years, you can make a pattern whichhas a bug, you can test it with a number of test cases, you can putit into a long-running program, the Perl documentation, or even abest-selling Perl book, and not realize that the bug is there untilmuch later.</p><p>The moral of the story is that regular expressions can bechallenging. If you're serious about learning about regularexpressions, though (and all Perl programmers should be), we highlyrecommend the book <em class="emphasis">Mastering RegularExpressions</em>, by Jeffry Friedl (O'Reilly &Associates, Inc.).</p></li></ol><hr width="684" align="left" /><div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="appa_06.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="appa_08.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">A.6. Answers to Chapter 7 Exercises</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">A.8. Answers to Chapter 9 Exercises</td></tr></table></div><hr width="684" align="left" /><img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" /><p><p><font size="-1"><a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font></p><map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map></body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?