📄 ch10.htm
字号:
<HTML><HEAD><TITLE>Chapter 10 -- Regular Expressions</TITLE><META></HEAD><BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910"><H1><FONT SIZE=6 COLOR=#FF0000>Chapter 10</FONT></H1><H1><FONT SIZE=6 COLOR=#FF0000>Regular Expressions</FONT></H1><HR><P><CENTER><B><FONT SIZE=5>CONTENTS</FONT></B></CENTER><UL><LI><A HREF="#PatternDelimiters">Pattern Delimiters</A><LI><A HREF="#TheMatchingOperatorm">The Matching Operator (m//)</A><UL><LI><A HREF="#TheMatchingOptions">The Matching Options</A></UL><LI><A HREF="#TheSubstitutionOperators">The Substitution Operator (s///)</A><UL><LI><A HREF="#TheSubstitutionOptions">The Substitution Options</A></UL><LI><A HREF="#TheTranslationOperatortr">The Translation Operator (tr///)</A><UL><LI><A HREF="#TheTranslationOptions">The Translation Options</A></UL><LI><A HREF="#TheBindingOperatorsand">The Binding Operators (=~ and !~)</A><LI><A HREF="#HowtoCreatePatterns">How to Create Patterns</A><UL><LI><A HREF="#ExampleCharacterClasses">Example: Character Classes</A><LI><A HREF="#ExampleQuantifiers">Example: Quantifiers</A><LI><A HREF="#ExamplePatternMemory">Example: Pattern Memory</A><LI><A HREF="#ExamplePatternPrecedeNCe">Example: Pattern PrecedeNCe</A><LI><A HREF="#ExampleExtensionSyntax">Example: Extension Syntax</A></UL><LI><A HREF="#PatternExamples">Pattern Examples</A><UL><LI><A HREF="#ExampleUsingtheMatchOperator">Example: Using the Match Operator</A><LI><A HREF="#ExampleUsingtheSubstitutionOperator">Example: Using the Substitution Operator</A><LI><A HREF="#ExampleUsingtheTranslationOperator">Example: Using the Translation Operator</A><LI><A HREF="#ExampleUsingtheISplitIFuNCtion">Example: Using the <I>Split()</I> FuNCtion</A></UL><LI><A HREF="#Summary">Summary</A><LI><A HREF="#ReviewQuestions">Review Questions</A><LI><A HREF="#ReviewExercises">Review Exercises</A></UL><HR><P>You can use a <I>regular expression</I> to find patterns in strings:for example, to look for a specific name in a phone list or allof the names that start with the letter <I>a</I>. Pattern matchingis one of Perl's most powerful and probably least understood features.But after you read this chapter, you'll be able to handle regularexpressions almost as well as a Perl guru. With a little practice,you'll be able to do some iNCredibly handy things.<P>There are three main uses for regular expressions in Perl: matching,substitution, and translation. The matching operation uses the<TT>m//</TT> operator, which evaluatesto a true or false value. The substitution operation substitutesone expression for another; it uses the <TT>s//</TT>operator. The translation operation translates one set of charactersto another and uses the <TT>tr//</TT>operator. These operators are summarized in Table 10.1.<BR><P><CENTER><B>Table 10.1 Perl's Regular Expression Operators</B></CENTER><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=151><I>Operator</I></TD><TD WIDTH=438><I>Description</I></TD></TR><TR><TD WIDTH=151>m/PATTERN/</TD><TD WIDTH=438>This operator returns true if PATTERN is found in <TT>$_</TT>.</TD></TR><TR><TD WIDTH=151>s/PATTERN/</TD><TD WIDTH=438>This operator replaces the sub-string matched by </TD></TR><TR><TD WIDTH=151>REPLACEMENT/</TD><TD WIDTH=438>PATTERN with REPLACEMENT.</TD></TR><TR><TD WIDTH=151>tr/CHARACTERS/</TD><TD WIDTH=438>This operator replaces characters specified by </TD></TR><TR><TD WIDTH=151>REPLACEMENTS/</TD><TD WIDTH=438>CHARACTERS with the characters in REPLACEMENTS.</TD></TR></TABLE></CENTER><P><P>All three regular expression operators work with <TT>$_</TT>as the string to search. You can use the binding operators (seethe section "The Binding Operators" later in this section)to search a variable other than <TT>$_</TT>.<P>Both the matching (<TT>m//</TT>) andthe substitution (<TT>s///</TT>) operatorsperform variable interpolation on the PATTERN and REPLACEMENTstrings. This comes in handy if you need to read the pattern fromthe keyboard or a file.<P>If the match pattern evaluates to the empty string, the last validpattern is used. So, if you see a statement like print if <TT>//</TT>;in a Perl program, look for the previous regular expression operatorto see what the pattern really is. The substitution operator alsouses this interpretation of the empty pattern.<P>In this chapter, you learn about pattern delimiters and then abouteach type of regular expression operator. After that, you learnhow to create patterns in the section "How to Create Patterns."Then, the "Pattern Examples" section shows you somesituations and how regular expressions can be used to resolvethe situations.<H2><A NAME="PatternDelimiters"><FONT SIZE=5 COLOR=#FF0000>Pattern Delimiters</FONT></A></H2><P>Every regular expression operator allows the use of alternative<I>pattern delimiters</I>. A <I>delimiter </I>marks the beginningand end of a given pattern. In the following statement,<BLOCKQUOTE><PRE>m//;</PRE></BLOCKQUOTE><P>you see two of the standard delimiters-the slashes (<TT>//</TT>).However, you can use any character as the delimiter. This featureis useful if you want to use the slash character inside your pattern.For instaNCe, to match a file you would normally use:<BLOCKQUOTE><PRE>m/\/root\/home\/random.dat/</PRE></BLOCKQUOTE><P>This match statement is hard to read because all of the slashesseem to run together (some programmers say they look like teepees).If you use an alternate delimiter, if might look like this:<BLOCKQUOTE><PRE>m!/root/home/random.dat!</PRE></BLOCKQUOTE><P>or<BLOCKQUOTE><PRE>m{/root/home/random.dat}</PRE></BLOCKQUOTE><P>You can see that these examples are a little clearer. The lastexample also shows that if a left bracket is used as the startingdelimiter, then the ending delimiter must be the right bracket.<P>Both the match and substitution operators let you use variableinterpolation. You can take advantage of this to use a single-quotedstring that does not require the slash to be escaped. For instaNCe:<BLOCKQUOTE><PRE>$file = '/root/home/random.dat';m/$file/; </PRE></BLOCKQUOTE><P>You might find that this technique yields clearer code than simplychanging the delimiters.<P>If you choose the single quote as your delimiter character, thenno variable interpolation is performed on the pattern. However,you still need to use the backslash character to escape any ofthe meta-characters discussed in the "How to Create Patterns"section later in this chapter.<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Tip</B></TD></TR><TR><TD><BLOCKQUOTE>I tend to avoid delimiters that might be confused with characters in the pattern. For example, using the plus sign as a delimiter (<TT>m+abc+</TT>) does not help program readability. A casual reader might think that you intend to add two expressions instead of matching them.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Caution</B></TD></TR><TR><TD><BLOCKQUOTE>The <TT>?</TT> has a special meaning when used as a match pattern delimiter. It works like the <TT>/</TT> delimiter except that it matches only oNCe between calls to the <TT>reset()</TT> fuNCtion. This feature may be removed in future versions of Perl, so avoid using it.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P>The next few sections look at the matching, substitution, andtranslation operators in more detail.<H2><A NAME="TheMatchingOperatorm"><FONT SIZE=5 COLOR=#FF0000>The Matching Operator (m//)</FONT></A></H2><P>The matching operator (<TT>m//</TT>)is used to find patterns in strings. One of its more common usesis to look for a specific string inside a data file. For instaNCe,you might look for all customers whose last name is "Johnson,"or you might need a list of all names starting with the letter<I>s</I>.<P>The matching operator only searches the <TT>$_</TT>variable. This makes the match statement shorter because you don'tneed to specify where to search. Here is a quick example:<BLOCKQUOTE><PRE>$_ = "AAA bbb AAA";print "Found bbb\n" if m/bbb/;</PRE></BLOCKQUOTE><P>The print statement is executed only if the <TT>bbb</TT>character sequeNCe is found in the <TT>$_</TT>variable. In this particular case, <TT>bbb</TT>will be found, so the program will display the following:<BLOCKQUOTE><PRE>Found bbb</PRE></BLOCKQUOTE><P>The matching operator allows you to use variable interpolationin order to create the pattern. For example:<BLOCKQUOTE><PRE>$needToFind = "bbb";$_ = "AAA bbb AAA";print "Found bbb\n" if m/$needToFind/;</PRE></BLOCKQUOTE><P>Using the matching operator is so commonplace that Perl allowsyou to leave off the <TT>m</TT> fromthe matching operator as long as slashes are used as delimiters:<BLOCKQUOTE><PRE>$_ = "AAA bbb AAA";print "Found bbb\n" if /bbb/;</PRE></BLOCKQUOTE><P>Using the matching operator to find a string inside a file isvery easy because the defaults are designed to facilitate thisactivity. For example:<BLOCKQUOTE><PRE>$target = "M";open(INPUT, "<findstr.dat");while (<INPUT>) { if (/$target/) { print "Found $target on line $."; }}close(INPUT);<BR></PRE></BLOCKQUOTE><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Note</B></TD></TR><TR><TD><BLOCKQUOTE>The <TT>$.</TT> special variable keeps track of the record number. Every time the diamond operators read a line, this variable is iNCremented.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P>This example reads every line in an input searching for the letter<TT>M</TT>. When an <TT>M</TT>is found, the print statement is executed. The print statementprints the letter that is found and the line number it was foundon.<H3><A NAME="TheMatchingOptions">The Matching Options</A></H3><P>The matching operator has several options that enhaNCe its utility.The most useful option is probably the capability to ignore caseand to create an array of all matches in a string. Table 10.2shows the options you can use with the matching operator.<BR><P><CENTER><B>Table 10.2 Options for the Matching Operator</B></CENTER><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=73><CENTER><I>Option</I></CENTER></TD><TD WIDTH=517><I>Description</I></TD></TR><TR><TD WIDTH=73><CENTER>g</CENTER></TD><TD WIDTH=517>This option finds all occurreNCes of the pattern in the string. A list of matches is returned or you can iterate over the matches using a loop statement.</TD></TR><TR><TD WIDTH=73><CENTER>i</CENTER></TD><TD WIDTH=517>This option ignores the case of characters in the string.</TD></TR><TR><TD WIDTH=73><CENTER>m</CENTER></TD><TD WIDTH=517>This option treats the string as multiple lines. Perl does some optimization by assuming that <TT>$_</TT> contains a single line of input. If you know that it contains multiple newline characters, use this option to turn off the optimization.</TD></TR><TR><TD WIDTH=73><CENTER>o</CENTER></TD><TD WIDTH=517>This option compiles the pattern only oNCe. You can achieve some small performaNCe gains with this option. It should be used with variable interpolation only when the value of the variable will not change during the lifetime of the program.</TD></TR><TR><TD WIDTH=73><CENTER>s</CENTER></TD><TD WIDTH=517>This option treats the string as a single line.</TD></TR><TR><TD WIDTH=73><CENTER>x</CENTER></TD><TD WIDTH=517>This option lets you use extended regular expressions. Basically, this means that Perl will ignore white space that's not escaped with a backslash or within a character class. I highly recommend this option so you can use spaces to make your regular expressions more readable. See the section, "Example: Extension Syntax," later in this chapter for more information.</TD></TR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -