📄 ch10.htm
字号:
<HTML>
<HEAD>
<TITLE>Chapter 10 -- Regular Expressions</TITLE>
<META>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<H1><FONT SIZE=6 COLOR=#FF0000>Chapter 10</FONT></H1>
<H1><FONT SIZE=6 COLOR=#FF0000>Regular Expressions</FONT></H1>
<HR>
<P>
<CENTER><B><FONT SIZE=5>CONTENTS</FONT></B></CENTER>
<UL>
<LI><A HREF="#PatternDelimiters">
Pattern Delimiters</A>
<LI><A HREF="#TheMatchingOperatorm">
The Matching Operator (m//)</A>
<UL>
<LI><A HREF="#TheMatchingOptions">
The Matching Options</A>
</UL>
<LI><A HREF="#TheSubstitutionOperators">
The Substitution Operator (s///)</A>
<UL>
<LI><A HREF="#TheSubstitutionOptions">
The Substitution Options</A>
</UL>
<LI><A HREF="#TheTranslationOperatortr">
The Translation Operator (tr///)</A>
<UL>
<LI><A HREF="#TheTranslationOptions">
The Translation Options</A>
</UL>
<LI><A HREF="#TheBindingOperatorsand">
The Binding Operators (=~ and !~)</A>
<LI><A HREF="#HowtoCreatePatterns">
How to Create Patterns</A>
<UL>
<LI><A HREF="#ExampleCharacterClasses">
Example: Character Classes</A>
<LI><A HREF="#ExampleQuantifiers">
Example: Quantifiers</A>
<LI><A HREF="#ExamplePatternMemory">
Example: Pattern Memory</A>
<LI><A HREF="#ExamplePatternPrecedeNCe">
Example: Pattern PrecedeNCe</A>
<LI><A HREF="#ExampleExtensionSyntax">
Example: Extension Syntax</A>
</UL>
<LI><A HREF="#PatternExamples">
Pattern Examples</A>
<UL>
<LI><A HREF="#ExampleUsingtheMatchOperator">
Example: Using the Match Operator</A>
<LI><A HREF="#ExampleUsingtheSubstitutionOperator">
Example: Using the Substitution Operator</A>
<LI><A HREF="#ExampleUsingtheTranslationOperator">
Example: Using the Translation Operator</A>
<LI><A HREF="#ExampleUsingtheISplitIFuNCtion">
Example: Using the <I>Split()</I> FuNCtion</A>
</UL>
<LI><A HREF="#Summary">
Summary</A>
<LI><A HREF="#ReviewQuestions">
Review Questions</A>
<LI><A HREF="#ReviewExercises">
Review Exercises</A>
</UL>
<HR>
<P>
You can use a <I>regular expression</I> to find patterns in strings:
for example, to look for a specific name in a phone list or all
of the names that start with the letter <I>a</I>. Pattern matching
is one of Perl's most powerful and probably least understood features.
But after you read this chapter, you'll be able to handle regular
expressions almost as well as a Perl guru. With a little practice,
you'll be able to do some iNCredibly handy things.
<P>
There are three main uses for regular expressions in Perl: matching,
substitution, and translation. The matching operation uses the
<TT>m//</TT> operator, which evaluates
to a true or false value. The substitution operation substitutes
one expression for another; it uses the <TT>s//</TT>
operator. The translation operation translates one set of characters
to another and uses the <TT>tr//</TT>
operator. These operators are summarized in Table 10.1.<BR>
<P>
<CENTER><B>Table 10.1 Perl's Regular Expression Operators</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=151><I>Operator</I></TD><TD WIDTH=438><I>Description</I>
</TD></TR>
<TR><TD WIDTH=151>m/PATTERN/</TD><TD WIDTH=438>This operator returns true if PATTERN is found in <TT>$_</TT>.
</TD></TR>
<TR><TD WIDTH=151>s/PATTERN/</TD><TD WIDTH=438>This operator replaces the sub-string matched by
</TD></TR>
<TR><TD WIDTH=151>REPLACEMENT/</TD><TD WIDTH=438>PATTERN with REPLACEMENT.
</TD></TR>
<TR><TD WIDTH=151>tr/CHARACTERS/</TD><TD WIDTH=438>This operator replaces characters specified by
</TD></TR>
<TR><TD WIDTH=151>REPLACEMENTS/</TD><TD WIDTH=438>CHARACTERS with the characters in REPLACEMENTS.
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
All three regular expression operators work with <TT>$_</TT>
as the string to search. You can use the binding operators (see
the section "The Binding Operators" later in this section)
to search a variable other than <TT>$_</TT>.
<P>
Both the matching (<TT>m//</TT>) and
the substitution (<TT>s///</TT>) operators
perform variable interpolation on the PATTERN and REPLACEMENT
strings. This comes in handy if you need to read the pattern from
the keyboard or a file.
<P>
If the match pattern evaluates to the empty string, the last valid
pattern is used. So, if you see a statement like print if <TT>//</TT>;
in a Perl program, look for the previous regular expression operator
to see what the pattern really is. The substitution operator also
uses this interpretation of the empty pattern.
<P>
In this chapter, you learn about pattern delimiters and then about
each type of regular expression operator. After that, you learn
how to create patterns in the section "How to Create Patterns."
Then, the "Pattern Examples" section shows you some
situations and how regular expressions can be used to resolve
the situations.
<H2><A NAME="PatternDelimiters"><FONT SIZE=5 COLOR=#FF0000>
Pattern Delimiters</FONT></A></H2>
<P>
Every regular expression operator allows the use of alternative
<I>pattern delimiters</I>. A <I>delimiter </I>marks the beginning
and end of a given pattern. In the following statement,
<BLOCKQUOTE>
<PRE>
m//;
</PRE>
</BLOCKQUOTE>
<P>
you see two of the standard delimiters-the slashes (<TT>//</TT>).
However, you can use any character as the delimiter. This feature
is useful if you want to use the slash character inside your pattern.
For instaNCe, to match a file you would normally use:
<BLOCKQUOTE>
<PRE>
m/\/root\/home\/random.dat/
</PRE>
</BLOCKQUOTE>
<P>
This match statement is hard to read because all of the slashes
seem to run together (some programmers say they look like teepees).
If you use an alternate delimiter, if might look like this:
<BLOCKQUOTE>
<PRE>
m!/root/home/random.dat!
</PRE>
</BLOCKQUOTE>
<P>
or
<BLOCKQUOTE>
<PRE>
m{/root/home/random.dat}
</PRE>
</BLOCKQUOTE>
<P>
You can see that these examples are a little clearer. The last
example also shows that if a left bracket is used as the starting
delimiter, then the ending delimiter must be the right bracket.
<P>
Both the match and substitution operators let you use variable
interpolation. You can take advantage of this to use a single-quoted
string that does not require the slash to be escaped. For instaNCe:
<BLOCKQUOTE>
<PRE>
$file = '/root/home/random.dat';
m/$file/;
</PRE>
</BLOCKQUOTE>
<P>
You might find that this technique yields clearer code than simply
changing the delimiters.
<P>
If you choose the single quote as your delimiter character, then
no variable interpolation is performed on the pattern. However,
you still need to use the backslash character to escape any of
the meta-characters discussed in the "How to Create Patterns"
section later in this chapter.<BR>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Tip</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
I tend to avoid delimiters that might be confused with characters in the pattern. For example, using the plus sign as a delimiter (<TT>m+abc+</TT>) does not help program readability. A casual reader might think that you intend to add two expressions
instead of matching them.
</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Caution</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
The <TT>?</TT> has a special meaning when used as a match pattern delimiter. It works like the <TT>/</TT> delimiter except that it matches only oNCe between calls to the <TT>reset()</TT> fuNCtion. This feature may be removed in future versions of Perl, so
avoid using it.
</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
The next few sections look at the matching, substitution, and
translation operators in more detail.
<H2><A NAME="TheMatchingOperatorm"><FONT SIZE=5 COLOR=#FF0000>
The Matching Operator (m//)</FONT></A></H2>
<P>
The matching operator (<TT>m//</TT>)
is used to find patterns in strings. One of its more common uses
is to look for a specific string inside a data file. For instaNCe,
you might look for all customers whose last name is "Johnson,"
or you might need a list of all names starting with the letter
<I>s</I>.
<P>
The matching operator only searches the <TT>$_</TT>
variable. This makes the match statement shorter because you don't
need to specify where to search. Here is a quick example:
<BLOCKQUOTE>
<PRE>
$_ = "AAA bbb AAA";
print "Found bbb\n" if m/bbb/;
</PRE>
</BLOCKQUOTE>
<P>
The print statement is executed only if the <TT>bbb
</TT>character sequeNCe is found in the <TT>$_</TT>
variable. In this particular case, <TT>bbb</TT>
will be found, so the program will display the following:
<BLOCKQUOTE>
<PRE>
Found bbb
</PRE>
</BLOCKQUOTE>
<P>
The matching operator allows you to use variable interpolation
in order to create the pattern. For example:
<BLOCKQUOTE>
<PRE>
$needToFind = "bbb";
$_ = "AAA bbb AAA";
print "Found bbb\n" if m/$needToFind/;
</PRE>
</BLOCKQUOTE>
<P>
Using the matching operator is so commonplace that Perl allows
you to leave off the <TT>m</TT> from
the matching operator as long as slashes are used as delimiters:
<BLOCKQUOTE>
<PRE>
$_ = "AAA bbb AAA";
print "Found bbb\n" if /bbb/;
</PRE>
</BLOCKQUOTE>
<P>
Using the matching operator to find a string inside a file is
very easy because the defaults are designed to facilitate this
activity. For example:
<BLOCKQUOTE>
<PRE>
$target = "M";
open(INPUT, "<findstr.dat");
while (<INPUT>) {
if (/$target/) {
print "Found $target on line $.";
}
}
close(INPUT);<BR>
</PRE>
</BLOCKQUOTE>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Note</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
The <TT>$.</TT> special variable keeps track of the record number. Every time the diamond operators read a line, this variable is iNCremented.
</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
This example reads every line in an input searching for the letter
<TT>M</TT>. When an <TT>M</TT>
is found, the print statement is executed. The print statement
prints the letter that is found and the line number it was found
on.
<H3><A NAME="TheMatchingOptions">
The Matching Options</A></H3>
<P>
The matching operator has several options that enhaNCe its utility.
The most useful option is probably the capability to ignore case
and to create an array of all matches in a string. Table 10.2
shows the options you can use with the matching operator.<BR>
<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -