📄 ch10.htm
字号:
by zero or more white space characters. The <TT>{5}</TT>
quantifier is used to ensure that that combination of components
is present five times.
<P>
The <TT>*</TT> and <TT>+</TT>
quantifiers are greedy. They match as many characters as possible.
This may not always be the behavior that you need. You can create
non-greedy components by following the quantifier with a <TT>?</TT>.
<P>
Use the following file specification in order to look at the <TT>*</TT>
and <TT>+</TT> quantifiers more closely:
<BLOCKQUOTE>
<PRE>
$_ = '/user/Jackie/temp/names.dat';
</PRE>
</BLOCKQUOTE>
<P>
The regular expression <TT>.*</TT>
will match the entire file specification. This can be seen in
the following small program:
<BLOCKQUOTE>
<PRE>
$_ = '/user/Jackie/temp/names.dat';
m/.*/;
print $&;
</PRE>
</BLOCKQUOTE>
<P>
This program displays
<BLOCKQUOTE>
<PRE>
/user/Jackie/temp/names.dat
</PRE>
</BLOCKQUOTE>
<P>
You can see that the <TT>*</TT> quantifier
is greedy. It matched the whole string. If you add the ? modifier
to make the <TT>.*</TT> component
non-greedy, what do you think the program would display?
<BLOCKQUOTE>
<PRE>
$_ = '/user/Jackie/temp/names.dat';
m/.*?/;
print $&;
</PRE>
</BLOCKQUOTE>
<P>
This program displays nothing because the least amount of characters
that the <TT>*</TT> matches is zero.
If we change the <TT>*</TT> to a <TT>+</TT>,
then the program will display
<BLOCKQUOTE>
<PRE>
/
</PRE>
</BLOCKQUOTE>
<P>
Next, let's look at the coNCept of pattern memory, which lets
you keep bits of matched string around after the match is complete.
<H3><A NAME="ExamplePatternMemory">
Example: Pattern Memory</A></H3>
<P>
Matching arbitrary numbers of characters is fine, but without
the capability to find out what was matched, patterns would not
be very useful. Perl lets you eNClose pattern components inside
parentheses in order to store the string that matched the components
into pattern memory. You also might hear <I>pattern memory </I>referred
to as <I>pattern buffers</I>. This memory persists after the match
statement is finished executing so that you can assign the matched
values to other variables.
<P>
You saw a simple example of this earlier right after the component
descriptions. That example looked for the first word in a string
and stored it into the first buffer, <TT>$1</TT>.
The following small program
<BLOCKQUOTE>
<PRE>
$_ = "AAA BBB ccC";
m/(\w+)/;
print("$1\n");
</PRE>
</BLOCKQUOTE>
<P>
will display
<BLOCKQUOTE>
<PRE>
AAA
</PRE>
</BLOCKQUOTE>
<P>
You can use as many buffers as you need. Each time you add a set
of parentheses, another buffer is used. If you want to find all
the words in the string, you need to use the /g match option.
In order to find all the words, you can use a loop statement that
loops until the match operator returns false.
<BLOCKQUOTE>
<PRE>
$_ = "AAA BBB ccC";
while (m/(\w+)/g) {
print("$1\n");
}
</PRE>
</BLOCKQUOTE>
<P>
The program will display
<BLOCKQUOTE>
<PRE>
AAA
BBB
ccC
</PRE>
</BLOCKQUOTE>
<P>
If looping through the matches is not the right approach for your
needs, perhaps you need to create an array consisting of the matches.
<BLOCKQUOTE>
<PRE>
$_ = "AAA BBB ccC";
@matches = m/(\w+)/g;
print("@matches\n");
</PRE>
</BLOCKQUOTE>
<P>
The program will display
<BLOCKQUOTE>
<PRE>
AAA BBB ccC
</PRE>
</BLOCKQUOTE>
<P>
Perl also has a few special variables to help you know what matched
and what did not. These variables occasionally will save you from
having to add parentheses to find information.<BR>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=61><BLOCKQUOTE>
<CENTER><TT><B><FONT FACE="Courier">$+</FONT></B></TT></CENTER>
</BLOCKQUOTE>
</TD><TD WIDTH=529><BLOCKQUOTE>
This variable is assigned the value that the last bracket match matched.</BLOCKQUOTE>
</TD></TR>
<TR><TD WIDTH=61><BLOCKQUOTE>
<CENTER><TT><B><FONT FACE="Courier">$&</FONT></B></TT></CENTER>
</BLOCKQUOTE>
</TD><TD WIDTH=529><BLOCKQUOTE>
This variable is assigned the value of the entire matched string. If the match is not successful, then <TT>$&</TT> retains its value from the last successful match.
</BLOCKQUOTE>
</TD></TR>
<TR><TD WIDTH=61><BLOCKQUOTE>
<CENTER><TT><B><FONT FACE="Courier">$`</FONT></B></TT></CENTER>
</BLOCKQUOTE>
</TD><TD WIDTH=529><BLOCKQUOTE>
This variable is assigned everything in the searched string that is before the matched string.</BLOCKQUOTE>
</TD></TR>
<TR><TD WIDTH=61><BLOCKQUOTE>
<CENTER><TT><B><FONT FACE="Courier">$'</FONT></B></TT></CENTER>
</BLOCKQUOTE>
</TD><TD WIDTH=529><BLOCKQUOTE>
This variable is assigned everything in the search string that is after the matched string.</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Tip</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
If you need to save the value of the matched strings stored in the pattern memory, make sure to assign them to other variables. Pattern memory is local to the eNClosing block and lasts only until another match is done.</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<H3><A NAME="ExamplePatternPrecedeNCe">
Example: Pattern PrecedeNCe</A></H3>
<P>
Pattern components have an order of precedeNCe just as operators
do. If you see the following pattern:
<BLOCKQUOTE>
<PRE>
m/a|b+/
</PRE>
</BLOCKQUOTE>
<P>
it's hard to tell if the pattern should be
<BLOCKQUOTE>
<PRE>
m/(a|b)+/ # match either the "a" character repeated one
# or more times or the "b" character repeated one
# or more times.
</PRE>
</BLOCKQUOTE>
<P>
or
<BLOCKQUOTE>
<PRE>
m/a|(b+)/ # match either the "a" character or the "b" character
# repeated one or more times.
</PRE>
</BLOCKQUOTE>
<P>
The order of precedeNCe shown in Table 10.7 is designed to solve
problems like this. By looking at the table, you can see that
quantifiers have a higher precedeNCe than alternation. Therefore,
the second interpretation is correct.<BR>
<P>
<CENTER><B>Table 10.7 The Pattern Component Order of
PrecedeNCe</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=145><CENTER><I>PrecedeNCe Level</I></CENTER></TD>
<TD WIDTH=204><I>Component</I></TD></TR>
<TR><TD WIDTH=145><CENTER>1</CENTER></TD><TD WIDTH=204>Parentheses
</TD></TR>
<TR><TD WIDTH=145><CENTER>2</CENTER></TD><TD WIDTH=204>Quantifiers
</TD></TR>
<TR><TD WIDTH=145><CENTER>3</CENTER></TD><TD WIDTH=204>SequeNCes and ANChors
</TD></TR>
<TR><TD WIDTH=145><CENTER>4</CENTER></TD><TD WIDTH=204>Alternation
</TD></TR>
</TABLE>
</CENTER>
<P>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Tip</B></TD></TR>
<TR><TD>
<BLOCKQUOTE>
You can use parentheses to affect the order in which components are evaluated because they have the highest precedeNCe. However, unless you use the extended syntax, you will be affecting the pattern memory.</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<H3><A NAME="ExampleExtensionSyntax">
Example: Extension Syntax</A></H3>
<P>
The regular expression extensions are a way to significantly add
to the power of patterns without adding a lot of meta-characters
to the proliferation that already exists. By using the basic (<TT>?...</TT>)
notation, the regular expression capabilities can be greatly extended.
<P>
At this time, Perl recognizes five extensions. These vary widely
in fuNCtionality-from adding comments to setting options. Table
10.8 lists the extensions and gives a short description of each.
<BR>
<P>
<CENTER><B>Table 10.8 Five Extension Components</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=127><CENTER><I>Extension</I></CENTER></TD><TD WIDTH=463><I>Description</I>
</TD></TR>
<TR><TD WIDTH=127><CENTER>(?# TEXT)</CENTER></TD><TD WIDTH=463>This extension lets you add comments to your regular expression. The TEXT value is ignored.
</TD></TR>
<TR><TD WIDTH=127><CENTER>(?:...)</CENTER></TD><TD WIDTH=463>This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used.
</TD></TR>
<TR><TD WIDTH=127><CENTER>(?=...)</CENTER></TD><TD WIDTH=463>This extension lets you match values without iNCluding them in the <TT>$&</TT> variable.
</TD></TR>
<TR><TD WIDTH=127><CENTER>(?!...)</CENTER></TD><TD WIDTH=463>This extension lets you specify what should not follow your pattern. For instaNCe, <TT>/blue(?!bird)/</TT> means that <TT>"bluebox"</TT> and <TT>"bluesy"</TT> will be matched
but not <TT>"bluebird"</TT>.
</TD></TR>
<TR><TD WIDTH=127><CENTER>(?sxi)</CENTER></TD><TD WIDTH=463>This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable
interpolation to do the matching.
</TD></TR>
</TABLE>
</CENTER>
<P>
<P>
By far the most useful feature of extended mode, in my opinion,
is the ability to add comments directly inside your patterns.
For example, would you rather a see a pattern that looks like
this:
<BLOCKQUOTE>
<PRE>
# Match a string with two words. $1 will be the
# first word. $2 will be the second word.
m/^\s+(\w+)\W+(\w+)\s+$/;
</PRE>
</BLOCKQUOTE>
<P>
or one that looks like this:
<BLOCKQUOTE>
<PRE>
m/
(?# This pattern will match any string with two)
(?# and only two words in it. The matched words)
(?# will be available in $1 and $2 if the match)
(?# is successful.)
^ (?# ANChor this match to the beginning)
(?# of the string)
\s* (?# skip over any whitespace characters)
(?# use the * because there may be none)
(\w+) (?# Match the first word, we know it's)
(?# the first word because of the aNChor)
(?# above. Place the matched word into)
(?# pattern memory.)
\W+ (?# Match at least one non-word)
(?# cha
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -