📄 ch3.htm
字号:
<P>
where the regular expression looks for the same sequence of the
characters: c r y p t.
<P>
<B>The Multipliers Grouping Pattern </B>We already
met one of these with the asterisk. The asterisk designates a
"zero or more" match with the previous character. The
"+" symbol is used to designate the return of matches
containing one or more of the previous character. To indicate
a match of "zero or one" of the previous character,
you would use the question mark, "?." Each of these
grouping patterns will choose to match the larger string of those
strings it finds.
<P>
If you want to stipulate how many characters these grouping patterns
are to match, you can use a general multiplier, whose format is
<BLOCKQUOTE>
<PRE>
/a{2,4}/
</PRE>
</BLOCKQUOTE>
<P>
where a is the regular expression we are trying to match, and
2 and 4 are the range of a's which will satisfy our string match,
meaning that a match will be found for strings "aa,"
"aaa," and "aaaa," but not for strings "a"
or "aaaaa."
<P>
When the general modifier has the second number absent, as with
<BLOCKQUOTE>
<PRE>
/a{3,}/
</PRE>
</BLOCKQUOTE>
<P>
it tells the match to look for three or more of the the letter
a. If the comma is absent, as with
<BLOCKQUOTE>
<PRE>
/a{3}/
</PRE>
</BLOCKQUOTE>
<P>
it tells the match to find exactly three a's. To look for three
or fewer a's, a zero is used in the range field, like this:
<BLOCKQUOTE>
<PRE>
/a{Ø,3}/
</PRE>
</BLOCKQUOTE>
<P>
If you want to match the conditions of two characters you might
try
<BLOCKQUOTE>
<PRE>
/a.{3}x/
</PRE>
</BLOCKQUOTE>
<P>
which will make the regular expression look for any letter a separated
by three non-newline characters from the letter x.
<P>
<B>The Parentheses Grouping Pattern </B>You can use
a pair of open and close parentheses to enclose any part of an
expression match you need to have remembered. The part of the
expression that is held by the parentheses is the part of the
expression that will be kept in memory.
<P>
To use this remembered expression match, you use an integer and
a backslash, like this:
<BLOCKQUOTE>
<PRE>
/moose(.)kiss\1/;
</PRE>
</BLOCKQUOTE>
<P>
This regular expression will match any occurrence of the string
"moose," followed by any two non-newline characters,
followed by the string "kiss," followed by any one non-newline
character. The regular expression will remember which single non-newline
characters it matched with "moose" and look for the
same with "kiss." For example,
<BLOCKQUOTE>
<PRE>
mooseqkissq
</PRE>
</BLOCKQUOTE>
<P>
is a match, but
<BLOCKQUOTE>
<PRE>
mooseqkissw
</PRE>
</BLOCKQUOTE>
<P>
is not. This differs from the regular expression
<BLOCKQUOTE>
<PRE>
/moose.kiss./;
</PRE>
</BLOCKQUOTE>
<P>
which will match any two non-newline characters, whether they
are the same or not. The "1" between the slashes relates
to what's in the parentheses. If there is more than one set of
parentheses, you can use the number between the slashes to indicate
the one you want remembered, starting from left to right. An example
might look like this:
<BLOCKQUOTE>
<PRE>
/a(.)p(.)e\1s/;
</PRE>
</BLOCKQUOTE>
<P>
The first character is "a," followed by the #1 non-newline
character, followed by "p," followed by the #2 newline
character, followed by "e," followed by whatever the
#1 non-newline character is, followed by "s." This will
match
<BLOCKQUOTE>
<PRE>
aqpdeqs
</PRE>
</BLOCKQUOTE>
<P>
where the different non-newline characters only have to match
their designation, and not each other. To add the ability to match
more than a single character with the referenced part, just add
an asterisk to the expression, as
<BLOCKQUOTE>
<PRE>
/a(.*)p\1e/;
</PRE>
</BLOCKQUOTE>
<P>
This expression would match "a," followed by any number
of non-newline characters, followed by "p," followed
by that same series of non-newline characters and then "e."
A match might be
<BLOCKQUOTE>
<PRE>
aplanetpplanete
</PRE>
</BLOCKQUOTE>
<P>
but not
<BLOCKQUOTE>
<PRE>
aqqpqqqe
</PRE>
</BLOCKQUOTE>
<P>
You can also use the memory grouping pattern to replace portions
of a string. A string like
<BLOCKQUOTE>
<PRE>
$_ = "a peas p corn e squash";
s/p(.*)e/b\1c/;
</PRE>
</BLOCKQUOTE>
<P>
creates the new string value of
<BLOCKQUOTE>
<PRE>
a peas b corn c squash
</PRE>
</BLOCKQUOTE>
<P>
where the "p" and "e" were replaced with "b"
and "c," but what was in between remains unchanged.
<P>
<B>The Alternation Grouping Pattern </B>The general
format for alternation is
<BLOCKQUOTE>
<PRE>
a|p|e
</PRE>
</BLOCKQUOTE>
<P>
where the regular expression is asked to match only one of the
designated alternatives, "a," "p," or "e."
You can apply alternation to more than one character, so
<BLOCKQUOTE>
<PRE>
ape|gorilla|monkey
</PRE>
</BLOCKQUOTE>
<P>
would be equally valid.
<H4>The Anchoring Pattern</H4>
<P>
To anchor a pattern there are four special notations available.
You would want to anchor your regular expression search if you
don't want to turn up every instance of a string. For example,
when searching for the string "the," you don't want
to also get "then," "there," "their,"
or "them." To do this you might use the word boundry
anchor \b:
<BLOCKQUOTE>
<PRE>
/the\b/;
</PRE>
</BLOCKQUOTE>
<P>
so that only those strings ending with "the" are matched.
But this doesn't stop a string like "absinthe" from
being matched, so you can add a word boundary anchor to the front
of the regular expression
<BLOCKQUOTE>
<PRE>
/\bthe\b/;
</PRE>
</BLOCKQUOTE>
<P>
so that only the exact matches of "the" are returned.
<P>
If, on the other hand, you wanted to match only those instances
which included the string in the regular expression, and not the
string itself, you would use the \B anchor
<BLOCKQUOTE>
<PRE>
/the\B/;
</PRE>
</BLOCKQUOTE>
<P>
to return the matches "thee," "these," "absinthe,"
"there," and "then," but not "the."
<P>
The next anchor, \^, is used to match the start of a string only
when it is in a place that makes sense to match, as with
<BLOCKQUOTE>
<PRE>
/\^the/;
</PRE>
</BLOCKQUOTE>
<P>
which matches only those strings which start with "the."
<P>
The final anchor, \$, works in a similar way but on the end of
a string, so
<BLOCKQUOTE>
<PRE>
/the\$/;
</PRE>
</BLOCKQUOTE>
<P>
will match any occurrence of "the" which appears at
the end of a string.
<H4>Pattern Precedence</H4>
<P>
As with operators, both grouping and anchoring patterns have an
order of precedence to follow. Table 3.4 gives you a quick rundown.
<BR>
<P>
<CENTER><B>Table 3.4 Pattern Precedence from Highest to Lowest</B></CENTER>
<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=40% CELLPADDING=3>
<TR VALIGN=TOP><TD><B>Name</B></TD><TD><B>Representation</B>
</TD></TR>
<TR VALIGN=TOP><TD>Parentheses</TD><TD>()</TD></TR>
<TR VALIGN=TOP><TD>Mulipliers</TD><TD>+*?{a,b}</TD></TR>
<TR VALIGN=TOP><TD>Sequence and Anchoring</TD><TD>ape\b\B\^\$
</TD></TR>
<TR VALIGN=TOP><TD>Alternation</TD><TD>|</TD></TR>
</TABLE></CENTER>
<P>
<P>
Remember, if you use parentheses to clarify a regular expression
because it has the higest precedence, you will also be employing
its memory of that string. These examples should explain the differences
in matches caused by the use of parentheses.
<BLOCKQUOTE>
<PRE>
ape*
</PRE>
</BLOCKQUOTE>
<P>
will match ap, ape, apee, apeee, etc.
<P>
whereas
<BLOCKQUOTE>
<PRE>
(ape)
</PRE>
</BLOCKQUOTE>
<P>
will match "", ape, apeape, apeapeape, etc.
<P>
and
<BLOCKQUOTE>
<PRE>
\^a|b
</PRE>
</BLOCKQUOTE>
<P>
will match "a" at the start of the line, or "b"
anywhere in the line. Yet
<BLOCKQUOTE>
<PRE>
\^(a|b)
</PRE>
</BLOCKQUOTE>
<P>
will match either "a" or "b" at the start
of the line, and
<BLOCKQUOTE>
<PRE>
a|pe|s
</PRE>
</BLOCKQUOTE>
<P>
matches "a" or "pe" or "s." If you
apply parentheses
<BLOCKQUOTE>
<PRE>
(a|pe)(pe|s)
</PRE>
</BLOCKQUOTE>
<P>
you'll match ape, as, pepe, and pes. These parentheses can be
used to find related words like
<BLOCKQUOTE>
<PRE>
(soft|hard)wood
</PRE>
</BLOCKQUOTE>
<P>
where either instances of softwood or hardwood are returned as
matches.
<P>
A possible use for the matching operators might be a script that
looks for a common response to direct a response. You can use
the "=~" operator to do this. If you remember, this
operator places the object of the expression as the new value.
Say you have already filled $_ with a value you need later in
the script. Then you could use =~ to make a temporary change of
direction. The =~ operator acts like this:
<BLOCKQUOTE>
<PRE>
print "Will you be needing anything else?";
if (<STDIN> =~ /^[Yy]/) { # which creates the
# condition that if the input begins with a 'Y'
# or 'y' that the condition is found true, so
# we proceed to the next line
print "And what would that be?";
<STDIN>;
print "I'm sorry, that's just not possible.";
}
</PRE>
</BLOCKQUOTE>
<P>
where no matter what the user inputs, the response will be the
same.
<H3><A NAME="OtherMatchingOperatorTidbits">
Other Matching Operator Tidbits</A></H3>
<P>
There are some other ways to modify your regular expressions.
Perl uses the "I" symbol to tell a regular expression
to ignore case in matching. In the format
<BLOCKQUOTE>
<PRE>
/string_characters/i
</PRE>
</BLOCKQUOTE>
<P>
you could amend a line from our last example script from this:
<BLOCKQUOTE>
<PRE>
if (<STDIN> =~ /^[Yy]/)
</PRE>
</BLOCKQUOTE>
<P>
to this:
<BLOCKQUOTE>
<PRE>
if (<STDIN> =~ /^y/i)
</PRE>
</BLOCKQUOTE>
<P>
so that the case of the response is not a factor determining response.
<P>
If you need to use a regular expression to search through filepaths
you would need to include slashes in the expression, and in order
to do this, a slash has to be preceded by a backslash to appear
only as a character in the string
<BLOCKQUOTE>
<PRE>
/^\/usr\/bin\/perl/
</PRE>
</BLOCKQUOTE>
<P>
(and the regular expression starts to look like a divoted golf
course!)
<H2><A NAME="ChapterinReview"><FONT SIZE=5 COLOR=#FF0000>
Chapter in Review</FONT></A></H2>
<P>
In this chapter we started out discussing various Perl control
structures like the statement block used to define a specific
script action, and the different kinds of loops, like the if/unless
loop and the for/foreach loop. These loops can be used to have
Perl repeat an action as many times as necessary for the script's
operation.
<P>
We also covered associative arrays, demonstrating how they differ
from arrays by having not just a single value in each element,
but a key/value pair. Associative arrays are modified by different
operators-like the keys, values, each, and delete operators.
<P>
The chapter finished with defining regular expressions as a pattern
matching tool used by Perl. Now that you have a general understanding
of what regular expressions are, defining them between two slashes,
and how they match these definition patterns to script specified
data, you can start solving some more interesting tasks with Perl.
In the next chapter, we'll marry this guestbook script to a CGI
output for the user and look at how Perl interacts with HTML.
<HR>
<CENTER><P><A HREF="ch2.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/ch2.htm"><IMG SRC="PC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/PC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>
<A HREF="#CONTENTS"><IMG SRC="CC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/CC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>
<A HREF="contents.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/contents.htm"><IMG SRC="HB.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/HB.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>
<A HREF="ch4.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/ch4.htm"><IMG SRC="NC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/NC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>
<HR WIDTH="100%"></P></CENTER>
</BODY>
</HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -