📄 ch3.htm
字号:
For a list of special characters in regular expressions, see Appendix
A.
<P>
You are also not limited to using the / as a delimeter for a regular
expression. If you precede the regular expression by m, the next
character becomes the delimeter, so this
<BLOCKQUOTE>
<PRE>
if ($line =~/john/) {
</PRE>
</BLOCKQUOTE>
<P>
is equivalent to
<BLOCKQUOTE>
<PRE>
if ($line =~m#john#) {
</PRE>
</BLOCKQUOTE>
<P>
and
<BLOCKQUOTE>
<PRE>
if ($line =~m[john]) {
</PRE>
</BLOCKQUOTE>
<P>
The delimiter can be just about any character, but notice that
in the case of character pairs, like the square brackets ([]),
the end delimiter is the opposite mate to the beginning delimiter.
<P>
Okay, now we have a database full of names, and we can check them
against inputed data, and ignore the case. What next? The next
logical step seems to be to glean some more useful information
from this program. Let's ask the user for their last name and
favorite color as well, as shown in Listing 3.2.
<HR>
<BLOCKQUOTE>
<B>Listing 3.2 Asking for Additional User Information<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
Asking for details
print "What is your first name? ";
$name=<STDIN>; # get the response
# from the user
chop($name); # remove the newline
print "What is your last name? ";
$lastname=<STDIN>;
chop($lastname);
print "What is your favorite color? ";
$color=<STDIN>;
chop($color);
$newline=$name.':'.$lastname.':'.$color."\n"; # make line
# delimited with colons
open (GUESTBOOK, ">>guest.pl"); # Open file for append
print GUESTBOOK "$newline";
# Append the field line to
# the guestbook file
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
</PRE>
</BLOCKQUOTE>
<HR>
<P>
There are a few things different now. We are asking for three
seperate pieces of data, and assigning each to a variable. Notice
that we are removing the newline character as soon as we get the
data. The next thing we do is format a string of data with the
three fields separated by a colon, and with a newline character
tacked on the end. We'll end up with something like this:
<BLOCKQUOTE>
<PRE>
John:Smith:magenta
</PRE>
</BLOCKQUOTE>
<P>
This will make it easy to add or retrieve the data we want at
a later time. The "." operator in Perl is the append
operator. To form the string we want, we are appending a colon
to the end of $name, adding $lastname to the end of the resulting
string, appending another colon, adding $color, then finally appending
a newline. Confused yet? Don't worry: as with all things in Perl,
there is an easier way. That line is equivalent to
<BLOCKQUOTE>
<PRE>
$newline=join(':',$name, $lastname, $color);
$newline.="\n";
</PRE>
</BLOCKQUOTE>
<P>
The join() function joins the variables or strings listed into
one string, separating the fields with the specified delimiter
(a : in this case). The .= operator on the following line appends
the newline character to the end of the $newline variable. This
is equivalent to: $newline=$newline."\n";
<P>
Now that we have this information, we'll want to check it. We
do this using the split() command (Listing 3.3), the opposite
of the join() command. Surprise, surprise.
<HR>
<BLOCKQUOTE>
<B>Listing 3.3 The Split Command<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
print "What is your first name? ";
$name=<STDIN>; # get the response
# from the user
chop($name); # remove the newline
print "What is your last name? ";
$lastname=<STDIN>;
chop($lastname);
print "What is your favorite color? ";
$color=<STDIN>;
chop($color);
$newline=$name.':'.$lastname.':'.$color."\n"; # make
# line delimited with colons
open (GUESTBOOK, "guest.pl");
while ($line=<GUESTBOOK>) {
($gbname, $gblastname, $gbcolor)=split(':', $line);
if ($gbname=~/^$name/i) {
print "You are already in the guestbook, $name!\n";
close (GUESTBOOK);
exit;
($gbname, $gblastname, $gbcolor)=split(':', $line);}
}
close (GUESTBOOK);
open (GUESTBOOK, ">>guest.pl");
# open file for appending
print GUESTBOOK "$newline";
# append the field line
# to the guestbook file
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
</PRE>
</BLOCKQUOTE>
<HR>
<P>
Here we assign $gbname, $gblastname, and $gbcolor to the first
three items retrieved by the split command. We do this by putting
brackets around the variable names to simulate an array. We could
have just as easily assigned all the variables to an array like
this:
<BLOCKQUOTE>
<PRE>
@data=split(':', $line);
</PRE>
</BLOCKQUOTE>
<P>
and referenced the first three elements in the array as
<BLOCKQUOTE>
<PRE>
$data[Ø],
$data[1], and $data[2]
</PRE>
</BLOCKQUOTE>
<P>
So now that we have some data to play with, let's do some more
tests just for practice. Our program is getting a little long,
so in Listing 3.4 we'll only deal with the part that has changed.
<HR>
<BLOCKQUOTE>
<B>Listing 3.4 Testing the Data<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
while ($line=<GUESTBOOK>) {
($gbname, $gblastname, $gbcolor)=split(':', $line);
if (($gbname=~/^$name/i) && ($gblastname=~/^$lastname/i)) {
print "You are already in the guestbook, $name!\n";
close (GUESTBOOK);
if ($gbcolor!~/$color/i) {
print "You have a different favorite color!\n";
print "Your old favorite color is: $gbcolor\n";
print "Your new favorite color is: $color\n";
print "Would you like to change it? ";
$input=<STDIN>;
if ($input=~/^y/i) {
open(GUESTBOOK, "guest.pl");
undef $/;
$body=<GUESTBOOK>;
$/="\n";
close(GUESTBOOK);
$body=~s/$line/$newline/;
open(GUESTBOOK, ">guest.pl");
print GUESTBOOK $body;
close(GUESTBOOK);
exit;
}
else {
exit;
}
}
exit;
}
}
</PRE>
</BLOCKQUOTE>
<HR>
<P>
What's happening here? The first thing you may notice is that
we are doing an extra test at the first if statement. Since people
may tend to have similar first names, we are now testing that
the first <I>and</I> last name match.
<P>
The && means that the first <I>and</I> second expressions
must be true in order for the if statement to be true. Alternatively,
|| means the first<I> or</I> the second expression must be true.
We next check to see if the color is the same. If it is, we just
exit. If it isn't, we alert the user, and ask them if they want
to change their color choice. We get a line from STDIN as usual,
and check to see if it starts with y or Y. If it doesn't, we exit.
If it does, that's when the fun starts.
<P>
We open guest.pl to read, as normal, but then we undef (undefine)
a system variable $/. This variable is the one used to determine
where lines end when you read them in from a file. It is normally
set to "\n", so you get one line per line in the file.
By undefining it, we will now read the entire file (newlines and
all) into the variable $body. Once we have the whole thing, we
can replace the line with the old color ($line) with the line
with the new color ($newline). This is done by using the =~ operator
again, but notice that there is an s in front of the first /.
This means we are doing a substitution. The expression between
the first two /'s will be replaced by the expression between the
second two /'s, if it exists. As with all regular expressions,
you can use any delimiter you like, so
<BLOCKQUOTE>
<PRE>
$body =~ s #$line#newline#;
</PRE>
</BLOCKQUOTE>
<P>
would have been equivalent. Also, the i directive to ignore case
that comes after the last slash can apply here as well. Once we
have replaced the line we want, we open the guestbook again for
writing, and just write the whole file out with a print, and exit.
Remember to redefine $/ before you do any more file operations
to make sure that you don't mess up your future operations. But
back to our regular expressions.
<P>
It's probably a good idea to go at each of the seperate elements
covered with this script so there is no doubt as to what regular
expression operators are, and how they work.
<P>
Unlike the grep command, which looks at all the lines in the designated
file, this script only looks at one, the line which is in $_.
To include all the lines of a file we need to do this:
<BLOCKQUOTE>
<PRE>
while (<>) {
if (/crypt/) {
print "$_";
}
}
</PRE>
</BLOCKQUOTE>
<P>
This loop continues until all lines are checked.
<P>
Now say you are checking your own scripts for crypt, and you realize
that your typing was a little sloppy in places. Sometimes you
slipped and spelled crypt with two p's, as cryppt. You can amend
your searching script to
<BLOCKQUOTE>
<PRE>
while (<>) {
if (/cryp*t/) {
print "$_";
}
}
</PRE>
</BLOCKQUOTE>
<P>
The asterisk will allow a search and return of crypt, as well
as any spellings of crypt with two or more p's.
<P>
Once you have matched what you are looking for you might want
to replace it with something. To do this, we can use the substitute
operator.
<P>
You might use this operator if you want to replace one string
with another string. The substitute operator has a short-form,
s, which looks like this in a statement:
<BLOCKQUOTE>
<PRE>
s/crypt/tomb/;
</PRE>
</BLOCKQUOTE>
<P>
The substitute operator will replace crypt with the replacement
string tomb.
<P>
Regular expressions, as you can now see, are patterns. These patterns
can be as big or as small as you need, each with its own peculiarities.
Let's look at some more.
<H3><A NAME="RegularExpressionsasPatterns">
Regular Expressions as Patterns</A></H3>
<P>
There are various patterns that regular expressions work with:
single-character, grouping, and anchoring. Each of these has its
own little characteristics that make it work.
<H4>The Single-Character Pattern</H4>
<P>
The most common pattern-matching character is a single character
used to match itself. This would be using a letter as a regular
expression to match itself; in other words, regular expression
"a" looking for character "a" in a string.
<P>
The second most common pattern-matching character is a period
or dot, "." This character will match any single character
with the exception of the newline operator, /n.
<P>
Moving into larger areas, a character class pattern-matching can
occur when a set of square brackets are used to enclose the regular
expression in question:
<BLOCKQUOTE>
<PRE>
/[crypt]/
</PRE>
</BLOCKQUOTE>
<P>
When a character class is used, a match will occur if any character
in the regular expression is found in the strings being tested.
It is important to note that regular expressions try to be as
accurate as possible, without limiting their scope, so they are
very case-sensitive.
<P>
One the other hand, only one of the characters in the correct
corresponding postion has to be in the regular expression for
a match to occur.
<P>
You can designate a range with this operator by inserting a dash
between the values. For example,
<BLOCKQUOTE>
<PRE>
/[Ø-5]/
</PRE>
</BLOCKQUOTE>
<P>
is the same as
<BLOCKQUOTE>
<PRE>
/[Ø12345]/
</PRE>
</BLOCKQUOTE>
<P>
which can be very powerful if you consider that
<BLOCKQUOTE>
<PRE>
/[a-zA-ZØ-9]/
</PRE>
</BLOCKQUOTE>
<P>
can search for all letters of the alphabet-both upper- and lowercase-as
well as all numbers. Not bad for 15 little keystrokes.
<P>
If you want to use the character class in the opposite way-for
example, to return those matches which are not in the regular
expression-then place a caret (^) after the left bracket, like
so:
<BLOCKQUOTE>
<PRE>
/[^Ø-5]/
</PRE>
</BLOCKQUOTE>
<P>
This expression matches every single character which is not in
the range from 0 to 5. There are some common character classes
in Perl which are listed in Table 3.3.<BR>
<P>
<CENTER><B>Table 3.3 Character Class Contractions</B></CENTER>
<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80% CELLPADDING=3>
<TR VALIGN=TOP><TD><B>Construct</B></TD><TD><B>Equivalent Class</B>
</TD><TD><B>Negated Construct</B></TD><TD><B>Equivalent Negated Class</B>
</TD></TR>
<TR VALIGN=TOP><TD>\d (digits)</TD><TD>[0-9]</TD><TD>\D (anything but digits)
</TD><TD>[^0-9]</TD></TR>
<TR VALIGN=TOP><TD>\w (words)</TD><TD>[a-zA-Z0-9]</TD>
<TD>\W (anything but words)</TD><TD>[^a-zA-Z0-9]
</TD></TR>
<TR VALIGN=TOP><TD>\s (space)</TD><TD>[ \r\t\n\f]</TD>
<TD>\S (anything but space)</TD><TD>[^ \r\t\n\f]
</TD></TR>
</TABLE></CENTER>
<P>
<P>
Before we get into any more of the guts of Perl, let's apply what
we've already exposed ourselves to. We should also start to make
note of the little differences between UNIX Perl and Perl for
Windows NT, or WinPerl.
<P>
One big difference is that while most Perl scripts you will find
contain the first line
<BLOCKQUOTE>
<PRE>
#! user/local/bin/perl
</PRE>
</BLOCKQUOTE>
<P>
or something similar, this is unnecessary with WinPerl. This line
in UNIX lets the operating system know where to find the Perl
interpreter. With Windows NT, you need to associate the .pl file
extension with perl.exe for your script to function. Probably
associating the .cgi extension is a good idea, too, since most
of these files are also written in Perl.
<H4>The Grouping Pattern</H4>
<P>
There are several grouping patterns to understand: sequence, multipliers,
parentheses, and alternation. By using grouping patterns you can
give your script the ability to put conditions on your regular
expression matching. For example, look for six of this, or look
for two or more of these.
<H4>The Sequence Grouping Pattern</H4>
<P>
We're already familiar with this: It's where a regular expression
matches a string exactly, like
<BLOCKQUOTE>
<PRE>
/crypt/
</PRE>
</BLOCKQUOTE>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -