📄 ch3.htm

📁 美国Macmillan出版社编写的Perl教程《Perl CGI Web Pages for WINNT》
💻 HTM
📖 第 1 页 / 共 4 页
字号:
For a list of special characters in regular expressions, see Appendix

A.

<P>

You are also not limited to using the / as a delimeter for a regular

expression. If you precede the regular expression by m, the next

character becomes the delimeter, so this

<BLOCKQUOTE>

<PRE>

if ($line =~/john/) {

</PRE>

</BLOCKQUOTE>

<P>

is equivalent to

<BLOCKQUOTE>

<PRE>

if ($line =~m#john#) {

</PRE>

</BLOCKQUOTE>

<P>

and

<BLOCKQUOTE>

<PRE>

if ($line =~m[john]) {

</PRE>

</BLOCKQUOTE>

<P>

The delimiter can be just about any character, but notice that

in the case of character pairs, like the square brackets ([]),

the end delimiter is the opposite mate to the beginning delimiter.

<P>

Okay, now we have a database full of names, and we can check them

against inputed data, and ignore the case. What next? The next

logical step seems to be to glean some more useful information

from this program. Let's ask the user for their last name and

favorite color as well, as shown in Listing 3.2.

<HR>

<BLOCKQUOTE>

<B>Listing 3.2 Asking for Additional User&nbsp;Information<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<PRE>

Asking for details

print &quot;What is your first name? &quot;;

$name=&lt;STDIN&gt;; # get the response 

# from the user

chop($name); # remove the newline

print &quot;What is your last name? &quot;;

$lastname=&lt;STDIN&gt;;

chop($lastname);

print &quot;What is your favorite color? &quot;;

$color=&lt;STDIN&gt;;

chop($color);

$newline=$name.':'.$lastname.':'.$color.&quot;\n&quot;; # make line

# delimited with colons

open (GUESTBOOK, &quot;&gt;&gt;guest.pl&quot;); # Open file for append

print GUESTBOOK &quot;$newline&quot;; 

# Append the field line to

# the guestbook file

print &quot;Thank you, $name!  Your name has been added to the Guestbook.\n&quot;;

close(GUESTBOOK);

</PRE>

</BLOCKQUOTE>

<HR>

<P>

There are a few things different now. We are asking for three

seperate pieces of data, and assigning each to a variable. Notice

that we are removing the newline character as soon as we get the

data. The next thing we do is format a string of data with the

three fields separated by a colon, and with a newline character

tacked on the end. We'll end up with something like this:

<BLOCKQUOTE>

<PRE>

John:Smith:magenta

</PRE>

</BLOCKQUOTE>

<P>

This will make it easy to add or retrieve the data we want at

a later time. The &quot;.&quot; operator in Perl is the append

operator. To form the string we want, we are appending a colon

to the end of $name, adding $lastname to the end of the resulting

string, appending another colon, adding $color, then finally appending

a newline. Confused yet? Don't worry: as with all things in Perl,

there is an easier way. That line is equivalent to

<BLOCKQUOTE>

<PRE>

$newline=join(':',$name, $lastname, $color);

$newline.=&quot;\n&quot;;

</PRE>

</BLOCKQUOTE>

<P>

The join() function joins the variables or strings listed into

one string, separating the fields with the specified delimiter

(a : in this case). The .= operator on the following line appends

the newline character to the end of the $newline variable. This

is equivalent to: $newline=$newline.&quot;\n&quot;;

<P>

Now that we have this information, we'll want to check it. We

do this using the split() command (Listing 3.3), the opposite

of the join() command. Surprise, surprise.

<HR>

<BLOCKQUOTE>

<B>Listing 3.3 The Split Command<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<PRE>

print &quot;What is your first name? &quot;;

$name=&lt;STDIN&gt;; # get the response

# from the user

chop($name); # remove the newline

print &quot;What is your last name? &quot;;

$lastname=&lt;STDIN&gt;;

chop($lastname);

print &quot;What is your favorite color? &quot;;

$color=&lt;STDIN&gt;;

chop($color);

$newline=$name.':'.$lastname.':'.$color.&quot;\n&quot;; # make

# line delimited with colons

open (GUESTBOOK, &quot;guest.pl&quot;);

while ($line=&lt;GUESTBOOK&gt;) {

($gbname, $gblastname, $gbcolor)=split(':', $line);

if ($gbname=~/^$name/i) {

print &quot;You are already in the guestbook, $name!\n&quot;;

close (GUESTBOOK);

exit;

($gbname, $gblastname, $gbcolor)=split(':', $line);}

}

close (GUESTBOOK);

open (GUESTBOOK, &quot;&gt;&gt;guest.pl&quot;);

# open file for appending

print GUESTBOOK &quot;$newline&quot;;

# append the field line

# to the guestbook file

print &quot;Thank you, $name! Your name has been added to the Guestbook.\n&quot;;

close(GUESTBOOK);

</PRE>

</BLOCKQUOTE>

<HR>

<P>

Here we assign $gbname, $gblastname, and $gbcolor to the first

three items retrieved by the split command. We do this by putting

brackets around the variable names to simulate an array. We could

have just as easily assigned all the variables to an array like

this:

<BLOCKQUOTE>

<PRE>

@data=split(':', $line);

</PRE>

</BLOCKQUOTE>

<P>

and referenced the first three elements in the array as

<BLOCKQUOTE>

<PRE>

$data[&Oslash;], 

$data[1], and $data[2]

</PRE>

</BLOCKQUOTE>

<P>

So now that we have some data to play with, let's do some more

tests just for practice. Our program is getting a little long,

so in Listing 3.4 we'll only deal with the part that has changed.

<HR>

<BLOCKQUOTE>

<B>Listing 3.4 Testing the Data<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<PRE>

while ($line=&lt;GUESTBOOK&gt;) {

($gbname, $gblastname, $gbcolor)=split(':', $line);

        if (($gbname=~/^$name/i) &amp;&amp; ($gblastname=~/^$lastname/i)) {

         print &quot;You are already in the guestbook, $name!\n&quot;;

                close (GUESTBOOK);

                if ($gbcolor!~/$color/i) {

                  print &quot;You have a different favorite color!\n&quot;;

                      print &quot;Your old favorite color is: $gbcolor\n&quot;;

                        print &quot;Your new favorite color is: $color\n&quot;;

                        print &quot;Would you like to change it? &quot;;

                        $input=&lt;STDIN&gt;;

                        if ($input=~/^y/i) {

                          open(GUESTBOOK, &quot;guest.pl&quot;);

                            undef $/;

                             $body=&lt;GUESTBOOK&gt;;

                            $/=&quot;\n&quot;;

                            close(GUESTBOOK);

                            $body=~s/$line/$newline/;

                             open(GUESTBOOK, &quot;&gt;guest.pl&quot;);

                            print GUESTBOOK $body;

                           close(GUESTBOOK);

                             exit;

                        }

                  else {

                       exit;

                  }

            }

        exit;

     }

}

</PRE>

</BLOCKQUOTE>

<HR>

<P>

What's happening here? The first thing you may notice is that

we are doing an extra test at the first if statement. Since people

may tend to have similar first names, we are now testing that

the first <I>and</I> last name match.

<P>

The &amp;&amp; means that the first <I>and</I> second expressions

must be true in order for the if statement to be true. Alternatively,

|| means the first<I> or</I> the second expression must be true.

We next check to see if the color is the same. If it is, we just

exit. If it isn't, we alert the user, and ask them if they want

to change their color choice. We get a line from STDIN as usual,

and check to see if it starts with y or Y. If it doesn't, we exit.

If it does, that's when the fun starts.

<P>

We open guest.pl to read, as normal, but then we undef (undefine)

a system variable $/. This variable is the one used to determine

where lines end when you read them in from a file. It is normally

set to &quot;\n&quot;, so you get one line per line in the file.

By undefining it, we will now read the entire file (newlines and

all) into the variable $body. Once we have the whole thing, we

can replace the line with the old color ($line) with the line

with the new color ($newline). This is done by using the =~ operator

again, but notice that there is an s in front of the first /.

This means we are doing a substitution. The expression between

the first two /'s will be replaced by the expression between the

second two /'s, if it exists. As with all regular expressions,

you can use any delimiter you like, so

<BLOCKQUOTE>

<PRE>

$body =~ s #$line#newline#;

</PRE>

</BLOCKQUOTE>

<P>

would have been equivalent. Also, the i directive to ignore case

that comes after the last slash can apply here as well. Once we

have replaced the line we want, we open the guestbook again for

writing, and just write the whole file out with a print, and exit.

Remember to redefine $/ before you do any more file operations

to make sure that you don't mess up your future operations. But

back to our regular expressions.

<P>

It's probably a good idea to go at each of the seperate elements

covered with this script so there is no doubt as to what regular

expression operators are, and how they work.

<P>

Unlike the grep command, which looks at all the lines in the designated

file, this script only looks at one, the line which is in $_.

To include all the lines of a file we need to do this:

<BLOCKQUOTE>

<PRE>

while (&lt;&gt;)	{

if (/crypt/)	{

print &quot;$_&quot;;

}

}

</PRE>

</BLOCKQUOTE>

<P>

This loop continues until all lines are checked.

<P>

Now say you are checking your own scripts for crypt, and you realize

that your typing was a little sloppy in places. Sometimes you

slipped and spelled crypt with two p's, as cryppt. You can amend

your searching script to

<BLOCKQUOTE>

<PRE>

while (&lt;&gt;)	{

if (/cryp*t/)	{

print &quot;$_&quot;;

}

}

</PRE>

</BLOCKQUOTE>

<P>

The asterisk will allow a search and return of crypt, as well

as any spellings of crypt with two or more p's.

<P>

Once you have matched what you are looking for you might want

to replace it with something. To do this, we can use the substitute

operator.

<P>

You might use this operator if you want to replace one string

with another string. The substitute operator has a short-form,

s, which looks like this in a statement:

<BLOCKQUOTE>

<PRE>

s/crypt/tomb/;

</PRE>

</BLOCKQUOTE>

<P>

The substitute operator will replace crypt with the replacement

string tomb.

<P>

Regular expressions, as you can now see, are patterns. These patterns

can be as big or as small as you need, each with its own peculiarities.

Let's look at some more.

<H3><A NAME="RegularExpressionsasPatterns">

Regular Expressions as Patterns</A></H3>

<P>

There are various patterns that regular expressions work with:

single-character, grouping, and anchoring. Each of these has its

own little characteristics that make it work.

<H4>The Single-Character Pattern</H4>

<P>

The most common pattern-matching character is a single character

used to match itself. This would be using a letter as a regular

expression to match itself; in other words, regular expression

&quot;a&quot; looking for character &quot;a&quot; in a string.

<P>

The second most common pattern-matching character is a period

or dot, &quot;.&quot; This character will match any single character

with the exception of the newline operator, /n.

<P>

Moving into larger areas, a character class pattern-matching can

occur when a set of square brackets are used to enclose the regular

expression in&nbsp;question:

<BLOCKQUOTE>

<PRE>

/[crypt]/

</PRE>

</BLOCKQUOTE>

<P>

When a character class is used, a match will occur if any character

in the regular expression is found in the strings being tested.

It is important to note that regular expressions try to be as

accurate as possible, without limiting their scope, so they are

very case-sensitive.

<P>

One the other hand, only one of the characters in the correct

corresponding postion has to be in the regular expression for

a match to occur.

<P>

You can designate a range with this operator by inserting a dash

between the values. For example,

<BLOCKQUOTE>

<PRE>

/[&Oslash;-5]/

</PRE>

</BLOCKQUOTE>

<P>

is the same as

<BLOCKQUOTE>

<PRE>

/[&Oslash;12345]/

</PRE>

</BLOCKQUOTE>

<P>

which can be very powerful if you consider that

<BLOCKQUOTE>

<PRE>

/[a-zA-Z&Oslash;-9]/

</PRE>

</BLOCKQUOTE>

<P>

can search for all letters of the alphabet-both upper- and lowercase-as

well as all numbers. Not bad for 15 little keystrokes.

<P>

If you want to use the character class in the opposite way-for

example, to return those matches which are not in the regular

expression-then place a caret (^) after the left bracket, like

so:

<BLOCKQUOTE>

<PRE>

/[^&Oslash;-5]/

</PRE>

</BLOCKQUOTE>

<P>

This expression matches every single character which is not in

the range from 0 to 5. There are some common character classes

in Perl which are listed in Table 3.3.<BR>

<P>

<CENTER><B>Table 3.3 Character Class Contractions</B></CENTER>

<P>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80% CELLPADDING=3>

<TR VALIGN=TOP><TD><B>Construct</B></TD><TD><B>Equivalent Class</B>

</TD><TD><B>Negated Construct</B></TD><TD><B>Equivalent Negated Class</B>

</TD></TR>

<TR VALIGN=TOP><TD>\d (digits)</TD><TD>[0-9]</TD><TD>\D (anything but digits)

</TD><TD>[^0-9]</TD></TR>

<TR VALIGN=TOP><TD>\w (words)</TD><TD>[a-zA-Z0-9]</TD>

<TD>\W (anything but words)</TD><TD>[^a-zA-Z0-9]

</TD></TR>

<TR VALIGN=TOP><TD>\s (space)</TD><TD>[ \r\t\n\f]</TD>

<TD>\S (anything but space)</TD><TD>[^ \r\t\n\f]

</TD></TR>

</TABLE></CENTER>

<P>

<P>

Before we get into any more of the guts of Perl, let's apply what

we've already exposed ourselves to. We should also start to make

note of the little differences between UNIX Perl and Perl for

Windows NT, or WinPerl.

<P>

One big difference is that while most Perl scripts you will find

contain the first line

<BLOCKQUOTE>

<PRE>

#! user/local/bin/perl

</PRE>

</BLOCKQUOTE>

<P>

or something similar, this is unnecessary with WinPerl. This line

in UNIX lets the operating system know where to find the Perl

interpreter. With Windows NT, you need to associate the .pl file

extension with perl.exe for your script to function. Probably

associating the .cgi extension is a good idea, too, since most

of these files are also written in Perl.

<H4>The Grouping Pattern</H4>

<P>

There are several grouping patterns to understand: sequence, multipliers,

parentheses, and alternation. By using grouping patterns you can

give your script the ability to put conditions on your regular

expression matching. For example, look for six of this, or look

for two or more of these.

<H4>The Sequence Grouping Pattern</H4>

<P>

We're already familiar with this: It's where a regular expression

matches a string exactly, like

<BLOCKQUOTE>

<PRE>

/crypt/

</PRE>

</BLOCKQUOTE>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -