📄 ch3.htm

📁 美国Macmillan出版社编写的Perl教程《Perl CGI Web Pages for WINNT》
💻 HTM
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
<P>

where the regular expression looks for the same sequence of the

characters: c r y p t.

<P>

<B>The Multipliers Grouping Pattern&nbsp;&nbsp;</B>We already

met one of these with the asterisk. The asterisk designates a

&quot;zero or more&quot; match with the previous character. The

&quot;+&quot; symbol is used to designate the return of matches

containing one or more of the previous character. To indicate

a match of &quot;zero or one&quot; of the previous character,

you would use the question mark, &quot;?.&quot; Each of these

grouping patterns will choose to match the larger string of those

strings it finds.

<P>

If you want to stipulate how many characters these grouping patterns

are to match, you can use a general multiplier, whose format is

<BLOCKQUOTE>

<PRE>

/a{2,4}/

</PRE>

</BLOCKQUOTE>

<P>

where a is the regular expression we are trying to match, and

2 and 4 are the range of a's which will satisfy our string match,

meaning that a match will be found for strings &quot;aa,&quot;

&quot;aaa,&quot; and &quot;aaaa,&quot; but not for strings &quot;a&quot;

or &quot;aaaaa.&quot;

<P>

When the general modifier has the second number absent, as with

<BLOCKQUOTE>

<PRE>

/a{3,}/

</PRE>

</BLOCKQUOTE>

<P>

it tells the match to look for three or more of the the letter

a. If the comma is absent, as with

<BLOCKQUOTE>

<PRE>

/a{3}/

</PRE>

</BLOCKQUOTE>

<P>

it tells the match to find exactly three a's. To look for three

or fewer a's, a zero is used in the range field, like this:

<BLOCKQUOTE>

<PRE>

/a{&Oslash;,3}/

</PRE>

</BLOCKQUOTE>

<P>

If you want to match the conditions of two characters you might

try

<BLOCKQUOTE>

<PRE>

/a.{3}x/

</PRE>

</BLOCKQUOTE>

<P>

which will make the regular expression look for any letter a separated

by three non-newline characters from the letter x.

<P>

<B>The Parentheses Grouping Pattern&nbsp;&nbsp;</B>You can use

a pair of open and close parentheses to enclose any part of an

expression match you need to have remembered. The part of the

expression that is held by the parentheses is the part of the

expression that will be kept in memory.

<P>

To use this remembered expression match, you use an integer and

a backslash, like this:

<BLOCKQUOTE>

<PRE>

/moose(.)kiss\1/;

</PRE>

</BLOCKQUOTE>

<P>

This regular expression will match any occurrence of the string

&quot;moose,&quot; followed by any two non-newline characters,

followed by the string &quot;kiss,&quot; followed by any one non-newline

character. The regular expression will remember which single non-newline

characters it matched with &quot;moose&quot; and look for the

same with &quot;kiss.&quot; For example,

<BLOCKQUOTE>

<PRE>

mooseqkissq

</PRE>

</BLOCKQUOTE>

<P>

is a match, but

<BLOCKQUOTE>

<PRE>

mooseqkissw

</PRE>

</BLOCKQUOTE>

<P>

is not. This differs from the regular expression

<BLOCKQUOTE>

<PRE>

/moose.kiss./;

</PRE>

</BLOCKQUOTE>

<P>

which will match any two non-newline characters, whether they

are the same or not. The &quot;1&quot; between the slashes relates

to what's in the parentheses. If there is more than one set of

parentheses, you can use the number between the slashes to indicate

the one you want remembered, starting from left to right. An example

might look like this:

<BLOCKQUOTE>

<PRE>

/a(.)p(.)e\1s/;

</PRE>

</BLOCKQUOTE>

<P>

The first character is &quot;a,&quot; followed by the #1 non-newline

character, followed by &quot;p,&quot; followed by the #2 newline

character, followed by &quot;e,&quot; followed by whatever the

#1 non-newline character is, followed by &quot;s.&quot; This will

match

<BLOCKQUOTE>

<PRE>

aqpdeqs

</PRE>

</BLOCKQUOTE>

<P>

where the different non-newline characters only have to match

their designation, and not each other. To add the ability to match

more than a single character with the referenced part, just add

an asterisk to the expression, as

<BLOCKQUOTE>

<PRE>

/a(.*)p\1e/;

</PRE>

</BLOCKQUOTE>

<P>

This expression would match &quot;a,&quot; followed by any number

of non-newline characters, followed by &quot;p,&quot; followed

by that same series of non-newline characters and then &quot;e.&quot;

A match might be

<BLOCKQUOTE>

<PRE>

aplanetpplanete

</PRE>

</BLOCKQUOTE>

<P>

but not

<BLOCKQUOTE>

<PRE>

aqqpqqqe

</PRE>

</BLOCKQUOTE>

<P>

You can also use the memory grouping pattern to replace portions

of a string. A string like

<BLOCKQUOTE>

<PRE>

$_ = &quot;a peas p corn e squash&quot;;

s/p(.*)e/b\1c/;

</PRE>

</BLOCKQUOTE>

<P>

creates the new string value of

<BLOCKQUOTE>

<PRE>

a peas b corn c squash 

</PRE>

</BLOCKQUOTE>

<P>

where the &quot;p&quot; and &quot;e&quot; were replaced with &quot;b&quot;

and &quot;c,&quot; but what was in between remains unchanged.

<P>

<B>The Alternation Grouping Pattern&nbsp;&nbsp;</B>The general

format for alternation is

<BLOCKQUOTE>

<PRE>

a|p|e

</PRE>

</BLOCKQUOTE>

<P>

where the regular expression is asked to match only one of the

designated alternatives, &quot;a,&quot; &quot;p,&quot; or &quot;e.&quot;

You can apply alternation to more than one character, so

<BLOCKQUOTE>

<PRE>

ape|gorilla|monkey

</PRE>

</BLOCKQUOTE>

<P>

would be equally valid.

<H4>The Anchoring Pattern</H4>

<P>

To anchor a pattern there are four special notations available.

You would want to anchor your regular expression search if you

don't want to turn up every instance of a string. For example,

when searching for the string &quot;the,&quot; you don't want

to also get &quot;then,&quot; &quot;there,&quot; &quot;their,&quot;

or &quot;them.&quot; To do this you might use the word boundry

anchor \b:

<BLOCKQUOTE>

<PRE>

/the\b/;

</PRE>

</BLOCKQUOTE>

<P>

so that only those strings ending with &quot;the&quot; are matched.

But this doesn't stop a string like &quot;absinthe&quot; from

being matched, so you can add a word boundary anchor to the front

of the regular expression

<BLOCKQUOTE>

<PRE>

/\bthe\b/;

</PRE>

</BLOCKQUOTE>

<P>

so that only the exact matches of &quot;the&quot; are returned.

<P>

If, on the other hand, you wanted to match only those instances

which included the string in the regular expression, and not the

string itself, you would use the \B anchor

<BLOCKQUOTE>

<PRE>

/the\B/;

</PRE>

</BLOCKQUOTE>

<P>

to return the matches &quot;thee,&quot; &quot;these,&quot; &quot;absinthe,&quot;

&quot;there,&quot; and &quot;then,&quot; but not &quot;the.&quot;

<P>

The next anchor, \^, is used to match the start of a string only

when it is in a place that makes sense to match, as with

<BLOCKQUOTE>

<PRE>

/\^the/;

</PRE>

</BLOCKQUOTE>

<P>

which matches only those strings which start with &quot;the.&quot;

<P>

The final anchor, \$, works in a similar way but on the end of

a string, so

<BLOCKQUOTE>

<PRE>

/the\$/;

</PRE>

</BLOCKQUOTE>

<P>

will match any occurrence of &quot;the&quot; which appears at

the end of a string.

<H4>Pattern Precedence</H4>

<P>

As with operators, both grouping and anchoring patterns have an

order of precedence to follow. Table 3.4 gives you a quick rundown.

<BR>

<P>

<CENTER><B>Table 3.4 Pattern Precedence from Highest to Lowest</B></CENTER>

<P>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=40% CELLPADDING=3>

<TR VALIGN=TOP><TD><B>Name</B></TD><TD><B>Representation</B>

</TD></TR>

<TR VALIGN=TOP><TD>Parentheses</TD><TD>()</TD></TR>

<TR VALIGN=TOP><TD>Mulipliers</TD><TD>+*?{a,b}</TD></TR>

<TR VALIGN=TOP><TD>Sequence and Anchoring</TD><TD>ape\b\B\^\$

</TD></TR>

<TR VALIGN=TOP><TD>Alternation</TD><TD>|</TD></TR>

</TABLE></CENTER>

<P>

<P>

Remember, if you use parentheses to clarify a regular expression

because it has the higest precedence, you will also be employing

its memory of that string. These examples should explain the differences

in matches caused by the use of parentheses.

<BLOCKQUOTE>

<PRE>

ape*

</PRE>

</BLOCKQUOTE>

<P>

will match ap, ape, apee, apeee, etc.

<P>

whereas

<BLOCKQUOTE>

<PRE>

(ape)

</PRE>

</BLOCKQUOTE>

<P>

will match &quot;&quot;, ape, apeape, apeapeape, etc.

<P>

and

<BLOCKQUOTE>

<PRE>

\^a|b

</PRE>

</BLOCKQUOTE>

<P>

will match &quot;a&quot; at the start of the line, or &quot;b&quot;

anywhere in the line. Yet

<BLOCKQUOTE>

<PRE>

\^(a|b)

</PRE>

</BLOCKQUOTE>

<P>

will match either &quot;a&quot; or &quot;b&quot; at the start

of the line, and

<BLOCKQUOTE>

<PRE>

a|pe|s

</PRE>

</BLOCKQUOTE>

<P>

matches &quot;a&quot; or &quot;pe&quot; or &quot;s.&quot; If you

apply parentheses

<BLOCKQUOTE>

<PRE>

(a|pe)(pe|s)

</PRE>

</BLOCKQUOTE>

<P>

you'll match ape, as, pepe, and pes. These parentheses can be

used to find related words like

<BLOCKQUOTE>

<PRE>

(soft|hard)wood

</PRE>

</BLOCKQUOTE>

<P>

where either instances of softwood or hardwood are returned as

matches.

<P>

A possible use for the matching operators might be a script that

looks for a common response to direct a response. You can use

the &quot;=~&quot; operator to do this. If you remember, this

operator places the object of the expression as the new value.

Say you have already filled $_ with a value you need later in

the script. Then you could use =~ to make a temporary change of

direction. The =~ operator acts like this:

<BLOCKQUOTE>

<PRE>

print &quot;Will you be needing anything else?&quot;;

if (&lt;STDIN&gt; =~ /^[Yy]/) { # which creates the 

# condition that if the input begins with a 'Y'

# or 'y' that the condition is found true, so

# we proceed to the next line 

print &quot;And what would that be?&quot;;

&lt;STDIN&gt;; 

print &quot;I'm sorry, that's just not possible.&quot;;

}

</PRE>

</BLOCKQUOTE>

<P>

where no matter what the user inputs, the response will be the

same.

<H3><A NAME="OtherMatchingOperatorTidbits">

Other Matching Operator Tidbits</A></H3>

<P>

There are some other ways to modify your regular expressions.

Perl uses the &quot;I&quot; symbol to tell a regular expression

to ignore case in matching. In the format

<BLOCKQUOTE>

<PRE>

/string_characters/i

</PRE>

</BLOCKQUOTE>

<P>

you could amend a line from our last example script from this:

<BLOCKQUOTE>

<PRE>

if (&lt;STDIN&gt; =~ /^[Yy]/)

</PRE>

</BLOCKQUOTE>

<P>

to this:

<BLOCKQUOTE>

<PRE>

if (&lt;STDIN&gt; =~ /^y/i)

</PRE>

</BLOCKQUOTE>

<P>

so that the case of the response is not a factor determining response.

<P>

If you need to use a regular expression to search through filepaths

you would need to include slashes in the expression, and in order

to do this, a slash has to be preceded by a backslash to appear

only as a character in the string

<BLOCKQUOTE>

<PRE>

/^\/usr\/bin\/perl/

</PRE>

</BLOCKQUOTE>

<P>

(and the regular expression starts to look like a divoted golf

course!)

<H2><A NAME="ChapterinReview"><FONT SIZE=5 COLOR=#FF0000>

Chapter in Review</FONT></A></H2>

<P>

In this chapter we started out discussing various Perl control

structures like the statement block used to define a specific

script action, and the different kinds of loops, like the if/unless

loop and the for/foreach loop. These loops can be used to have

Perl repeat an action as many times as necessary for the script's

operation.

<P>

We also covered associative arrays, demonstrating how they differ

from arrays by having not just a single value in each element,

but a key/value pair. Associative arrays are modified by different

operators-like the keys, values, each, and delete operators.

<P>

The chapter finished with defining regular expressions as a pattern

matching tool used by Perl. Now that you have a general understanding

of what regular expressions are, defining them between two slashes,

and how they match these definition patterns to script specified

data, you can start solving some more interesting tasks with Perl.

In the next chapter, we'll marry this guestbook script to a CGI

output for the user and look at how Perl interacts with HTML.

<HR>



<CENTER><P><A HREF="ch2.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/ch2.htm"><IMG SRC="PC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/PC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>

<A HREF="#CONTENTS"><IMG SRC="CC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/CC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>

<A HREF="contents.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/contents.htm"><IMG SRC="HB.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/HB.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>

<A HREF="ch4.htm" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/ch4.htm"><IMG SRC="NC.GIF" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/NC.GIF" BORDER=0 HEIGHT=88 WIDTH=140></A>

<HR WIDTH="100%"></P></CENTER>

</BODY>

</HTML>
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -