📄 ch10.htm

📁 prrl 5 programs codes in the book
💻 HTM
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
by zero or more white space characters. The <TT>{5}</TT>

quantifier is used to ensure that that combination of components

is present five times.

<P>

The <TT>*</TT> and <TT>+</TT>

quantifiers are greedy. They match as many characters as possible.

This may not always be the behavior that you need. You can create

non-greedy components by following the quantifier with a <TT>?</TT>.

<P>

Use the following file specification in order to look at the <TT>*</TT>

and <TT>+</TT> quantifiers more closely:

<BLOCKQUOTE>

<PRE>

$_ = '/user/Jackie/temp/names.dat';

</PRE>

</BLOCKQUOTE>

<P>

The regular expression <TT>.*</TT>

will match the entire file specification. This can be seen in

the following small program:

<BLOCKQUOTE>

<PRE>

$_ = '/user/Jackie/temp/names.dat';

m/.*/;

print $&amp;;

</PRE>

</BLOCKQUOTE>

<P>

This program displays

<BLOCKQUOTE>

<PRE>

/user/Jackie/temp/names.dat

</PRE>

</BLOCKQUOTE>

<P>

You can see that the <TT>*</TT> quantifier

is greedy. It matched the whole string. If you add the ? modifier

to make the <TT>.*</TT> component

non-greedy, what do you think the program would display?

<BLOCKQUOTE>

<PRE>

$_ = '/user/Jackie/temp/names.dat';

m/.*?/;

print $&amp;;

</PRE>

</BLOCKQUOTE>

<P>

This program displays nothing because the least amount of characters

that the <TT>*</TT> matches is zero.

If we change the <TT>*</TT> to a <TT>+</TT>,

then the program will display

<BLOCKQUOTE>

<PRE>

/

</PRE>

</BLOCKQUOTE>

<P>

Next, let's look at the coNCept of pattern memory, which lets

you keep bits of matched string around after the match is complete.

<H3><A NAME="ExamplePatternMemory">

Example: Pattern Memory</A></H3>

<P>

Matching arbitrary numbers of characters is fine, but without

the capability to find out what was matched, patterns would not

be very useful. Perl lets you eNClose pattern components inside

parentheses in order to store the string that matched the components

into pattern memory. You also might hear <I>pattern memory </I>referred

to as <I>pattern buffers</I>. This memory persists after the match

statement is finished executing so that you can assign the matched

values to other variables.

<P>

You saw a simple example of this earlier right after the component

descriptions. That example looked for the first word in a string

and stored it into the first buffer, <TT>$1</TT>.

The following small program

<BLOCKQUOTE>

<PRE>

$_ =  &quot;AAA BBB ccC&quot;;

m/(\w+)/;

print(&quot;$1\n&quot;);

</PRE>

</BLOCKQUOTE>

<P>

will display

<BLOCKQUOTE>

<PRE>

AAA

</PRE>

</BLOCKQUOTE>

<P>

You can use as many buffers as you need. Each time you add a set

of parentheses, another buffer is used. If you want to find all

the words in the string, you need to use the /g match option.

In order to find all the words, you can use a loop statement that

loops until the match operator returns false.

<BLOCKQUOTE>

<PRE>

$_ =  &quot;AAA BBB ccC&quot;;



while (m/(\w+)/g) {

    print(&quot;$1\n&quot;);

}

</PRE>

</BLOCKQUOTE>

<P>

The program will display

<BLOCKQUOTE>

<PRE>

AAA

BBB

ccC

</PRE>

</BLOCKQUOTE>

<P>

If looping through the matches is not the right approach for your

needs, perhaps you need to create an array consisting of the matches.

<BLOCKQUOTE>

<PRE>

$_ =  &quot;AAA BBB ccC&quot;;

@matches = m/(\w+)/g;

print(&quot;@matches\n&quot;);

</PRE>

</BLOCKQUOTE>

<P>

The program will display

<BLOCKQUOTE>

<PRE>

AAA BBB ccC

</PRE>

</BLOCKQUOTE>

<P>

Perl also has a few special variables to help you know what matched

and what did not. These variables occasionally will save you from

having to add parentheses to find information.<BR>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=61><BLOCKQUOTE>

<CENTER><TT><B><FONT FACE="Courier">$+</FONT></B></TT></CENTER>

</BLOCKQUOTE>



</TD><TD WIDTH=529><BLOCKQUOTE>

This variable is assigned the value that the last bracket match matched.</BLOCKQUOTE>



</TD></TR>

<TR><TD WIDTH=61><BLOCKQUOTE>

<CENTER><TT><B><FONT FACE="Courier">$&amp;</FONT></B></TT></CENTER>

</BLOCKQUOTE>



</TD><TD WIDTH=529><BLOCKQUOTE>

This variable is assigned the value of the entire matched string. If the match is not successful, then <TT>$&amp;</TT> retains its value from the last successful match.

</BLOCKQUOTE>



</TD></TR>

<TR><TD WIDTH=61><BLOCKQUOTE>

<CENTER><TT><B><FONT FACE="Courier">$`</FONT></B></TT></CENTER>

</BLOCKQUOTE>



</TD><TD WIDTH=529><BLOCKQUOTE>

This variable is assigned everything in the searched string that is before the matched string.</BLOCKQUOTE>



</TD></TR>

<TR><TD WIDTH=61><BLOCKQUOTE>

<CENTER><TT><B><FONT FACE="Courier">$'</FONT></B></TT></CENTER>

</BLOCKQUOTE>



</TD><TD WIDTH=529><BLOCKQUOTE>

This variable is assigned everything in the search string that is after the matched string.</BLOCKQUOTE>



</TD></TR>

</TABLE>

</CENTER>

<P>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD><B>Tip</B></TD></TR>

<TR><TD>

<BLOCKQUOTE>

If you need to save the value of the matched strings stored in the pattern memory, make sure to assign them to other variables. Pattern memory is local to the eNClosing block and lasts only until another match is done.</BLOCKQUOTE>



</TD></TR>

</TABLE>

</CENTER>

<P>

<H3><A NAME="ExamplePatternPrecedeNCe">

Example: Pattern PrecedeNCe</A></H3>

<P>

Pattern components have an order of precedeNCe just as operators

do. If you see the following pattern:

<BLOCKQUOTE>

<PRE>

m/a|b+/

</PRE>

</BLOCKQUOTE>

<P>

it's hard to tell if the pattern should be

<BLOCKQUOTE>

<PRE>

m/(a|b)+/   # match either the &quot;a&quot; character repeated one 

            # or more times or the &quot;b&quot; character repeated one

            # or more times.

</PRE>

</BLOCKQUOTE>

<P>

or

<BLOCKQUOTE>

<PRE>

m/a|(b+)/   # match either the &quot;a&quot; character or the &quot;b&quot; character

            # repeated one or more times.

</PRE>

</BLOCKQUOTE>

<P>

The order of precedeNCe shown in Table 10.7 is designed to solve

problems like this. By looking at the table, you can see that

quantifiers have a higher precedeNCe than alternation. Therefore,

the second interpretation is correct.<BR>

<P>

<CENTER><B>Table 10.7&nbsp;&nbsp;The Pattern Component Order of

PrecedeNCe</B></CENTER>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=145><CENTER><I>PrecedeNCe Level</I></CENTER></TD>

<TD WIDTH=204><I>Component</I></TD></TR>

<TR><TD WIDTH=145><CENTER>1</CENTER></TD><TD WIDTH=204>Parentheses

</TD></TR>

<TR><TD WIDTH=145><CENTER>2</CENTER></TD><TD WIDTH=204>Quantifiers

</TD></TR>

<TR><TD WIDTH=145><CENTER>3</CENTER></TD><TD WIDTH=204>SequeNCes and ANChors

</TD></TR>

<TR><TD WIDTH=145><CENTER>4</CENTER></TD><TD WIDTH=204>Alternation

</TD></TR>

</TABLE>

</CENTER>

<P>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD><B>Tip</B></TD></TR>

<TR><TD>

<BLOCKQUOTE>

You can use parentheses to affect the order in which components are evaluated because they have the highest precedeNCe. However, unless you use the extended syntax, you will be affecting the pattern memory.</BLOCKQUOTE>



</TD></TR>

</TABLE>

</CENTER>

<P>

<H3><A NAME="ExampleExtensionSyntax">

Example: Extension Syntax</A></H3>

<P>

The regular expression extensions are a way to significantly add

to the power of patterns without adding a lot of meta-characters

to the proliferation that already exists. By using the basic (<TT>?...</TT>)

notation, the regular expression capabilities can be greatly extended.

<P>

At this time, Perl recognizes five extensions. These vary widely

in fuNCtionality-from adding comments to setting options. Table

10.8 lists the extensions and gives a short description of each.

<BR>

<P>

<CENTER><B>Table 10.8&nbsp;&nbsp;Five Extension Components</B></CENTER>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=127><CENTER><I>Extension</I></CENTER></TD><TD WIDTH=463><I>Description</I>

</TD></TR>

<TR><TD WIDTH=127><CENTER>(?# TEXT)</CENTER></TD><TD WIDTH=463>This extension lets you add comments to your regular expression. The TEXT value is ignored.

</TD></TR>

<TR><TD WIDTH=127><CENTER>(?:...)</CENTER></TD><TD WIDTH=463>This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used.

</TD></TR>

<TR><TD WIDTH=127><CENTER>(?=...)</CENTER></TD><TD WIDTH=463>This extension lets you match values without iNCluding them in the <TT>$&amp;</TT> variable.

</TD></TR>

<TR><TD WIDTH=127><CENTER>(?!...)</CENTER></TD><TD WIDTH=463>This extension lets you specify what should not follow your pattern. For instaNCe, <TT>/blue(?!bird)/</TT> means that <TT>&quot;bluebox&quot;</TT> and <TT>&quot;bluesy&quot;</TT> will be matched 
but not <TT>&quot;bluebird&quot;</TT>.

</TD></TR>

<TR><TD WIDTH=127><CENTER>(?sxi)</CENTER></TD><TD WIDTH=463>This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable 
interpolation to do the matching.

</TD></TR>

</TABLE>

</CENTER>

<P>

<P>

By far the most useful feature of extended mode, in my opinion,

is the ability to add comments directly inside your patterns.

For example, would you rather a see a pattern that looks like

this:

<BLOCKQUOTE>

<PRE>

# Match a string with two words. $1 will be the

# first word. $2 will be the second word.

m/^\s+(\w+)\W+(\w+)\s+$/;

</PRE>

</BLOCKQUOTE>

<P>

or one that looks like this:

<BLOCKQUOTE>

<PRE>

m/

    (?# This pattern will match any string with two)

    (?# and only two words in it. The matched words)

    (?# will be available in $1 and $2 if the match)

    (?# is successful.)



    ^      (?# ANChor this match to the beginning)

           (?# of the string)



    \s*    (?# skip over any whitespace characters)

           (?# use the * because there may be none)



    (\w+)  (?# Match the first word, we know it's)

           (?# the first word because of the aNChor)

           (?# above. Place the matched word into)

           (?# pattern memory.)



    \W+    (?# Match at least one non-word)

           (?# cha
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -