📄 ch7.htm

📁 《Perl 5 Unreleased》
💻 HTM
📖 第 1 页 / 共 5 页
字号:
</TABLE></CENTER>

<P>

<P>

Here are some examples and how they are interpreted given a string

with the word <I>hello</I> in it somewhere:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/\Ahel/&nbsp;&nbsp;&nbsp;&nbsp; # match

only if the first three characters are &quot;hel&quot;<BR>

/llo\Z/&nbsp;&nbsp;&nbsp;&nbsp; # match only if the last three

characters are &quot;llo&quot;<BR>

/llo$/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# matches only if the

last three characters are &quot;llo&quot;<BR>

/\Ahello\Z/ # same as /^hello$/ unless doing multiple line matching

<BR>

/\bhello/&nbsp;&nbsp;&nbsp;# matches &quot;hello&quot;, not &quot;Othello&quot;,

but also matches &quot;hello.&quot;<BR>

/\bhello/&nbsp;&nbsp;&nbsp;# matches &quot;$hello&quot; because

$ is not part of a word.<BR>

/hello\b/&nbsp;&nbsp;&nbsp;# matches &quot;hello&quot;, and &quot;Othello&quot;,

but not &quot;hello.&quot;<BR>

/\bhello\b/ # matches &quot;hello&quot;, and not &quot;Othello&quot;

nor &quot;hello.&quot;</FONT></TT>

</BLOCKQUOTE>

<P>

A &quot;word&quot; for use with these anchors is assumed to contain

letters, digits, and underscore characters. No other characters,

such as the tilde (<TT><FONT FACE="Courier">~</FONT></TT>), hash

(<TT><FONT FACE="Courier">#</FONT></TT>), or exclamation point

(<TT><FONT FACE="Courier">!</FONT></TT>) are part of the word.

Therefore, the pattern <TT><FONT FACE="Courier">/\bhello/</FONT></TT>

will match the string <TT><FONT FACE="Courier">&quot;$hello&quot;</FONT></TT>,

because <TT><FONT FACE="Courier">$</FONT></TT> is not part of

a word.

<P>

The <TT><FONT FACE="Courier">\B</FONT></TT> pattern anchor takes

the opposite action than that of <TT><FONT FACE="Courier">\b</FONT></TT>.

It matches only if the pattern is contained in a word. For example,

the pattern below:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/\Bhello/&nbsp;&nbsp;&nbsp;&nbsp;</FONT></TT>

</BLOCKQUOTE>

<P>

match <TT><FONT FACE="Courier">&quot;$hello&quot;</FONT></TT>

and <TT><FONT FACE="Courier">&quot;Othello&quot;</FONT></TT> but

not <TT><FONT FACE="Courier">&quot;hello&quot;</FONT></TT> nor

<TT><FONT FACE="Courier">&quot;hello.&quot;</FONT></TT> Whereas,

the pattern here: 

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/hello\B/&nbsp;&nbsp;&nbsp;</FONT></TT>

</BLOCKQUOTE>

<P>

will match <TT><FONT FACE="Courier">&quot;hello.&quot;</FONT></TT>

but not <TT><FONT FACE="Courier">&quot;hello&quot;</FONT></TT>,

<TT><FONT FACE="Courier">&quot;Othello&quot;</FONT></TT> nor<TT><FONT FACE="Courier">

&quot;$hello&quot;</FONT></TT>. Finally this pattern 

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/\Bhello\B/ </FONT></TT>

</BLOCKQUOTE>

<P>

will match <TT><FONT FACE="Courier">&quot;Othello&quot;</FONT></TT>

but not <TT><FONT FACE="Courier">&quot;hello&quot;</FONT></TT>,

<TT><FONT FACE="Courier">&quot;$hello&quot;</FONT></TT> nor <TT><FONT FACE="Courier">&quot;hello.&quot;</FONT></TT>.

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/\Bhello/&nbsp;&nbsp;&nbsp;&nbsp;# match

&quot;$hello&quot; and &quot;Othello&quot; but not &quot;hello&quot;

nor &quot;hello.&quot;<BR>

/hello\B/&nbsp;&nbsp;&nbsp;&nbsp;# match &quot;hello.&quot; but

not &quot;hello&quot;, &quot;Othello&quot; nor &quot;$hello&quot;.

<BR>

/\Bhello\B/&nbsp;&nbsp;# match &quot;Othello&quot; but not &quot;hello&quot;,

&quot;$hello&quot; nor &quot;hello.&quot;.</FONT></TT>

</BLOCKQUOTE>

<P>

Listing 7.9 contains the code from Listing 7.8 with the addition

of the new word boundary functions.

<HR>

<BLOCKQUOTE>

<B>Listing 7.9. Using the boundary characters.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #!/usr/bin/perl<BR>

&nbsp;2 <BR>

&nbsp;3 $scalars =&nbsp;&nbsp;0;

<BR>

&nbsp;4 $hashes =&nbsp;&nbsp;0;<BR>

&nbsp;5 $arrays =&nbsp;&nbsp;0;

<BR>

&nbsp;6 $handles =&nbsp;&nbsp;0;<BR>

&nbsp;7 <BR>

&nbsp;8 while (&lt;STDIN&gt;) {<BR>

&nbsp;9&nbsp;&nbsp;&nbsp;&nbsp;

@words = split (/[\t ]+/);<BR>

10&nbsp;&nbsp;&nbsp;&nbsp; foreach $token (@words) {<BR>

11&nbsp;&nbsp;&nbsp;&nbsp; if ($token =~ /\$\b[a-zA-Z][_0-9a-zA-Z]*\b/)

{<BR>

12&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal scalar variable\n&quot;);<BR>

13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$scalars++;

<BR>

14&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /@\b[a-zA-Z][_0-9a-zA-Z]*\b/)

{<BR>

15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal array variable\n&quot;);<BR>

16&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$arrays++;

<BR>

17&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /%\b[a-zA-Z][_0-9A-Z]*\b/)

{<BR>

18&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal hash variable\n&quot;);<BR>

19&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$hashes++;

<BR>

20&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /\&lt;[A-Z][_0-9A-Z]*\&gt;/)

{<BR>

21&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is probably a file handle\n&quot;);<BR>

22&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$handles++;

<BR>

23&nbsp;&nbsp;&nbsp;&nbsp; }<BR>

24&nbsp;&nbsp;&nbsp;&nbsp;}<BR>

25 }<BR>

26 <BR>

27 print &quot; This file used scalars $scalars times\n&quot;;

<BR>

28 print &quot; This file used arrays&nbsp;&nbsp;$arrays&nbsp;&nbsp;times\n&quot;;

<BR>

29 print &quot; This file used hashes $hashes times\n&quot;;<BR>

30 print &quot; This file used handles $handles times\n&quot;;</FONT></TT>

</BLOCKQUOTE>

<HR>

<P>

Here is sample input and output for this program that takes an

existing script file in <TT><FONT FACE="Courier">test.txt</FONT></TT>

and uses it as the input to the <TT><FONT FACE="Courier">test.pl</FONT></TT>

program. 

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$ <B>cat test.txt<BR>

</B>#!/usr/bin/perl<BR>

<BR>

$input = &lt;STDIN&gt;;<BR>

chop ($input);<BR>

<BR>

@words = split (/ +/, $input);<BR>

foreach $i (@words) {<BR>

</FONT></TT>&nbsp;&nbsp;&nbsp;&nbsp;<TT><FONT FACE="Courier">print

&quot; [$i] \n&quot;;<BR>

&nbsp;&nbsp;&nbsp;&nbsp;}<BR>

<BR>

$ <B>test.pl&nbsp;&nbsp;&lt; test.txt<BR>

</B>&nbsp;This file used

scalars 5 times<BR>

&nbsp;This file used arrays&nbsp;&nbsp;2&nbsp;&nbsp;times<BR>

&nbsp;This file used hashes

0 times<BR>

&nbsp;This file used handles 1 times</FONT></TT>

</BLOCKQUOTE>

<P>

Patterns do not have to be typed literally to be used in the <TT><I><FONT FACE="Courier">/

/</FONT></I></TT> search functions. You can also specify them

from within variables. Listing 7.10 is a modification of Listing

7.9, which uses three variables to hold the patterns instead of

specifying them in the <TT><FONT FACE="Courier">if</FONT></TT>

statement. 

<HR>

<BLOCKQUOTE>

<B>Listing 7.10. Using pattern matches in variables.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #!/usr/bin/perl<BR>

&nbsp;2 <BR>

&nbsp;3 $scalars =&nbsp;&nbsp;0;

<BR>

&nbsp;4 $hashes =&nbsp;&nbsp;0;<BR>

&nbsp;5 $arrays =&nbsp;&nbsp;0;

<BR>

&nbsp;6 $handles =&nbsp;&nbsp;0;<BR>

&nbsp;7 <BR>

&nbsp;8 $sType = &quot;\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b&quot;;

<BR>

&nbsp;9 $aType = &quot;@\\b[a-zA-Z][_0-9a-zA-Z]*\\b&quot;;

<BR>

10 $hType = &quot;%\\b[a-zA-Z][_0-9A-Z]*\\b/&quot;;<BR>

11 <BR>

12 while (&lt;STDIN&gt;) {<BR>

13&nbsp;&nbsp;&nbsp;&nbsp; @words = split (/[\t ]+/);<BR>

14&nbsp;&nbsp;&nbsp;&nbsp; foreach $token (@words) {<BR>

15&nbsp;&nbsp;&nbsp;&nbsp; if ($token =~ /$sType/ ) {<BR>

16&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal scalar variable\n&quot;);<BR>

17&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$scalars++;

<BR>

18&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /$aType/ ) {<BR>

19&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal array variable\n&quot;);<BR>

20&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$arrays++;

<BR>

21&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /$hType/ ) {<BR>

22&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is a legal hash variable\n&quot;);<BR>

23&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$hashes++;

<BR>

24&nbsp;&nbsp;&nbsp;&nbsp; } elsif ($token =~ /\&lt;[A-Z][_0-9A-Z]*\&gt;/)

{<BR>

25&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#

print (&quot;$token is probably a file handle\n&quot;);<BR>

26&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$handles++;

<BR>

27&nbsp;&nbsp;&nbsp;&nbsp; }<BR>

28&nbsp;&nbsp;&nbsp;&nbsp;}<BR>

29 }<BR>

30 <BR>

31 print &quot; This file used scalars $scalars times\n&quot;;

<BR>

32 print &quot; This file used arrays&nbsp;&nbsp;$arrays&nbsp;&nbsp;times\n&quot;;

<BR>

33 print &quot; This file used hashes $hashes times\n&quot;;<BR>

34 print &quot; This file used handles $handles times\n&quot;;</FONT></TT>

</BLOCKQUOTE>

<HR>

<P>

In this code, the variables <TT><FONT FACE="Courier">$aType</FONT></TT>,

<TT><FONT FACE="Courier">$hType</FONT></TT>, and <TT><FONT FACE="Courier">$sType</FONT></TT>

can be used elsewhere in the program verbatim. What you have to

do, though, is to escape the backslashes twice, once to get past

the Perl parser for the string and the other for the pattern searcher

if you are using double quotes. When using single quotes, you

can use the following line:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$sType = '\$\\b[a-zA-Z][_0-9a-zA-Z]*\b';</FONT></TT>

</BLOCKQUOTE>

<P>

instead of this line:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$sType = &quot;\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b&quot;;</FONT></TT>

</BLOCKQUOTE>

<P>

Make sure that you remember to include the enclosing <TT><FONT FACE="Courier">/</FONT></TT>

characters when using a <TT><FONT FACE="Courier">$variable</FONT></TT>

for a pattern. Forgetting to do this will give erroneous results.

Also, be sure you see how each backslash is placed to escape characters

correctly.

<H3><A NAME="ShortcutsforWordsinPerl">Shortcuts for Words in Perl</A>

</H3>

<P>

The <TT><FONT FACE="Courier">[]</FONT></TT> classes for patterns

simplify searches quite a bit. In Perl, there are several shortcut

patterns that describe words or numbers. You have seen them already

in the previous examples and chapters.

<P>

Here are the shortcuts:<P>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR VALIGN=TOP><TD WIDTH=75><I><CENTER>Shortcut</I></TD><TD WIDTH=229><I>Description</I>

</TD><TD WIDTH=148><I>Pattern String</I></TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\d</FONT></TT></TD>

<TD WIDTH=229>Any digit</TD><TD WIDTH=148><TT><FONT FACE="Courier">[0-9]</FONT></TT>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\D</FONT></TT></TD>

<TD WIDTH=229>Anything other than a digit</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^0-9]</FONT></TT>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\w</FONT></TT></TD>

<TD WIDTH=229>Any word character</TD><TD WIDTH=148><TT><FONT FACE="Courier">[_0-9a-zA-Z]</FONT></TT>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\W</FONT></TT></TD>

<TD WIDTH=229>Anything not a word character</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^_0-9a-zA-Z]</FONT></TT>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\s</FONT></TT></TD>

<TD WIDTH=229>White space </TD><TD WIDTH=148><TT><FONT FACE="Courier">[ \r\t\n\f]</FONT></TT>

</TD></TR>

<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\S</FONT></TT></TD>

<TD WIDTH=229>Anything other than white space</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^ \r\t\n\f]</FONT></TT>

</TD></TR>

</TABLE></CENTER>

<P>

<P>

These escape sequences can be used anywhere ordinary characters

are used. For example, the pattern <TT><FONT FACE="Courier">/[\da-z]/</FONT></TT>

matches any digit or lowercase letter.

<P>

The definition of word boundary as used by the <TT><FONT FACE="Courier">\b</FONT></TT>

and <TT><FONT FACE="Courier">\B</FONT></TT> special characters

is done with the use of <TT><FONT FACE="Courier">\w</FONT></TT>

and <TT><FONT FACE="Courier">\W</FONT></TT>. The patterns <TT><FONT FACE="Courier">/\w\W/</FONT></TT>

and <TT><FONT FACE="Courier">/\W\w/</FONT></TT> can be used to

detect word boundaries. If the pattern <TT><FONT FACE="Courier">/\w\W/</FONT></TT>

matches a pair of characters, it means that the first character

is part of a word and the second is not. This further means that

the first character is at the end of a matched word and that a

word boundary exists between the first and second characters matched

by the pattern and you are at the end of a word.

<P>

Conversely, if <TT><FONT FACE="Courier">/\W\w/</FONT></TT> matches
💿 文件大小 1200 K
👤 上传用户 cz6891297
📂 所属分类其他书籍
🏷️ 相关标签

#Unreleased #Perl
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -