📄 ch7.htm
字号:
</TABLE></CENTER>
<P>
<P>
Here are some examples and how they are interpreted given a string
with the word <I>hello</I> in it somewhere:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/\Ahel/ # match
only if the first three characters are "hel"<BR>
/llo\Z/ # match only if the last three
characters are "llo"<BR>
/llo$/ # matches only if the
last three characters are "llo"<BR>
/\Ahello\Z/ # same as /^hello$/ unless doing multiple line matching
<BR>
/\bhello/ # matches "hello", not "Othello",
but also matches "hello."<BR>
/\bhello/ # matches "$hello" because
$ is not part of a word.<BR>
/hello\b/ # matches "hello", and "Othello",
but not "hello."<BR>
/\bhello\b/ # matches "hello", and not "Othello"
nor "hello."</FONT></TT>
</BLOCKQUOTE>
<P>
A "word" for use with these anchors is assumed to contain
letters, digits, and underscore characters. No other characters,
such as the tilde (<TT><FONT FACE="Courier">~</FONT></TT>), hash
(<TT><FONT FACE="Courier">#</FONT></TT>), or exclamation point
(<TT><FONT FACE="Courier">!</FONT></TT>) are part of the word.
Therefore, the pattern <TT><FONT FACE="Courier">/\bhello/</FONT></TT>
will match the string <TT><FONT FACE="Courier">"$hello"</FONT></TT>,
because <TT><FONT FACE="Courier">$</FONT></TT> is not part of
a word.
<P>
The <TT><FONT FACE="Courier">\B</FONT></TT> pattern anchor takes
the opposite action than that of <TT><FONT FACE="Courier">\b</FONT></TT>.
It matches only if the pattern is contained in a word. For example,
the pattern below:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/\Bhello/ </FONT></TT>
</BLOCKQUOTE>
<P>
match <TT><FONT FACE="Courier">"$hello"</FONT></TT>
and <TT><FONT FACE="Courier">"Othello"</FONT></TT> but
not <TT><FONT FACE="Courier">"hello"</FONT></TT> nor
<TT><FONT FACE="Courier">"hello."</FONT></TT> Whereas,
the pattern here:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/hello\B/ </FONT></TT>
</BLOCKQUOTE>
<P>
will match <TT><FONT FACE="Courier">"hello."</FONT></TT>
but not <TT><FONT FACE="Courier">"hello"</FONT></TT>,
<TT><FONT FACE="Courier">"Othello"</FONT></TT> nor<TT><FONT FACE="Courier">
"$hello"</FONT></TT>. Finally this pattern
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/\Bhello\B/ </FONT></TT>
</BLOCKQUOTE>
<P>
will match <TT><FONT FACE="Courier">"Othello"</FONT></TT>
but not <TT><FONT FACE="Courier">"hello"</FONT></TT>,
<TT><FONT FACE="Courier">"$hello"</FONT></TT> nor <TT><FONT FACE="Courier">"hello."</FONT></TT>.
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/\Bhello/ # match
"$hello" and "Othello" but not "hello"
nor "hello."<BR>
/hello\B/ # match "hello." but
not "hello", "Othello" nor "$hello".
<BR>
/\Bhello\B/ # match "Othello" but not "hello",
"$hello" nor "hello.".</FONT></TT>
</BLOCKQUOTE>
<P>
Listing 7.9 contains the code from Listing 7.8 with the addition
of the new word boundary functions.
<HR>
<BLOCKQUOTE>
<B>Listing 7.9. Using the boundary characters.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #!/usr/bin/perl<BR>
2 <BR>
3 $scalars = 0;
<BR>
4 $hashes = 0;<BR>
5 $arrays = 0;
<BR>
6 $handles = 0;<BR>
7 <BR>
8 while (<STDIN>) {<BR>
9
@words = split (/[\t ]+/);<BR>
10 foreach $token (@words) {<BR>
11 if ($token =~ /\$\b[a-zA-Z][_0-9a-zA-Z]*\b/)
{<BR>
12 #
print ("$token is a legal scalar variable\n");<BR>
13 $scalars++;
<BR>
14 } elsif ($token =~ /@\b[a-zA-Z][_0-9a-zA-Z]*\b/)
{<BR>
15 #
print ("$token is a legal array variable\n");<BR>
16 $arrays++;
<BR>
17 } elsif ($token =~ /%\b[a-zA-Z][_0-9A-Z]*\b/)
{<BR>
18 #
print ("$token is a legal hash variable\n");<BR>
19 $hashes++;
<BR>
20 } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/)
{<BR>
21 #
print ("$token is probably a file handle\n");<BR>
22 $handles++;
<BR>
23 }<BR>
24 }<BR>
25 }<BR>
26 <BR>
27 print " This file used scalars $scalars times\n";
<BR>
28 print " This file used arrays $arrays times\n";
<BR>
29 print " This file used hashes $hashes times\n";<BR>
30 print " This file used handles $handles times\n";</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
Here is sample input and output for this program that takes an
existing script file in <TT><FONT FACE="Courier">test.txt</FONT></TT>
and uses it as the input to the <TT><FONT FACE="Courier">test.pl</FONT></TT>
program.
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$ <B>cat test.txt<BR>
</B>#!/usr/bin/perl<BR>
<BR>
$input = <STDIN>;<BR>
chop ($input);<BR>
<BR>
@words = split (/ +/, $input);<BR>
foreach $i (@words) {<BR>
</FONT></TT> <TT><FONT FACE="Courier">print
" [$i] \n";<BR>
}<BR>
<BR>
$ <B>test.pl < test.txt<BR>
</B> This file used
scalars 5 times<BR>
This file used arrays 2 times<BR>
This file used hashes
0 times<BR>
This file used handles 1 times</FONT></TT>
</BLOCKQUOTE>
<P>
Patterns do not have to be typed literally to be used in the <TT><I><FONT FACE="Courier">/
/</FONT></I></TT> search functions. You can also specify them
from within variables. Listing 7.10 is a modification of Listing
7.9, which uses three variables to hold the patterns instead of
specifying them in the <TT><FONT FACE="Courier">if</FONT></TT>
statement.
<HR>
<BLOCKQUOTE>
<B>Listing 7.10. Using pattern matches in variables.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #!/usr/bin/perl<BR>
2 <BR>
3 $scalars = 0;
<BR>
4 $hashes = 0;<BR>
5 $arrays = 0;
<BR>
6 $handles = 0;<BR>
7 <BR>
8 $sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
<BR>
9 $aType = "@\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
<BR>
10 $hType = "%\\b[a-zA-Z][_0-9A-Z]*\\b/";<BR>
11 <BR>
12 while (<STDIN>) {<BR>
13 @words = split (/[\t ]+/);<BR>
14 foreach $token (@words) {<BR>
15 if ($token =~ /$sType/ ) {<BR>
16 #
print ("$token is a legal scalar variable\n");<BR>
17 $scalars++;
<BR>
18 } elsif ($token =~ /$aType/ ) {<BR>
19 #
print ("$token is a legal array variable\n");<BR>
20 $arrays++;
<BR>
21 } elsif ($token =~ /$hType/ ) {<BR>
22 #
print ("$token is a legal hash variable\n");<BR>
23 $hashes++;
<BR>
24 } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/)
{<BR>
25 #
print ("$token is probably a file handle\n");<BR>
26 $handles++;
<BR>
27 }<BR>
28 }<BR>
29 }<BR>
30 <BR>
31 print " This file used scalars $scalars times\n";
<BR>
32 print " This file used arrays $arrays times\n";
<BR>
33 print " This file used hashes $hashes times\n";<BR>
34 print " This file used handles $handles times\n";</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
In this code, the variables <TT><FONT FACE="Courier">$aType</FONT></TT>,
<TT><FONT FACE="Courier">$hType</FONT></TT>, and <TT><FONT FACE="Courier">$sType</FONT></TT>
can be used elsewhere in the program verbatim. What you have to
do, though, is to escape the backslashes twice, once to get past
the Perl parser for the string and the other for the pattern searcher
if you are using double quotes. When using single quotes, you
can use the following line:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$sType = '\$\\b[a-zA-Z][_0-9a-zA-Z]*\b';</FONT></TT>
</BLOCKQUOTE>
<P>
instead of this line:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";</FONT></TT>
</BLOCKQUOTE>
<P>
Make sure that you remember to include the enclosing <TT><FONT FACE="Courier">/</FONT></TT>
characters when using a <TT><FONT FACE="Courier">$variable</FONT></TT>
for a pattern. Forgetting to do this will give erroneous results.
Also, be sure you see how each backslash is placed to escape characters
correctly.
<H3><A NAME="ShortcutsforWordsinPerl">Shortcuts for Words in Perl</A>
</H3>
<P>
The <TT><FONT FACE="Courier">[]</FONT></TT> classes for patterns
simplify searches quite a bit. In Perl, there are several shortcut
patterns that describe words or numbers. You have seen them already
in the previous examples and chapters.
<P>
Here are the shortcuts:<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR VALIGN=TOP><TD WIDTH=75><I><CENTER>Shortcut</I></TD><TD WIDTH=229><I>Description</I>
</TD><TD WIDTH=148><I>Pattern String</I></TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\d</FONT></TT></TD>
<TD WIDTH=229>Any digit</TD><TD WIDTH=148><TT><FONT FACE="Courier">[0-9]</FONT></TT>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\D</FONT></TT></TD>
<TD WIDTH=229>Anything other than a digit</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^0-9]</FONT></TT>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\w</FONT></TT></TD>
<TD WIDTH=229>Any word character</TD><TD WIDTH=148><TT><FONT FACE="Courier">[_0-9a-zA-Z]</FONT></TT>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\W</FONT></TT></TD>
<TD WIDTH=229>Anything not a word character</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^_0-9a-zA-Z]</FONT></TT>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\s</FONT></TT></TD>
<TD WIDTH=229>White space </TD><TD WIDTH=148><TT><FONT FACE="Courier">[ \r\t\n\f]</FONT></TT>
</TD></TR>
<TR VALIGN=TOP><TD WIDTH=75><TT><FONT FACE="Courier"><CENTER>\S</FONT></TT></TD>
<TD WIDTH=229>Anything other than white space</TD><TD WIDTH=148><TT><FONT FACE="Courier">[^ \r\t\n\f]</FONT></TT>
</TD></TR>
</TABLE></CENTER>
<P>
<P>
These escape sequences can be used anywhere ordinary characters
are used. For example, the pattern <TT><FONT FACE="Courier">/[\da-z]/</FONT></TT>
matches any digit or lowercase letter.
<P>
The definition of word boundary as used by the <TT><FONT FACE="Courier">\b</FONT></TT>
and <TT><FONT FACE="Courier">\B</FONT></TT> special characters
is done with the use of <TT><FONT FACE="Courier">\w</FONT></TT>
and <TT><FONT FACE="Courier">\W</FONT></TT>. The patterns <TT><FONT FACE="Courier">/\w\W/</FONT></TT>
and <TT><FONT FACE="Courier">/\W\w/</FONT></TT> can be used to
detect word boundaries. If the pattern <TT><FONT FACE="Courier">/\w\W/</FONT></TT>
matches a pair of characters, it means that the first character
is part of a word and the second is not. This further means that
the first character is at the end of a matched word and that a
word boundary exists between the first and second characters matched
by the pattern and you are at the end of a word.
<P>
Conversely, if <TT><FONT FACE="Courier">/\W\w/</FONT></TT> matches
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -