📄 ch7.htm
字号:
lists all the items collected in the <TT><FONT FACE="Courier">%finds</FONT></TT>
associative array.
<P>
Listing 7.3 finds only the first occurrence of a pattern in a
line. You can use the <TT><FONT FACE="Courier">offset</FONT></TT>
argument to search for a pattern other than from the start. The
<TT><FONT FACE="Courier">offset</FONT></TT> argument is specified
from 0 and up. Listing 7.4 presents another search program that
finds more than one occurrence on a line.
<HR>
<BLOCKQUOTE>
<B>Listing 7.4. Searching more than once.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #!/usr/bin/perl<BR>
2 <BR>
3 %finds = ();<BR>
4 $fname = "news.txt";<BR>
5 $word = "the";
<BR>
6 open (IFILE, $fname) || die "Cannot open $fname $!\n";
<BR>
7 <BR>
8 print "Search for :$word: \n";<BR>
9 while (<IFILE>)
{<BR>
10 $thispos = 0;<BR>
11 $nextpos = 0;<BR>
12 while (1) {<BR>
13 $nextpos
= index($_,$word,$thispos);<BR>
14 last if
($nextpos == -1);<BR>
15 $count++;
<BR>
16 $finds{"$count"}
= $nextpos;<BR>
17 $thispos
= $nextpos + 1;<BR>
18 }<BR>
19 }<BR>
20 close IFILE;<BR>
21 print "\nLn : Column";<BR>
22 while(($key,$value) = each(%finds)) {<BR>
23 print " $key : $value \n";
<BR>
24 }</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
The output of Listing 7.4 on a sample file would be something
like this:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">Ln : Column<BR>
1 : 31<BR>
2 : 54<BR>
3 : 38<BR>
4 : 53</FONT></TT>
</BLOCKQUOTE>
<H3><A NAME="ThesubstrFunction">The <TT><FONT SIZE=4 FACE="Courier">substr</FONT></TT><FONT SIZE=4>
Function</FONT></A></H3>
<P>
The <TT><FONT FACE="Courier">substr</FONT></TT> function is used
to extract parts of a string from other strings. Here's the syntax
for this function:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">substr ($<I>master</I>, $<I>offset</I>,
$<I>length</I>);</FONT></TT>
</BLOCKQUOTE>
<P>
<TT><FONT FACE="Courier">$<I>master</I></FONT></TT> is the string
from which a substring is to be copied, starting at the index
specified at <TT><FONT FACE="Courier">$<I>offset</I></FONT></TT>
and up to <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>
characters. Listing 7.5 illustrates the use of this function.
<HR>
<BLOCKQUOTE>
<B>Listing 7.5. Using the </B><TT><B><FONT FACE="Courier">substr</FONT></B></TT><B>
function.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #!/usr/bin/perl<BR>
2 # Check out the substr function.<BR>
3 #<BR>
4 $quote = "No man but a blockhead ever wrote except
for money";<BR>
5 # quote
by Samuel Johnson<BR>
6<BR>
7 $sub[0] = substr
($quote, 9, 6);<BR>
8<BR>
9 $name = "blockhead"
;<BR>
10 $pos = index($quote,$name);<BR>
11 $len = length($name);<BR>
12 $sub[1] = substr ($quote, $pos, $len);<BR>
13 $pos = index($quote,"wrote");<BR>
14 $sub[2] = substr ($quote, $pos, 6);<BR>
15<BR>
16 for ($i = 0; $i < 3; $i++) {<BR>
17 print "\$sub[$i] is \""
. $sub[$i] . "\" \n";<BR>
18 }<BR>
19<BR>
20 #<BR>
21 # To replace a string, let's try substr on the left-hand side.
<BR>
22 #<BR>
23 # Replace the words 'a blockhead', with the words 'an altruist'.
<BR>
24 # (Sorry Sam.)<BR>
25 $name = "a blockhead" ;<BR>
26 $pos = index($quote,$name);<BR>
27 $len = length($name);<BR>
28<BR>
29 substr ($quote, $pos, $len) = "an altruist";<BR>
30 print "After substr = $quote \n";</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
The output from the code in Listing 7.5 is as follows:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$sub[0] is "t a bl"<BR>
$sub[1] is "blockhead"<BR>
$sub[2] is "wrote "<BR>
<BR>
After substr = No man but an altruist ever wrote except for money</FONT></TT>
</BLOCKQUOTE>
<P>
You can see how the <TT><FONT FACE="Courier">substr</FONT></TT>
operator can be used to extract values from another string. Basically,
you tell the <TT><FONT FACE="Courier">substr</FONT></TT> function
how many characters you need and from where, and the chopped off
portion is returned from the function.
<P>
The <TT><FONT FACE="Courier">substr</FONT></TT> function can also
be used to make substitutions within a string. In this listing,
the words <TT><FONT FACE="Courier">"a blockhead"</FONT></TT>
are replaced by <TT><FONT FACE="Courier">"an altruist"</FONT></TT>.
The part of the string specified by <TT><FONT FACE="Courier">substr</FONT></TT>
is replaced by the value appearing to the right of the assignment
operator. Here's the syntax for these calls to <TT><FONT FACE="Courier">substr</FONT></TT>:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">substr ($<I>master</I>, $<I>offset</I>,
$<I>length</I>) = $<I>newStr</I>;</FONT></TT>
</BLOCKQUOTE>
<P>
<TT><FONT FACE="Courier">$<I>master</I></FONT></TT> must be a
string that can be written to (that is, not a <I>tied</I> variable-<A HREF="ch6.htm" tppabs="http://www.mcp.com/815097600/0-672/0-672-30891-6/ch6.htm" >see Chapter 6</A>,
"Binding Variables to Objects," for information on using
<TT><FONT FACE="Courier">tie()</FONT></TT> on variables). <TT><FONT FACE="Courier">$<I>offset</I></FONT></TT>
is where the substitution begins for up to <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>
characters. The value of <TT><FONT FACE="Courier">$<I>offset</I>
+ $<I>length</I></FONT></TT> must be less than the existing length
of the string. The <TT><FONT FACE="Courier">$<I>newStr</I></FONT></TT>
variable can be the empty string if you want to remove the substring
at the offset. To substitute the tail-end of the string starting
from the offset, do not specify the <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>
argument.
<P>
For example, this line:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$len = 22; substr ($quote, $pos, $len)
= "an altruist";</FONT></TT>
</BLOCKQUOTE>
<P>
prints the following line in the previous example:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">After substr = No man but an altruist</FONT></TT>
</BLOCKQUOTE>
<P>
The offset can be a negative number to specify counting from the
right side of the string. For example, the following line replaces
three characters at the fifth index from the right side in <TT><FONT FACE="Courier">$quote</FONT></TT>
with the word <TT><FONT FACE="Courier">"cash"</FONT></TT>:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">substr($quote, -5, 3) = "cash";</FONT></TT>
</BLOCKQUOTE>
<P>
The <TT><FONT FACE="Courier">substr</FONT></TT> function is great
when working with known strings that do cut and paste operations.
For more general strings, you have to work with patterns that
can be described using regular expressions. If you are familiar
with the <TT><FONT FACE="Courier">grep</FONT></TT> command in
UNIX, you already know about regular expressions. Basically, a
<I>regular expression</I> is a way of specifying strings like
"all words beginning with the letter a" or "all
strings with an xy in the middle somewhere." The next section
illustrates how Perl can help make these types of search and replace
patterns easier.
<H3><A NAME="StringSearchingwithPatterns">String Searching with
Patterns</A></H3>
<P>
Perl enables you to match patterns within strings with the <TT><FONT FACE="Courier">=~</FONT></TT>
operator. To see whether a string has a certain pattern in it,
you use the following syntax:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$result = $variable =~ /pattern/</FONT></TT>
</BLOCKQUOTE>
<P>
The value <TT><FONT FACE="Courier">$result</FONT></TT> is <TT><FONT FACE="Courier">true</FONT></TT>
if the pattern is found in <TT><FONT FACE="Courier">$variable</FONT></TT>.
To check whether a string does not have a pattern, you have to
use the <TT><FONT FACE="Courier">!~</FONT></TT> operator, like
this:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">$result = $variable !~ /pattern/</FONT></TT>
</BLOCKQUOTE>
<P>
Listing 7.6 shows how to match strings literally. It prints a
message if the string <TT><FONT FACE="Courier">Apple</FONT></TT>,
<TT><FONT FACE="Courier">apple</FONT></TT>, or <TT><FONT FACE="Courier">Orange</FONT></TT>
is found, or if the strings <TT><FONT FACE="Courier">Grape</FONT></TT>
and <TT><FONT FACE="Courier">grape</FONT></TT> are not found.
<HR>
<BLOCKQUOTE>
<B>Listing 7.6. Substitution with patterns.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #!/usr/bin/perl<BR>
2 <BR>
3 $input = <STDIN>
;<BR>
4 chop($input);<BR>
5 print "Orange
found! \n" if ( $input =~ /Orange/ );<BR>
6 print "Apple found! \n" if ( $input
=~ /[Aa]pple/ );<BR>
7 print "Grape not found! \n" if ( $input !~
/[Gg]rape/ );</FONT></TT>
</BLOCKQUOTE>
<HR>
<P>
So, how did you search for <TT><FONT FACE="Courier">apple</FONT></TT>
and <TT><FONT FACE="Courier">Apple</FONT></TT> in one statement?
This involves specifying a pattern to the search string. The syntax
for the <TT><FONT FACE="Courier">=~</FONT></TT> operator is this:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">[$variable =~] [m]/PATTERN/[i][o][g]</FONT></TT>
</BLOCKQUOTE>
<P>
<TT><FONT FACE="Courier">$variable</FONT></TT> is searched for
the pattern in <TT><FONT FACE="Courier">PATTERN</FONT></TT>. The
delimiter of the text being searched is a white space or an end-of-line
character. The <TT><FONT FACE="Courier">i</FONT></TT> specifies
a case-insensitive search. The <TT><FONT FACE="Courier">g</FONT></TT>
is used as an iterator to search more than once on the same string.
The <TT><FONT FACE="Courier">o</FONT></TT> interpolates characters.
I cover all these options shortly.
<P>
Let's look at how the patterns in <TT><FONT FACE="Courier">PATTERN</FONT></TT>
are defined. If you are already familiar with the <TT><FONT FACE="Courier">grep</FONT></TT>
utility in UNIX, you are familiar with patterns.
<P>
A character is matched for the string verbatim when placed in
<TT><FONT FACE="Courier">PATTERN</FONT></TT>. For example, <TT><FONT FACE="Courier">/Orange/</FONT></TT>
matched the string <TT><FONT FACE="Courier">Orange</FONT></TT>
only. To match a character other than a new line you can use the
dot (<TT><FONT FACE="Courier">.</FONT></TT>) operator. For example,
to match <TT><FONT FACE="Courier">Hat</FONT></TT> or <TT><FONT FACE="Courier">Cat</FONT></TT>,
you would use the pattern:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/.at/</FONT></TT>
</BLOCKQUOTE>
<P>
This also matches <TT><FONT FACE="Courier">Bat</FONT></TT>, <TT><FONT FACE="Courier">hat</FONT></TT>,
<TT><FONT FACE="Courier">Mat</FONT></TT>, and so on. If you just
want to get <TT><FONT FACE="Courier">Cat</FONT></TT> and <TT><FONT FACE="Courier">Hat</FONT></TT>,
you can use a character class using the square brackets (<TT><FONT FACE="Courier">[]</FONT></TT>).
For example, the pattern
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/[ch]cat/</FONT></TT>
</BLOCKQUOTE>
<P>
will match <TT><FONT FACE="Courier">Cat</FONT></TT> or <TT><FONT FACE="Courier">Hat</FONT></TT>,
but not <TT><FONT FACE="Courier">cat</FONT></TT>, <TT><FONT FACE="Courier">hat</FONT></TT>,
<TT><FONT FACE="Courier">bat</FONT></TT>, and so on. The characters
in a class are case sensitive. So to allow the lowercase versions,
you would use the pattern:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/[cChH]cat/</FONT></TT>
</BLOCKQUOTE>
<P>
It's cumbersome to list a lot of characters in the <TT><FONT FACE="Courier">[]</FONT></TT>
class, so the dash (<TT><FONT FACE="Courier">-</FONT></TT>) operator
can define a range of characters to use. These two statements
look for a digit:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/[0-9]/<BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -