📄 ch7.htm

📁 《Perl 5 Unreleased》
💻 HTM
📖 第 1 页 / 共 5 页
字号:
lists all the items collected in the <TT><FONT FACE="Courier">%finds</FONT></TT>

associative array.

<P>

Listing 7.3 finds only the first occurrence of a pattern in a

line. You can use the <TT><FONT FACE="Courier">offset</FONT></TT>

argument to search for a pattern other than from the start. The

<TT><FONT FACE="Courier">offset</FONT></TT> argument is specified

from 0 and up. Listing 7.4 presents another search program that

finds more than one occurrence on a line.

<HR>

<BLOCKQUOTE>

<B>Listing 7.4. Searching more than once.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #!/usr/bin/perl<BR>

&nbsp;2 <BR>

&nbsp;3 %finds = ();<BR>

&nbsp;4 $fname = &quot;news.txt&quot;;<BR>

&nbsp;5 $word = &quot;the&quot;;

<BR>

&nbsp;6 open (IFILE, $fname) || die &quot;Cannot open $fname $!\n&quot;;

<BR>

&nbsp;7 <BR>

&nbsp;8 print &quot;Search for :$word: \n&quot;;<BR>

&nbsp;9 while (&lt;IFILE&gt;)

{<BR>

10&nbsp;&nbsp;&nbsp;&nbsp; $thispos = 0;<BR>

11&nbsp;&nbsp;&nbsp;&nbsp; $nextpos = 0;<BR>

12&nbsp;&nbsp;&nbsp;&nbsp; while (1) {<BR>

13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$nextpos

= index($_,$word,$thispos);<BR>

14&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;last if

($nextpos == -1);<BR>

15&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$count++;

<BR>

16&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$finds{&quot;$count&quot;}

= $nextpos;<BR>

17&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$thispos

= $nextpos + 1;<BR>

18&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<BR>

19 }<BR>

20 close IFILE;<BR>

21 print &quot;\nLn : Column&quot;;<BR>

22 while(($key,$value) = each(%finds)) {<BR>

23&nbsp;&nbsp;&nbsp;&nbsp; print &quot; $key : $value \n&quot;;

<BR>

24&nbsp;&nbsp;&nbsp;&nbsp; }</FONT></TT>

</BLOCKQUOTE>

<HR>

<P>

The output of Listing 7.4 on a sample file would be something

like this:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">Ln : Column<BR>

&nbsp;1 : 31<BR>

&nbsp;2 : 54<BR>

&nbsp;3 : 38<BR>

&nbsp;4 : 53</FONT></TT>

</BLOCKQUOTE>

<H3><A NAME="ThesubstrFunction">The <TT><FONT SIZE=4 FACE="Courier">substr</FONT></TT><FONT SIZE=4>

Function</FONT></A></H3>

<P>

The <TT><FONT FACE="Courier">substr</FONT></TT> function is used

to extract parts of a string from other strings. Here's the syntax

for this function:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">substr ($<I>master</I>, $<I>offset</I>,

$<I>length</I>);</FONT></TT>

</BLOCKQUOTE>

<P>

<TT><FONT FACE="Courier">$<I>master</I></FONT></TT> is the string

from which a substring is to be copied, starting at the index

specified at <TT><FONT FACE="Courier">$<I>offset</I></FONT></TT>

and up to <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>

characters. Listing 7.5 illustrates the use of this function.

<HR>

<BLOCKQUOTE>

<B>Listing 7.5. Using the </B><TT><B><FONT FACE="Courier">substr</FONT></B></TT><B>

function.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #!/usr/bin/perl<BR>

&nbsp;2 #&nbsp;&nbsp;Check out the substr function.<BR>

&nbsp;3 #<BR>

&nbsp;4 $quote = &quot;No man but a blockhead ever wrote except

for money&quot;;<BR>

&nbsp;5 #&nbsp;&nbsp;quote

by Samuel Johnson<BR>

&nbsp;6<BR>

&nbsp;7 $sub[0] = substr

($quote, 9, 6);<BR>

&nbsp;8<BR>

&nbsp;9 $name = &quot;blockhead&quot;

;<BR>

10 $pos = index($quote,$name);<BR>

11 $len = length($name);<BR>

12 $sub[1] = substr ($quote, $pos, $len);<BR>

13 $pos = index($quote,&quot;wrote&quot;);<BR>

14 $sub[2] = substr ($quote, $pos, 6);<BR>

15<BR>

16 for ($i = 0; $i &lt; 3; $i++) {<BR>

17&nbsp;&nbsp;&nbsp;&nbsp; print &quot;\$sub[$i] is \&quot;&quot;

.&nbsp;&nbsp;$sub[$i] . &quot;\&quot; \n&quot;;<BR>

18 }<BR>

19<BR>

20 #<BR>

21 # To replace a string, let's try substr on the left-hand side.

<BR>

22 #<BR>

23 # Replace the words 'a blockhead', with the words 'an altruist'.

<BR>

24 # (Sorry Sam.)<BR>

25 $name = &quot;a blockhead&quot; ;<BR>

26 $pos = index($quote,$name);<BR>

27 $len = length($name);<BR>

28<BR>

29 substr ($quote, $pos, $len) = &quot;an altruist&quot;;<BR>

30 print &quot;After substr = $quote \n&quot;;</FONT></TT>

</BLOCKQUOTE>

<HR>

<P>

The output from the code in Listing 7.5 is as follows:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$sub[0] is &quot;t a bl&quot;<BR>

$sub[1] is &quot;blockhead&quot;<BR>

$sub[2] is &quot;wrote &quot;<BR>

<BR>

After substr = No man but an altruist ever wrote except for money</FONT></TT>

</BLOCKQUOTE>

<P>

You can see how the <TT><FONT FACE="Courier">substr</FONT></TT>

operator can be used to extract values from another string. Basically,

you tell the <TT><FONT FACE="Courier">substr</FONT></TT> function

how many characters you need and from where, and the chopped off

portion is returned from the function.

<P>

The <TT><FONT FACE="Courier">substr</FONT></TT> function can also

be used to make substitutions within a string. In this listing,

the words <TT><FONT FACE="Courier">&quot;a blockhead&quot;</FONT></TT>

are replaced by <TT><FONT FACE="Courier">&quot;an altruist&quot;</FONT></TT>.

The part of the string specified by <TT><FONT FACE="Courier">substr</FONT></TT>

is replaced by the value appearing to the right of the assignment

operator. Here's the syntax for these calls to <TT><FONT FACE="Courier">substr</FONT></TT>:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">substr ($<I>master</I>, $<I>offset</I>,

$<I>length</I>) = $<I>newStr</I>;</FONT></TT>

</BLOCKQUOTE>

<P>

<TT><FONT FACE="Courier">$<I>master</I></FONT></TT> must be a

string that can be written to (that is, not a <I>tied</I> variable-<A HREF="ch6.htm" tppabs="http://www.mcp.com/815097600/0-672/0-672-30891-6/ch6.htm" >see Chapter 6</A>,

&quot;Binding Variables to Objects,&quot; for information on using

<TT><FONT FACE="Courier">tie()</FONT></TT> on variables). <TT><FONT FACE="Courier">$<I>offset</I></FONT></TT>

is where the substitution begins for up to <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>

characters. The value of <TT><FONT FACE="Courier">$<I>offset</I>

+ $<I>length</I></FONT></TT> must be less than the existing length

of the string. The <TT><FONT FACE="Courier">$<I>newStr</I></FONT></TT>

variable can be the empty string if you want to remove the substring

at the offset. To substitute the tail-end of the string starting

from the offset, do not specify the <TT><FONT FACE="Courier">$<I>length</I></FONT></TT>

argument.

<P>

For example, this line:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$len = 22; substr ($quote, $pos, $len)

= &quot;an altruist&quot;;</FONT></TT>

</BLOCKQUOTE>

<P>

prints the following line in the previous example:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">After substr = No man but an altruist</FONT></TT>

</BLOCKQUOTE>

<P>

The offset can be a negative number to specify counting from the

right side of the string. For example, the following line replaces

three characters at the fifth index from the right side in <TT><FONT FACE="Courier">$quote</FONT></TT>

with the word <TT><FONT FACE="Courier">&quot;cash&quot;</FONT></TT>:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">substr($quote, -5, 3) = &quot;cash&quot;;</FONT></TT>

</BLOCKQUOTE>

<P>

The <TT><FONT FACE="Courier">substr</FONT></TT> function is great

when working with known strings that do cut and paste operations.

For more general strings, you have to work with patterns that

can be described using regular expressions. If you are familiar

with the <TT><FONT FACE="Courier">grep</FONT></TT> command in

UNIX, you already know about regular expressions. Basically, a

<I>regular expression</I> is a way of specifying strings like

&quot;all words beginning with the letter a&quot; or &quot;all

strings with an xy in the middle somewhere.&quot; The next section

illustrates how Perl can help make these types of search and replace

patterns easier.

<H3><A NAME="StringSearchingwithPatterns">String Searching with

Patterns</A></H3>

<P>

Perl enables you to match patterns within strings with the <TT><FONT FACE="Courier">=~</FONT></TT>

operator. To see whether a string has a certain pattern in it,

you use the following syntax:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$result = $variable =~ /pattern/</FONT></TT>

</BLOCKQUOTE>

<P>

The value <TT><FONT FACE="Courier">$result</FONT></TT> is <TT><FONT FACE="Courier">true</FONT></TT>

if the pattern is found in <TT><FONT FACE="Courier">$variable</FONT></TT>.

To check whether a string does not have a pattern, you have to

use the <TT><FONT FACE="Courier">!~</FONT></TT> operator, like

this:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">$result = $variable !~ /pattern/</FONT></TT>

</BLOCKQUOTE>

<P>

Listing 7.6 shows how to match strings literally. It prints a

message if the string <TT><FONT FACE="Courier">Apple</FONT></TT>,

<TT><FONT FACE="Courier">apple</FONT></TT>, or <TT><FONT FACE="Courier">Orange</FONT></TT>

is found, or if the strings <TT><FONT FACE="Courier">Grape</FONT></TT>

and <TT><FONT FACE="Courier">grape</FONT></TT> are not found.

<HR>

<BLOCKQUOTE>

<B>Listing 7.6. Substitution with patterns.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #!/usr/bin/perl<BR>

&nbsp;2 <BR>

&nbsp;3 $input = &lt;STDIN&gt;

;<BR>

&nbsp;4 chop($input);<BR>

&nbsp;5 print &quot;Orange

found! \n&quot; if ( $input =~ /Orange/ );<BR>

&nbsp;6 print &quot;Apple found! \n&quot; if (&nbsp;&nbsp;$input

=~ /[Aa]pple/ );<BR>

&nbsp;7 print &quot;Grape not found! \n&quot; if (  $input !~

/[Gg]rape/ );</FONT></TT>

</BLOCKQUOTE>

<HR>

<P>

So, how did you search for <TT><FONT FACE="Courier">apple</FONT></TT>

and <TT><FONT FACE="Courier">Apple</FONT></TT> in one statement?

This involves specifying a pattern to the search string. The syntax

for the <TT><FONT FACE="Courier">=~</FONT></TT> operator is this:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">[$variable =~] [m]/PATTERN/[i][o][g]</FONT></TT>

</BLOCKQUOTE>

<P>

<TT><FONT FACE="Courier">$variable</FONT></TT> is searched for

the pattern in <TT><FONT FACE="Courier">PATTERN</FONT></TT>. The

delimiter of the text being searched is a white space or an end-of-line

character. The <TT><FONT FACE="Courier">i</FONT></TT> specifies

a case-insensitive search. The <TT><FONT FACE="Courier">g</FONT></TT>

is used as an iterator to search more than once on the same string.

The <TT><FONT FACE="Courier">o</FONT></TT> interpolates characters.

I cover all these options shortly.

<P>

Let's look at how the patterns in <TT><FONT FACE="Courier">PATTERN</FONT></TT>

are defined. If you are already familiar with the <TT><FONT FACE="Courier">grep</FONT></TT>

utility in UNIX, you are familiar with patterns.

<P>

A character is matched for the string verbatim when placed in

<TT><FONT FACE="Courier">PATTERN</FONT></TT>. For example, <TT><FONT FACE="Courier">/Orange/</FONT></TT>

matched the string <TT><FONT FACE="Courier">Orange</FONT></TT>

only. To match a character other than a new line you can use the

dot (<TT><FONT FACE="Courier">.</FONT></TT>) operator. For example,

to match <TT><FONT FACE="Courier">Hat</FONT></TT> or <TT><FONT FACE="Courier">Cat</FONT></TT>,

you would use the pattern:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/.at/</FONT></TT>

</BLOCKQUOTE>

<P>

This also matches <TT><FONT FACE="Courier">Bat</FONT></TT>, <TT><FONT FACE="Courier">hat</FONT></TT>,

<TT><FONT FACE="Courier">Mat</FONT></TT>, and so on. If you just

want to get <TT><FONT FACE="Courier">Cat</FONT></TT> and <TT><FONT FACE="Courier">Hat</FONT></TT>,

you can use a character class using the square brackets (<TT><FONT FACE="Courier">[]</FONT></TT>).

For example, the pattern

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/[ch]cat/</FONT></TT>

</BLOCKQUOTE>

<P>

will match <TT><FONT FACE="Courier">Cat</FONT></TT> or <TT><FONT FACE="Courier">Hat</FONT></TT>,

but not <TT><FONT FACE="Courier">cat</FONT></TT>, <TT><FONT FACE="Courier">hat</FONT></TT>,

<TT><FONT FACE="Courier">bat</FONT></TT>, and so on. The characters

in a class are case sensitive. So to allow the lowercase versions,

you would use the pattern:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/[cChH]cat/</FONT></TT>

</BLOCKQUOTE>

<P>

It's cumbersome to list a lot of characters in the <TT><FONT FACE="Courier">[]</FONT></TT>

class, so the dash (<TT><FONT FACE="Courier">-</FONT></TT>) operator

can define a range of characters to use. These two statements

look for a digit:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/[0-9]/<BR>
💿 文件大小 1200 K
👤 上传用户 cz6891297
📂 所属分类其他书籍
🏷️ 相关标签

#Unreleased #Perl
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -