📄 ch2.htm
字号:
<HTML>
<HEAD>
<TITLE>Chapter 2 -- Principles of General Text Processing骉he Backbone of Perl
</TITLE>
<META>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<H1><FONT SIZE=6 COLOR=#FF0000>Chapter 2</FONT></H1>
<H1><FONT SIZE=6 COLOR=#FF0000>Principles of General Text Processing-<br>The
Backbone of Perl</FONT></H1>
<HR>
<P>
<CENTER><B><FONT SIZE=5><A NAME="CONTENTS">CONTENTS</A></FONT></B></CENTER>
<UL>
<LI><A HREF="#ScalarData">
Scalar Data</A>
<UL>
<LI><A HREF="#FloatsIntegersandLiterals">
Floats, Integers, and Literals</A>
<LI><A HREF="#Integers">
Integers</A>
<LI><A HREF="#CharacterStrings">
Character Strings</A>
<LI><A HREF="#Operators">
Operators</A>
</UL>
<LI><A HREF="#ScalarVariables">
Scalar Variables</A>
<UL>
<LI><A HREF="#TheChopOperator">
The Chop Operator</A>
<LI><A HREF="#InterpolationofScalarsintoStrings">
Interpolation of Scalars into Strings</A>
<LI><A HREF="#StandardInputltSTDINgt">
Standard Input <STDIN></A>
<LI><A HREF="#ThePrintOperator">
The Print Operator</A>
<LI><A HREF="#TheValueundef">
The Value undef</A>
</UL>
<LI><A HREF="#ArraysDefined">
Arrays Defined</A>
<UL>
<LI><A HREF="#ArrayVariables">
Array Variables</A>
<LI><A HREF="#ArrayOperators">
Array Operators</A>
<LI><A HREF="#ArraysandltSTDINgt">
Arrays and <STDIN></A>
<LI><A HREF="#ArrayInterpolation">
Array Interpolation</A>
</UL>
</UL>
<HR>
<P>
In the first chapter of this book there was a brief mention of
what text was, how it is the primary building block for CGI communication,
and how Perl is very good at dealing with text. Text, to reiterate,
is data in the form of characters, integers, and non-alphanumeric
characters that you use in creating text files, HTML files, and
Perl scripts.
<P>
Getting into Perl means getting into text manipulation, which
is what you're going to do in this chapter. You are also going
to explore basic programming concepts as they apply to Perl and
its building blocks, also called data structures.
<P>
The areas of text processing and programming you should understand
from this chapter are scalar data, arrays and list data, control
structures, associative arrays, regular expressions, functions,
filehandles and file tests, and formats. All of these will be
covered in this chapter as they apply to Perl.
<H2><A NAME="ScalarData"><FONT SIZE=5 COLOR=#FF0000>
Scalar Data</FONT></A></H2>
<P>
The term <I>scalar</I> in Perl is applied to either a number,
like 12 or 4.3213e32, or a string of characters, like the words
"Hey, now!," or the play <I>Hamlet</I>. Perl makes no
distinction between numbers and character strings, treating them
the same. Any collection of these numbers or characters is collectively
called<I> scalar data.</I>
<P>
Scalar variables are used, or manipulated, with operators. This
manipulation may produce another scalar value. Scalar values are
stored in scalar variables. You can have scalar values read from
files or written to them.
<P>
While numbers and strings are treated the same by Perl, there
are some fine details that you should be aware of, if only for
the fact that knowing fine details is something that separates
the programmers from the hackers.
<H3><A NAME="FloatsIntegersandLiterals">
Floats, Integers, and Literals</A></H3>
<P>
When dealing with numbers, some are written as their value is,
like 4, and some are written using short forms, like 2.5 (which
is two and a half) or -3.453e32 (which is negative three point
four five three times ten to the power of thirty-two). There are
obvious reasons why short forms are used for some numbers. In
Perl the numbers that are written as their value are called integers.
Those that are representations of one kind or another are called
floats.
<P>
Perl treats both integers or floats as literals. A <I>literal</I>
is the way a value is designated in the actual coding of a program.
This is the data that is fed to the Perl compiler. Perl will accept
the following kinds of number types (whole, fractions, negatives,
and exponents) as floats:
<UL>
<LI>2.5-Two and a half
<LI>5.321e7-5.321 times 10 to the power of 7
<LI>-8.34e8-Negative 8.34 times 10 to the power of 8
<LI>-4.76e-13-Negative 4.76 times 10 to the power of negative
13
</UL>
<P>
When using this notation you can substitute an uppercase "E"
for the lowercase "e" without changing the value of
the number.
<H3><A NAME="Integers">
Integers</A></H3>
<P>
Integers use the familiar notation:
<BLOCKQUOTE>
<PRE>
18
-32
1ØØØØ32458
</PRE>
</BLOCKQUOTE>
<P>
but you don't use the number 0 at the beginning of an integer
literal because Perl can handle hexadecimal numbers, as well as
octal numbers, both of which use zeros at the beginning of their
notations.
<H3><A NAME="CharacterStrings">
Character Strings</A></H3>
<P>
The characters used to make up character strings, or just strings,
each have an 8-bit value. There is a 256 character set that is
recognized by Perl.
<P>
A string can range in size from having no characters to one so
long it would be longer than you need it to be. This reflects
one of the premises of Perl, and that is to have "no built-in
limits" in its various abilities whenever possible.
<P>
This ability to process strings, regardless of the characters
that make it up, is what makes Perl adept at CGI programming.
<P>
Perl also treats characters as literal notations. There are two
kinds of literal strings: single- and double-quoted (see Figure
2.1).
<P>
<A HREF="f2-1.gif" tppabs="http://210.32.137.15/ebook/PC%20Magazine%20Programming%20Perl%205.0%20CGI%20Web%20Pages%20for%20Microsoft%20Windows%20NT/f2-1.gif"><B>Figure 2.1 :</B> <I>Examples of single- and double-quoted strings</I>.</A>
<H4>Single-Quoted Strings</H4>
<P>
If a string is contained by a single pair of quotes, like 'Hey,
now!', it is called a single-quoted string. These quote marks
are not part of the string, they merely indicate to Perl where
the string starts and ends. If you want to put a single quote
inside a string (and not have it treated as the delimiter for
your string), precede it with a backslash, since the backslash
is used to denote special characters. If you want to put a backslash
into your string, precede the backslash with a backslash, as well.
These are the only two instances of special meaning using a backslash
inside a single-quoted string.
<H4>Double-Quoted Strings</H4>
<P>
When a string is enclosed by a double pair of quotes, like "Hey,
now!", it is a double-quoted string. With double-quoted strings
the backslash has much more "umph." Inside the double
quotes, a backslash can be used to indicate some control characters
or octal and even hexadecimal representations of special characters.
Examples of such might be:
<UL>
<LI>"Hey, now!\n"-where the string Hey, now! is followed
by a newline command
<LI>"No flipping! \177"-where the string No flipping!
is followed by octal 177, the delete character
<LI>"Live \tto tape"-where the string Live to tape is
spliced by a tab, outputting:<BR>
Live to tape
</UL>
<P>
The backslash can place many powerful commands inside a string.
These are called backslash escapes. Table 2.1 outlines them.<BR>
<P>
<CENTER><B>Table 2.1 Double-Quoted String Backslash Escapes</B></CENTER>
<P>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=60% CELLPADDING=3>
<TR VALIGN=TOP><TD><CENTER><B>Backslash Escape</B></CENTER></TD>
<TD><B>Command Function</B></TD></TR>
<TR VALIGN=TOP><TD><CENTER>\n</CENTER></TD><TD>Newline
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\r</CENTER></TD><TD>Return
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\t</CENTER></TD><TD>Tab</TD>
</TR>
<TR VALIGN=TOP><TD><CENTER>\f</CENTER></TD><TD>Formfeed
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\b</CENTER></TD><TD>Backspace
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\v</CENTER></TD><TD>Vertical tab
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\a</CENTER></TD><TD>Bell</TD>
</TR>
<TR VALIGN=TOP><TD><CENTER>\e</CENTER></TD><TD>Escape
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\177</CENTER></TD><TD>Any octal ASCII value, like 177-delete character
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\x7f </CENTER></TD><TD>Any hex ASCII value, like x7f-delete character
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\cC</CENTER></TD><TD>Any control character, like control C
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\\ </CENTER></TD><TD>Backslash
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\"</CENTER></TD><TD>Double-quote
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\l</CENTER></TD><TD>Make the next letter lowercase
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\L</CENTER></TD><TD>Make all the next letters lowercase until \E
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\u</CENTER></TD><TD>Make the next letter uppercase
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\U</CENTER></TD><TD>Make all the next letters uppercase until \E
</TD></TR>
<TR VALIGN=TOP><TD><CENTER>\E </CENTER></TD><TD>Terminate \L or \U
</TD></TR>
</TABLE></CENTER>
<P>
<P>
One other facet of double-quoted strings is that they are variable
interpolated. This means that a variable inside the string can
have its current value replaced once the string is read.
<P>
Relating this back to what we know about CGI, we can write a script
that demonstrates the difference between single- and double-quoted
strings and puts it up as an HTML page to our browser. In Listing
2.1 we will also encounter some Perl commands, which are explained
in a comment line that starts with the "#" symbol.
<HR>
<BLOCKQUOTE>
<B>Listing 2.1 Perl Command Script<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<PRE>
#! usr/bin/perop
# quotes_examples.pl
print "Content-type: text/html\n \n"; # print is a command that outputs
# data. The string being printed is a
# common header used in CGI for
#returning HTML documents
$date='date';
# date is a system command and $date # is a scalar variable
chop ($date);
# chop is an operator
print <<"eop"; #the end of perl tag using double quotes
<HTML>
<HEAD>
<TITLE>Examples of single and double quoted strings</TITLE>
</HEAD>
<BODY>
<H2>Examples of single and double quoted strings</H2>
<P>
Hey, now!
<BR>
Today the date is $date.
<HR NOSHADE>
eop
print <<'eop'; # the eop tag using
# single quotes
<H2>Examples of single and double quoted strings</H2>
<P>
Hey, now!
<BR>
Today the date is $date.
<HR NOSHADE>
</BODY>
</HTML>
eop
</PRE>
</BLOCKQUOTE>
<HR>
<P>
Right away you will see that using double quotes on the eop string
has a much different effect than the single quotes. The scalar
variable $date is set by the system command with single quotes.
This directs Perl to execute the system command within the single
quotes. The "=" symbol is an assignment statement that
tells Perl to assign the output of the system command to the scalar
variable $date.
<P>
The Perl operator chop is used to remove the last character from
the argument within its parentheses. In quotes_examples.pl chop
takes off the last character from the scalar variable $date. Don't
ask why, but there are very handy uses for chop listed in the
next section.
<P>
The Perl operator print is used to output the signified scalar
variable, in this case eop, into standard output. When the print
operator is used in Perl it should really have a set of parentheses
around the variable it is assigning to standard output, as with
our example.
<P>
When you run the script from a browser you get:
<BLOCKQUOTE>
<PRE>
print <<("eop");
</PRE>
</BLOCKQUOTE>
<P>
which is better syntax than:
<BLOCKQUOTE>
<PRE>
print <<"eop";
</PRE>
</BLOCKQUOTE>
<P>
but in almost all cases leaving off the parentheses will not affect
your script. The parentheses help get rid of any ambiguity that
may exist in a larger Perl program. Keep this in mind if you are
having trouble with your larger scripts.
<P>
In the first print statement double-quotes are used. This tells
Perl to decipher any variables that occur in its print string
between the eop tags. This makes Perl put the value of the current
date in the system command "date" in the variable $date.
<P>
When the single quotes are used around the variable eop, it tells
Perl to ignore all variables inside the print string. This makes
the $date variable part of the HMTL document text, and so it is
presented on the page with the other text. Amazing what one little
pair of quotes can do to you if you're not careful, eh?
<P>
Both chop and print are Perl operators. There are more commands
like this in Perl that will help you get things done in your scripts.
<H3><A NAME="Operators">
Operators</A></H3>
<P>
An operator in Perl makes a new value, called a <I>result</I>,
from one or more operands, or other values. An example of this
might be the plus sign used in simple addition. The operator "+"
can take two values, like "1" and "2," and
make the result "3," as in 1 + 2 = 3. Operators work
on both numbers and character strings in conjunction with the
suitable operands.
<P>
If you accidentally use a number operand with a string, Perl will
convert it based on the operand, not the number or string value.
If you put a "+" operand between "Beverly Hills
90210" and "Oceans 11" you'll end up with the numeric
result 90221. White space and nonnumeric characters are given
the value of 0 by the operand, and then ignored.
<P>
If you, with equal abandon, put a string operand between two numbers,
you'll get a number that has been expanded into whatever its string
equivalent might be. An example would be putting the string concatenate
(a pretty fancy word that means putting two strings together)
"." between "The Dirty" and (2*6) like this:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -