📄 ch01_01.htm
字号:
<HTML><HEAD><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:28:25Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="index.htm"TITLE="Perl Cookbook"><LINKREL="prev"HREF="prf2_06.htm"TITLE="Acknowledgments"><LINKREL="next"HREF="ch01_02.htm"TITLE="1.1. Accessing Substrings"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="prf2_06.htm"TITLE="Acknowledgments"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: Acknowledgments"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_02.htm"TITLE="1.1. Accessing Substrings"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.1. Accessing Substrings"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="chapter"><H1CLASS="chapter"><ACLASS="title"NAME="ch01-19248">1. Strings</A></H1><DIVCLASS="htmltoc"><P><B>Contents:</B><BR><ACLASS="sect1"HREF="#ch01-26961"TITLE="1.0. Introduction">Introduction</A><BR><ACLASS="sect1"HREF="ch01_02.htm"TITLE="1.1. Accessing Substrings">Accessing Substrings</A><BR><ACLASS="sect1"HREF="ch01_03.htm"TITLE="1.2. Establishing a Default Value">Establishing a Default Value</A><BR><ACLASS="sect1"HREF="ch01_04.htm"TITLE="1.3. Exchanging Values Without Using Temporary Variables">Exchanging Values Without Using Temporary Variables</A><BR><ACLASS="sect1"HREF="ch01_05.htm"TITLE="1.4. Converting Between ASCII Characters and Values">Converting Between ASCII Characters and Values</A><BR><ACLASS="sect1"HREF="ch01_06.htm"TITLE="1.5. Processing a String One Character at a Time">Processing a String One Character at a Time</A><BR><ACLASS="sect1"HREF="ch01_07.htm"TITLE="1.6. Reversing a String by Word or Character">Reversing a String by Word or Character</A><BR><ACLASS="sect1"HREF="ch01_08.htm"TITLE="1.7. Expanding and Compressing Tabs">Expanding and Compressing Tabs</A><BR><ACLASS="sect1"HREF="ch01_09.htm"TITLE="1.8. Expanding Variables in User Input">Expanding Variables in User Input</A><BR><ACLASS="sect1"HREF="ch01_10.htm"TITLE="1.9. Controlling Case">Controlling Case</A><BR><ACLASS="sect1"HREF="ch01_11.htm"TITLE="1.10. Interpolating Functions and Expressions Within Strings">Interpolating Functions and Expressions Within Strings</A><BR><ACLASS="sect1"HREF="ch01_12.htm"TITLE="1.11. Indenting Here Documents">Indenting Here Documents</A><BR><ACLASS="sect1"HREF="ch01_13.htm"TITLE="1.12. Reformatting Paragraphs">Reformatting Paragraphs</A><BR><ACLASS="sect1"HREF="ch01_14.htm"TITLE="1.13. Escaping Characters">Escaping Characters</A><BR><ACLASS="sect1"HREF="ch01_15.htm"TITLE="1.14. Trimming Blanks from the Ends of a String">Trimming Blanks from the Ends of a String</A><BR><ACLASS="sect1"HREF="ch01_16.htm"TITLE="1.15. Parsing Comma-Separated Data">Parsing Comma-Separated Data</A><BR><ACLASS="sect1"HREF="ch01_17.htm"TITLE="1.16. Soundex Matching">Soundex Matching</A><BR><ACLASS="sect1"HREF="ch01_18.htm"TITLE="1.17. Program: fixstyle">Program: fixstyle</A><BR><ACLASS="sect1"HREF="ch01_19.htm"TITLE="1.18. Program: psgrep">Program: psgrep</A></P><P></P></DIV><DIVCLASS="epigraph"ALIGN="right"><PCLASS="para"ALIGN="right"><I>He multiplieth words without knowledge.</I></P><PCLASS="attribution"ALIGN="right">- Job 35:16 </P></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch01-26961">1.0. Introduction</A></H2><PCLASS="para"><ACLASS="indexterm"NAME="ch01-idx-1000010110-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010110-1"></A>Many programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don't despair, though - Perl isn't a low-level language; lines and strings are easy to handle.</P><PCLASS="para">Perl was <EMCLASS="emphasis">designed</EM> for text manipulation. In fact, Perl can manipulate text in so many ways that they can't all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see <ACLASS="xref"HREF="ch06_01.htm"TITLE="Pattern Matching">Chapter 6, <CITECLASS="chapter">Pattern Matching</CITE></A>, and <ACLASS="xref"HREF="ch08_01.htm"TITLE="File Contents">Chapter 8, <CITECLASS="chapter">File Contents</CITE></A>, which discuss interesting techniques not covered here.</P><PCLASS="para">Perl's fundamental unit for working with data is the <ACLASS="indexterm"NAME="ch01-idx-1000010112-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010112-1"></A>scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to other values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine's double-precision floating-point notation. Strings in Perl may be of any length (within the limits of your machine's virtual memory) and contain any data you care to put there - even binary data containing null bytes.</P><PCLASS="para">A string is not an array of bytes: You cannot use array subscripting on a string to address one of its characters; use <CODECLASS="literal">substr</CODE> for that. Like all data types in Perl, strings grow and shrink on demand. They get reclaimed by Perl's garbage collection system when they're no longer used, typically when the variables holding them go out of scope or when the expression they were used in has been evaluated. In other words, memory management is already taken care of for you, so you don't have to worry about it.</P><PCLASS="para"><ACLASS="indexterm"NAME="ch01-idx-1000010113-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010113-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010113-2"></A>A scalar value is either defined or undefined. If defined, it may hold a string, number, or reference. The only undefined value is <CODECLASS="literal">undef</CODE>. All other values are defined, even 0 and the empty string. Definedness is not the same as Boolean <ACLASS="indexterm"NAME="ch01-idx-1000010679-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010679-1"></A>truth, though; to check whether a value is defined, use the <CODECLASS="literal">defined</CODE> function. Boolean truth has a specialized meaning, tested with operators like <CODECLASS="literal">&&</CODE> and <CODECLASS="literal">||</CODE> or in an <CODECLASS="literal">if</CODE> or <CODECLASS="literal">while</CODE> block's test condition.</P><PCLASS="para">Two defined strings are <ACLASS="indexterm"NAME="ch01-idx-1000010114-0"></A>false: the <ACLASS="indexterm"NAME="ch01-idx-1000010115-0"></A>empty string ("") and a string of length one containing the digit zero ("<CODECLASS="literal">0</CODE>"). This second one may surprise you, but Perl does this because of its on-demand conversion between strings and numbers. The numbers <CODECLASS="literal">0.</CODE>, <CODECLASS="literal">0.00</CODE>, and <CODECLASS="literal">0.0000000</CODE> are all false when unquoted but are not false in strings (the string "<CODECLASS="literal">0.00</CODE>" is true, not false). All other defined values (e.g., "<CODECLASS="literal">false</CODE>", <CODECLASS="literal">15</CODE>, and <CODECLASS="literal">\$x </CODE>) are true.</P><PCLASS="para">The <CODECLASS="literal">undef</CODE> value behaves like the empty string ("") when used as a string, <CODECLASS="literal">0</CODE> when used as a number, and the null reference when used as a reference. But in all these cases, it's false. Using an undefined value where Perl expects a defined value will trigger a run-time warning message on STDERR if you've used the <BCLASS="emphasis.bold">-w</B> flag. Merely asking whether something is true or false does not demand a particular value, so this is exempt from a warning. Some operations do not trigger warnings when used on variables holding undefined values. These include the autoincrement and autodecrement operators, <CODECLASS="literal">++</CODE> and <CODECLASS="literal">--</CODE>, and the addition and catenation assignment operators, <CODECLASS="literal">+=</CODE> and <CODECLASS="literal">.=</CODE> .</P><PCLASS="para"><ACLASS="indexterm"NAME="ch01-idx-1000010116-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010116-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010116-2"></A><ACLASS="indexterm"NAME="ch01-idx-1000010116-3"></A>Specify strings in your program either with single quotes, double quotes, the quote-like operators <CODECLASS="literal">q//</CODE> and <CODECLASS="literal">qq//</CODE>, or "here documents." Single quotes are the simplest form of quoting - the only special characters are <CODECLASS="literal">'</CODE> to terminate the string, <CODECLASS="literal">\'</CODE> to quote a single quote in the string, and <CODECLASS="literal">\\</CODE> to quote a backslash in the string:</P><PRECLASS="programlisting">$string = '\n'; # two characters, \ and an n$string = 'Jon \'Maddog\' Orwant'; # literal single quotes</PRE><PCLASS="para">Double quotes interpolate variables (but not function calls - see <ACLASS="xref"HREF="ch01_11.htm"TITLE="Interpolating Functions and Expressions Within Strings">Recipe 1.10</A> to find how to do this) and expand a lot of backslashed shortcuts: "<CODECLASS="literal">\n</CODE>" becomes a newline, "<CODECLASS="literal">\033</CODE>" becomes the character with octal value 33, "<CODECLASS="literal">\cJ</CODE>" becomes a Ctrl-J, and so on. The full list of these is given in the <EMCLASS="emphasis">perlop </EM>(1) manpage.</P><PRECLASS="programlisting">$string = "\n"; # a "newline" character$string = "Jon \"Maddog\" Orwant"; # literal double quotes</PRE><PCLASS="para"><ACLASS="indexterm"NAME="ch01-idx-1000010118-0"></A>The <CODECLASS="literal">q//</CODE> and <CODECLASS="literal">qq//</CODE> regexp-like quoting operators let you use alternate delimiters for single- and double-quoted strings. For instance, if you want a literal string that contains single quotes, it's easier to write this than to escape the single quotes with backslashes:</P><PRECLASS="programlisting">$string = q/Jon 'Maddog' Orwant/; # literal single quotes</PRE><PCLASS="para">You can use the same character as delimiter, as we do with / here, or you can balance the delimiters if you use parentheses or paren-like characters:</P><PRECLASS="programlisting">$string = q[Jon 'Maddog' Orwant]; # literal single quotes$string = q{Jon 'Maddog' Orwant}; # literal single quotes$string = q(Jon 'Maddog' Orwant); # literal single quotes$string = q<Jon 'Maddog' Orwant>; # literal single quotes</PRE><PCLASS="para"><ACLASS="indexterm"NAME="ch01-idx-1000010130-0"></A>"Here documents" are borrowed from the shell. They are a way to quote a large chunk of text. The text can be interpreted as single-quoted, double-quoted, or even as commands to be executed, depending on how you quote the terminating identifier. Here we double-quote two lines with a here document:</P><PRECLASS="programlisting">$a = <<"EOF";This is a multiline here documentterminated by EOF on a line by itselfEOF</PRE><PCLASS="para">Note there's no semicolon after the terminating <CODECLASS="literal">EOF</CODE>. Here documents are covered in more detail in <ACLASS="xref"HREF="ch01_12.htm"TITLE="Indenting Here Documents">Recipe 1.11</A>.</P><PCLASS="para">A warning for non-Western programmers: Perl doesn't currently directly support multibyte characters (expect <ACLASS="indexterm"NAME="ch01-idx-1000010687-0"></A>Unicode support in 5.006), so we'll be using the terms <EMCLASS="emphasis">byte</EM> and <EMCLASS="emphasis">character</EM> interchangeably.</P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="prf2_06.htm"TITLE="Acknowledgments"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: Acknowledgments"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_02.htm"TITLE="1.1. Accessing Substrings"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.1. Accessing Substrings"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">Acknowledgments</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">1.1. Accessing Substrings</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -