📄 ch01_02.htm

📁 By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998
💻 HTM
字号:
<HTML><HEAD><TITLE>Recipe 1.1. Accessing Substrings (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:28:42Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch01_01.htm"TITLE="1. Strings"><LINKREL="prev"HREF="ch01_01.htm"TITLE="1.0. Introduction"><LINKREL="next"HREF="ch01_03.htm"TITLE="1.2. Establishing a Default Value"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_01.htm"TITLE="1.0. Introduction"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.0. Introduction"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch01_01.htm"TITLE="1. Strings"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_03.htm"TITLE="1.2. Establishing a Default Value"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.2. Establishing a Default Value"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch01-11736">1.1. Accessing Substrings</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-99">Problem<ACLASS="indexterm"NAME="ch01-idx-1000010134-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010134-1"></A></A></H3><PCLASS="para">You want to access or modify just a portion of a string, not the whole thing. For instance, you've read a fixed-width record and want to extract the individual fields.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-105">Solution</A></H3><PCLASS="para">The <CODECLASS="literal">substr</CODE><ACLASS="indexterm"NAME="ch01-idx-1000010135-0"></A> function lets you read from and write to bits of the string.</P><PRECLASS="programlisting">$value = substr($string, $offset, $count);$value = substr($string, $offset);    substr($string, $offset, $count) = $newstring;substr($string, $offset)         = $newtail;</PRE><PCLASS="para">The <CODECLASS="literal">unpack</CODE><ACLASS="indexterm"NAME="ch01-idx-1000010136-0"></A> function gives only read access, but is faster when you have many substrings to extract.</P><PRECLASS="programlisting"># get a 5-byte string, skip 3, then grab 2 8-byte strings, then the rest($leading, $s1, $s2, $trailing) =    unpack(&quot;A5 x3 A8 A8 A*&quot;, $data);# split at five byte boundaries@fivers = unpack(&quot;A5&quot; x (length($string)/5), $string);# chop string into individual characters@chars  = unpack(&quot;A1&quot; x length($string), $string);</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-141">Discussion</A></H3><PCLASS="para">Unlike many other languages that represent strings as arrays of bytes (or characters), in Perl, strings are a basic data type. This means that you must use functions like <CODECLASS="literal">unpack</CODE> or <CODECLASS="literal">substr</CODE> to access individual characters or a portion of the string.</P><PCLASS="para">The offset argument to <CODECLASS="literal">substr</CODE> indicates the start of the substring you're interested in, counting from the front if positive and from the end if negative. If offset is 0, the substring starts at the beginning. The count argument is the length of the substring.</P><PRECLASS="programlisting">$string = &quot;This is what you have&quot;;#         +012345678901234567890  Indexing forwards  (left to right)#          109876543210987654321- Indexing backwards (right to left)#           note that 0 means 10 or 20, etc. above$first  = substr($string, 0, 1);  # &quot;T&quot;$start  = substr($string, 5, 2);  # &quot;is&quot;$rest   = substr($string, 13);    # &quot;you have&quot;$last   = substr($string, -1);    # &quot;e&quot;$end    = substr($string, -4);    # &quot;have&quot;$piece  = substr($string, -8, 3); # &quot;you&quot;</PRE><PCLASS="para">You can do more than just look at parts of the string with <CODECLASS="literal">substr</CODE>; you can actually change them. That's because <CODECLASS="literal">substr</CODE> is a particularly odd kind of function &nbsp;- an <EMCLASS="emphasis">lvaluable</EM><ACLASS="indexterm"NAME="ch01-idx-1000010151-0"></A> one, that is, a function that may itself be assigned a value. (For the record, the others are <CODECLASS="literal">vec</CODE>, <CODECLASS="literal">pos</CODE>, and as of the 5.004 release, <CODECLASS="literal">keys</CODE>. If you squint, <CODECLASS="literal">local</CODE> and <CODECLASS="literal">my</CODE> can also be viewed as lvaluable functions.)</P><PRECLASS="programlisting">$string = &quot;This is what you have&quot;;print $string;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>This is what you have</I></CODE></B></CODE>substr($string, 5, 2) = &quot;wasn't&quot;; # change &quot;is&quot; to &quot;wasn't&quot;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>This wasn't what you have</I></CODE></B></CODE>substr($string, -12)  = &quot;ondrous&quot;;# &quot;This wasn't wondrous&quot;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>This wasn't wondrous</I></CODE></B></CODE>substr($string, 0, 1) = &quot;&quot;;       # delete first character<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>his wasn't wondrous</I></CODE></B></CODE>substr($string, -10)  = &quot;&quot;;       # delete last 10 characters<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>his wasn'</I></CODE></B></CODE></PRE><PCLASS="para">You can use the <CODECLASS="literal">=~</CODE><ACLASS="indexterm"NAME="ch01-idx-1000010150-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010150-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010150-2"></A><ACLASS="indexterm"NAME="ch01-idx-1000010150-3"></A> operator and the <CODECLASS="literal">s///</CODE>, <CODECLASS="literal">m//</CODE>, or <CODECLASS="literal">tr///</CODE> operators in conjunction with substr to make them affect only that portion of the string.</P><PRECLASS="programlisting"># you can test substrings with =~if (substr($string, -10) =~ /pattern/) {    print &quot;Pattern matches in last 10 characters\n&quot;;}# substitute &quot;at&quot; for &quot;is&quot;, restricted to first five characterssubstr($string, 0, 5) =~ s/is/at/g;</PRE><PCLASS="para">You can even swap values by using several <CODECLASS="literal">substr</CODE>s on each side of an assignment:</P><PRECLASS="programlisting"># exchange the first and last letters in a string$a = &quot;make a hat&quot;;(substr($a,0,1), substr($a,-1)) = (substr($a,-1), substr($a,0,1));print $a;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>take a ham</I></CODE></B></CODE></PRE><PCLASS="para">Although <CODECLASS="literal">unpack</CODE> is not lvaluable, it is considerably faster than <CODECLASS="literal">substr</CODE> when you extract numerous values at once. It doesn't directly support offsets as <CODECLASS="literal">substr</CODE> does. Instead, it uses lowercase &quot;<CODECLASS="literal">x</CODE>&quot; with a count to skip forward some number of bytes and an uppercase &quot;<CODECLASS="literal">X</CODE>&quot; with a count to skip backward some number of bytes.</P><PRECLASS="programlisting"># extract column with unpack$a = &quot;To be or not to be&quot;;$b = unpack(&quot;x6 A6&quot;, $a);  # skip 6, grab 6print $b;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>or not</I></CODE></B></CODE>($b, $c) = unpack(&quot;x6 A2 X5 A2&quot;, $a); # forward 6, grab 2; backward 5, grab 2print &quot;$b\n$c\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>or</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>be</I></CODE></B></CODE></PRE><PCLASS="para">Sometimes you prefer to think of your data as being cut up at specific <ACLASS="indexterm"NAME="ch01-idx-1000010152-0"></A>columns. For example, you might want to place cuts right before positions 8, 14, 20, 26, and 30. Those are the column numbers where each field begins. Although you could calculate that the proper <CODECLASS="literal">unpack</CODE> format is &quot;<CODECLASS="literal">A7</CODE> <CODECLASS="literal">A6</CODE> <CODECLASS="literal">A6</CODE> <CODECLASS="literal">A6</CODE> <CODECLASS="literal">A4</CODE> <CODECLASS="literal">A*</CODE>&quot;, this is too much mental strain for the virtuously lazy Perl programmer. Let Perl figure it out for you. Use the <CODECLASS="literal">cut2fmt</CODE> function below:</P><PRECLASS="programlisting">sub cut2fmt {    my(@positions) = @_;    my $template   = '';    my $lastpos    = 1;    foreach $place (@positions) {        $template .= &quot;A&quot; . ($place - $lastpos) . &quot; &quot;;        $lastpos   = $place;    }    $template .= &quot;A*&quot;;    return $template;}$fmt = cut2fmt(8, 14, 20, 26, 30);print &quot;$fmt\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>A7 A6 A6 A6 A4 A*</I></CODE></B></CODE></PRE><PCLASS="para">The powerful <CODECLASS="literal">unpack</CODE> function goes far beyond mere text processing. It's the gateway between text and binary data.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-279">See Also</A></H3><PCLASS="para">The <CODECLASS="literal">unpack</CODE> and <CODECLASS="literal">substr</CODE> functions in <EMCLASS="emphasis">perlfunc </EM>(1) and <ACLASS="olink"HREF="../prog/ch03_01.htm">Chapter 3</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; the <EMCLASS="emphasis">cut2fmt</EM> subroutine of <ACLASS="xref"HREF="ch01_19.htm"TITLE="Program: psgrep">Recipe 1.18</A>; the binary use of <CODECLASS="literal">unpack</CODE> in <ACLASS="xref"HREF="ch08_19.htm"TITLE="Program: tailwtmp">Recipe 8.18</A> <ACLASS="indexterm"NAME="ch01-idx-1000010145-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010145-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010145-2"></A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_01.htm"TITLE="1.0. Introduction"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.0. Introduction"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_03.htm"TITLE="1.2. Establishing a Default Value"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.2. Establishing a Default Value"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">1.0. Introduction</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">1.2. Establishing a Default Value</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -