ch01_06.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 518 行
HTM
518 行
<HTML><HEAD><TITLE>Recipe 1.5. Processing a String One Character at a Time (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:28:50Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch01_01.htm"TITLE="1. Strings"><LINKREL="prev"HREF="ch01_05.htm"TITLE="1.4. Converting Between ASCII Characters and Values"><LINKREL="next"HREF="ch01_07.htm"TITLE="1.6. Reversing a String by Word or Character"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_05.htm"TITLE="1.4. Converting Between ASCII Characters and Values"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.4. Converting Between ASCII Characters and Values"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch01_01.htm"TITLE="1. Strings"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_07.htm"TITLE="1.6. Reversing a String by Word or Character"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.6. Reversing a String by Word or Character"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch01-16077">1.5. Processing a String One Character at a Time</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-539">Problem <ACLASS="indexterm"NAME="ch01-idx-1000010209-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010209-1"></A></A></H3><PCLASS="para">You want to process a string one character at a time.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-545">Solution</A></H3><PCLASS="para">Use <KBDCLASS="command">split</KBD><ACLASS="indexterm"NAME="ch01-idx-1000010210-0"></A> with a null pattern to break up the string into individual characters, or use <KBDCLASS="command">unpack</KBD> if you just want their ASCII values:</P><PRECLASS="programlisting">@array = split(//, $string);@array = unpack(&quot;C*&quot;, $string);</PRE><PCLASS="para">Or extract each character in turn with a loop:</P><PRECLASS="programlisting">    while (/(.)/g) { # . is never a newline here        # do something with $1    }</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-549">Discussion</A></H3><PCLASS="para">As we said before, Perl's fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more easily. See, for example, <ACLASS="xref"HREF="ch07_08.htm"TITLE="Writing a Filter">Recipe 7.7</A>, where a set of substitutions is used to find command-line arguments.</P><PCLASS="para">Splitting on a pattern that matches the empty string returns a list of the individual characters in the string. This is a convenient feature when done intentionally, but it's easy to do unintentionally. For instance, <CODECLASS="literal">/X*/</CODE> matches the empty string. Odds are you will find others when you don't mean to.</P><PCLASS="para">Here's an example that prints the characters used in the string &quot;<CODECLASS="literal">an</CODE> <CODECLASS="literal">apple</CODE> <CODECLASS="literal">a</CODE> <CODECLASS="literal">day</CODE>&quot;, sorted in ascending ASCII order:</P><PRECLASS="programlisting">%seen = ();$string = &quot;an apple a day&quot;;foreach $byte (split //, $string) {    $seen{$byte}++;}print &quot;unique chars are: &quot;, sort(keys %seen), &quot;\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>unique chars are:  adelnpy</I></CODE></B></CODE></PRE><PCLASS="para">These <CODECLASS="literal">split</CODE> and <CODECLASS="literal">unpack</CODE> solutions give you an array of characters to work with. If you don't want an array, you can use a pattern match with the <CODECLASS="literal">/g</CODE> flag in a <CODECLASS="literal">while</CODE> loop, extracting one character at a time:</P><PRECLASS="programlisting">%seen = ();$string = &quot;an apple a day&quot;;while ($string =~ /(.)/g) {    $seen{$1}++;}print &quot;unique chars are: &quot;, sort(keys %seen), &quot;\n&quot;;<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>unique chars are:  adelnpy</I></CODE></B></CODE></PRE><PCLASS="para">In general, if you find yourself doing character-by-character processing, there's probably a better way to go about it. Instead of using <CODECLASS="literal">index</CODE> and <CODECLASS="literal">substr</CODE> or <CODECLASS="literal">split</CODE> and <CODECLASS="literal">unpack</CODE>, it might be easier to use a pattern. Instead of computing a 32-bit checksum by hand, as in the next example, the <CODECLASS="literal">unpack</CODE> function can compute it far more efficiently.</P><PCLASS="para">The following example calculates the checksum of <CODECLASS="literal">$string</CODE> with a <CODECLASS="literal">foreach</CODE> loop. There are better checksums; this just happens to be the basis of a traditional and computationally easy checksum. See the MD5 module from CPAN if you want a more sound checksum.</P><PRECLASS="programlisting">$sum = 0;foreach $ascval (unpack(&quot;C*&quot;, $string)) {    $sum += $ascval;}print &quot;sum is $sum\n&quot;;# prints &quot;1248&quot; if $string was &quot;an apple a day&quot;</PRE><PCLASS="para">This does the same thing, but much faster:</P><PRECLASS="programlisting">$sum = unpack(&quot;%32C*&quot;, $string);</PRE><PCLASS="para">This lets us emulate the SysV checksum program:</P><PRECLASS="programlisting">#!/usr/bin/perl# sum - compute 16-bit checksum of all input files$checksum = 0;while (&lt;&gt;) { $checksum += unpack(&quot;%16C*&quot;, $_) }$checksum %= (2 ** 16) - 1;print &quot;$checksum\n&quot;;</PRE><PCLASS="para">Here's an example of its use:</P><PRECLASS="programlisting">% perl sum /etc/termcap<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>1510</I></CODE></B></CODE></PRE><PCLASS="para">If you have the GNU version of <EMCLASS="emphasis">sum</EM>, you'll need to call it with the <BCLASS="emphasis.bold">- -sysv</B> option to get the same answer on the same file.</P><PRECLASS="programlisting">% sum --sysv /etc/termcap<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>1510 851 /etc/termcap</I></CODE></B></CODE></PRE><PCLASS="para">Another tiny program that processes its input one character at a time is <EMCLASS="emphasis">slowcat</EM>, shown in <ACLASS="xref"HREF="ch01_06.htm#ch01-23073"TITLE="slowcat">Example 1.1</A>. The idea here is to pause after each character is printed so you can scroll text before an audience slowly enough that they can read it.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch01-23073">Example 1.1: slowcat</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# <ACLASS="indexterm"NAME="ch01-idx-1000011066-0"></A>slowcat - emulate a   s l o w   line printer# usage: slowcat [-DELAY] [files ...]$DELAY = ($ARGV[0] =~ /^-([.\d]+)/) ? (shift, $1) : 1;$| = 1;while (&lt;&gt;) {    for (split(//)) {        print;        select(undef,undef,undef, 0.005 * $DELAY);    }}</PRE></DIV></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-1000011087">See Also</A></H3><PCLASS="para">The <CODECLASS="literal">split</CODE> and <CODECLASS="literal">unpack</CODE> functions in <EMCLASS="emphasis">perlfunc </EM>(1) and <ACLASS="olink"HREF="../prog/ch03_01.htm">Chapter 3</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; the use of <CODECLASS="literal">select</CODE> for timing is explained in <ACLASS="xref"HREF="ch03_11.htm"TITLE="Short Sleeps">Recipe 3.10</A><ACLASS="indexterm"NAME="ch01-idx-1000010212-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010212-1"></A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_05.htm"TITLE="1.4. Converting Between ASCII Characters and Values"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.4. Converting Between ASCII Characters and Values"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_07.htm"TITLE="1.6. Reversing a String by Word or Character"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.6. Reversing a String by Word or Character"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">1.4. Converting Between ASCII Characters and Values</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">1.6. Reversing a String by Word or Character</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
ch01_06.htm - 源码说明

本页面展示了「By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998」中的 ch01_06.htm 源码文件，采用 HTM 编程语言编写，共 518 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Christiansen相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?