ch01_16.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 466 行
HTM
466 行
<HTML><HEAD><TITLE>Recipe 1.15. Parsing Comma-Separated Data (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen &amp; Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:29:20Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch01_01.htm"TITLE="1. Strings"><LINKREL="prev"HREF="ch01_15.htm"TITLE="1.14. Trimming Blanks from the Ends of a String"><LINKREL="next"HREF="ch01_17.htm"TITLE="1.16. Soundex Matching"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_15.htm"TITLE="1.14. Trimming Blanks from the Ends of a String"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.14. Trimming Blanks from the Ends of a String"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch01_01.htm"TITLE="1. Strings"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_17.htm"TITLE="1.16. Soundex Matching"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.16. Soundex Matching"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch01-chap01_parsing_0">1.15. Parsing Comma-Separated Data</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-1657">Problem <ACLASS="indexterm"NAME="ch01-idx-1000010335-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010335-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010335-2"></A><ACLASS="indexterm"NAME="ch01-idx-1000010335-3"></A><ACLASS="indexterm"NAME="ch01-idx-1000010335-4"></A></A></H3><PCLASS="para">You have a data file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them. Most spreadsheets and database programs use comma-separated values as a common interchange format.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-1663">Solution</A></H3><PCLASS="para">Use the procedure in <EMCLASS="emphasis">Mastering Regular Expressions</EM>.</P><PRECLASS="programlisting">sub parse_csv {    my $text = shift;      # record containing comma-separated values    my @new  = ();    push(@new, $+) while $text =~ m{        # the first part groups the phrase inside the quotes.        # see explanation of this pattern in MRE        &quot;([^\&quot;\\]*(?:\\.[^\&quot;\\]*)*)&quot;,?           |  ([^,]+),?           | ,       }gx;       push(@new, undef) if substr($text, -1,1) eq ',';       return @new;      # list of values that were comma-separated}  </PRE><PCLASS="para">Or use the standard Text::ParseWords module.</P><PRECLASS="programlisting">use <ACLASS="indexterm"NAME="ch01-idx-1000011467-0"></A>Text::ParseWords;sub parse_csv {    return quoteword(&quot;,&quot;,0, $_[0]);}</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-1669">Discussion</A></H3><PCLASS="para">Comma-separated input is a deceptive and complex format. It sounds simple, but involves a fairly complex escaping system because the fields themselves can contain commas. This makes the pattern matching solution complex and rules out a simple <CODECLASS="literal">split</CODE> <CODECLASS="literal">/,/</CODE>.</P><PCLASS="para">Fortunately, Text::ParseWords hides the complexity from you. Pass its <CODECLASS="literal">quotewords</CODE><ACLASS="indexterm"NAME="ch01-idx-1000010342-0"></A> function two arguments and the CSV string. The first argument is the separator (a comma, in this case) and the second is a true or false value controlling whether the strings are returned with quotes around them.</P><PCLASS="para">If you want to represent quotation marks inside a field delimited by quotation marks, escape them with backslashes &quot;<CODECLASS="literal">like</CODE> <CODECLASS="literal">\&quot;this\&quot;</CODE>&quot;. Quotation marks and backslashes are the only characters that have meaning backslashed. Any other use of a backslash will be left in the output string.</P><PCLASS="para">Here's how you'd use the <CODECLASS="literal">parse_csv</CODE><ACLASS="indexterm"NAME="ch01-idx-1000010343-0"></A> subroutines. The <CODECLASS="literal">q&lt;&gt;</CODE> is just a fancy quote so we didn't have to backslash everything.</P><PRECLASS="programlisting">$line = q&lt;XYZZY,&quot;&quot;,&quot;O'Reilly, Inc&quot;,&quot;Wall, Larry&quot;,&quot;a \&quot;glug\&quot; bit,&quot;,5,    &quot;Error, Core Dumped&quot;&gt;;@fields = parse_csv($line);for ($i = 0; $i &lt; @fields; $i++) {    print &quot;$i : $fields[$i]\n&quot;;}<CODECLASS="userinput"><B><CODECLASS="replaceable"><I>0 : XYZZY</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>1 :</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>2 : O'Reilly, Inc</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>3 : Wall, Larry</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>4 : a \&quot;glug\&quot; bit,</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>5 : 5</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>6 : Error, Core Dumped</I></CODE></B></CODE></PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch01-pgfId-1705">See Also</A></H3><PCLASS="para">The explanation of regular expression syntax in <EMCLASS="emphasis">perlre </EM>(1) and <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; the documentation for the standard Text::ParseWords module (also in <ACLASS="olink"HREF="../prog/ch07_01.htm">Chapter 7</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>); the section "An Introductory Example: Parsing CSV Text" in Chapter 7 of <EMCLASS="emphasis">Mastering Regular Expressions</EM> <ACLASS="indexterm"NAME="ch01-idx-1000010338-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010338-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010338-2"></A><ACLASS="indexterm"NAME="ch01-idx-1000010338-3"></A><ACLASS="indexterm"NAME="ch01-idx-1000010338-4"></A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_15.htm"TITLE="1.14. Trimming Blanks from the Ends of a String"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.14. Trimming Blanks from the Ends of a String"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_17.htm"TITLE="1.16. Soundex Matching"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 1.16. Soundex Matching"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">1.14. Trimming Blanks from the Ends of a String</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">1.16. Soundex Matching</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
ch01_16.htm - 源码说明

本页面展示了「By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998」中的 ch01_16.htm 源码文件，采用 HTM 编程语言编写，共 466 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Christiansen相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?