ch06_08.htm
来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 418 行
HTM
418 行
<HTML><HEAD><TITLE>Recipe 6.7. Reading Records with a Pattern Separator (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:34:06Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><LINKREL="prev"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"><LINKREL="next"HREF="ch06_09.htm"TITLE="6.8. Extracting a Range of Lines"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.6. Matching Multiple Lines"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_09.htm"TITLE="6.8. Extracting a Range of Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.8. Extracting a Range of Lines"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch06-chap06_reading_0">6.7. Reading Records with a Pattern Separator</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-865">Problem <ACLASS="indexterm"NAME="ch06-idx-1000007592-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007592-1"></A><ACLASS="indexterm"NAME="ch06-idx-1000007592-2"></A><ACLASS="indexterm"NAME="ch06-idx-1000007592-3"></A><ACLASS="indexterm"NAME="ch06-idx-1000007592-4"></A></A></H3><PCLASS="para">You want to read in records separated by a pattern, but Perl doesn't allow its input record separator variable to be a regular expression.</P><PCLASS="para">Many problems, most obviously those involving the parsing of complex file formats, become a lot simpler when you are easily able to extract records that might be separated by a number of different strings.<ACLASS="indexterm"NAME="ch06-idx-1000007766-0"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-873">Solution</A></H3><PCLASS="para">Read the whole file and use <CODECLASS="literal">split</CODE>: <ACLASS="indexterm"NAME="ch06-idx-1000007598-0"></A></P><PRECLASS="programlisting">undef $/;@chunks = split(/pattern/, <FILEHANDLE>);</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-883">Discussion</A></H3><PCLASS="para">Perl's record separator must be a fixed string, not a pattern. (After all, <EMCLASS="emphasis">awk</EM> has to be better at <EMCLASS="emphasis">something</EM>.) To sidestep this limitation, undefine the input record separator entirely so that the next line-read operation gets the rest of the file. This is sometimes called <ICLASS="firstterm">slurp</I> mode, because it slurps in the whole file as one big string. Then <CODECLASS="literal">split</CODE> that huge string using the record separating pattern as the first argument.</P><PCLASS="para">Here's an example, where the input stream is a text file that includes lines consisting of <CODECLASS="literal">".Se"</CODE>, <CODECLASS="literal">".Ch"</CODE>, and <CODECLASS="literal">".Ss"</CODE>, which are special codes in the <EMCLASS="emphasis">troff</EM> macro set that this book was developed under. These lines are the separators, and we want to find text that falls between them.</P><PRECLASS="programlisting"># .Ch, .Se and .Ss divide chunks of STDIN{ local $/ = undef; @chunks = split(/^\.(Ch|Se|Ss)$/m, <>);}print "I read ", scalar(@chunks), " chunks.\n";</PRE><PCLASS="para">We create a localized version of <CODECLASS="literal">$/</CODE> so its previous value gets restored after the block finishes. By using <CODECLASS="literal">split</CODE> with parentheses in the pattern, captured separators are also returned. This way the data elements in the return list alternate with elements containing <CODECLASS="literal">"Se"</CODE>, <CODECLASS="literal">"Ch"</CODE>, or <CODECLASS="literal">"Ss"</CODE>.</P><PCLASS="para">If you didn't want delimiters returned but still needed parentheses, you could use non-capturing parentheses in the pattern: <CODECLASS="literal">/^\.(?:Ch|Se|Ss)$/m </CODE>.</P><PCLASS="para">If you just want to split <EMCLASS="emphasis">before</EM> a pattern but include the pattern in the return, use a look-ahead assertion: <CODECLASS="literal">/^(?=\.(?:Ch|Se|Ss))/m </CODE>. That way each chunk starts with the pattern.</P><PCLASS="para">Be aware that this uses a lot of memory if the file is large. However, with today's machines and your typical text files, this is less often an issue now than it once was. Just don't try it on a 200-MB logfile unless you have plenty of virtual memory to use to swap out to disk with! Even if you do have enough swap space, you'll likely end up thrashing.<ACLASS="indexterm"NAME="ch06-idx-1000007594-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007594-1"></A><ACLASS="indexterm"NAME="ch06-idx-1000007594-2"></A><ACLASS="indexterm"NAME="ch06-idx-1000007594-3"></A><ACLASS="indexterm"NAME="ch06-idx-1000007594-4"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-911">See Also</A></H3><PCLASS="para">The <CODECLASS="literal">$/</CODE> variable in <ICLASS="filename">perlvar </I>(1) and in the <ACLASS="olink"HREF="../prog/ch02_09.htm">"Special Variables"</A> section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; the <CODECLASS="literal">split</CODE> function in <ICLASS="filename">perlfunc </I>(1) and <ACLASS="olink"HREF="../prog/ch03_01.htm">Chapter 3</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; we talk more about the special variable <CODECLASS="literal">$/</CODE> in <ACLASS="xref"HREF="ch08_01.htm"TITLE="File Contents">Chapter 8</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_07.htm"TITLE="6.6. Matching Multiple Lines"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.6. Matching Multiple Lines"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_09.htm"TITLE="6.8. Extracting a Range of Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.8. Extracting a Range of Lines"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">6.6. Matching Multiple Lines</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.8. Extracting a Range of Lines</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?