📄 ch07_08.htm
字号:
<HTML><HEAD><TITLE>Recipe 7.7. Writing a Filter (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:36:12Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch07_01.htm"TITLE="7. File Access"><LINKREL="prev"HREF="ch07_07.htm"TITLE="7.6. Storing Files Inside Your Program Text"><LINKREL="next"HREF="ch07_09.htm"TITLE="7.8. Modifying a File in Place with Temporary File"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch07_07.htm"TITLE="7.6. Storing Files Inside Your Program Text"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 7.6. Storing Files Inside Your Program Text"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch07_01.htm"TITLE="7. File Access"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch07_09.htm"TITLE="7.8. Modifying a File in Place with Temporary File"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 7.8. Modifying a File in Place with Temporary File"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch07-39704">7.7. Writing a Filter</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch07-pgfId-726">Problem</A></H3><PCLASS="para"><ACLASS="indexterm"NAME="ch07-idx-1000009627-0"></A><ACLASS="indexterm"NAME="ch07-idx-1000009627-1"></A><ACLASS="indexterm"NAME="ch07-idx-1000009627-2"></A>You want to write a program that takes a list of filenames on the command line and reads from STDIN if no filenames were given. You'd like the user to be able to give the file <CODECLASS="literal">"-"</CODE> to indicate STDIN or <CODECLASS="literal">"someprogram</CODE> <CODECLASS="literal">|"</CODE> to indicate the output of another program. You might want your program to modify the files in place or to produce output based on its input.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch07-pgfId-732">Solution</A></H3><PCLASS="para">Read lines with <>: <ACLASS="indexterm"NAME="ch07-idx-1000009648-0"></A><ACLASS="indexterm"NAME="ch07-idx-1000009648-1"></A></P><PRECLASS="programlisting">while (<>) { # do something with the line}</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch07-pgfId-744"><CODECLASS="literal"></CODE><ACLASS="indexterm"NAME="ch07-idx-1000012042-0"></A>iscussion</A></H3><PCLASS="para">When you say:</P><PRECLASS="programlisting">while (<>) { # ... }</PRE><PCLASS="para">Perl translates this into:[<ACLASS="footnote"HREF="#ch07-pgfId-1000001149">4</A>]</P><BLOCKQUOTECLASS="footnote"><DIVCLASS="footnote"><PCLASS="para"><ACLASS="footnote"NAME="ch07-pgfId-1000001149">[4]</A> Except that the code written here won't work because ARGV has internal magic.</P></DIV></BLOCKQUOTE><PRECLASS="programlisting">unshift(@ARGV, '-') unless @ARGV;while ($ARGV = shift @ARGV) { unless (open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while (defined($_ = <ARGV>)) { # ... }}</PRE><PCLASS="para">You can access <CODECLASS="literal">ARGV</CODE> and <CODECLASS="literal">$ARGV</CODE> inside the loop to read more from the filehandle or to find the filename currently being processed. Let's look at how this works.</P><DIVCLASS="sect3"><H4CLASS="sect3"><ACLASS="title"NAME="ch07-pgfId-1000007512">Behavior</A></H4><PCLASS="para">If the user supplies no arguments, Perl sets <CODECLASS="literal">@ARGV</CODE> to a single string, <CODECLASS="literal">"-"</CODE>. This is shorthand for STDIN when opened for reading and STDOUT when opened for writing. It's also what lets the user of your program specify <CODECLASS="literal">"-"</CODE> as a filename on the command line to read from STDIN.</P><PCLASS="para">Next, the file processing loop removes one argument at a time from <CODECLASS="literal">@ARGV</CODE> and copies the filename into the global variable <CODECLASS="literal">$ARGV</CODE>. If the file cannot be opened, Perl goes on to the next one. Otherwise, it processes a line at a time. When the file runs out, the loop goes back and opens the next one, repeating the process until <CODECLASS="literal">@ARGV</CODE> is exhausted.</P><PCLASS="para">The <CODECLASS="literal">open</CODE> statement didn't say <CODECLASS="literal">open(ARGV,</CODE> <CODECLASS="literal">"<</CODE> <CODECLASS="literal">$ARGV")</CODE>. There's no extra greater- than symbol supplied. This allows for interesting effects, like passing the string <CODECLASS="literal">"gzip</CODE> <CODECLASS="literal">-dc</CODE> <CODECLASS="literal">file.gz</CODE> <CODECLASS="literal">|"</CODE> as an argument, to make your program read the output of the command <CODECLASS="literal">"gzip</CODE> <CODECLASS="literal">-dc</CODE> <CODECLASS="literal">file.gz"</CODE>. See Recipe 16.6 for more about this use of magic open.</P><PCLASS="para">You can change <CODECLASS="literal">@ARGV</CODE> before or inside the loop. Let's say you don't want the default behavior of reading from STDIN if there aren't any arguments - you want it to default to all the C or C++ source and header files. Insert this line before you start processing <CODECLASS="literal"><ARGV></CODE>:</P><PRECLASS="programlisting">@ARGV = glob("*.[Cch]") unless @ARGV;</PRE><PCLASS="para">Process options before the loop, either with one of the Getopt libraries described in <ACLASS="xref"HREF="ch15_01.htm"TITLE="User Interfaces">Chapter 15, <CITECLASS="chapter">User Interfaces</CITE></A>, or manually:</P><PRECLASS="programlisting"># arg demo 1: Process optional -c flag if (@ARGV && $ARGV[0] eq '-c') { $chop_first++; shift;}# arg demo 2: Process optional -NUMBER flag if (@ARGV && $ARGV[0] =~ /^-(\d+)$/) { $columns = $1; shift;}# arg demo 3: Process clustering -a, -i, -n, or -u flags while (@ARGV && $ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) { next if /^$/; s/a// && (++$append, redo); s/i// && (++$ignore_ints, redo); s/n// && (++$nostdout, redo); s/u// && (++$unbuffer, redo); die "usage: $0 [-ainu] [filenames] ...\n"; }</PRE><PCLASS="para">Other than its implicit looping over command-line arguments, <CODECLASS="literal"><></CODE> is not special. The special variables controlling I/O still apply; see <ACLASS="xref"HREF="ch08_01.htm"TITLE="File Contents">Chapter 8</A> for more on them. You can set <CODECLASS="literal">$/</CODE> to set the line terminator, and <CODECLASS="literal">$.</CODE> contains the current line (record) number. If you undefine <CODECLASS="literal">$/</CODE>, you don't get the concatenated contents of all files at once; you get one complete file each time:</P><PRECLASS="programlisting">undef $/; while (<>) { # $_ now has the complete contents of # the file whose name is in $ARGV }</PRE><PCLASS="para">If you localize <CODECLASS="literal">$/</CODE>, the old value is automatically restored when the enclosing block exits:</P><PRECLASS="programlisting">{ # create block for local local $/; # record separator now undef while (<>) { # do something; called functions still have # undeffed version of $/ } } # $/ restored here</PRE><PCLASS="para">Because processing <CODECLASS="literal"><ARGV></CODE> never explicitly closes filehandles, the record number in <CODECLASS="literal">$.</CODE> is not reset. If you don't like that, you can explicitly close the file yourself to reset <CODECLASS="literal">$.</CODE>:</P><PRECLASS="programlisting">while (<>) { print "$ARGV:$.:$_"; close ARGV if eof; }</PRE><PCLASS="para"><CODECLASS="literal">The</CODE> <CODECLASS="literal">eof</CODE> <CODECLASS="literal">function</CODE> defaults to checking the end of file status of the last file read. Since the last handle read was ARGV, <CODECLASS="literal">eof</CODE> reports whether we're at the end of the current file. If so, we close it and reset the <CODECLASS="literal">$.</CODE> variable. On the other hand, the special notation <CODECLASS="literal">eof()</CODE> with parentheses but no argument checks if we've reached the end of all files in the <CODECLASS="literal"><ARGV></CODE> processing.</P></DIV><DIVCLASS="sect3"><H4CLASS="sect3"><ACLASS="title"NAME="ch07-pgfId-1000008252">Command-line options</A></H4><PCLASS="para"><ACLASS="indexterm"NAME="ch07-idx-1000009633-0"></A><ACLASS="indexterm"NAME="ch07-idx-1000009633-1"></A><ACLASS="indexterm"NAME="ch07-idx-1000009633-2"></A>Perl has command-line options, <BCLASS="emphasis.bold">-n</B>,<BCLASS="emphasis.bold"> -p</B>,<BCLASS="emphasis.bold"> </B>and <BCLASS="emphasis.bold">-i</B>, to make writing filters and one-liners easier.</P><PCLASS="para">The <BCLASS="emphasis.bold">-n</B> option adds the <CODECLASS="literal"
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -