ch01_19.htm
来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 854 行 · 第 1/2 页
HTM
854 行
><B><CODECLASS="replaceable"><I>100100 0 10116 9564 0 0 1412 928 setup_frame T p3 0:00 ssh -C www</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>100100 0 26560 26554 0 0 1076 572 setup_frame T p2 0:00 less</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>100000 101 19058 9562 0 0 1396 900 setup_frame T p1 0:02 nvi /tmp/a</I></CODE></B></CODE></PRE></DIV><PCLASS="para">The <EMCLASS="emphasis">psgrep</EM> program integrates many techniques presented throughout this book. Stripping strings of leading and trailing whitespace is found in <ACLASS="xref"HREF="ch01_15.htm"TITLE="Trimming Blanks from the Ends of a String">Recipe 1.14</A>. Converting cut marks into an <CODECLASS="literal">unpack</CODE> format to extract fixed fields is in <ACLASS="xref"HREF="ch01_02.htm"TITLE="Accessing Substrings">Recipe 1.1</A>. Matching strings with regular expressions is the entire topic of <ACLASS="xref"HREF="ch06_01.htm"TITLE="Pattern Matching">Chapter 6</A>.</P><PCLASS="para">The multiline string in the here document passed to <CODECLASS="literal">die</CODE> is discussed in Recipes <ACLASS="xref"HREF="ch01_11.htm"TITLE="Interpolating Functions and Expressions Within Strings">Recipe 1.10</A> and <ACLASS="xref"HREF="ch01_12.htm"TITLE="Indenting Here Documents">Recipe 1.11</A>. The assignment to <CODECLASS="literal">@fields{@fieldnames}</CODE> sets many values at once in the hash named <CODECLASS="literal">%fields</CODE>. Hash slices are discussed in Recipes <ACLASS="xref"HREF="ch04_08.htm"TITLE="Finding Elements in One Array but Not Another">Recipe 4.7</A> and <ACLASS="xref"HREF="ch05_11.htm"TITLE="Merging Hashes">Recipe 5.10</A>.</P><PCLASS="para">The sample program input contained beneath <CODECLASS="literal">__END__</CODE> is described in <ACLASS="xref"HREF="ch07_07.htm"TITLE="Storing Files Inside Your Program Text">Recipe 7.6</A>. During development, we used canned input from the <CODECLASS="literal">DATA</CODE> filehandle for testing purposes. Once the program worked properly, we changed it to read from a piped-in <EMCLASS="emphasis">ps</EM> command but left a remnant of the original filter input to aid in future porting and maintenance. Launching other programs over a pipe is covered in <ACLASS="xref"HREF="ch16_01.htm"TITLE="Process Management and Communication">Chapter 16, <CITECLASS="chapter">Process Management and Communication</CITE></A>, including Recipes <ACLASS="xref"HREF="ch16_11.htm"TITLE="Communicating Between Related Processes">Recipe 16.10</A> and <ACLASS="xref"HREF="ch16_14.htm"TITLE="Listing Available Signals">Recipe 16.13</A>.</P><PCLASS="para">The real power and expressiveness in <EMCLASS="emphasis">psgrep</EM> derive from Perl's use of string arguments not as mere strings but directly as Perl code. This is similar to the technique in <ACLASS="xref"HREF="ch09_10.htm"TITLE="Renaming Files">Recipe 9.9</A>, except that in <EMCLASS="emphasis">psgrep</EM>, the user's arguments are wrapped with a routine called <CODECLASS="literal">is_desirable</CODE>. That way, the cost of compiling strings into Perl code happens only once, before the program whose output we'll process is even begun. For example, asking for UIDs under 10 creates this string to <CODECLASS="literal">eval</CODE>:</P><PRECLASS="programlisting">eval "sub is_desirable { uid < 10 } " . 1;</PRE><PCLASS="para">The mysterious "<CODECLASS="literal">.1</CODE>" at the end is so that if the user code compiles, the whole <CODECLASS="literal">eval</CODE> returns true. That way we don't even have to check <CODECLASS="literal">$@</CODE> for compilation errors as we do in <ACLASS="xref"HREF="ch10_13.htm"TITLE="Handling Exceptions">Recipe 10.12</A>.</P><PCLASS="para">Specifying arbitrary Perl code in a filter to select records is a breathtakingly powerful approach, but it's not entirely original. Perl owes much to the <EMCLASS="emphasis">awk</EM> programming language, which is often used for such filtering. One problem with <EMCLASS="emphasis">awk</EM> is that it can't easily treat input as fixed-size fields instead of fields separated by something. Another is that the fields are not mnemonically named: <EMCLASS="emphasis">awk</EM> uses <CODECLASS="literal">$1</CODE>, <CODECLASS="literal">$2</CODE>, etc. Plus Perl can do much that <EMCLASS="emphasis">awk</EM> cannot.</P><PCLASS="para">The user criteria don't even have to be simple expressions. For example, this call initializes a variable <CODECLASS="literal">$id</CODE> to user <EMCLASS="emphasis">nobody </EM>'s number to use later in its expression:</P><PRECLASS="programlisting">% psgrep 'no strict "vars"; BEGIN { $id = getpwnam("nobody") } uid == $id '</PRE><PCLASS="para">How can we use unquoted words without even a dollar sign, like <CODECLASS="literal">uid</CODE>, <CODECLASS="literal">command</CODE>, and <CODECLASS="literal">size</CODE>, to represent those respective fields in each input record? We directly manipulate the symbol table by assigning closures to indirect <ACLASS="indexterm"NAME="ch01-idx-1000011522-0"></A>typeglobs, which creates functions with those names. The function names are created using both uppercase and lowercase names, allowing both "<CODECLASS="literal">UID</CODE> <CODECLASS="literal"><</CODE> <CODECLASS="literal">10</CODE>" and "<CODECLASS="literal">uid</CODE> <CODECLASS="literal"><</CODE> <CODECLASS="literal">10</CODE>". Closures are described in <ACLASS="xref"HREF="ch11_05.htm"TITLE="Taking References to Functions">Recipe 11.4</A>, and assigning them to typeglobs to create function aliases is shown in <ACLASS="xref"HREF="ch10_15.htm"TITLE="Redefining a Function">Recipe 10.14</A>.</P><PCLASS="para">One twist here not seen in those recipes is empty parentheses on the closure. These allowed us to use the function in an expression anywhere we'd use a single term, like a string or a numeric constant. It creates a void prototype so the field-accessing function named <CODECLASS="literal">uid</CODE> accepts no arguments, just like the built-in function <CODECLASS="literal">time</CODE>. If these functions weren't prototyped void, expressions like "<CODECLASS="literal">uid</CODE> <CODECLASS="literal"><</CODE> <CODECLASS="literal">10</CODE>" or "<CODECLASS="literal">size</CODE> <CODECLASS="literal">/</CODE> <CODECLASS="literal">2</CODE> <CODECLASS="literal">></CODE> <CODECLASS="literal">rss</CODE>" would confuse the parser because it would see the unterminated start of a wildcard glob and of a pattern match, respectively. Prototypes are discussed in <ACLASS="xref"HREF="ch10_12.htm"TITLE="Prototyping Functions">Recipe 10.11</A>.</P><PCLASS="para">The version of <EMCLASS="emphasis">psgrep</EM> demonstrated here expects the output from Red Hat Linux's <EMCLASS="emphasis">ps</EM>. To port to other systems, look at which columns the headers begin at. This approach isn't relevant only to <EMCLASS="emphasis">ps</EM> or only to Unix systems. It's a generic technique for filtering input records using Perl expressions, easily adapted to other record layouts. The input format could be in columns, space separated, comma separated, or the result of a pattern match with capturing parentheses.</P><PCLASS="para">The program could even be modified to handle a user-defined database with a small change to the selection functions. If you had an array of records as described in <ACLASS="xref"HREF="ch11_10.htm"TITLE="Constructing Records">Recipe 11.9</A>, you could let users specify arbitrary selection criteria, such as:</P><PRECLASS="programlisting">sub id() { $_->{ID} }sub title() { $_->{TITLE} }sub executive() { title =~ /(?:vice-)?president/i }# user search criteria go in the grep clause@slowburners = grep { id < 10 && !executive } @employees;</PRE><PCLASS="para">For reasons of security and performance, this kind of power is seldom found in database engines like those described in <ACLASS="xref"HREF="ch14_01.htm"TITLE="Database Access">Chapter 14, <CITECLASS="chapter">Database Access</CITE></A>. SQL doesn't support this, but given Perl and small bit of ingenuity, it's easy to roll it up on your own. The search engine at <ACLASS="systemitem.url"HREF="http://mox.perl.com/cgi-bin/MxScreen ">http://mox.perl.com/cgi-bin/MxScreen </A>uses such a technique, but instead of output from <EMCLASS="emphasis">ps</EM>, its records are Perl hashes loaded from a database. <ACLASS="indexterm"NAME="ch01-idx-1000010111-0"></A><ACLASS="indexterm"NAME="ch01-idx-1000010111-1"></A><ACLASS="indexterm"NAME="ch01-idx-1000010111-2"></A></P></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch01_18.htm"TITLE="1.17. Program: fixstyle"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 1.17. Program: fixstyle"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="chapter"HREF="ch02_01.htm"TITLE="2. Numbers"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 2. Numbers"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">1.17. Program: fixstyle</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">2. Numbers</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?