ch06_18.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 784 行 · 第 1/2 页
HTM
784 行
CLASS="para">To handle the non-overlapping case, you need two parts separated by an OR. The first branch is THIS followed by THAT; the second is the other way around.</P><PRECLASS="programlisting">&quot;labelled&quot; =~ /(?:^.*bell.*lab)|(?:^.*lab.*bell)/</PRE><PCLASS="para">or in long form:</P><PRECLASS="programlisting">$brand = &quot;labelled&quot;;if ($brand =~ m{        (?:                 # non-capturing grouper            ^ .*?           # any amount of stuff at the front              bell          # look for a bell              .*?           # followed by any amount of anything              lab           # look for a lab          )                 # end grouper    |                       # otherwise, try the other direction        (?:                 # non-capturing grouper            ^ .*?           # any amount of stuff at the front              lab           # look for a lab              .*?           # followed by any amount of anything              bell          # followed by a bell          )                 # end grouper    }sx )                   # /s means . can match newline{    print &quot;Our brand has bell and lab separate.\n&quot;;}</PRE><PCLASS="para">These patterns aren't necessarily faster. <CODECLASS="literal">$murray_hill</CODE> <CODECLASS="literal">=~</CODE> <CODECLASS="literal">/bell/</CODE> <CODECLASS="literal">&amp;&amp;</CODE> <CODECLASS="literal">$murray_hill</CODE> <CODECLASS="literal">=~</CODE> <CODECLASS="literal">/lab/</CODE><SPANCLASS="acronym"> </SPAN>will scan the string at most twice, but the pattern matching engine's only option is to try to find a <CODECLASS="literal">&quot;lab&quot;</CODE> for each occurrence of <CODECLASS="literal">&quot;bell&quot;</CODE> in <CODECLASS="literal">(?=^.*?bell)(?=^.*?lab)/</CODE>, leading to quadratic worst case running times.</P><PCLASS="para">If you followed those two, then the NOT case should be a breeze. The general form looks like this:</P><PRECLASS="programlisting">$map =~ /^(?:(?!waldo).)*$/s</PRE><PCLASS="para">Spelled out in long form, this yields:</P><PRECLASS="programlisting">if ($map =~ m{        ^                   # start of string        (?:                 # non-capturing grouper            (?!             # look ahead negation                waldo       # is he ahead of us now?            )               # is so, the negation failed            .               # any character (cuzza /s)        ) *                 # repeat that grouping 0 or more        $                   # through the end of the string    }sx )                   # /s means . can match newline{    print &quot;There's no waldo here!\n&quot;;}</PRE><PCLASS="para">How would you combine AND, OR, and NOT? It's not a pretty picture, and in a regular program, you'd almost never do this, but from a config file or command line where you only get to specify one pattern, you have no choice. You just have to combine what we've learned so far. Carefully.</P><PCLASS="para">Let's say you wanted to run the Unix <EMCLASS="emphasis">w</EM> program and find out whether user <CODECLASS="literal">tchrist</CODE> were logged on anywhere but a terminal whose name began with <CODECLASS="literal">ttyp </CODE>; that is, <CODECLASS="literal">tchrist</CODE> must match, but <CODECLASS="literal">ttyp</CODE> must not.</P><PCLASS="para">Here's sample input from <EMCLASS="emphasis">w</EM> on my Linux system:</P><PRECLASS="programlisting"><CODECLASS="userinput"><B><CODECLASS="replaceable"><I> 7:15am  up 206 days, 13:30,  4 users,  load average: 1.04, 1.07, 1.04</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>USER     TTY      FROM              LOGIN@  IDLE   JCPU   PCPU  WHAT</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist  tty1                       5:16pm 36days 24:43   0.03s  xinit</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist  tty2                       5:19pm  6days  0.43s  0.43s  -tcsh</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist  ttyp0    chthon            7:58am  3days 23.44s  0.44s  -tcsh</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>gnat     ttyS4    coprolith         2:01pm 13:36m  0.30s  0.30s  -tcsh</I></CODE></B></CODE></PRE><PCLASS="para">Here's how to do that using the <EMCLASS="emphasis">minigrep</EM> program outlined previously or with the <EMCLASS="emphasis">tcgrep</EM> program at the end of this chapter:</P><PRECLASS="programlisting">% w | minigrep '^(?!.*ttyp).*tchrist'</PRE><PCLASS="para">Decoding that pattern:</P><PRECLASS="programlisting">m{    ^                       # anchored to the start    (?!                     # zero-width look-ahead assertion        .*                  # any amount of anything (faster than .*?)        ttyp                # the string you don't want to find    )                       # end look-ahead negation; rewind to start    .*                      # any amount of anything (faster than .*?)    tchrist                 # now try to find Tom}x</PRE><PCLASS="para">Never mind that any sane person would just call <EMCLASS="emphasis">grep</EM> twice, once with a <BCLASS="emphasis.bold">-v</B> option to select only non-matches.</P><PRECLASS="programlisting">% w | grep tchrist | grep -v ttyp</PRE><PCLASS="para">The point is that Boolean conjunctions and negations <EMCLASS="emphasis">can</EM> be coded up in one single pattern. You should comment this kind of thing, though, having pity on those who come after you &nbsp;-  before they do.</P><PCLASS="para">How would you embed that <CODECLASS="literal">/s</CODE> in a pattern passed to a program from the command line? The same way as you would a <CODECLASS="literal">/i</CODE> switch: by using <CODECLASS="literal">(?i)</CODE> in the pattern. The <CODECLASS="literal">/s</CODE> and <CODECLASS="literal">/m</CODE> modifiers can be painlessly included in a pattern as well, using <CODECLASS="literal">/(?s)</CODE> or <CODECLASS="literal">/(?m)</CODE>. These can even cluster, as in <CODECLASS="literal">/(?smi)</CODE>. That would make these two reasonably interchangeable:</P><PRECLASS="programlisting">% grep -i 'pattern' files% minigrep '(?i)pattern' files</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1000008827">See Also</A></H3><PCLASS="para">Lookahead assertions are shown in the "Regular Expressions" section of <ICLASS="filename">perlre</I> (1), and in the <ACLASS="olink"HREF="../prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.2">"rules of regular expression matching"</A> section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; your system's <ICLASS="filename">grep </I>(1) and <ICLASS="filename">w</I> (1) manpages; we talk about configuration files in <ACLASS="xref"HREF="ch08_17.htm"TITLE="Reading Configuration Files">Recipe 8.16</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_17.htm"TITLE="6.16. Detecting Duplicate Words"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.16. Detecting Duplicate Words"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_19.htm"TITLE="6.18. Matching Multiple-Byte Characters"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.18. Matching Multiple-Byte Characters"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">6.16. Detecting Duplicate Words</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.18. Matching Multiple-Byte Characters</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
ch06_18.htm - 源码说明

本页面展示了「By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998」中的 ch06_18.htm 源码文件，采用 HTM 编程语言编写，共 784 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Christiansen相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?