📄 ch06_18.htm
字号:
CLASS="para">To handle the non-overlapping case, you need two parts separated by an OR. The first branch is THIS followed by THAT; the second is the other way around.</P><PRECLASS="programlisting">"labelled" =~ /(?:^.*bell.*lab)|(?:^.*lab.*bell)/</PRE><PCLASS="para">or in long form:</P><PRECLASS="programlisting">$brand = "labelled";if ($brand =~ m{ (?: # non-capturing grouper ^ .*? # any amount of stuff at the front bell # look for a bell .*? # followed by any amount of anything lab # look for a lab ) # end grouper | # otherwise, try the other direction (?: # non-capturing grouper ^ .*? # any amount of stuff at the front lab # look for a lab .*? # followed by any amount of anything bell # followed by a bell ) # end grouper }sx ) # /s means . can match newline{ print "Our brand has bell and lab separate.\n";}</PRE><PCLASS="para">These patterns aren't necessarily faster. <CODECLASS="literal">$murray_hill</CODE> <CODECLASS="literal">=~</CODE> <CODECLASS="literal">/bell/</CODE> <CODECLASS="literal">&&</CODE> <CODECLASS="literal">$murray_hill</CODE> <CODECLASS="literal">=~</CODE> <CODECLASS="literal">/lab/</CODE><SPANCLASS="acronym"> </SPAN>will scan the string at most twice, but the pattern matching engine's only option is to try to find a <CODECLASS="literal">"lab"</CODE> for each occurrence of <CODECLASS="literal">"bell"</CODE> in <CODECLASS="literal">(?=^.*?bell)(?=^.*?lab)/</CODE>, leading to quadratic worst case running times.</P><PCLASS="para">If you followed those two, then the NOT case should be a breeze. The general form looks like this:</P><PRECLASS="programlisting">$map =~ /^(?:(?!waldo).)*$/s</PRE><PCLASS="para">Spelled out in long form, this yields:</P><PRECLASS="programlisting">if ($map =~ m{ ^ # start of string (?: # non-capturing grouper (?! # look ahead negation waldo # is he ahead of us now? ) # is so, the negation failed . # any character (cuzza /s) ) * # repeat that grouping 0 or more $ # through the end of the string }sx ) # /s means . can match newline{ print "There's no waldo here!\n";}</PRE><PCLASS="para">How would you combine AND, OR, and NOT? It's not a pretty picture, and in a regular program, you'd almost never do this, but from a config file or command line where you only get to specify one pattern, you have no choice. You just have to combine what we've learned so far. Carefully.</P><PCLASS="para">Let's say you wanted to run the Unix <EMCLASS="emphasis">w</EM> program and find out whether user <CODECLASS="literal">tchrist</CODE> were logged on anywhere but a terminal whose name began with <CODECLASS="literal">ttyp </CODE>; that is, <CODECLASS="literal">tchrist</CODE> must match, but <CODECLASS="literal">ttyp</CODE> must not.</P><PCLASS="para">Here's sample input from <EMCLASS="emphasis">w</EM> on my Linux system:</P><PRECLASS="programlisting"><CODECLASS="userinput"><B><CODECLASS="replaceable"><I> 7:15am up 206 days, 13:30, 4 users, load average: 1.04, 1.07, 1.04</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist tty1 5:16pm 36days 24:43 0.03s xinit</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist tty2 5:19pm 6days 0.43s 0.43s -tcsh</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>tchrist ttyp0 chthon 7:58am 3days 23.44s 0.44s -tcsh</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>gnat ttyS4 coprolith 2:01pm 13:36m 0.30s 0.30s -tcsh</I></CODE></B></CODE></PRE><PCLASS="para">Here's how to do that using the <EMCLASS="emphasis">minigrep</EM> program outlined previously or with the <EMCLASS="emphasis">tcgrep</EM> program at the end of this chapter:</P><PRECLASS="programlisting">% w | minigrep '^(?!.*ttyp).*tchrist'</PRE><PCLASS="para">Decoding that pattern:</P><PRECLASS="programlisting">m{ ^ # anchored to the start (?! # zero-width look-ahead assertion .* # any amount of anything (faster than .*?) ttyp # the string you don't want to find ) # end look-ahead negation; rewind to start .* # any amount of anything (faster than .*?) tchrist # now try to find Tom}x</PRE><PCLASS="para">Never mind that any sane person would just call <EMCLASS="emphasis">grep</EM> twice, once with a <BCLASS="emphasis.bold">-v</B> option to select only non-matches.</P><PRECLASS="programlisting">% w | grep tchrist | grep -v ttyp</PRE><PCLASS="para">The point is that Boolean conjunctions and negations <EMCLASS="emphasis">can</EM> be coded up in one single pattern. You should comment this kind of thing, though, having pity on those who come after you - before they do.</P><PCLASS="para">How would you embed that <CODECLASS="literal">/s</CODE> in a pattern passed to a program from the command line? The same way as you would a <CODECLASS="literal">/i</CODE> switch: by using <CODECLASS="literal">(?i)</CODE> in the pattern. The <CODECLASS="literal">/s</CODE> and <CODECLASS="literal">/m</CODE> modifiers can be painlessly included in a pattern as well, using <CODECLASS="literal">/(?s)</CODE> or <CODECLASS="literal">/(?m)</CODE>. These can even cluster, as in <CODECLASS="literal">/(?smi)</CODE>. That would make these two reasonably interchangeable:</P><PRECLASS="programlisting">% grep -i 'pattern' files% minigrep '(?i)pattern' files</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1000008827">See Also</A></H3><PCLASS="para">Lookahead assertions are shown in the "Regular Expressions" section of <ICLASS="filename">perlre</I> (1), and in the <ACLASS="olink"HREF="../prog/ch02_04.htm#PERL2-CH-2-SECT-4.1.2">"rules of regular expression matching"</A> section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; your system's <ICLASS="filename">grep </I>(1) and <ICLASS="filename">w</I> (1) manpages; we talk about configuration files in <ACLASS="xref"HREF="ch08_17.htm"TITLE="Reading Configuration Files">Recipe 8.16</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_17.htm"TITLE="6.16. Detecting Duplicate Words"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.16. Detecting Duplicate Words"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_19.htm"TITLE="6.18. Matching Multiple-Byte Characters"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.18. Matching Multiple-Byte Characters"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">6.16. Detecting Duplicate Words</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6.18. Matching Multiple-Byte Characters</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -