📄 ch06_18.htm
字号:
<HTML><HEAD><TITLE>Recipe 6.17. Expressing AND, OR, and NOT in a Single Pattern (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:34:54Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><LINKREL="prev"HREF="ch06_17.htm"TITLE="6.16. Detecting Duplicate Words"><LINKREL="next"HREF="ch06_19.htm"TITLE="6.18. Matching Multiple-Byte Characters"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_17.htm"TITLE="6.16. Detecting Duplicate Words"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 6.16. Detecting Duplicate Words"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch06_01.htm"TITLE="6. Pattern Matching"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch06_19.htm"TITLE="6.18. Matching Multiple-Byte Characters"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6.18. Matching Multiple-Byte Characters"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch06-15940">6.17. Expressing AND, OR, and NOT in a Single Pattern</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1935">Problem</A></H3><PCLASS="para"><ACLASS="indexterm"NAME="ch06-idx-1000007701-0"></A><ACLASS="indexterm"NAME="ch06-idx-1000007701-1"></A><ACLASS="indexterm"NAME="ch06-idx-1000007701-2"></A><ACLASS="indexterm"NAME="ch06-idx-1000007701-3"></A><ACLASS="indexterm"NAME="ch06-idx-1000007701-4"></A><ACLASS="indexterm"NAME="ch06-idx-1000007701-5"></A>You have an existing program that accepts a pattern as an argument or input. It doesn't allow you to add extra logic, like case insensitive options, ANDs, or NOTs. So you need to write a single pattern that matches either of two different patterns (the "or" case), both of two patterns (the "and" case), or that reverses the sense of the match ("not").</P><PCLASS="para">This situation arises often in a configuration files, web forms, or command-line arguments. Imagine there's a program that does this:</P><PRECLASS="programlisting">chomp($pattern = <CONFIG_FH>);if ( $data =~ /$pattern/ ) { ..... }</PRE><PCLASS="para">As the one maintaining the contents of CONFIG_FH, you need to convey Booleans through to the matching program through that one, measly pattern without explicit connectives.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1949">Solution</A></H3><PCLASS="para">True if either <CODECLASS="literal">/ALPHA/</CODE> or <CODECLASS="literal">/BETA/</CODE> matches, like <CODECLASS="literal">/ALPHA/</CODE> <CODECLASS="literal">||</CODE> <CODECLASS="literal">/BETA/</CODE>:</P><PRECLASS="programlisting">/ALPHA|BETA/</PRE><PCLASS="para">True if both <CODECLASS="literal">/ALPHA/</CODE> and <CODECLASS="literal">/BETA/</CODE> match, but may overlap, meaning <CODECLASS="literal">"BETALPHA"</CODE> should be okay, like <CODECLASS="literal">/ALPHA/</CODE> <CODECLASS="literal">&&</CODE> <CODECLASS="literal">/BETA/</CODE>:</P><PRECLASS="programlisting">/^(?=.*ALPHA)(?=.*BETA)/s</PRE><PCLASS="para">True if both <CODECLASS="literal">/ALPHA/</CODE> and <CODECLASS="literal">/BETA/</CODE> match, but may not overlap, meaning that <CODECLASS="literal">"BETALPHA"</CODE> should fail:</P><PRECLASS="programlisting">/ALPHA.*BETA|BETA.*ALPHA/s</PRE><PCLASS="para">True if pattern <CODECLASS="literal">/PAT/</CODE> does not match, like <CODECLASS="literal">$var</CODE> <CODECLASS="literal">!~</CODE> <CODECLASS="literal">/PAT/</CODE>:</P><PRECLASS="programlisting">/^(?:(?!PAT).)*$/s</PRE><PCLASS="para">True if pattern <CODECLASS="literal">BAD</CODE> does not match, but pattern <CODECLASS="literal">GOOD</CODE> does:</P><PRECLASS="programlisting">/(?=^(?:(?!BAD).)*$)GOOD/s</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch06-pgfId-1975">Discussion</A></H3><PCLASS="para">When you're writing a regular program and want to know if something doesn't match, say one of:</P><PRECLASS="programlisting">if (!($string =~ /pattern/)) { something() } # uglyif ( $string !~ /pattern/) { something() } # preferred</PRE><PCLASS="para">If you want to see if two patterns both match, use:</P><PRECLASS="programlisting">if ($string =~ /pat1/ && $string =~ /pat2/ ) { <CODECLASS="literal">something</CODE>() }</PRE><PCLASS="para">If you want to see if either of two patterns matches:</P><PRECLASS="programlisting">if ($string =~ /pat1/ || $string =~ /pat2/ ) { <CODECLASS="literal">something</CODE>() }</PRE><PCLASS="para">In short, use Perl's normal Boolean connectives to combine regular expressions, rather than doing it all within a single pattern. However, imagine a <EMCLASS="emphasis">minigrep</EM> program, one that reads its single pattern as an argument, as shown in <ACLASS="xref"HREF="ch06_18.htm#ch06-19434"TITLE="minigrep">Example 6.12</A>.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch06-19434">Example 6.12: minigrep</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# <ACLASS="indexterm"NAME="ch06-idx-1000009242-0"></A>minigrep - trivial grep$pat = shift;while (<>) { print if /$pat/o;}</PRE></DIV><PCLASS="para">If you want to tell <EMCLASS="emphasis">minigrep</EM> that some pattern must not match, or that it has to match both of two subpatterns in any order, then you're at an impasse. The program isn't built to accept those constructs. How can you do it using one pattern? That is, you'd like to execute the <EMCLASS="emphasis">minigrep PAT</EM> program where PAT can't match or has more than one connected patterns in it. This need comes up often in program reading patterns from configuration files.</P><PCLASS="para">The OR case is pretty easy, since the <CODECLASS="literal">|</CODE> symbol provides for alternation. The AND and NOT cases, however, require special encoding.</P><PCLASS="para">For AND, you have to distinguish between overlapping and non-overlapping cases. You want to see if a string matches both <CODECLASS="literal">"bell"</CODE> and <CODECLASS="literal">"lab"</CODE>. If you allow overlapping, then the word <CODECLASS="literal">"labelled"</CODE> qualifies. But if you didn't want to count overlaps, then it shouldn't qualify. The overlapping case uses two look-ahead assertions:</P><PRECLASS="programlisting"> "labelled" =~ /^(?=.*bell)(?=.*lab)/s</PRE><PCLASS="para">Remember: in a normal program, you don't have to go through these contortions. You can simply say:</P><PRECLASS="programlisting">$string =~ /bell/ && $string =~ /lab/</PRE><PCLASS="para">To unravel this, we'll spell it out using <CODECLASS="literal">/x</CODE> and comments. Here's the long version:</P><PRECLASS="programlisting"> if ($murray_hill =~ m{ ^ # start of string (?= # zero-width lookahead .* # any amount of intervening stuff bell # the desired bell string ) # rewind, since we were only looking (?= # and do the same thing .* # any amount of intervening stuff lab # and the lab part ) }sx ) # /s means . can match newline{ print "Looks like Bell Labs might be in Murray Hill!\n";}</PRE><PCLASS="para">We didn't use <CODECLASS="literal">.*?</CODE> to end it early because minimal matching is more expensive than maximal matching. So it's more efficient to use <CODECLASS="literal">.*</CODE> over <CODECLASS="literal">.*?</CODE>, given random input where the occurrence of matches at the front or the end of the string is completely unpredictable. Of course, sometimes choosing between <CODECLASS="literal">.*</CODE> and <CODECLASS="literal">.*?</CODE> may depend on correctness rather than efficiency, but not here.</P><P
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -