📄 ch26_03.htm
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 26] 26.3 Understanding Expressions </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly & Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:43:59Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch26_01.htm"TITLE="26. Regular Expressions (Pattern Matching)"><LINKREL="prev"HREF="ch26_02.htm"TITLE="26.2 Don't Confuse Regular Expressions with Wildcards "><LINKREL="next"HREF="ch26_04.htm"TITLE="26.4 Using Metacharacters in Regular Expressions "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_02.htm"TITLE="26.2 Don't Confuse Regular Expressions with Wildcards "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 26.2 Don't Confuse Regular Expressions with Wildcards "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 26<BR>Regular Expressions (Pattern Matching)</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_04.htm"TITLE="26.4 Using Metacharacters in Regular Expressions "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 26.4 Using Metacharacters in Regular Expressions "BORDER="0"></A></TD></TR></TABLE> <HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-7972">26.3 Understanding Expressions </A></H2><PCLASS="para"><ACLASS="indexterm"NAME="UPT-ART-7972-IX-REGULAR-EXPRESSIONS-DESCRIBED"></A>You are probably familiar with the kinds of expressionsthat a calculator interprets.Look at the following arithmetic expression:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">2 + 4</PRE></BLOCKQUOTE></P><PCLASS="para">"Two plus four" consists of several constants orliteral values and an operator.A calculator program mustrecognize, for instance, that 2 is a numeric constant andthat the plus sign represents an operator, not tobe interpreted as the <CODECLASS="literal">+</CODE> character.</P><PCLASS="para">An expression tells the computer how to produce a result.Although itis the sum of "two plus four" that we really want,we don't simply tell the computer to return a six.We instruct the computer to evaluate the expression andreturn a value.</P><PCLASS="para">An expression can be more complicated than 2+4; in fact, itmight consist of multiple simple expressions, such asthe following:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">2 + 3 * 4</PRE></BLOCKQUOTE></P><PCLASS="para">A calculator normally evaluates an expression from left to right.However, certainoperators have precedence over others: that is, they will be performedfirst.Thus, the above expression will evaluate to 14 and not 20 becausemultiplication takes precedence over addition.Precedence can be overridden by placing the simple expression inparentheses.Thus, (2+3)*4 or "the sum of two plus three times four"will evaluate to 20.The parentheses are symbols that instruct the calculator to changethe order in which the expression is evaluated.</P><PCLASS="para">A regular expression, by contrast, is descriptive of a pattern or sequence ofcharacters.Concatenation is the basic operation implied in every regularexpression.That is, a pattern matches adjacent characters.Look at the following example of a regular expression:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">ABE</PRE></BLOCKQUOTE></P><PCLASS="para">Each literal character is a regular expression thatmatches only that single character.This expression describes an "<CODECLASS="literal">A</CODE> followed by a <CODECLASS="literal">B</CODE> then followed by an <CODECLASS="literal">E</CODE>"or simply the string <CODECLASS="literal">ABE</CODE>.The term "string" means each character concatenated to the onepreceding it.That a regular expression describes a <EMCLASS="emphasis">sequence</EM> of characterscan't be emphasized enough.(Novice users are inclined to thinkin higher-level units such as words, and not individual characters.)Regular expressions are case-sensitive; <CODECLASS="literal">A</CODE> does not match <CODECLASS="literal">a</CODE>.</P><PCLASS="para">Programs such as<SPANCLASS="link"><EMCLASS="emphasis">grep</EM> (<ACLASS="linkend"HREF="ch27_02.htm"TITLE="Searching for Text with grep ">27.2</A>)</SPAN>that accept regular expressionsmust first evaluatethe syntax of the regular expression to produce a pattern.They then read the input line by line trying to match the pattern.An input line is a string, and to see if a string matches the pattern, a program compares the first character in the string to the first character of the pattern.If there is a match, it compares the second character inthe string to the second character of the pattern.Whenever it fails to make a match, it compares the next characterin the string to the first character of the pattern.<ACLASS="xref"HREF="ch26_03.htm#UPT-ART-7972-FIG-0"TITLE="Interpreting a Regular Expression">Figure 26.1</A>illustrates this process, trying to match the pattern <CODECLASS="literal">abe</CODE>on an input line.</P><H4CLASS="figure"><ACLASS="title"NAME="UPT-ART-7972-FIG-0">Figure 26.1: Interpreting a Regular Expression</A></H4><IMGCLASS="graphic"SRC="figs/7972.gif"ALT="Figure 26.1"><PCLASS="para">A regular expression is not limited to literal characters.There is, for<ACLASS="indexterm"NAME="AUTOID-28377"></A><ACLASS="indexterm"NAME="AUTOID-28380"></A>instance, a metacharacter - the dot (<CODECLASS="literal">.</CODE>) - that can be used as a"wildcard" to match any single character.You can think of this wildcard as analogous to a blank tilein Scrabble(TM) where it means any letter.Thus, we can specify the regular expression <CODECLASS="literal">A.E</CODE> and it willmatch <CODECLASS="literal">ACE</CODE>, <CODECLASS="literal">ABE</CODE>, and <CODECLASS="literal">ALE</CODE>.It will match any characterin the position following <CODECLASS="literal">A</CODE>.</P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-28390"></A><ACLASS="indexterm"NAME="AUTOID-28393"></A><ACLASS="indexterm"NAME="AUTOID-28396"></A>The metacharacter <CODECLASS="literal">*</CODE> (the asterisk) is used to match zero or moreoccurrences of the <EMCLASS="emphasis">preceding</EM> regular expression, which typicallyis a single character.You may be familiar with <CODECLASS="literal">*</CODE> as a <EMCLASS="emphasis">shell</EM>metacharacter, where it also means "zero or more characters."But that meaningis very different from <CODECLASS="literal">*</CODE> in a regular expression.By itself, themetacharacter <CODECLASS="literal">*</CODE> does not match anything in a regular expression; it modifies what goes before it.The regular expression <CODECLASS="literal">.*</CODE> matches any number ofcharacters.The regular expression <CODECLASS="literal">A.*E</CODE> matches any string that matches <CODECLASS="literal">A.E</CODE> but it will alsomatch any number of characters between <CODECLASS="literal">A</CODE> and <CODECLASS="literal">E</CODE>:<CODECLASS="literal">AIRPLANE</CODE>, <CODECLASS="literal">A</CODE> <CODECLASS="literal">FINE</CODE>, <CODECLASS="literal">AE</CODE>,<CODECLASS="literal">A</CODE> <CODECLASS="literal">32-cent</CODE> <CODECLASS="literal">S.A.S.E</CODE>,or <CODECLASS="literal">A</CODE> <CODECLASS="literal">LONG</CODE> <CODECLASS="literal">WAY</CODE> <CODECLASS="literal">HOME</CODE>, for example.</P><PCLASS="para">If you understand the difference between <CODECLASS="literal">.</CODE> and <CODECLASS="literal">*</CODE> in regularexpressions, you already know about the two basic types of metacharacters: thosethat can be evaluated to a single character, and those that modify howcharacters that precede it are evaluated.</P><PCLASS="para">It should also be apparent thatby use of metacharacters you can expand or limit the possible matches.You have more control over what is matched and what is not.In article<ACLASS="xref"HREF="ch26_04.htm"TITLE="Using Metacharacters in Regular Expressions ">26.4</A>,Bruce Barnett explains in detail how to use regular expression metacharacters.<ACLASS="indexterm"NAME="AUTOID-28426"></A></P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">DD</SPAN> <SPANCLASS="bibliomisc">from O'Reilly & Associates' <CITECLASS="citetitle">sed & awk</CITE></SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_02.htm"TITLE="26.2 Don't Confuse Regular Expressions with Wildcards "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 26.2 Don't Confuse Regular Expressions with Wildcards "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_04.htm"TITLE="26.4 Using Metacharacters in Regular Expressions "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 26.4 Using Metacharacters in Regular Expressions "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">26.2 Don't Confuse Regular Expressions with Wildcards </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">26.4 Using Metacharacters in Regular Expressions </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed & awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -