ch26_06.htm

来自「the unix power tools」· HTM 代码 · 共 571 行
HTM
571 行
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 26] 26.6 Just What Does a Regular Expression Match? </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:44:08Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch26_01.htm"TITLE="26. Regular Expressions (Pattern Matching)"><LINKREL="prev"HREF="ch26_05.htm"TITLE="26.5 Getting Regular Expressions Right "><LINKREL="next"HREF="ch26_07.htm"TITLE="26.7 Limiting the Extent of a Match "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_05.htm"TITLE="26.5 Getting Regular Expressions Right "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 26.5 Getting Regular Expressions Right "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 26<BR>Regular Expressions (Pattern Matching)</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_07.htm"TITLE="26.7 Limiting the Extent of a Match "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 26.7 Limiting the Extent of a Match "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-7835">26.6 Just What Does a Regular Expression Match? </A></H2><PCLASS="para">One of the toughest things to learn about regular expressionsis just what they do match.The problem is that a regular expression tends to find the longestpossible match&nbsp;- which can be more than you want.</P><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="showmatch">showmatch</A><BR></TH><TDVALIGN="TOP">Here's a simple script called <EMCLASS="emphasis">showmatch</EM> that is<ACLASS="indexterm"NAME="AUTOID-29042"></A><ACLASS="indexterm"NAME="AUTOID-29045"></A>useful for testing regular expressions, when writing<EMCLASS="emphasis">sed</EM> scripts, etc. Given a regular expression and a filename, itfinds lines in the file matching that expression, just like <EMCLASS="emphasis">grep</EM>, butit uses a row of carets (<CODECLASS="literal">^^^^</CODE>) to highlight the portion of the linethat was actually matched.</TD></TR></TABLE><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">#! /bin/sh# showmatch - mark string that matches patternpattern=$1; shiftnawk 'match($0,pattern) &gt; 0 {    s = substr($0,1,RSTART-1)    m = substr($0,1,RLENGTH)    gsub (/[^\b- ]/, &quot; &quot;, s)    gsub (/./,       &quot;^&quot;, m)    printf &quot;%s\n%s%s\n&quot;, $0, s, m}' pattern=&quot;$pattern&quot; $*</PRE></BLOCKQUOTE></P><PCLASS="para">For example:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>showmatch 'CD-...' mbox</B></CODE>and CD-ROM publishing. We have recognized     ^^^^^^that documentation will be shipped on CD-ROM; however,                                      ^^^^^^</PRE></BLOCKQUOTE></P><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="xgrep">xgrep</A><BR></TH><TDVALIGN="TOP"><ACLASS="indexterm"NAME="AUTOID-29057"></A><EMCLASS="emphasis">xgrep</EM> is a related script that simply retrieves only the matched text.This allows you to extract patterned data from a file.For example, you could extract only the numbers from a tablecontaining both text and numbers.It's also great for counting the number of occurrences of some patternin your file, as shown below.Just be sure that your expression only matches what you want.If you aren't sure, leave off the <EMCLASS="emphasis">wc</EM> command and glance at theoutput.For example, the regular expression <CODECLASS="literal">[0-9]*</CODE> will match numberslike <CODECLASS="literal">3.2</CODE> <EMCLASS="emphasis">twice</EM>: once for the <CODECLASS="literal">3</CODE> and again for the<CODECLASS="literal">2</CODE>!You want to include a dot (<CODECLASS="literal">.</CODE>) and/or comma (<CODECLASS="literal">,</CODE>),depending on how your numbers are written.For example: <CODECLASS="literal">[0-9][.0-9]*</CODE> matches a leading digit, possiblyfollowed by more dots and digits.</TD></TR></TABLE><BLOCKQUOTECLASS="note"><PCLASS="para"><STRONG>NOTE:</STRONG> Remember that an expression like <CODECLASS="literal">[0-9]*</CODE> will match <EMCLASS="emphasis">zero</EM> numbers(because <CODECLASS="literal">*</CODE> means &quot;zero or more of the preceding character&quot;).That expression can make <EMCLASS="emphasis">xgrep</EM> run for a very long time!The following expression, which matches <EMCLASS="emphasis">one</EM> or more digits,is probably what you want instead:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen"><CODECLASS="userinput"><B>xgrep &quot;[0-9][0-9]*&quot; </B></CODE><CODECLASS="replaceable"><I>files</I></CODE><CODECLASS="userinput"><B> | wc -l</B></CODE></PRE></BLOCKQUOTE></P></BLOCKQUOTE><PCLASS="para">The <EMCLASS="emphasis">xgrep</EM> shell script runs the <EMCLASS="emphasis">sed</EM> commands below,replacing <CODECLASS="literal">$re</CODE> with the regular expression from the command lineand <CODECLASS="literal">$x</CODE> with a CTRL-b character (which is used as a delimiter).We've shown the <EMCLASS="emphasis">sed</EM> commands numbered, like <CODECLASS="replaceable"><I>5&gt;</I></CODE>;these are only for reference and aren't part of the script:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen"><CODECLASS="replaceable"><I>1&gt;</I></CODE> \$x$re$x!d<CODECLASS="replaceable"><I>2&gt;</I></CODE> s//$x&amp;$x/g<CODECLASS="replaceable"><I>3&gt;</I></CODE> s/[^$x]*$x//<CODECLASS="replaceable"><I>4&gt;</I></CODE> s/$x[^$x]*$x/\<CODECLASS="replaceable"><I>  </I></CODE> /g<CODECLASS="replaceable"><I>5&gt;</I></CODE> s/$x.*//</PRE></BLOCKQUOTE></P><PCLASS="para"><BCLASS="emphasis.bold">Command 1</B> deletes all input lines that don't contain a match.On the remaining lines (which do match), <BCLASS="emphasis.bold">command 2</B> surrounds thematching text with CTRL-b delimiter characters.<BCLASS="emphasis.bold">Command 3</B> removes all characters (including the first delimiter)before the first match on a line.When there's more than one match on a line, <BCLASS="emphasis.bold">command 4</B> breaks themultiple matches onto separate lines.<BCLASS="emphasis.bold">Command 5</B> removes the last delimiter, and any text after it, fromevery output line.</P><PCLASS="para">Greg Ubben revised <EMCLASS="emphasis">showmatch</EM> and wrote <EMCLASS="emphasis">xgrep</EM>.</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">JP</SPAN>, <SPANCLASS="authorinitials">DD,&nbsp;TOR</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_05.htm"TITLE="26.5 Getting Regular Expressions Right "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 26.5 Getting Regular Expressions Right "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch26_07.htm"TITLE="26.7 Limiting the Extent of a Match "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 26.7 Limiting the Extent of a Match "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">26.5 Getting Regular Expressions Right </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">26.7 Limiting the Extent of a Match </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
ch26_06.htm - 源码说明

本页面展示了「the unix power tools」中的 ch26_06.htm 源码文件，采用 HTM 编程语言编写，共 571 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与power相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?