📄 regexp.html

📁 Shall高级编程
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
></LI><LI><P><ANAME="QUEXREGEX"></A></P><P>The question mark -- <SPANCLASS="TOKEN">?</SPAN> -- matches zero or	      one of the previous RE. It is generally used for matching	      single characters.</P></LI><LI><P><ANAME="PLUSREF"></A></P><P>The plus -- <SPANCLASS="TOKEN">+</SPAN> -- matches one or more of the	    previous RE. It serves a role similar to the <SPANCLASS="TOKEN">*</SPAN>, but	    does <SPANCLASS="emphasis"><ICLASS="EMPHASIS">not</I></SPAN> match zero occurrences.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;# GNU versions of sed and awk can use "+",   2&nbsp;# but it needs to be escaped.   3&nbsp;   4&nbsp;echo a111b | sed -ne '/a1\+b/p'   5&nbsp;echo a111b | grep 'a1\+b'   6&nbsp;echo a111b | gawk '/a1+b/'   7&nbsp;# All of above are equivalent.   8&nbsp;   9&nbsp;# Thanks, S.C.</PRE></TD></TR></TABLE></P><P><ANAME="ESCPCB"></A></P></LI><LI><P><AHREF="escapingsection.html#ESCP">Escaped</A> <SPANCLASS="QUOTE">"curly	      brackets"</SPAN> -- <SPANCLASS="TOKEN">\{ \}</SPAN> -- indicate the number	      of occurrences of a preceding RE to match.</P><P>It is necessary to escape the curly brackets since	      they have only their literal character meaning	      otherwise. This usage is technically not part of the basic	      RE set.</P><P><SPANCLASS="QUOTE">"[0-9]\{5\}"</SPAN> matches exactly five digits	      (characters in the range of 0 to 9).</P><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Curly brackets are not available as an RE in the	      <SPANCLASS="QUOTE">"classic"</SPAN> (non-POSIX compliant) version	      of <AHREF="awk.html#AWKREF">awk</A>. However,	      <BCLASS="COMMAND">gawk</B> has the	      <TTCLASS="OPTION">--re-interval</TT> option that permits them	      (without being escaped).</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>echo 2222 | gawk --re-interval '/2{3}/'</B></TT> <TTCLASS="COMPUTEROUTPUT">2222</TT> 	      </PRE></TD></TR></TABLE>	    </P><P><BCLASS="COMMAND">Perl</B> and some	      <BCLASS="COMMAND">egrep</B> versions do not require escaping	      the curly brackets.</P></TD></TR></TABLE></DIV></LI><LI><P><ANAME="PARENGRPS"></A></P><P>Parentheses -- <BCLASS="COMMAND">( )</B> -- enclose a group of	      REs. They are useful with the following	      <SPANCLASS="QUOTE">"<SPANCLASS="TOKEN">|</SPAN>"</SPAN> operator and in <AHREF="string-manipulation.html#EXPRPAREN">substring extraction</A> using <AHREF="moreadv.html#EXPRREF">expr</A>.</P></LI><LI><P>The -- <BCLASS="COMMAND">|</B> -- <SPANCLASS="QUOTE">"or"</SPAN> RE operator	      matches any of a set of alternate characters.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>egrep 're(a|e)d' misc.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">People who read seem to be better informed than those who do not. The clarinet produces sound by the vibration of its reed.</TT> 	      </PRE></TD></TR></TABLE>	      </P></LI></UL><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="100%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Some versions of <BCLASS="COMMAND">sed</B>,	      <BCLASS="COMMAND">ed</B>, and <BCLASS="COMMAND">ex</B> support	      escaped versions of the extended Regular Expressions	      described above, as do the GNU utilities.</P></TD></TR></TABLE></DIV><UL><LISTYLE="list-style-type: square"><DIVCLASS="FORMALPARA"><P><B><ANAME="POSIXREF"></A>POSIX Character Classes. </B><TTCLASS="USERINPUT"><B>[:class:]</B></TT></P></DIV><P>This is an alternate method of specifying a range of	      characters to match.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:alnum:]</B></TT> matches alphabetic or	      numeric characters. This is equivalent to	      <TTCLASS="USERINPUT"><B>A-Za-z0-9</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:alpha:]</B></TT> matches alphabetic	      characters. This is equivalent to	      <TTCLASS="USERINPUT"><B>A-Za-z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:blank:]</B></TT> matches a space or a	      tab.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:cntrl:]</B></TT> matches control	      characters.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:digit:]</B></TT> matches (decimal)	      digits. This is equivalent to	      <TTCLASS="USERINPUT"><B>0-9</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:graph:]</B></TT> (graphic printable	      characters). Matches characters in the range of ASCII 33 -	      126. This is the same as <TTCLASS="USERINPUT"><B>[:print:]</B></TT>,	      below, but excluding the space character.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:lower:]</B></TT> matches lowercase	      alphabetic characters. This is equivalent to	      <TTCLASS="USERINPUT"><B>a-z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:print:]</B></TT> (printable	      characters). Matches characters in the range of ASCII 32 -	      126. This is the same as <TTCLASS="USERINPUT"><B>[:graph:]</B></TT>,	      above, but adding the space character.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:space:]</B></TT> matches whitespace	      characters (space and horizontal tab).</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:upper:]</B></TT> matches uppercase	      alphabetic characters. This is equivalent to	      <TTCLASS="USERINPUT"><B>A-Z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:xdigit:]</B></TT> matches hexadecimal	      digits. This is equivalent to	      <TTCLASS="USERINPUT"><B>0-9A-Fa-f</B></TT>.</P><DIVCLASS="IMPORTANT"><TABLECLASS="IMPORTANT"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/important.png"HSPACE="5"ALT="Important"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>POSIX character classes generally require quoting	      or <AHREF="tests.html#DBLBRACKETS">double brackets</A>	      ([[ ]]).</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep [[:digit:]] test.file</B></TT> <TTCLASS="COMPUTEROUTPUT">abc=723</TT> 	      </PRE></TD></TR></TABLE>	    </P><P>These character classes may even be used with <AHREF="globbingref.html">globbing</A>, to a limited	      extent.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>ls -l ?[[:digit:]][[:digit:]]?</B></TT> <TTCLASS="COMPUTEROUTPUT">-rw-rw-r--    1 bozo  bozo         0 Aug 21 14:47 a33b</TT> 	      </PRE></TD></TR></TABLE>	    </P><P>To see POSIX character classes used in scripts, refer to	      <AHREF="textproc.html#EX49">Example 15-20</A> and <AHREF="textproc.html#LOWERCASE">Example 15-21</A>.</P></TD></TR></TABLE></DIV></LI></UL><P><AHREF="sedawk.html#SEDREF">Sed</A>, <AHREF="awk.html#AWKREF">awk</A>, and <AHREF="wrapper.html#PERLREF">Perl</A>, used as filters in scripts, take	  REs as arguments when "sifting" or transforming files or I/O	  streams. See <AHREF="contributed-scripts.html#BEHEAD">Example A-12</A> and <AHREF="contributed-scripts.html#TREE">Example A-17</A>	  for illustrations of this.</P><P>The standard reference on this complex topic is Friedl's	  <ICLASS="CITETITLE">Mastering Regular	  Expressions</I>. <ICLASS="CITETITLE">Sed &#38;	  Awk</I>, by Dougherty and Robbins, also gives a very	  lucid treatment of REs. See the <AHREF="biblio.html"><I>Bibliography</I></A> for	  more information on these books.</P></DIV></DIV><H3CLASS="FOOTNOTES">Notes</H3><TABLEBORDER="0"CLASS="FOOTNOTES"WIDTH="100%"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="5%"><ANAME="FTN.AEN15785"HREF="regexp.html#AEN15785">[1]</A></TD><TDALIGN="LEFT"VALIGN="TOP"WIDTH="95%"><P><ANAME="METAMEANINGREF"></A>A	    <ICLASS="FIRSTTERM">meta-meaning</I> is the meaning of a	    term or expression on a higher level of abstraction. For	    example, the <ICLASS="FIRSTTERM">literal</I> meaning	    of <ICLASS="FIRSTTERM">regular expression</I> is an	    ordinary expression that conforms to accepted usage. The	    <ICLASS="FIRSTTERM">meta-meaning</I> is drastically different,	    as discussed at length in this chapter.</P></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="5%"><ANAME="FTN.AEN15838"HREF="regexp.html#AEN15838">[2]</A></TD><TDALIGN="LEFT"VALIGN="TOP"WIDTH="95%"><P>Since <AHREF="sedawk.html#SEDREF">sed</A>, <AHREF="awk.html#AWKREF">awk</A>, and <AHREF="textproc.html#GREPREF">grep</A> process single lines, there		  will usually not be a newline to match. In those cases where		  there is a newline in a multiple line expression, the dot		  will match the newline.	            <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;#!/bin/bash   2&nbsp;   3&nbsp;sed -e 'N;s/.*/[&#38;]/' &#60;&#60; EOF   # Here Document   4&nbsp;line1   5&nbsp;line2   6&nbsp;EOF   7&nbsp;# OUTPUT:   8&nbsp;# [line1   9&nbsp;# line2]  10&nbsp;  11&nbsp;  12&nbsp;  13&nbsp;echo  14&nbsp;  15&nbsp;awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' &#60;&#60; EOF  16&nbsp;line 1  17&nbsp;line 2  18&nbsp;EOF  19&nbsp;# OUTPUT:  20&nbsp;# line  21&nbsp;# 1  22&nbsp;  23&nbsp;  24&nbsp;# Thanks, S.C.  25&nbsp;  26&nbsp;exit 0</PRE></TD></TR></TABLE></P></TD></TR></TABLE><DIVCLASS="NAVFOOTER"><HRALIGN="LEFT"WIDTH="100%"><TABLESUMMARY="Footer navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><TDWIDTH="33%"ALIGN="left"VALIGN="top"><AHREF="part5.html"ACCESSKEY="P">Prev</A></TD><TDWIDTH="34%"ALIGN="center"VALIGN="top"><AHREF="index.html"ACCESSKEY="H">Home</A></TD><TDWIDTH="33%"ALIGN="right"VALIGN="top"><AHREF="globbingref.html"ACCESSKEY="N">Next</A></TD></TR><TR><TDWIDTH="33%"ALIGN="left"VALIGN="top">Advanced Topics</TD><TDWIDTH="34%"ALIGN="center"VALIGN="top"><AHREF="part5.html"ACCESSKEY="U">Up</A></TD><TDWIDTH="33%"ALIGN="right"VALIGN="top">Globbing</TD></TR></TABLE></DIV></BODY></HTML>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -