📄 regexp.html
字号:
></LI><LI><P><ANAME="QUEXREGEX"></A></P><P>The question mark -- <SPANCLASS="TOKEN">?</SPAN> -- matches zero or one of the previous RE. It is generally used for matching single characters.</P></LI><LI><P><ANAME="PLUSREF"></A></P><P>The plus -- <SPANCLASS="TOKEN">+</SPAN> -- matches one or more of the previous RE. It serves a role similar to the <SPANCLASS="TOKEN">*</SPAN>, but does <SPANCLASS="emphasis"><ICLASS="EMPHASIS">not</I></SPAN> match zero occurrences.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 # GNU versions of sed and awk can use "+", 2 # but it needs to be escaped. 3 4 echo a111b | sed -ne '/a1\+b/p' 5 echo a111b | grep 'a1\+b' 6 echo a111b | gawk '/a1+b/' 7 # All of above are equivalent. 8 9 # Thanks, S.C.</PRE></TD></TR></TABLE></P><P><ANAME="ESCPCB"></A></P></LI><LI><P><AHREF="escapingsection.html#ESCP">Escaped</A> <SPANCLASS="QUOTE">"curly brackets"</SPAN> -- <SPANCLASS="TOKEN">\{ \}</SPAN> -- indicate the number of occurrences of a preceding RE to match.</P><P>It is necessary to escape the curly brackets since they have only their literal character meaning otherwise. This usage is technically not part of the basic RE set.</P><P><SPANCLASS="QUOTE">"[0-9]\{5\}"</SPAN> matches exactly five digits (characters in the range of 0 to 9).</P><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Curly brackets are not available as an RE in the <SPANCLASS="QUOTE">"classic"</SPAN> (non-POSIX compliant) version of <AHREF="awk.html#AWKREF">awk</A>. However, <BCLASS="COMMAND">gawk</B> has the <TTCLASS="OPTION">--re-interval</TT> option that permits them (without being escaped).</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>echo 2222 | gawk --re-interval '/2{3}/'</B></TT> <TTCLASS="COMPUTEROUTPUT">2222</TT> </PRE></TD></TR></TABLE> </P><P><BCLASS="COMMAND">Perl</B> and some <BCLASS="COMMAND">egrep</B> versions do not require escaping the curly brackets.</P></TD></TR></TABLE></DIV></LI><LI><P><ANAME="PARENGRPS"></A></P><P>Parentheses -- <BCLASS="COMMAND">( )</B> -- enclose a group of REs. They are useful with the following <SPANCLASS="QUOTE">"<SPANCLASS="TOKEN">|</SPAN>"</SPAN> operator and in <AHREF="string-manipulation.html#EXPRPAREN">substring extraction</A> using <AHREF="moreadv.html#EXPRREF">expr</A>.</P></LI><LI><P>The -- <BCLASS="COMMAND">|</B> -- <SPANCLASS="QUOTE">"or"</SPAN> RE operator matches any of a set of alternate characters.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>egrep 're(a|e)d' misc.txt</B></TT> <TTCLASS="COMPUTEROUTPUT">People who read seem to be better informed than those who do not. The clarinet produces sound by the vibration of its reed.</TT> </PRE></TD></TR></TABLE> </P></LI></UL><DIVCLASS="NOTE"><TABLECLASS="NOTE"WIDTH="100%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/note.png"HSPACE="5"ALT="Note"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>Some versions of <BCLASS="COMMAND">sed</B>, <BCLASS="COMMAND">ed</B>, and <BCLASS="COMMAND">ex</B> support escaped versions of the extended Regular Expressions described above, as do the GNU utilities.</P></TD></TR></TABLE></DIV><UL><LISTYLE="list-style-type: square"><DIVCLASS="FORMALPARA"><P><B><ANAME="POSIXREF"></A>POSIX Character Classes. </B><TTCLASS="USERINPUT"><B>[:class:]</B></TT></P></DIV><P>This is an alternate method of specifying a range of characters to match.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:alnum:]</B></TT> matches alphabetic or numeric characters. This is equivalent to <TTCLASS="USERINPUT"><B>A-Za-z0-9</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:alpha:]</B></TT> matches alphabetic characters. This is equivalent to <TTCLASS="USERINPUT"><B>A-Za-z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:blank:]</B></TT> matches a space or a tab.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:cntrl:]</B></TT> matches control characters.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:digit:]</B></TT> matches (decimal) digits. This is equivalent to <TTCLASS="USERINPUT"><B>0-9</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:graph:]</B></TT> (graphic printable characters). Matches characters in the range of ASCII 33 - 126. This is the same as <TTCLASS="USERINPUT"><B>[:print:]</B></TT>, below, but excluding the space character.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:lower:]</B></TT> matches lowercase alphabetic characters. This is equivalent to <TTCLASS="USERINPUT"><B>a-z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:print:]</B></TT> (printable characters). Matches characters in the range of ASCII 32 - 126. This is the same as <TTCLASS="USERINPUT"><B>[:graph:]</B></TT>, above, but adding the space character.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:space:]</B></TT> matches whitespace characters (space and horizontal tab).</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:upper:]</B></TT> matches uppercase alphabetic characters. This is equivalent to <TTCLASS="USERINPUT"><B>A-Z</B></TT>.</P></LI><LI><P><TTCLASS="USERINPUT"><B>[:xdigit:]</B></TT> matches hexadecimal digits. This is equivalent to <TTCLASS="USERINPUT"><B>0-9A-Fa-f</B></TT>.</P><DIVCLASS="IMPORTANT"><TABLECLASS="IMPORTANT"WIDTH="90%"BORDER="0"><TR><TDWIDTH="25"ALIGN="CENTER"VALIGN="TOP"><IMGSRC="common/important.png"HSPACE="5"ALT="Important"></TD><TDALIGN="LEFT"VALIGN="TOP"><P>POSIX character classes generally require quoting or <AHREF="tests.html#DBLBRACKETS">double brackets</A> ([[ ]]).</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep [[:digit:]] test.file</B></TT> <TTCLASS="COMPUTEROUTPUT">abc=723</TT> </PRE></TD></TR></TABLE> </P><P>These character classes may even be used with <AHREF="globbingref.html">globbing</A>, to a limited extent.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>ls -l ?[[:digit:]][[:digit:]]?</B></TT> <TTCLASS="COMPUTEROUTPUT">-rw-rw-r-- 1 bozo bozo 0 Aug 21 14:47 a33b</TT> </PRE></TD></TR></TABLE> </P><P>To see POSIX character classes used in scripts, refer to <AHREF="textproc.html#EX49">Example 15-20</A> and <AHREF="textproc.html#LOWERCASE">Example 15-21</A>.</P></TD></TR></TABLE></DIV></LI></UL><P><AHREF="sedawk.html#SEDREF">Sed</A>, <AHREF="awk.html#AWKREF">awk</A>, and <AHREF="wrapper.html#PERLREF">Perl</A>, used as filters in scripts, take REs as arguments when "sifting" or transforming files or I/O streams. See <AHREF="contributed-scripts.html#BEHEAD">Example A-12</A> and <AHREF="contributed-scripts.html#TREE">Example A-17</A> for illustrations of this.</P><P>The standard reference on this complex topic is Friedl's <ICLASS="CITETITLE">Mastering Regular Expressions</I>. <ICLASS="CITETITLE">Sed & Awk</I>, by Dougherty and Robbins, also gives a very lucid treatment of REs. See the <AHREF="biblio.html"><I>Bibliography</I></A> for more information on these books.</P></DIV></DIV><H3CLASS="FOOTNOTES">Notes</H3><TABLEBORDER="0"CLASS="FOOTNOTES"WIDTH="100%"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="5%"><ANAME="FTN.AEN15785"HREF="regexp.html#AEN15785">[1]</A></TD><TDALIGN="LEFT"VALIGN="TOP"WIDTH="95%"><P><ANAME="METAMEANINGREF"></A>A <ICLASS="FIRSTTERM">meta-meaning</I> is the meaning of a term or expression on a higher level of abstraction. For example, the <ICLASS="FIRSTTERM">literal</I> meaning of <ICLASS="FIRSTTERM">regular expression</I> is an ordinary expression that conforms to accepted usage. The <ICLASS="FIRSTTERM">meta-meaning</I> is drastically different, as discussed at length in this chapter.</P></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="5%"><ANAME="FTN.AEN15838"HREF="regexp.html#AEN15838">[2]</A></TD><TDALIGN="LEFT"VALIGN="TOP"WIDTH="95%"><P>Since <AHREF="sedawk.html#SEDREF">sed</A>, <AHREF="awk.html#AWKREF">awk</A>, and <AHREF="textproc.html#GREPREF">grep</A> process single lines, there will usually not be a newline to match. In those cases where there is a newline in a multiple line expression, the dot will match the newline. <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 #!/bin/bash 2 3 sed -e 'N;s/.*/[&]/' << EOF # Here Document 4 line1 5 line2 6 EOF 7 # OUTPUT: 8 # [line1 9 # line2] 10 11 12 13 echo 14 15 awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' << EOF 16 line 1 17 line 2 18 EOF 19 # OUTPUT: 20 # line 21 # 1 22 23 24 # Thanks, S.C. 25 26 exit 0</PRE></TD></TR></TABLE></P></TD></TR></TABLE><DIVCLASS="NAVFOOTER"><HRALIGN="LEFT"WIDTH="100%"><TABLESUMMARY="Footer navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><TDWIDTH="33%"ALIGN="left"VALIGN="top"><AHREF="part5.html"ACCESSKEY="P">Prev</A></TD><TDWIDTH="34%"ALIGN="center"VALIGN="top"><AHREF="index.html"ACCESSKEY="H">Home</A></TD><TDWIDTH="33%"ALIGN="right"VALIGN="top"><AHREF="globbingref.html"ACCESSKEY="N">Next</A></TD></TR><TR><TDWIDTH="33%"ALIGN="left"VALIGN="top">Advanced Topics</TD><TDWIDTH="34%"ALIGN="center"VALIGN="top"><AHREF="part5.html"ACCESSKEY="U">Up</A></TD><TDWIDTH="33%"ALIGN="right"VALIGN="top">Globbing</TD></TR></TABLE></DIV></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -