📄 regexp.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><HTML><HEAD><TITLE>Regular Expressions</TITLE><METANAME="GENERATOR"CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+"><LINKREL="HOME"TITLE="Advanced Bash-Scripting Guide"HREF="index.html"><LINKREL="UP"TITLE="Advanced Topics"HREF="part5.html"><LINKREL="PREVIOUS"TITLE="Advanced Topics"HREF="part5.html"><LINKREL="NEXT"TITLE="Globbing"HREF="globbingref.html"><METAHTTP-EQUIV="Content-Style-Type"CONTENT="text/css"><LINKREL="stylesheet"HREF="common/kde-common.css"TYPE="text/css"><METAHTTP-EQUIV="Content-Type"CONTENT="text/html; charset=iso-8859-1"><METAHTTP-EQUIV="Content-Language"CONTENT="en"><LINKREL="stylesheet"HREF="common/kde-localised.css"TYPE="text/css"TITLE="KDE-English"><LINKREL="stylesheet"HREF="common/kde-default.css"TYPE="text/css"TITLE="KDE-Default"></HEAD><BODYCLASS="CHAPTER"BGCOLOR="#FFFFFF"TEXT="#000000"LINK="#AA0000"VLINK="#AA0055"ALINK="#AA0000"STYLE="font-family: sans-serif;"><DIVCLASS="NAVHEADER"><TABLESUMMARY="Header navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><THCOLSPAN="3"ALIGN="center">Advanced Bash-Scripting Guide: An in-depth exploration of the art of shell scripting</TH></TR><TR><TDWIDTH="10%"ALIGN="left"VALIGN="bottom"><AHREF="part5.html"ACCESSKEY="P">Prev</A></TD><TDWIDTH="80%"ALIGN="center"VALIGN="bottom"></TD><TDWIDTH="10%"ALIGN="right"VALIGN="bottom"><AHREF="globbingref.html"ACCESSKEY="N">Next</A></TD></TR></TABLE><HRALIGN="LEFT"WIDTH="100%"></DIV><DIVCLASS="CHAPTER"><H1><ANAME="REGEXP"></A>Chapter 17. Regular Expressions</H1><TABLEBORDER="0"WIDTH="100%"CELLSPACING="0"CELLPADDING="0"CLASS="EPIGRAPH"><TR><TDWIDTH="45%"> </TD><TDWIDTH="45%"ALIGN="LEFT"VALIGN="TOP"><I><P><I>. . . the intellectual activity associated with software development is largely one of gaining insight.</I></P><P><I>--Stowe Boyd</I></P></I></TD></TR></TABLE><P><ANAME="REGEXREF"></A></P><P>To fully utilize the power of shell scripting, you need to master Regular Expressions. Certain commands and utilities commonly used in scripts, such as <AHREF="textproc.html#GREPREF">grep</A>, <AHREF="moreadv.html#EXPRREF">expr</A>, <AHREF="sedawk.html#SEDREF">sed</A> and <AHREF="awk.html#AWKREF">awk</A>, interpret and use REs. As of <AHREF="bashver3.html#BASH3REF">version 3</A>, Bash has acquired its own <AHREF="bashver3.html#REGEXMATCHREF">RE-match operator</A>: <BCLASS="COMMAND">=~</B>.</P><DIVCLASS="SECT1"><H1CLASS="SECT1"><ANAME="AEN15780"></A>17.1. A Brief Introduction to Regular Expressions</H1><P>An expression is a string of characters. Those characters having an interpretation above and beyond their literal meaning are called <ICLASS="FIRSTTERM">metacharacters</I>. A quote symbol, for example, may denote speech by a person, <ICLASS="FIRSTTERM">ditto</I>, or a meta-meaning <ANAME="AEN15785"HREF="#FTN.AEN15785">[1]</A> for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that match (or specify) patterns.</P><P>A Regular Expression contains one or more of the following:</P><UL><LI><P><ICLASS="FIRSTTERM">A character set</I>. These are the characters retaining their literal meaning. The simplest type of Regular Expression consists <SPANCLASS="emphasis"><ICLASS="EMPHASIS">only</I></SPAN> of a character set, with no metacharacters.</P></LI><LI><P><ANAME="ANCHORREF"></A></P><P><ICLASS="FIRSTTERM">An anchor</I>. These designate (<ICLASS="FIRSTTERM">anchor</I>) the position in the line of text that the RE is to match. For example, <SPANCLASS="TOKEN">^</SPAN>, and <SPANCLASS="TOKEN">$</SPAN> are anchors.</P></LI><LI><P><ICLASS="FIRSTTERM">Modifiers</I>. These expand or narrow (<ICLASS="FIRSTTERM">modify</I>) the range of text the RE is to match. Modifiers include the asterisk, brackets, and the backslash.</P></LI></UL><P>The main uses for Regular Expressions (<ICLASS="FIRSTTERM">RE</I>s) are text searches and string manipulation. An RE <ICLASS="FIRSTTERM">matches</I> a single character or a set of characters -- a string or a part of a string.</P><UL><LI><P>The asterisk -- <SPANCLASS="TOKEN">*</SPAN> -- matches any number of repeats of the character string or RE preceding it, including <SPANCLASS="emphasis"><ICLASS="EMPHASIS">zero</I></SPAN> instances.</P><P><SPANCLASS="QUOTE">"1133*"</SPAN> matches <TTCLASS="REPLACEABLE"><I>11 + one or more 3's</I></TT>: <TTCLASS="REPLACEABLE"><I>113</I></TT>, <TTCLASS="REPLACEABLE"><I>1133</I></TT>, <TTCLASS="REPLACEABLE"><I>1133333</I></TT>, and so forth.</P></LI><LI><P><ANAME="REGEXDOT"></A>The dot -- <SPANCLASS="TOKEN">.</SPAN> -- matches any one character, except a newline. <ANAME="AEN15838"HREF="#FTN.AEN15838">[2]</A> </P><P><SPANCLASS="QUOTE">"13."</SPAN> matches <TTCLASS="REPLACEABLE"><I>13 + at least one of any character (including a space)</I></TT>: <TTCLASS="REPLACEABLE"><I>1133</I></TT>, <TTCLASS="REPLACEABLE"><I>11333</I></TT>, but not <TTCLASS="REPLACEABLE"><I>13</I></TT> (additional character missing).</P></LI><LI><P>The caret -- <SPANCLASS="TOKEN">^</SPAN> -- matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE. </P></LI><LI><P><ANAME="DOLLARSIGNREF"></A></P><P>The dollar sign -- <SPANCLASS="TOKEN">$</SPAN> -- at the end of an RE matches the end of a line.</P><P><SPANCLASS="QUOTE">"XXX$"</SPAN> matches <SPANCLASS="TOKEN">XXX</SPAN> at the end of a line.</P><P><SPANCLASS="QUOTE">"^$"</SPAN> matches blank lines.</P></LI><LI><P><ANAME="BRACKETSREF"></A></P><P>Brackets -- <SPANCLASS="TOKEN">[...]</SPAN> -- enclose a set of characters to match in a single RE.</P><P><SPANCLASS="QUOTE">"[xyz]"</SPAN> matches the characters <TTCLASS="REPLACEABLE"><I>x</I></TT>, <TTCLASS="REPLACEABLE"><I>y</I></TT>, or <TTCLASS="REPLACEABLE"><I>z</I></TT>.</P><P><SPANCLASS="QUOTE">"[c-n]"</SPAN> matches any of the characters in the range <TTCLASS="REPLACEABLE"><I>c</I></TT> to <TTCLASS="REPLACEABLE"><I>n</I></TT>.</P><P><SPANCLASS="QUOTE">"[B-Pk-y]"</SPAN> matches any of the characters in the ranges <TTCLASS="REPLACEABLE"><I>B</I></TT> to <TTCLASS="REPLACEABLE"><I>P</I></TT> and <TTCLASS="REPLACEABLE"><I>k</I></TT> to <TTCLASS="REPLACEABLE"><I>y</I></TT>.</P><P><SPANCLASS="QUOTE">"[a-z0-9]"</SPAN> matches any lowercase letter or any digit.</P><P><SPANCLASS="QUOTE">"[^b-d]"</SPAN> matches all characters <SPANCLASS="emphasis"><ICLASS="EMPHASIS">except</I></SPAN> those in the range <TTCLASS="REPLACEABLE"><I>b</I></TT> to <TTCLASS="REPLACEABLE"><I>d</I></TT>. This is an instance of <SPANCLASS="TOKEN">^</SPAN> negating or inverting the meaning of the following RE (taking on a role similar to <SPANCLASS="TOKEN">!</SPAN> in a different context).</P><P>Combined sequences of bracketed characters match common word patterns. <SPANCLASS="QUOTE">"[Yy][Ee][Ss]"</SPAN> matches <TTCLASS="REPLACEABLE"><I>yes</I></TT>, <TTCLASS="REPLACEABLE"><I>Yes</I></TT>, <TTCLASS="REPLACEABLE"><I>YES</I></TT>, <TTCLASS="REPLACEABLE"><I>yEs</I></TT>, and so forth. <SPANCLASS="QUOTE">"[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]"</SPAN> matches any Social Security number.</P></LI><LI><P><ANAME="REGEXBS"></A></P><P>The backslash -- <SPANCLASS="TOKEN">\</SPAN> -- <AHREF="escapingsection.html#ESCP">escapes</A> a special character, which means that character gets interpreted literally.</P><P>A <SPANCLASS="QUOTE">"\$"</SPAN> reverts back to its literal meaning of <SPANCLASS="QUOTE">"$"</SPAN>, rather than its RE meaning of end-of-line. Likewise a <SPANCLASS="QUOTE">"\\"</SPAN> has the literal meaning of <SPANCLASS="QUOTE">"\"</SPAN>.</P></LI><LI><P><ANAME="ANGLEBRAC"></A></P><P><AHREF="escapingsection.html#ESCP">Escaped</A> <SPANCLASS="QUOTE">"angle brackets"</SPAN> -- <SPANCLASS="TOKEN">\<...\></SPAN> -- mark word boundaries.</P><P>The angle brackets must be escaped, since otherwise they have only their literal character meaning.</P><P><SPANCLASS="QUOTE">"\<the\>"</SPAN> matches the word <SPANCLASS="QUOTE">"the,"</SPAN> but not the words <SPANCLASS="QUOTE">"them,"</SPAN> <SPANCLASS="QUOTE">"there,"</SPAN> <SPANCLASS="QUOTE">"other,"</SPAN> etc.</P><P> <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>cat textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is line 1, of which there is only one instance. This is the only instance of line 2. This is line 3, another line. This is line 4.</TT> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep 'the' textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is line 1, of which there is only one instance. This is the only instance of line 2. This is line 3, another line.</TT> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep '\<the\>' textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is the only instance of line 2.</TT> </PRE></TD></TR></TABLE> </P></LI></UL><TABLECLASS="SIDEBAR"BORDER="1"CELLPADDING="5"><TR><TD><DIVCLASS="SIDEBAR"><ANAME="AEN15960"></A><P>The only way to be certain that a particular RE works is to test it.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="100%"><TR><TD><PRECLASS="PROGRAMLISTING"> 1 TEST FILE: tstfile # No match. 2 # No match. 3 Run grep "1133*" on this file. # Match. 4 # No match. 5 # No match. 6 This line contains the number 113. # Match. 7 This line contains the number 13. # No match. 8 This line contains the number 133. # No match. 9 This line contains the number 1133. # Match. 10 This line contains the number 113312. # Match. 11 This line contains the number 1112. # No match. 12 This line contains the number 113312312. # Match. 13 This line contains no numbers at all. # No match.</PRE></TD></TR></TABLE></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="100%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep "1133*" tstfile</B></TT> <TTCLASS="COMPUTEROUTPUT">Run grep "1133*" on this file. # Match. This line contains the number 113. # Match. This line contains the number 1133. # Match. This line contains the number 113312. # Match. This line contains the number 113312312. # Match.</TT> </PRE></TD></TR></TABLE></DIV></TD></TR></TABLE><UL><LISTYLE="list-style-type: square"><DIVCLASS="FORMALPARA"><P><B><ANAME="EXTREGEX"></A>Extended REs. </B>Additional metacharacters added to the basic set. Used in <AHREF="textproc.html#EGREPREF">egrep</A>, <AHREF="awk.html#AWKREF">awk</A>, and <AHREF="wrapper.html#PERLREF">Perl</A>.</P></DIV
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -