regexp.html

来自「Shall高级编程」· HTML 代码 · 共 1,555 行 · 第 1/2 页
HTML
1,555 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><HTML><HEAD><TITLE>Regular Expressions</TITLE><METANAME="GENERATOR"CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+"><LINKREL="HOME"TITLE="Advanced Bash-Scripting Guide"HREF="index.html"><LINKREL="UP"TITLE="Advanced Topics"HREF="part5.html"><LINKREL="PREVIOUS"TITLE="Advanced Topics"HREF="part5.html"><LINKREL="NEXT"TITLE="Globbing"HREF="globbingref.html"><METAHTTP-EQUIV="Content-Style-Type"CONTENT="text/css"><LINKREL="stylesheet"HREF="common/kde-common.css"TYPE="text/css"><METAHTTP-EQUIV="Content-Type"CONTENT="text/html; charset=iso-8859-1"><METAHTTP-EQUIV="Content-Language"CONTENT="en"><LINKREL="stylesheet"HREF="common/kde-localised.css"TYPE="text/css"TITLE="KDE-English"><LINKREL="stylesheet"HREF="common/kde-default.css"TYPE="text/css"TITLE="KDE-Default"></HEAD><BODYCLASS="CHAPTER"BGCOLOR="#FFFFFF"TEXT="#000000"LINK="#AA0000"VLINK="#AA0055"ALINK="#AA0000"STYLE="font-family: sans-serif;"><DIVCLASS="NAVHEADER"><TABLESUMMARY="Header navigation table"WIDTH="100%"BORDER="0"CELLPADDING="0"CELLSPACING="0"><TR><THCOLSPAN="3"ALIGN="center">Advanced Bash-Scripting Guide: An in-depth exploration of the art of shell scripting</TH></TR><TR><TDWIDTH="10%"ALIGN="left"VALIGN="bottom"><AHREF="part5.html"ACCESSKEY="P">Prev</A></TD><TDWIDTH="80%"ALIGN="center"VALIGN="bottom"></TD><TDWIDTH="10%"ALIGN="right"VALIGN="bottom"><AHREF="globbingref.html"ACCESSKEY="N">Next</A></TD></TR></TABLE><HRALIGN="LEFT"WIDTH="100%"></DIV><DIVCLASS="CHAPTER"><H1><ANAME="REGEXP"></A>Chapter 17. Regular Expressions</H1><TABLEBORDER="0"WIDTH="100%"CELLSPACING="0"CELLPADDING="0"CLASS="EPIGRAPH"><TR><TDWIDTH="45%">&nbsp;</TD><TDWIDTH="45%"ALIGN="LEFT"VALIGN="TOP"><I><P><I>. . . the intellectual activity associated with software        development is largely one of gaining insight.</I></P><P><I>--Stowe Boyd</I></P></I></TD></TR></TABLE><P><ANAME="REGEXREF"></A></P><P>To fully utilize the power of shell scripting, you need to	master Regular Expressions. Certain commands	and utilities commonly used in scripts, such	as <AHREF="textproc.html#GREPREF">grep</A>, <AHREF="moreadv.html#EXPRREF">expr</A>, <AHREF="sedawk.html#SEDREF">sed</A>	and <AHREF="awk.html#AWKREF">awk</A>, interpret and use REs. As of	<AHREF="bashver3.html#BASH3REF">version 3</A>, Bash has acquired its	own <AHREF="bashver3.html#REGEXMATCHREF">RE-match operator</A>:	<BCLASS="COMMAND">=~</B>.</P><DIVCLASS="SECT1"><H1CLASS="SECT1"><ANAME="AEN15780"></A>17.1. A Brief Introduction to Regular Expressions</H1><P>An expression is a string of characters. Those characters	  having an interpretation above and beyond their literal	  meaning are called <ICLASS="FIRSTTERM">metacharacters</I>.	  A quote symbol, for example, may denote speech by a person,	  <ICLASS="FIRSTTERM">ditto</I>, or a meta-meaning	    <ANAME="AEN15785"HREF="#FTN.AEN15785">[1]</A>	  for the symbols that follow. Regular Expressions are sets	  of characters and/or metacharacters that match (or specify)	  patterns.</P><P>A Regular Expression contains one or more of the	following:</P><UL><LI><P><ICLASS="FIRSTTERM">A character set</I>. These are the	      characters retaining their literal meaning. The	      simplest type of Regular Expression consists	      <SPANCLASS="emphasis"><ICLASS="EMPHASIS">only</I></SPAN> of a character set, with no	      metacharacters.</P></LI><LI><P><ANAME="ANCHORREF"></A></P><P><ICLASS="FIRSTTERM">An anchor</I>. These designate	      (<ICLASS="FIRSTTERM">anchor</I>) the position in the line of	      text that the RE is to match. For example, <SPANCLASS="TOKEN">^</SPAN>,	      and <SPANCLASS="TOKEN">$</SPAN> are anchors.</P></LI><LI><P><ICLASS="FIRSTTERM">Modifiers</I>. These expand or narrow	      (<ICLASS="FIRSTTERM">modify</I>) the range of text the RE is	      to match. Modifiers include the asterisk, brackets, and	      the backslash.</P></LI></UL><P>The main uses for Regular Expressions	  (<ICLASS="FIRSTTERM">RE</I>s) are text searches and string	  manipulation. An RE <ICLASS="FIRSTTERM">matches</I> a single	  character or a set of characters -- a string or a part of	  a string.</P><UL><LI><P>The asterisk -- <SPANCLASS="TOKEN">*</SPAN> -- matches any number of	      repeats of the character string or RE preceding it,	      including <SPANCLASS="emphasis"><ICLASS="EMPHASIS">zero</I></SPAN> instances.</P><P><SPANCLASS="QUOTE">"1133*"</SPAN> matches <TTCLASS="REPLACEABLE"><I>11 +	      one or more 3's</I></TT>:	      <TTCLASS="REPLACEABLE"><I>113</I></TT>, <TTCLASS="REPLACEABLE"><I>1133</I></TT>,	      <TTCLASS="REPLACEABLE"><I>1133333</I></TT>, and so forth.</P></LI><LI><P><ANAME="REGEXDOT"></A>The dot -- <SPANCLASS="TOKEN">.</SPAN> -- matches	      any one character, except a newline.	        <ANAME="AEN15838"HREF="#FTN.AEN15838">[2]</A>  	    </P><P><SPANCLASS="QUOTE">"13."</SPAN> matches <TTCLASS="REPLACEABLE"><I>13 + at	     least one of any character (including a	     space)</I></TT>: <TTCLASS="REPLACEABLE"><I>1133</I></TT>,	     <TTCLASS="REPLACEABLE"><I>11333</I></TT>, but not	     <TTCLASS="REPLACEABLE"><I>13</I></TT> (additional character	     missing).</P></LI><LI><P>The caret -- <SPANCLASS="TOKEN">^</SPAN> -- matches the beginning of	      a line, but sometimes, depending on context, negates the	      meaning of a set of characters in an RE.	    </P></LI><LI><P><ANAME="DOLLARSIGNREF"></A></P><P>The dollar sign -- <SPANCLASS="TOKEN">$</SPAN> -- at the end of an	      RE matches the end of a line.</P><P><SPANCLASS="QUOTE">"XXX$"</SPAN> matches <SPANCLASS="TOKEN">XXX</SPAN> at the	      end of a line.</P><P><SPANCLASS="QUOTE">"^$"</SPAN> matches blank lines.</P></LI><LI><P><ANAME="BRACKETSREF"></A></P><P>Brackets -- <SPANCLASS="TOKEN">[...]</SPAN> -- enclose a set of characters	      to match in a single RE.</P><P><SPANCLASS="QUOTE">"[xyz]"</SPAN> matches the characters	      <TTCLASS="REPLACEABLE"><I>x</I></TT>, <TTCLASS="REPLACEABLE"><I>y</I></TT>,	      or <TTCLASS="REPLACEABLE"><I>z</I></TT>.</P><P><SPANCLASS="QUOTE">"[c-n]"</SPAN> matches any of the	      characters in the range <TTCLASS="REPLACEABLE"><I>c</I></TT>	      to <TTCLASS="REPLACEABLE"><I>n</I></TT>.</P><P><SPANCLASS="QUOTE">"[B-Pk-y]"</SPAN> matches any of the	      characters in the ranges <TTCLASS="REPLACEABLE"><I>B</I></TT>	      to <TTCLASS="REPLACEABLE"><I>P</I></TT>	      and <TTCLASS="REPLACEABLE"><I>k</I></TT> to	      <TTCLASS="REPLACEABLE"><I>y</I></TT>.</P><P><SPANCLASS="QUOTE">"[a-z0-9]"</SPAN> matches any lowercase letter or any	      digit.</P><P><SPANCLASS="QUOTE">"[^b-d]"</SPAN> matches all characters	      <SPANCLASS="emphasis"><ICLASS="EMPHASIS">except</I></SPAN> those in	      the range <TTCLASS="REPLACEABLE"><I>b</I></TT> to	      <TTCLASS="REPLACEABLE"><I>d</I></TT>. This is an instance of	      <SPANCLASS="TOKEN">^</SPAN> negating or inverting the meaning	      of the following RE (taking on a role similar to	      <SPANCLASS="TOKEN">!</SPAN> in a different context).</P><P>Combined sequences of bracketed characters match  	      common word patterns. <SPANCLASS="QUOTE">"[Yy][Ee][Ss]"</SPAN> matches	      <TTCLASS="REPLACEABLE"><I>yes</I></TT>, <TTCLASS="REPLACEABLE"><I>Yes</I></TT>,	      <TTCLASS="REPLACEABLE"><I>YES</I></TT>, <TTCLASS="REPLACEABLE"><I>yEs</I></TT>,	      and so forth.	      <SPANCLASS="QUOTE">"[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]"</SPAN>	      matches any Social Security number.</P></LI><LI><P><ANAME="REGEXBS"></A></P><P>The backslash -- <SPANCLASS="TOKEN">\</SPAN> -- <AHREF="escapingsection.html#ESCP">escapes</A> a special character, which	      means that character gets interpreted literally.</P><P>A <SPANCLASS="QUOTE">"\$"</SPAN> reverts back to its	       literal meaning of <SPANCLASS="QUOTE">"$"</SPAN>, rather than its	       RE meaning of end-of-line. Likewise a <SPANCLASS="QUOTE">"\\"</SPAN>	       has the literal meaning of <SPANCLASS="QUOTE">"\"</SPAN>.</P></LI><LI><P><ANAME="ANGLEBRAC"></A></P><P><AHREF="escapingsection.html#ESCP">Escaped</A> <SPANCLASS="QUOTE">"angle	      brackets"</SPAN> -- <SPANCLASS="TOKEN">\&#60;...\&#62;</SPAN> -- mark word	      boundaries.</P><P>The angle brackets must be escaped, since otherwise	      they have only their literal character meaning.</P><P><SPANCLASS="QUOTE">"\&#60;the\&#62;"</SPAN> matches the word	      <SPANCLASS="QUOTE">"the,"</SPAN> but not the words <SPANCLASS="QUOTE">"them,"</SPAN>	      <SPANCLASS="QUOTE">"there,"</SPAN> <SPANCLASS="QUOTE">"other,"</SPAN> etc.</P><P>	      <TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="90%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>cat textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is line 1, of which there is only one instance. This is the only instance of line 2. This is line 3, another line. This is line 4.</TT>   <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep 'the' textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is line 1, of which there is only one instance. This is the only instance of line 2. This is line 3, another line.</TT>   <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep '\&#60;the\&#62;' textfile</B></TT> <TTCLASS="COMPUTEROUTPUT">This is the only instance of line 2.</TT> 	      </PRE></TD></TR></TABLE>	    </P></LI></UL><TABLECLASS="SIDEBAR"BORDER="1"CELLPADDING="5"><TR><TD><DIVCLASS="SIDEBAR"><ANAME="AEN15960"></A><P>The only way to be certain that a particular RE works is to	    test it.</P><P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="100%"><TR><TD><PRECLASS="PROGRAMLISTING">   1&nbsp;TEST FILE: tstfile                          # No match.   2&nbsp;                                            # No match.   3&nbsp;Run   grep "1133*"  on this file.           # Match.   4&nbsp;                                            # No match.   5&nbsp;                                            # No match.   6&nbsp;This line contains the number 113.          # Match.   7&nbsp;This line contains the number 13.           # No match.   8&nbsp;This line contains the number 133.          # No match.   9&nbsp;This line contains the number 1133.         # Match.  10&nbsp;This line contains the number 113312.       # Match.  11&nbsp;This line contains the number 1112.         # No match.  12&nbsp;This line contains the number 113312312.    # Match.  13&nbsp;This line contains no numbers at all.       # No match.</PRE></TD></TR></TABLE></P><TABLEBORDER="0"BGCOLOR="#E0E0E0"WIDTH="100%"><TR><TD><PRECLASS="SCREEN"> <TTCLASS="PROMPT">bash$ </TT><TTCLASS="USERINPUT"><B>grep "1133*" tstfile</B></TT> <TTCLASS="COMPUTEROUTPUT">Run   grep "1133*"  on this file.           # Match. This line contains the number 113.          # Match. This line contains the number 1133.         # Match. This line contains the number 113312.       # Match. This line contains the number 113312312.    # Match.</TT> 	      </PRE></TD></TR></TABLE></DIV></TD></TR></TABLE><UL><LISTYLE="list-style-type: square"><DIVCLASS="FORMALPARA"><P><B><ANAME="EXTREGEX"></A>Extended REs. </B>Additional metacharacters added to the basic set. Used		in <AHREF="textproc.html#EGREPREF">egrep</A>,		<AHREF="awk.html#AWKREF">awk</A>, and <AHREF="wrapper.html#PERLREF">Perl</A>.</P></DIV
regexp.html - 源码说明

本页面展示了「Shall高级编程」中的 regexp.html 源码文件，采用 HTML 编程语言编写，共 1,555 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Shall相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?