📄 unx06.htm

📁 Linux Unix揭密.高质量电子书籍.对学习Linux有大帮助,欢迎下载学习.
💻 HTM
📖 第 1 页 / 共 5 页
字号:

<BR></P>

<PRE>$ cat REfile

A regular expression is a sequence of characters taken

from the set of uppercase and lowercase letters, digits,

punctuation marks, etc., plus a set of special regular

expression operators. Some of these operators may remind

you of file name matching, but be forewarned: in general,

regular expression operators are different from the

shell metacharacters we discussed in Chapter 1.

The simplest form of a regular expression is one that

includes only letters. For example, they would match only

the three-letter sequence t, h, e. This pattern is found

in the following words: the, therefore, bother. In other

words, wherever the regular expression pattern is found

&#151; even if it is surrounded by other characters &#151; it will

be matched.</PRE>

<H5 ALIGN="CENTER">

<CENTER><A ID="I8" NAME="I8">

<FONT SIZE=3><B>Regular Expression Characters</B>

<BR></FONT></A></CENTER></H5>

<P>Regular expressions match patterns that consist of a combination of ordinary characters, such as letters, digits, and various other characters used as operators. You will meet examples of these below. A character's use often determines its meaning in a 

regular expression. All programs that use regular expressions have a search pattern. The editor family of programs (vi, ex, ed, and sed; see Chapter 7, &quot;Editing Text Files&quot;) also has a replacement pattern. In some cases, the meaning of a special 

character differs depending on whether it's used as part of the search pattern or in the replacement pattern.

<BR></P>

<H5 ALIGN="CENTER">

<CENTER><A ID="I9" NAME="I9">

<FONT SIZE=3><B>A Regular Expression with No Special Characters</B>

<BR></FONT></A></CENTER></H5>

<P>Here's an example of a simple search for an regular expression. This regular expression is a character string with no special characters in it.

<BR></P>

<PRE>$ grep only REfile

includes only letters. For example, the would match only</PRE>

<P>The sole occurrence of only satisfied grep's search, so grep printed the matching line.

<BR></P>

<H5 ALIGN="CENTER">

<CENTER><A ID="I10" NAME="I10">

<FONT SIZE=3><B>Special Characters</B>

<BR></FONT></A></CENTER></H5>

<P>Certain characters have special meanings when used in regular expressions, and some of them have special meanings depending on their position in the regular expression. Some of these characters are used as placeholders and some as operators. Some are 
used for both, depending on their position in the regular expression.

<BR></P>

<UL>

<LI>The dot (.), asterisk (*), left square bracket ([) and backslash (\) are special except when they appear between a left and right pair of square brackets ([]).

<BR>

<BR></LI>

<LI>A circumflex or caret (^) is special when it's the first character of a regular expression, and also when it's the first character after the opening left square bracket in a left and right pair of square brackets.

<BR>

<BR></LI>

<LI>A dollar sign ($) is special when it's the last character of a regular expression.

<BR>

<BR></LI>

<LI>A pair of delimiters, usually a pair of slash characters (//), is special because it delimits the regular expression.

<BR>

<BR></LI></UL>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> Any character not used in the current regular expression can be used as the delimiter, but the slash is traditional.

<BR></NOTE>

<HR ALIGN=CENTER>

<UL>

<LI>A special character preceded by a backslash is matched by the character itself. This is called escaping. When a special character is escaped, the command recognizes it as a literal&#151;the actual character with no special meaning. In other words, as 
in file-name matching, the backslash cancels the special meaning of the character that follows it.

<BR>

<BR></LI></UL>

<P>Now let's look at each character in detail.

<BR></P>

<H6 ALIGN="CENTER">

<CENTER>

<FONT SIZE=3><B>Matching Any One Character</B>

<BR></FONT></CENTER></H6>

<P>The dot matches any one character except a newline. For example, consider the following:

<BR></P>

<PRE>$ grep 'w.r' REfile

from the set of uppercase and lowercase letters, digits,

you of file name matching, but be forewarned: in general,

in the following words: the, therefore, bother. In other

words, wherever the regular expression pattern is found</PRE>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> The regular expression w.r appears within a set of apostrophes (referred to by UNIXees as &quot;single quotes&quot;). Their use is mandatory if grep is to function properly. If they are omitted, the 
shell (see Chapters 11, 12, and 13) may interpret certain special characters in the regular expressio as if they were &quot;shell special characters&quot; rather than &quot;grep special characters&quot; and the result will be unexpected.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>The pattern w.r matches wer in lowercase on the first displayed line, by war in forewarned on the second, by wor in words on the third, and by wor in words on the fourth. Expressed in English, the sample command says &quot;Find and display all lines 
that match the following pattern: w followed by any character except a newline followed by r.&quot;

<BR></P>

<P>You can form a somewhat different one-character regular expression by enclosing a list of characters in a left and right pair of square brackets. The matching is limited to those characters listed between the brackets. For example, the pattern

<BR></P>

<PRE>[aei135XYZ]</PRE>

<P>matches any one of the characters a, e, i, 1, 3, 5, X, Y, or Z.

<BR></P>

<P>Consider the following example:

<BR></P>

<PRE>$ grep 'w[fhmkz]' REfile

words, wherever the regular expression pattern is found</PRE>

<P>This time, the match was satisfied only by the wh in wherever, matching the pattern &quot;w followed by either f, h, m, k, or z.&quot;

<BR></P>

<P>If the first character in the list is a right square bracket (]), it does not terminate the list&#151;that would make the list empty, which is not permitted. Instead, ] itself becomes one of the possible characters in the search pattern. For example, 
the pattern

<BR></P>

<PRE>[]a]</PRE>

<P>matches either ] or a.

<BR></P>

<P>If the first character in the list is a circumflex (also called a caret), the match occurs on any character that is not in the list:

<BR></P>

<PRE>$ grep 'w[^fhmkz]' REfile

from the set of uppercase and lowercase letters, digits,

you of file name matching, but be forewarned: in general,

shell metacharacters we discussed in Chapter 1.

includes only letters. For example, the would match only

in the following words: the, therefore, bother. In other

words, wherever the regular expression pattern is found

&#151; even if it is surrounded by other characters &#151; it will</PRE>

<P>The pattern &quot;w followed by anything except  f, h, m, k, or z&quot; has many matches. On line 1, we in lowercase is a &quot;w followed by anything except an f, an h, an m, a k, or a z.&quot; On line 2, wa in forewarned is a match, as is the word we 

on line 3. Line 4 contains wo in would, and line 5 contains wo in words. Line 6 has wo in words as its match. The other possible matches on line 6 are ignored because the match is satisfied at the beginning of the line. Finally, at the end of line 7, wi in 

will matches.

<BR></P>

<P>You can use a minus sign (-) inside the left and right pair of square brackets to indicate a range of letters or digits. For example, the pattern

<BR></P>

<PRE>[a-z]</PRE>

<P>matches any lowercase letter.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> You cannot write the range &quot;backward&quot;; that is, _ [z-a] is illegal.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>Consider the following example:

<BR></P>

<PRE>$ grep 'w[a-f]' REfile

from the set of uppercase and lowercase letters, digits,

you of file name matching, but be forewarned: in general,

shell metacharacters we discussed in Chapter 1.</PRE>

<P>The matches are we on line 1, wa on line 2, and we on line 3. Look at REfile again and note how many potential matches are omitted because the character following the w is not one of the group a through f.

<BR></P>

<P>Furthermore, you can include several ranges in one set of brackets. For example, the pattern

<BR></P>

<PRE>[a-zA-Z]</PRE>

<P>matches any letter, lower- or uppercase.

<BR></P>

<H6 ALIGN="CENTER">

<CENTER>

<FONT SIZE=3><B>Matching Multiples of a Single Character</B>

<BR></FONT></CENTER></H6>

<P>If you want to specify precisely how many of a given character you want the regular expression to match, you can use the escaped left and right curly brace pair (\{____\}). For example, the pattern

<BR></P>

<PRE>X\{2,5\}</PRE>

<P>matches at least two but not more than five Xs. That is, it matches XX, XXX, XXXX, or XXXXX. The minimum number of matches is written immediately after the escaped left curly brace, followed by a comma (,) and then the maximum value.

<BR></P>

<P>If you omit the maximum value (but not the comma), as in

<BR></P>

<PRE>X\{2,\}</PRE>

<P>you specify that the match should occur for at least two Xs.

<BR></P>

<P>If you write just a single value, omitting the comma, you specify the exact number of matches, no more and no less. For example, the pattern

<BR></P>

<PRE>X\{4\}</PRE>

<P>matches only XXXX. Here are some examples of this kind of regular expression:

<BR></P>

<PRE>$ grep 'p\{2\}' REfile

from the set of uppercase and lowercase letters, digits,</PRE>

<P>This is the only line that contains &quot;pp.&quot;

<BR></P>

<PRE>$ grep 'p\{1\}' REfile

A regular expression is a sequence of characters taken

from the set of uppercase and lowercase letters, digits,

punctuation marks, etc., plus a set of special regular

expression operators. Some of these operators may remind

regular expression operators are different from the

shell metacharacters we discussed in Chapter 1.

The simplest form of a regular expression is one that

includes only letters. For example, the would match only

the three-letter sequence t, h, e. This pattern is found

words, wherever the regular expression pattern is found</PRE>

<P>Notice that on the second line, the first &quot;p&quot; in &quot;uppercase&quot; satisfies the search. The grep program doesn't even see the second &quot;p&quot; in the word because it stops searching as soon as it finds one &quot;p.&quot;

<BR></P>

<H6 ALIGN="CENTER">

<CENTER>

<FONT SIZE=3><B>Matching Multiples of a Regular Expression</B>

<BR></FONT></CENTER></H6>

<P>The asterisk (*) matches zero or more of the preceding regular expression. Therefore, the pattern

<BR></P>

<PRE>X*</PRE>

<P>matches zero or more Xs: nothing, X, XX, XXX, and so on. To ensure that you get at least one character in the match, use

<BR></P>

<PRE>XX*</PRE>

<P>For example, the command

<BR></P>

<PRE>$ grep 'p*' REfile</PRE>

<P>displays the entire file, because every line can match &quot;zero or more instances of the letter p.&quot; However, note the output of the following commands:

<BR></P>

<PRE>$ grep 'pp*' REfile

A regular expression is a sequence of characters taken

from the set of uppercase and lowercase letters, digits,

punctuation marks, etc., plus a set of special regular

expression operators. Some of these operators may remind

regular expression operators are different from the

shell metacharacters we discussed in Chapter 1.

The simplest form of a regular expression is one that

includes only letters. For example, the would match only

the three-letter sequence t, h, e. This pattern is found

words, wherever the regular expression pattern is found

$ grep 'ppp*' REfile

from the set of uppercase and lowercase letters, digits,</PRE>

<P>The regular expression ppp* matches &quot;pp followed by zero or more instances of the letter p,&quot; or, in other words, &quot;two or more instances of the letter p.&quot;

<BR></P>

<P>The extended set of regular expressions includes two additional operators that are similar to the asterisk: the plus sign (+) and the question mark (?). The plus sign is used to match one or more occurrences of the preceding character, and the question 

mark is used to match zero or one occurrences. For example, the command

<BR></P>

<PRE>$ egrep 'p?' REfile</PRE>

<P>outputs the entire file because every line contains zero or one p. However, note the output of the following command:

<BR></P>

<PRE>$ egrep 'p+' REfile

A regular expression is a sequence of characters taken

from the set of uppercase and lowercase letters, digits,

punctuation marks, etc., plus a set of special regular

expression operators. Some of these operators may remind

regular expression operators are different from the

shell metacharacters we discussed in Chapter 1.

The simplest form of a regular expression is one that

includes only letters. For example, the would match only

the three-letter sequence t, h, e. This pattern is found

words, wherever the regular expression pattern is found</PRE>

<P>Another possibility is [a-z]+. This pattern matches one or more occurrences of any lowercase letter.

<BR></P>

<H6 ALIGN="CENTER">

<CENTER>

<FONT SIZE=3><B>Anchoring the Match</B>

<BR></FONT></CENTER></H6>

<P>A circumflex (^) used as the first character of the pattern anchors the regular expression to the beginning of the line. Therefore, the pattern

<BR></P>

<PRE>^[Tt]he</PRE>

<P>matches a line that begins with either The or the, but does not match a line that has a The or the at any other position on the line. Note, for example, the output of the following two commands:

<BR></P>

<PRE>$ grep '[Tt]he' REfile

from the set of uppercase and lowercase letters, digits,

expression operators. Some of these operators may remind

regular expression operators are different from the

The simplest form of a regular expression is one that

includes only letters. For example, the would match only

the three-letter sequence t, h, e. This pattern is found

in the following words: the, therefore, bother. In other

words, wherever the regular expression pattern is found

&#151; even if it is surrounded by other characters &#151; it is

$ grep '^[Tt]he' REfile

The simplest form of a regular expression is one that

the three-letter sequence t, h, e. This pattern is found</PRE>

<P>A dollar sign as the last character of the pattern anchors the regular expression to the end of the line, as in the following example:

<BR></P>

<PRE>$ grep '1\.$' REfile

shell metacharacters we discussed in Chapter 1.</PRE>

<P>This anchoring occurs because the line ends in a match of the regular expression. The period in the regular expression is preceded by a backslash, so the program knows that it's looking for a period and not just any character.

<BR></P>

<P>Here's another example that uses REfile:

<BR></P>

<PRE>$ grep '[Tt]he$' REfile

regular expression operators are different from the</PRE>

<P>The regular expression .* is an idiom that is used to match zero or more occurrences of any sequence of any characters. Any multicharacter regular expression always matches the longest string of characters that fits the regular expression description. 
Consequently, .* used as the entire regular expression always matches an entire line of text. Therefore, the command

<BR></P>

<PRE>$ grep '^.*$' REfile</PRE>

<P>prints the entire file. Note that in this case the anchoring characters are redundant.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -