📄 unx15.htm

📁 Linux Unix揭密.高质量电子书籍.对学习Linux有大帮助,欢迎下载学习.
💻 HTM
📖 第 1 页 / 共 5 页
字号:

<TR>

<TD>

<P>!~</P>

<TD>

<P>not matched by</P></TABLE>

<P>In awk, as in C, the logical equality operator is == rather than =. The single = compares memory location, whereas == compares values. When the pattern is a comparison, the pattern matches if the comparison is true (non-null or non-zero). Here's an 
example: what if you wanted to only print lines where the first field had a numeric value of less than twenty? No problem in awk:

<BR></P>

<PRE>$1 &lt; 20 {print $0}</PRE>

<P>If the expression is arithmetic, it is matched when it evaluates to a nonzero number. For example, here's a small program that will print the first ten lines that have exactly seven words:

<BR></P>

<PRE>BEGIN  {i=0}

NF==7 { print $0 ; i++ }

/i==10/ {exit}</PRE>

<P>There's another way that you could use these comparisons too, since awk understands collation orders (that is, whether words are greater or lesser than other words in a standard dictionary ordering). Consider the situation where you have a phone 
directory&#151;a sorted list of names&#151;in a file and want to print all the names that would appear in the corporate phonebook before a certain person, say D. Hughes. You could do this quite succinctly:

<BR></P>

<PRE>$1 &gt;= &quot;Hughes,D&quot; { exit }</PRE>

<P>When the pattern is a string, a match occurs if the expression is non-null. In the earlier example with the pattern /Ann/, it was assumed to be a string since it was enclosed in slashes. In a comparison expression, if both operands have a numeric value, 

the comparison is based on the numeric value. Otherwise, the comparison is made using string ordering, which is why this simple example works.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> You can write more than two comparisons to a line in awk. 

<BR></NOTE>

<HR ALIGN=CENTER>

<P>The pattern $2 &lt;= $1 could involve either a numeric comparison or a string comparison. Whichever it is, it will vary from file to file or even from record to record within the same file.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> Know your input file well when using such patterns, particularly since awk will often silently assume a type for the variable and work with it, without error messages or other warnings.

<BR></NOTE>

<HR ALIGN=CENTER>

<H4 ALIGN="CENTER">

<CENTER><A ID="I25" NAME="I25">

<FONT SIZE=3><B>String Matching</B>

<BR></FONT></A></CENTER></H4>

<P>There are three forms of string matching. The simplest is to surround a string by slashes (/). No quotation marks are used. Hence /&quot;Ann&quot;/ is actually the string ' &quot;Ann&quot;  ' not the string Ann, and /&quot;Ann&quot;/ returns no input. 
The entire input record is returned if the expression within the slashes is anywhere in the record. The other two matching operators have a more specific scope. The operator ~ means &quot;is matched by,&quot; and the pattern matches when the input field 
being tested for a match contains the substring on the right hand side.

<BR></P>

<PRE>$2 ~ /mm/</PRE>

<P>This example matches every input record containing mm somewhere in the second field. It could also be written as $2 ~ &quot;mm&quot;.

<BR></P>

<P>The other operator !~ means &quot;is not matched by.&quot;

<BR></P>

<PRE>$2 !~ /mm/</PRE>

<P>This example matches every input record not containing mm anywhere in the second field.

<BR></P>

<P>Armed with that explanation, you can now see that /Ann/ is really just shorthand for the more complex statement $0 ~ /Ann/.

<BR></P>

<P>Regular expressions are common to UNIX, and they come in two main flavors. You have probably used them unconsciously on the command line as wildcards, where * matches zero or more characters and ? matches any single character. For instance entering the 

first line below results in the command interpreter matching all files with the suffix abc and the rm command deleting them.

<BR></P>

<PRE>rm *abc</PRE>

<P>Awk works with regular expressions that are similar to those used with grep, sed, and other editors but subtly different than the wildcards used with the command shell. In particular, . matches a character and * matches zero or more of the previous 
character in the pattern (so a pattern of x*y will match anything that has any number of the letter x followed by a y. To force a single x to appear too, you'd need to use the regular expression xx*y instead). By default, patterns can appear anywhere on 
the line, so to have them tied to an edge, you need to use ^ to indicate the beginning of the word or line, and $ for the end. If you wanted to match all lines where the first word ends in abc, for example, you could use $1 ~ /abc$/. The following line 
matches all records where the fourth field begins with the letter a:

<BR></P>

<PRE>$4 ~ /^a.*/</PRE>

<H4 ALIGN="CENTER">

<CENTER><A ID="I26" NAME="I26">

<FONT SIZE=3><B>Range Patterns</B>

<BR></FONT></A></CENTER></H4>

<P>The pattern portion of a pattern/action pair may also consist of two patterns separated by a comma (,); the action is performed for all lines between the first occurrence of the first pattern and the next occurrence of the second.

<BR></P>

<P>At most companies, employees receive different benefits according to their respective hire dates. It so happens that I have a file listing all employees in my company, including hire date. If I wanted to write an awk program that just lists the 
employees hired between 1980 and 1987 I could use the following script, if the first field is the employee's name and the third field is the year hired. Here's how that data file might look (notice that I use : to separate fields so that we don't have to 
worry about the spaces in the employee names)

<BR></P>

<PRE>$ cat emp.data.

John Anderson:sales:1980

Joe Turner:marketing:1982

Susan Greco:sales:1985

Ike Turner:pr:1988

Bob Burmeister:accounting:1991</PRE>

<P>The program could then be invoked:

<BR></P>

<PRE>$ awk -F: '$3 &gt; 1980,$3 &lt; 1987 {print $1, $3}' emp.data</PRE>

<P>With the output:

<BR></P>

<PRE>John Anderson 1980

Joe Turner 1982

Susan Greco 1985</PRE>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> The above example works because the input is already in order according to hire year. Range patterns often work best with pre-sorted input. This particular data file would be a bit tricky to sort within 

UNIX, but you could use the rather complex command sort -c: +3 -4 -rn emp.data &gt; new.emp.data to sort things correctly. (See Chapter 6 for more details on using the powerful sort command.)

<BR></NOTE>

<HR ALIGN=CENTER>

<P>Notice range patterns are inclusive&#151;they include both the first item matched and the end data indicated in the pattern. The range pattern matches all records from the first occurrence of the first pattern to the first occurrence of the second. This 

is a subtle point, but it has a major affect on how range patterns work. First, if the second pattern is never found, all remaining records match. So given the input file below:

<BR></P>

<PRE>$ cat sample.data

1

3

5

7

9

11</PRE>

<P>The following output appears on the monitor, totally disregarding that 9 and 11 are out of range.

<BR></P>

<PRE>$ awk '$1==3, $1==8' file1 sample.data

3

5

7

9

11</PRE>

<P>The end pattern of a range is not equivalent to a &lt;= operand, though liberal use of these patterns can alleviate the problem, as shown in the employee hire date example above.

<BR></P>

<P>Secondly, as stated, the pattern matches the first range; others that might occur later in the data file are ignored. That's why you have to make sure that the data is sorted as you expect.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="caution.gif" WIDTH = 37 HEIGHT = 35><B>CAUTION:</B> Range patterns cannot be parts of a larger pattern.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>A more useful example of the range pattern comes from awk's ability to handle multiple input files. I have a function finder program that finds code segments I know exist and tells me where they are. The code segments for a particular function X, for 
example, are bracketed by the phrase &quot;function X&quot; at the beginning and } /* end of X at the end. It can be expressed as the awk pattern range:

<BR></P>

<PRE>'/function <I>functionname</I>/,/} \/* end of <I>functionname</I>/'</PRE>

<H4 ALIGN="CENTER">

<CENTER><A ID="I27" NAME="I27">

<FONT SIZE=3><B>Compound Patterns</B>

<BR></FONT></A></CENTER></H4>

<P>Patterns can be combined using the following logical operators and parentheses as needed.

<BR></P>

<UL>

<LH><B>Table 15.2. The Logical Operators in </B><B>awk</B><B>.</B>

<BR></LH></UL>

<TABLE BORDER>

<TR>

<TD>

<PRE><I>Operator</I>

<BR></PRE>

<TD>

<PRE><I>Meaning</I>

<BR></PRE>

<TR>

<TD>

<P>!</P>

<TD>

<P>not</P>

<TR>

<TD>

<P>||</P>

<TD>

<P>or (you can also use | in regular expressions)</P>

<TR>

<TD>

<P>&amp;&amp;</P>

<TD>

<P>and</P></TABLE>

<P>The pattern may be simple or quite complicated: (NF&lt;3) || (NF &gt;4). This matches all input records not having exactly four fields. As is usual in awk, there are a wide variety of ways to do the same thing (specify a pattern). Regular expressions 
are allowed in string matching, but their use is not forced. To form a pattern that matches strings beginning with a or b or c or d, there are several pattern options:

<BR></P>

<PRE>/^[a-d].*/ 

/^a.*/ !! /^b.*/ || /^c.*/ || /^d.*/ </PRE>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> When using range patterns: $1==2, $1==4 and $1&gt;= 2 &amp;&amp; $1 &lt;=4 are not the same ranges at all. First, the range pattern depends on the occurrence of the second pattern as a stop marker, 
not on the value indicated in the range. Secondly, as I mentioned earlier, the first pattern only matches the first range, others are ignored.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>For instance, consider the following simple input file:

<BR></P>

<PRE>$ cat mydata

1     0

3     1

4     1

5     1

7     0

4     2

5     2

1     0

4     3</PRE>

<P>The first range I try, '$1==3,$1==5, produces:

<BR></P>

<PRE>$ awk '$1==3,$1==5' mydata

3     1

4     1

5     1</PRE>

<P>Compare this to the following pattern and output.

<BR></P>

<PRE>$ awk '$1&gt;=3 &amp;&amp; $1&lt;=5' mydata

3     1

4     1

5     1

4     2

5     2

4     3</PRE>

<P>Range patterns cannot be parts of a combined pattern.

<BR></P>

<H3 ALIGN="CENTER">

<CENTER><A ID="I28" NAME="I28">

<FONT SIZE=4><B>Actions</B>

<BR></FONT></A></CENTER></H3>

<P>The remainder of this chapter explores the action part of a pattern action statement. As the name suggests, the action part tells awk what to do when a pattern is found. Patterns are optional. An awk program built solely of actions looks like other 
iterative programming languages. But looks are deceptive&#151;even without a pattern, awk matches every input record to the first pattern action statement before moving to the second.

<BR></P>

<P>Actions must be enclosed in curly braces ({}) whether accompanied by a pattern or alone. An action part may consist of multiple statements. When the statements have no pattern and are single statements (no compound loops or conditions), brackets for 
each individual action are optional provided the actions begin with a left curly brace and end with a right curly brace. Consider the following two action pieces:

<BR></P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -