📄 unx15.htm

📁 Linux Unix揭密.高质量电子书籍.对学习Linux有大帮助,欢迎下载学习.
💻 HTM
📖 第 1 页 / 共 5 页
字号:

<BR></FONT></A></CENTER></H3>

<P>The UNIX utility awk is a pattern matching and processing language with considerably more power than you may realize. It searches one or more specified files, checking for records that match a specified pattern. If awk finds a match, the corresponding 
action is performed. A simple concept, but it results in a powerful tool. Often an awk program is only a few lines long, and because of this, an awk program is often written, used, and discarded. A traditional programming language, such as Pascal or C, 
would take more thought, more lines of code, and hence, more time. Short awk programs arise from two of its built-in features: the amount of predefined flexibility and the number of details that are handled by the language automatically. Together, these 
features allow the manipulation of large data files in short (often single-line) programs, and make awk stand apart from other programming languages. Certainly any time you spend learning awk will pay dividends in improved productivity and efficiency.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I6" NAME="I6">

<FONT SIZE=3><B>Uses</B>

<BR></FONT></A></CENTER></H4>

<P>The uses for awk vary from the simple to the complex. Originally awk was intended for various kinds of data manipulation. Intentionally omitting parts of a file, counting occurrences in a file, and writing reports are naturals for awk.

<BR></P>

<P>Awk uses the syntax of the C programming language, so if you know C, you have an idea of awk syntax. If you are new to programming or don't know C, learning awk will familiarize you with many of the C constructs.

<BR></P>

<P>Examples of where awk can be helpful abound. Computer-aided manufacturing, for example, is plagued with nonstandardization, so the output of a computer that's running a particular tool is quite likely to be incompatible with the input required for a 
different tool. Rather than write any complex C program, this type of simple data transformation is a perfect awk task.

<BR></P>

<P>One real problem of computer-aided manufacturing today is that no standard format yet exists for the program running the machine. Therefore, the output from Computer A running Machine A probably is not the input needed for Computer B running Machine B. 

Although Machine A is finished with the material, Machine B is not ready to accept it. Production halts while someone edits the file so it meets Computer B's needed format. This is a perfect and simple awk task.

<BR></P>

<P>Due to the amount of built-in automation within awk, it is also useful for rapid prototyping or trying out an idea that could later be implemented in another language.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I7" NAME="I7">

<FONT SIZE=3><B>Features</B>

<BR></FONT></A></CENTER></H4>

<P>Reflecting the UNIX environment, awk features resemble the structures of both C and shell scripts. Highlights include its being flexible, its predefined variables, automation, its standard program constructs, conventional variable types, its powerful 
output formatting borrowed from C, and its ease of use.

<BR></P>

<P>The flexibility means that most tasks may be done more than one way in awk. With the application in mind, the programmer chooses which method to use . The built-in variables already provide many of the tools to do what is needed. Awk is highly 
automated. For instance, awk automatically retrieves each record, separates it into fields, and does type conversion when needed without programmer request. Furthermore, there are no variable declarations. Awk includes the &quot;usual&quot; programming 
constructs for the control of program flow: an if statement for two way decisions and do, for and while statements for looping. Awk also includes its own notational shorthand to ease typing. (This is UNIX after all!) Awk borrows the printf() statement from 

C to allow &quot;pretty&quot; and versatile formats for output. These features combine to make awk user friendly.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I8" NAME="I8">

<FONT SIZE=3><B>Brief History</B>

<BR></FONT></A></CENTER></H4>

<P>Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan created awk in 1977. (The name is from the creators' last initials.) In 1985, more features were added, creating nawk (new awk). For quite a while, nawk remained exclusively the property of 
AT&amp;T, Bell Labs. Although it became part of System V for Release 3.1, some versions of UNIX, like SunOS, keep both awk and nawk due to a syntax incompatibility. Others, like System V run nawk under the name awk (although System V. has nawk too). In The 

Free Software Foundation, GNU introduced their version of awk, gawk, based on the IEEE POSIX (Institute of Electrical and Electronics Engineers, Inc., IEEE Standard for Information Technology, Portable Operating System Interface, Part 2: Shell and 
Utilities Volume 2, ANSI approved 4/5/93), awk standard which is different from awk or nawk. Linux, PC shareware UNIX, uses gawk rather than awk or nawk. Throughout this chapter I have used the word awk when any of the three will do the concept. The 
versions are mostly upwardly compatible. Awk is the oldest, then nawk, then POSIX awk, then gawk as shown below. I have used the notation version++ to denote a concept that began in that version and continues through any later versions.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> Due to different syntax, awk code can never be upgraded to nawk. However, except as noted, all the concepts of awk are implemented in nawk (and gawk). Where it matters, I have specified the version.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>

<BR><B><A HREF="15unx01.gif">Figure 15.1. The evolution of awk.</A></B>

<BR></P>

<P>Refer to the end of the chapter for more information and further resources on awk and its derivatives.

<BR></P>

<H3 ALIGN="CENTER">

<CENTER><A ID="I9" NAME="I9">

<FONT SIZE=4><B>Fundamentals</B>

<BR></FONT></A></CENTER></H3>

<P>This section introduces the basics of the awk programming language. Although my discussion first skims the surface of each topic to familiarize you with how awk functions, later sections of the chapter go into greater detail. One feature of awk that 
almost continually holds true is this: you can do most tasks more than one way. The command line exemplifies this. First, I explain the variety of ways awk may be called from the command line&#151;using files for input, the program file, and possibly an 
output file. Next, I introduce the main construct of awk, which is the pattern action statement. Then, I explain the fundamental ways awk can read and transform input. I conclude the section with a look at the format of an awk program.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I10" NAME="I10">

<FONT SIZE=3><B>Entering Awk from the Command Line</B>

<BR></FONT></A></CENTER></H4>

<P>In its simplest form, awk takes the material you want to process from standard input and displays the results to standard output (the monitor). You write the awk program on the command line. The following table shows the various ways you can enter awk 
and input material for processing.

<BR></P>

<P>You can either specify explicit awk statements on the command line, or, with the -f flag, specify an awk program file that contains a series of awk commands. In addition to the standard UNIX design allowing for standard input and output, you can, of 
course, use file redirection in your shell, too, so awk &lt; inputfile is functionally identical to awk inputfile. To save the output in a file, again use file redirection: awk &gt; outputfile does the trick. Helpfully, awk can work with multiple input 
files at once if they are specified on the command line.

<BR></P>

<P>The most common way to see people use awk is as part of a command pipe, where it's filtering the output of a command. An example is ls -l | awk {print $3} which would print just the third column of each line of the ls command. Awk scripts can become 
quite complex, so if you have a standard set of filter rules that you'd like to apply to a file, with the output sent directly to the printer, you could use something like awk -f myawkscript inputfile | lp.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> If you opt to specify your awk script on the command line, you'll find it best to use single quotes to let you use spaces and to ensure that the command shell doesn't falsely interpret any portion of 
the command.

<BR></NOTE>

<HR ALIGN=CENTER>

<H4 ALIGN="CENTER">

<CENTER><A ID="I11" NAME="I11">

<FONT SIZE=3><B>Files for Input</B>

<BR></FONT></A></CENTER></H4>

<P>These input and output places can be changed if desired. You can specify an input file by typing the name of the file after the program with a blank space between the two. The input file enters the awk environment from your workstation keyboard 
(standard input). To signal the end of the input file, type Ctl + d. The program on the command line executes on the input file you just entered and the results are displayed on the monitor (the standard output.)

<BR></P>

<P>Here's a simple little awk command that echoes all lines I type, prefacing each with the number of words (or fields, in awk parlance, hence the NF variable for number of fields) in the line. (Note that Ctrl+d means that while holding down the Control 
key you should press the d key).

<BR></P>

<PRE>$ awk '{print $NF : $0}'

I am testing my typing.

A quick brown fox jumps when vexed by lazy ducks.

Ctrl+d

5: I am testing my typing.

10: A quick brown fox jumps when vexed by lazy ducks.

$ _</PRE>

<P>You can also name more than one input file on the command line, causing the combined files to act as one input. This is one way of having multiple runs through one input file.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> Keep in mind that the correct ordering on the command line is crucial for your program to work correctly: files are read from left to right, so if you want to have file1 and file2 read in that order, 
you'll need to specify them as such on the command line.

<BR></NOTE>

<HR ALIGN=CENTER>

<H5 ALIGN="CENTER">

<CENTER><A ID="I12" NAME="I12">

<FONT SIZE=3><B>The Program File</B>

<BR></FONT></A></CENTER></H5>

<P>With awk's automatic type conversion, a file of names and a file of numbers entered in the reverse order at the command line generate strange-looking output rather than an error message. That is why for longer programs, it is simpler to put the program 

in a file and specify the name of the file on the command line. The -f option does this. Notice that this is an exception to the usual way UNIX handles options. Usually the options occur at the end of a command; however, here an input file is the last 
parameter.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> Versions of awk that meet the POSIX awk specifications are allowed to have multiple -f options. You can use this for running multiple programs using the same input.

<BR></NOTE>

<HR ALIGN=CENTER>

<H5 ALIGN="CENTER">

<CENTER><A ID="I13" NAME="I13">

<FONT SIZE=3><B>Specifying Output on the Command Line</B>

<BR></FONT></A></CENTER></H5>

<P>Output from awk may be redirected to a file or piped to another program (see Chapter 4). The command awk /^5/ {print $0} | grep 3, for example, will result in just those lines that start with the digit five (that's what the awk part does) and also 
contain the digit three (the grep command). If you wanted to save that output to a file, by contrast, you could use awk /^5/ {print $0} &gt; results and the file results would contain all lines prefaced by the digit 5. If you opt for neither of these 
courses, the output of awk will be displayed on your screen directly, which can be quite useful in many instances, particularly when you're developing&#151;or fine tuning&#151;your awk script.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I14" NAME="I14">

<FONT SIZE=3><B>Patterns and Actions</B>

<BR></FONT></A></CENTER></H4>

<P>Awk programs are divided into three main blocks; the BEGIN block, the per-statement processing block, and the END block. Unless explicitly stated, all statements to awk appear in the per-statement block (you'll see later where the other blocks can come 

in particularly handy for programming, though).

<BR></P>

<P>Statements within awk are divided into two parts: a pattern, telling awk what to match, and a corresponding action, telling awk what to do when a line matching the pattern is found. The action part of a pattern action statement is enclosed in curly 
braces ({}) and may be multiple statements. Either part of a pattern action statement may be omitted. An action with no specified pattern matches every record of the input file you want to search (that's how the earlier example of {print $0} worked). A 
pattern without an action indicates that you want input records to be copied to the output file as they are (i.e., printed).

<BR></P>

<P>The example of /^5/ {print $0} is an example of a two-part statement: the pattern here is all lines that begin with the digit five (the ^ indicates that it should appear at the beginning of the line: without it the pattern would say any line that 
includes the digit five) and the action is print the entire line verbatim. ($0 is shorthand for the entire line.)

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I15" NAME="I15">

<FONT SIZE=3><B>Input</B>

<BR></FONT></A></CENTER></H4>

<P>Awk automatically scans, in order, each record of the input file looking for each pattern action statement in the awk program. Unless otherwise set, awk assumes each record is a single line. (See the sections &quot;Advanced 
Concepts&quot;,&quot;Multi-line Records&quot; for how to change this.) If the input file has blank lines in it, the blank lines count as a record too. Awk automatically retrieves each record for analysis; there is no <I>read</I> statement in awk.

<BR></P>

<P>A programmer may also disrupt the automatic input order in of two ways: the next and exit statements. The next statement tells awk to retrieve the next record from the input file and continue without running the current input record through the 
remaining portion of pattern action statements in the program. For example, if you are doing a crossword puzzle and all the letters of a word are formed by previous words, most likely you wouldn't even bother to read that clue but simply skip to the clue 
below; this is how the <I>next</I> statement would work, if your list of clues were the input. The other method of disrupting the usual flow of input is through the exit statement. The exit statement transfers control to the END block&#151;if one is 
specified&#151;or quits the program, as if all the input has been read; suppose the arrival of a friend ends your interest in the crossword puzzle, but you still put the paper away. Within the END block, an exit statement causes the program to quit.

<BR></P>

<P>An input record refers to the entire line of a file including any characters, spaces, or Tabs. The spaces and tabs are called whitespace.

<BR></P>

<HR ALIGN=CENTER>

<NOTE>

<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> If you think that your input file may include both spaces and tabs, you can save yourself a lot of confusion by ensuring that all tabs become spaces with the expand program. It works like this: expand 
filename | awk { stuff }.

<BR></NOTE>

<HR ALIGN=CENTER>

<P>The whitespace in the input file and the whitespace in the output file are not related and any whitespace you want in the output file, you must explicitly put there.

<BR></P>

<H5 ALIGN="CENTER">

<CENTER><A ID="I16" NAME="I16">

<FONT SIZE=3><B>Fields</B>

<BR></FONT></A></CENTER></H5>

<P>A group of characters in the input record or output file is called a field. Fields are predefined in awk: $1 is the first field, $2 is the second, $3 is the third, and so on. $0 indicates the entire line. Fields are separated by a field separator (any 
single character including Tab), held in the variable FS. Unless you change it, FS has a space as its value. FS may be changed by either starting the programfile with the following statement:

<BR></P>

<PRE>BEGIN {FS = &quot;char&quot; }</PRE>

<P>or by setting the -Fchar command line option where char is the selected field separator character you want to use.

<BR></P>

<P>One file that you might have viewed which demonstrates where changing the field separator could be helpful is the /etc/passwd file that defines all user accounts. Rather than having the different fields separated by spaces or tabs, the password file is 

structured with lines:

<BR></P>

<PRE>news:?:6:11:USENET News:/usr/spool/news:/bin/ksh</PRE>

<P>Each field is separated by a colon! You could change each colon to a space (with sed, for example), but that wouldn't work too well: notice that the fifth field, USENET News, contains a space already. Better to change the field separator. If you wanted 

to just have a list of the fifth fields in each line, therefore, you could use the simple awk command awk -F: {print $5} /etc/passwd.

<BR></P>

<P>Likewise, the built-in variable OFS holds the value of the output field separator. OFS also has a default value of a space. It, too, may be changed by placing the following line at the start of a program.

<BR></P>

<PRE>BEGIN {OFS = &quot;char&quot; }</PRE>

<P>If you want to automatically translate the passwd file so that it listed only the first and fifth fields, separated by a tab, you can therefore use the awk script:

<BR></P>

<PRE>BEGIN { FS=&quot;:&quot; ; OFS=&quot;       &quot; }

{ print $1, $5 }</PRE>

<P>Notice here that the script contains two blocks: the BEGIN block and the main per-input line block. Also notice that most of the work is done automatically.

<BR></P>

<H4 ALIGN="CENTER">

<CENTER><A ID="I17" NAME="I17">

<FONT SIZE=3><B>Program Format</B>

<BR></FONT></A></CENTER></H4>

<P>With a few noted exceptions, awk programs are free format. The interpreter ignores any blank lines in a programfile. Add them to improve the readability of your program whenever you wish. The same is true for Tabs and spaces between operators and the 
parts of a program. Therefore, these two lines are treated identically by the awk interpreter.

<BR></P>

<PRE>$4 == 2               {print &quot;Two&quot;}

$4     ==     2     {     print     &quot;Two&quot;     }</PRE>

<P>If more than one pattern action line appears on a line, you'll need to separate them with a semicolon, as shown above in the BEGIN block for the passwd file translator. If you stick with one-command-per-line then you won't need to worry too much about 
the semicolons. There are a couple of spots, however, where the semicolon must always be used: before an else statement or when included in the syntax of a statement. (See the &quot;Loops&quot; or &quot;The Conditional Statement&quot; sections.) However, 
you may always put a semicolon at the end of a statement.

<BR></P>

<P>The other format restriction for awk programs is that at least the opening curly bracket of the action half of a pattern action statement must be on the same line as the accompanying pattern, if both pattern and action exist. Thus, following examples 
all do the same thing.

<BR></P>

<P>The first shows all statements on one line:

<BR></P>

<PRE>$2==0     {print &quot;&quot;; print &quot;&quot;; print &quot;&quot;;}</PRE>

<P>The second with the first statement on the same line as the pattern to match:

<BR></P>

<PRE>$2==0     {     print &quot;&quot;

          print &quot;&quot;

          print &quot;&quot;}</PRE>

<P>and finally as spread out as possible:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -