📄 ch26.htm
字号:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<SCRIPT>
<!--
function displayWindow(url, width, height) {
var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>
-->
<UL>
<LI><A HREF="#Heading1">- 26 -</A>
<UL>
<LI><A HREF="#Heading2">gawk</A>
<UL>
<LI><A HREF="#Heading3">What Is the awk Language?</A>
<LI><A HREF="#Heading4">Files, Records, and Fields</A>
<LI><A HREF="#Heading5">NOTE</A>
<LI><A HREF="#Heading6">Pattern-Action Pairs</A>
<LI><A HREF="#Heading7">NOTE</A>
<UL>
<LI><A HREF="#Heading8">Simple Patterns</A>
</UL>
<LI><A HREF="#Heading9">NOTE</A>
<UL>
<LI><A HREF="#Heading10">Comparisons and Arithmetic</A>
<LI><A HREF="#Heading11">Strings and Numbers</A>
<LI><A HREF="#Heading12">Formatting Output</A>
<LI><A HREF="#Heading13">Changing Field Separators</A>
<LI><A HREF="#Heading14">Metacharacters</A>
</UL>
<LI><A HREF="#Heading15">Calling gawk Programs</A>
<UL>
<LI><A HREF="#Heading16">BEGIN and END</A>
<LI><A HREF="#Heading17">Variables</A>
</UL>
<LI><A HREF="#Heading18">NOTE</A>
<LI><A HREF="#Heading19">NOTE</A>
<UL>
<LI><A HREF="#Heading20">Built-In Variables</A>
</UL>
<LI><A HREF="#Heading21">Control Structures</A>
<UL>
<LI><A HREF="#Heading22">The if Statement</A>
<LI><A HREF="#Heading23">The while Loop</A>
<LI><A HREF="#Heading24">The for Loop</A>
<LI><A HREF="#Heading25">next and exit</A>
<LI><A HREF="#Heading26">Arrays</A>
</UL>
<LI><A HREF="#Heading27">Summary</A>
</UL>
</UL>
</UL>
<P>
<HR SIZE="4">
<H2 ALIGN="CENTER"><A NAME="Heading1<FONT COLOR="#000077">- 26 -</FONT></H2>
<H2 ALIGN="CENTER"><A NAME="Heading2<FONT COLOR="#000077">gawk</FONT></H2>
<P><I>by Tim Parker</I></P>
<P>IN THIS CHAPTER</P>
<UL>
<LI>What Is the awk Language?
<P>
<LI>Files, Records, and Fields
<P>
<LI>Pattern-Action Pairs
<P>
<LI>Calling gawk Programs
<P>
<LI>Control Structures
</UL>
<P><BR>
The <TT>awk</TT> programming language was created by the three people who gave their
last-name initials to the language: Alfred Aho, Peter Weinberger, and Brian Kernighan.
The <TT>gawk</TT> program included with Linux is the GNU implementation of that programming
language.</P>
<P>The <TT>awk</TT> language is more than just a programming language; it is an almost
indispensable tool for many system administrators and UNIX programmers. The language
itself is easy to learn, easy to master, and amazingly flexible. Once you get the
hang of using <TT>awk</TT>, you'll be surprised how often you can use it for routine
tasks on your system.</P>
<P>To help you understand <TT>gawk</TT>, I will follow a simple order of introducing
the elements of the programming language, as well as showing good examples. You are
encouraged, or course, to experiment as the chapter progresses.</P>
<P>I can't cover all the different aspects and features of <TT>gawk</TT> in this
chapter, but we will look at the basics of the language and show you enough, hopefully,
to get your curiosity working.
<H3 ALIGN="CENTER"><A NAME="Heading3<FONT COLOR="#000077">What Is the awk Language?</FONT></H3>
<P><TT>awk</TT> is designed to be an easy-to-use programming language that lets you
work with information either stored in files or piped to it. The main strengths of
<TT>awk</TT> are its capabilities to do the following:
<UL>
<LI>Display some or all the contents of a file, selecting rows, columns, or fields
as necessary.
<P>
<LI>Analyze text for frequency of words, occurrences, and so on.
<P>
<LI>Prepare formatted output reports based on information in a file.
<P>
<LI>Filter text in a very powerful manner.
<P>
<LI>Perform calculations with numeric information from a file.
</UL>
<P><TT>awk</TT> isn't difficult to learn. In many ways, <TT>awk</TT> is the ideal
first programming language because of its simple rules, basic formatting, and standard
usage. Experienced programmers will find <TT>awk</TT> refreshingly easy to use.
<H3 ALIGN="CENTER"><A NAME="Heading4<FONT COLOR="#000077">Files, Records, and
Fields</FONT></H3>
<P>Usually, <TT>gawk</TT> works with data stored in files. Often this is numeric
data, but <TT>gawk</TT> can work with character information, too. If data is not
stored in a file, it is supplied to <TT>gawk</TT> through a pipe or other form of
redirection. Only ASCII files (text files) can be properly handled with <TT>gawk</TT>.
Although it does have the ability to work with binary files, the results are often
unpredictable. Since most information on a Linux system is stored in ASCII, this
isn't a problem.</P>
<P>As a simple example of a file that <TT>gawk</TT> works with, consider a telephone
directory. It is composed of many entries, all with the same format: last name, first
name, address, telephone number. The entire telephone directory is a database of
sorts, although without a sophisticated search routine. Indeed, the telephone directory
relies on a pure alphabetical order to enable users to search for the data they need.</P>
<P>Each line in the telephone directory is a complete set of data on its own and
is called a record. For example, the entry in the telephone directory for "Smith,
John," which includes his address and telephone number, is a record.</P>
<P>Each piece of information in the record--the last name, the first name, the address,
and the telephone number--is called a field. For the <TT>gawk</TT> language, the
field is a single piece of information. A record, then, is a number of fields that
pertain to a single item. A set of records makes up a file.</P>
<P>In most cases, fields are separated by a character that is used only to separate
fields, such as a space, a tab, a colon, or some other special symbol. This character
is called a field separator. A good example is the file <TT>/etc/passwd</TT>, which
looks like this:<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">tparker:t36s62hsh:501:101:Tim Parker:/home/tparker:/bin/bash
etreijs:2ys639dj3h:502:101:Ed Treijs:/home/etreijs:/bin/tcsh
ychow:1h27sj:503:101:Yvonne Chow:/home/ychow:/bin/bash
</FONT></PRE>
<P>If you look carefully at the file, you will see that it uses a colon as the field
separator. Each line in the <TT>/etc/passwd</TT> file has seven fields: the user
name, the password, the user ID, the group ID, a comment field, the home directory,
and the startup shell. Each field is separated by a colon. Colons exist only to separate
fields. A program looking for the sixth field in any line needs only count five colons
across (because the first field doesn't have a colon before it).</P>
<P>That's where we find a problem with the <TT>gawk</TT> definition of fields as
they pertain to the telephone directory example. Consider the following lines from
a telephone directory:<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">Smith, John 13 Wilson St. 555-1283
Smith, John 2736 Artside Dr, Apt 123 555-2736
Smith, John 125 Westmount Cr 555-1726
</FONT></PRE>
<P>We "know" there are four fields here: the last name, the first name,
the address, and the telephone number. But <TT>gawk</TT> doesn't see it that way.
The telephone book uses the space character as a field separator, so on the first
line it sees "Smith" as the first field, "John" as the second,
"13" as the third, "Wilson" as the fourth, and so on. As far
as <TT>gawk</TT> is concerned, the first line when using a space character as a field
separator has six fields. The second line has eight fields.
<DL>
<DT></DT>
</DL>
<DL>
<DD>
<HR>
<A NAME="Heading5<FONT COLOR="#000077"><B>NOTE:</B> </FONT>When working with
a programming language, you must consider data the way the language will see it.
Remember that programming languages take things literally.
<HR>
</DL>
<P>To make sense of the telephone directory the way we want to handle it, we have
to find another way of structuring the data so that there is a field separator between
the sections. For example, the following uses the slash character as the field separator:<FONT
COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">Smith/John/13 Wilson St./555-1283
Smith/John/2736 Artside Dr, Apt 123/555-2736
Smith/John/125 Westmount Cr/555-1726
</FONT></PRE>
<P>By default, <TT>gawk</TT> uses blank characters (spaces or tabs) as field separators
unless instructed to use another character. If <TT>gawk</TT> is using spaces, it
doesn't matter how many are in a row; they are treated as a single block for purposes
of finding fields. Naturally, there is a way to override this behavior, too.
<H3 ALIGN="CENTER"><A NAME="Heading6<FONT COLOR="#000077">Pattern-Action Pairs</FONT></H3>
<P>The <TT>gawk</TT> language has a particular format for almost all instructions.
Each command is composed of two parts: a pattern and a corresponding action. Whenever
the pattern is matched, <TT>gawk</TT> executes the action that matches that pattern.</P>
<P>Pattern-action pairs can be thought of in more common terms to show how they work.
Consider instructing someone how to get to the post office. You might say, "Go
to the end of the street and turn right. At the stop sign, turn left. At the end
of the street, go right." You have created three pattern-action pairs with these
instructions:<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">end of street: turn right
stop sign: turn left
end of street: turn right
</FONT></PRE>
<P>When these patterns are met, the corresponding action is taken. You wouldn't turn
right before you reached the end of the street, and you don't turn right until you
get to the end of the street, so the pattern must be matched precisely for the action
to be performed. This is a bit simplistic, but it gives you the basic idea.</P>
<P>With <TT>gawk</TT>, the patterns to be matched are enclosed in a pair of slashes,
and the actions are in a pair of curly braces:<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">/pattern1/{action1}
/pattern2/{action2}
/pattern3/{action3}
</FONT></PRE>
<P>This format makes it quite easy to tell where the pattern starts and ends, and
when the action starts and ends. All <TT>gawk</TT> programs are sets of these pattern-action
pairs, one after the other. Remember these pattern-action pairs are working on text
files, so a typical set of patterns might be matching a set of strings, and the actions
might be to print out parts of the line that matched.</P>
<P>Suppose there isn't a pattern? In that case, the pattern matches every time and
the action is executed every time. If there is no action, <TT>gawk</TT> copies the
entire line that matched without change.</P>
<P>Here are some simple examples. The <TT>gawk</TT> command<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">gawk `/tparker/' /etc/passwd
</FONT></PRE>
<P>will look for each line in the <TT>/etc/passwd</TT> file that contains the pattern
<TT>tparker</TT> and display it (there is no action, only a pattern). The output
from the command will be the one line in the <TT>/etc/passwd</TT> file that contains
the string <TT>tparker</TT>. If there is more than one line in the file with that
pattern, they all will be displayed. In this case, <TT>gawk</TT> is acting exactly
like the <TT>grep</TT> utility!</P>
<P>This example shows you two important things about <TT>gawk</TT>: It can be invoked
from the command line by giving it the pattern-action pair to work with and a filename,
and it likes to have single quotes around the pattern-action pair in order to differentiate
them from the filename.</P>
<P>The <TT>gawk</TT> language is literal in its matching. The string <TT>cat</TT>
will match any lines with <TT>cat</TT> in them, whether the word "cat"
by itself or part of another word such as "concatenate." To be exact, put
spaces on either side of the word. Also, case is important. We'll see how to expand
the matching in the section "Metacharacters" a little later in the chapter.</P>
<P>Jumping ahead slightly, we can introduce a <TT>gawk</TT> command. The command<FONT
COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">gawk `{print $3}' file2.data
</FONT></PRE>
<P>has only one action, so it performs that action on every line in the file <TT>file2.data</TT>.
The action is <TT>print $3</TT>, which tells <TT>gawk</TT> to print the third field
of every line. The default field separator, a space, is used to tell where fields
begin and end. If we had tried the same command on the <TT>/etc/passwd</TT> file,
nothing would have been displayed because the field separator used in that file is
the colon.</P>
<P>We can combine the two commands to show a complete pattern-action pair:<FONT COLOR="#0066FF"></FONT>
<PRE><FONT COLOR="#0066FF">gawk `/UNIX/{print $2}' file2.data
</FONT></PRE>
<P>This command will search <TT>file2.data</TT> line by line, looking for the string
<TT>UNIX</TT>. If it finds <TT>UNIX</TT>, it prints the second field of that line
(record).
<DL>
<DT></DT>
</DL>
<DL>
<DD>
<HR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -