438-442.html
来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 118 行
HTML
118 行
<HTML>
<HEAD>
<TITLE>Linux Unleashed, Third Edition:gawk</TITLE>
<SCRIPT>
<!--
function displayWindow(url, width, height) {
var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>
-->
<!--ISBN=0672313723//-->
<!--TITLE=Linux Unleashed, Third Edition//-->
<!--AUTHOR=Tim Parker//-->
<!--PUBLISHER=Macmillan Computer Publishing//-->
<!--IMPRINT=Sams//-->
<!--CHAPTER=25//-->
<!--PAGES=438-442//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="../ch24/435-436.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="442-444.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<H2 ALIGN="CENTER"><FONT COLOR="#000077"><I>Part V<BR>Linux for Programmers
</I></FONT></H2>
<DL>
<DT><B>In This Part</B>
<DT>• gawk
<DT>• Programming in C
<DT>• Programming in C++
<DT>• Perl
<DT>• Introduction to Tcl and Tk
<DT>• Other Compilers
<DT>• Smalltalk/X
</DL>
<H2><A NAME="Heading1"></A><FONT COLOR="#000077">Chapter 25<BR>gawk
</FONT></H2>
<P><I>by Tim Parker</I></P>
<DL>
<DT><B>In This Chapter</B>
<DT>• What is the <TT>gawk</TT> language?
<DT>• Files, records, and fields
<DT>• Pattern-action pairs
<DT>• Calling <TT>gawk</TT> programs
<DT>• Control structures
</DL>
<P>The <TT>awk</TT> programming language was created by the three people who gave their last-name initials to the language: Alfred Aho, Peter Weinberger, and Brian Kernighan. The <TT>gawk</TT> program included with Linux is the GNU implementation of that programming language.</P>
<P>The <TT>gawk</TT> language is more than just a programming language; it is an almost indispensable tool for many system administrators and UNIX programmers. The language itself is easy to learn, easy to master, and amazingly flexible. After you get the hang of using <TT>gawk</TT>, you’ll be surprised how often you can use it for routine tasks on your system.</P>
<P>To help you understand <TT>gawk</TT>, we will follow a simple order of introducing the elements of the programming language, as well as showing good examples. You are encouraged, or course, to experiment as the chapter progresses. It’s not possible to cover all the different aspects and features of <TT>gawk</TT> in this chapter, but we will look at the basics of the language and show you enough, hopefully, to get your curiosity working.</P>
<H3><A NAME="Heading2"></A><FONT COLOR="#000077">What Is the gawk Language?</FONT></H3>
<P><TT>gawk</TT> is designed to be an easy-to-use programming language that lets you work with information either stored in files or piped to them. The main strengths of <TT>gawk</TT> are its capabilities to do the following:</P>
<DL>
<DD><B>•</B> Display some or all the contents of a file, selecting rows, columns, or fields as necessary.
<DD><B>•</B> Analyze text for frequency of words, occurrences, and so on.
<DD><B>•</B> Prepare formatted output reports based on information in a file.
<DD><B>•</B> Filter text in a very powerful manner.
<DD><B>•</B> Perform calculations with numeric information from a file.
</DL>
<P><TT>gawk</TT> isn’t difficult to learn. In many ways, <TT>gawk</TT> is the ideal first programming language because of its simple rules, basic formatting, and standard usage. Experienced programmers will find <TT>gawk</TT> refreshingly easy to use.</P>
<H3><A NAME="Heading3"></A><FONT COLOR="#000077">Files, Records, and Fields</FONT></H3>
<P>Usually, <TT>gawk</TT> works with data stored in files. Often this is numeric data, but <TT>gawk</TT> can work with character information, too. If data is not stored in a file, it is supplied to <TT>gawk</TT> through a pipe or other form of redirection. Only ASCII files (text files) can be properly handled with <TT>gawk</TT>. Although it does have the capability to work with binary files, the results are often unpredictable. Because most information on a Linux system is stored in ASCII, this isn’t a problem.</P>
<P>As a simple example of a file that <TT>gawk</TT> works with, consider a telephone directory. It is composed of many entries, all with the same format: last name, first name, address, telephone number. The entire telephone directory is a database of sorts, although without a sophisticated search routine. Indeed, the telephone directory relies on a pure alphabetical order to enable users to search for the data they need.</P>
<P>Each line in the telephone directory is a complete set of data on its own and is called a <I>record</I>. For example, the entry in the telephone directory for “Smith, John,” which includes his address and telephone number, is a record.</P>
<P>Each piece of information in the record—the last name, the first name, the address, and the telephone number—is called a <I>field</I>. For the <TT>gawk</TT> language, the field is a single piece of information. A record, then, is a number of fields that pertain to a single item. A set of records makes up a <I>file</I>.</P>
<P>In most cases, fields are separated (delineated) by a character that is used only to separate fields, such as a space, a tab, a colon, or some other special symbol. This character is called a <I>field separator</I>. A good example is the file <TT>/etc/passwd</TT>, which looks like this:</P>
<!-- CODE SNIP //-->
<PRE>
tparker:t36s62hsh:501:101:Tim Parker:/home/tparker:/bin/bash
etreijs:2ys639dj3h:502:101:Ed Treijs:/home/etreijs:/bin/tcsh
ychow:1h27sj:503:101:Yvonne Chow:/home/ychow:/bin/bash
</PRE>
<!-- END CODE SNIP //-->
<P>If you look carefully at the file, you can see that it uses a colon as the field separator. Each line in the <TT>/etc/passwd</TT> file has seven fields: the username, the password, the user ID, the group ID, a comment field, the home directory, and the startup shell. Each field is separated by a colon. Colons exist only to separate fields. A program looking for the sixth field in any line needs only count five colons across (because the first field doesn’t have a colon before it).</P>
<P>That’s where we find a problem with the <TT>gawk</TT> definition of fields as they pertain to the telephone directory example. Consider the following lines from a telephone directory:</P>
<!-- CODE SNIP //-->
<PRE>
Smith, John 13 Wilson St. 555-1283
Smith, John 2736 Artside Dr, Apt 123 555-2736
Smith, John 125 Westmount Cr 555-1728
</PRE>
<!-- END CODE SNIP //-->
<P>We “know” there are four fields here: the last name, the first name, the address, and the telephone number. But <TT>gawk</TT> doesn’t see it that way. The telephone book uses the space character as a field separator, so on the first line it sees “Smith” as the first field, “John” as the second, “13” as the third, “Wilson” as the fourth, and so on. As far as <TT>gawk</TT> is concerned, the first line when using a space character as a field separator has six fields. The second line has eight fields. Whitespace (spaces and tabs) in the preceding example are ignored by <TT>gawk</TT> as being just more characters with no special meanings. Unless you change the field separator to a space or tab character, whitespace has no meaning to <TT>gawk</TT>.</P>
<BLOCKQUOTE>
<P><FONT SIZE="-1"><HR><B>Tip: </B><BR>When working with a programming language, you must consider data the way the language will see it. Remember that programming languages take things literally.<HR></FONT>
</BLOCKQUOTE>
<P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="../ch24/435-436.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="442-444.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
</td>
</tr>
</table>
<!-- begin footer information -->
</body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?