438-442.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 118 行

HTML
118
字号
<HTML>

<HEAD>

<TITLE>Linux Unleashed, Third Edition:gawk</TITLE>

<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!--ISBN=0672313723//-->

<!--TITLE=Linux Unleashed, Third Edition//-->

<!--AUTHOR=Tim Parker//-->

<!--PUBLISHER=Macmillan Computer Publishing//-->

<!--IMPRINT=Sams//-->

<!--CHAPTER=25//-->

<!--PAGES=438-442//-->

<!--UNASSIGNED1//-->

<!--UNASSIGNED2//-->



<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="../ch24/435-436.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="442-444.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>

<P><BR></P>

<H2 ALIGN="CENTER"><FONT COLOR="#000077"><I>Part V<BR>Linux for Programmers

</I></FONT></H2>

<DL>

<DT><B>In This Part</B>

<DT>&#149;&nbsp;&nbsp; gawk

<DT>&#149;&nbsp;&nbsp; Programming in C

<DT>&#149;&nbsp;&nbsp; Programming in C&#43;&#43;

<DT>&#149;&nbsp;&nbsp; Perl

<DT>&#149;&nbsp;&nbsp; Introduction to Tcl and Tk

<DT>&#149;&nbsp;&nbsp; Other Compilers

<DT>&#149;&nbsp;&nbsp; Smalltalk/X

</DL>

<H2><A NAME="Heading1"></A><FONT COLOR="#000077">Chapter 25<BR>gawk

</FONT></H2>

<P><I>by Tim Parker</I></P>

<DL>

<DT><B>In This Chapter</B>

<DT>&#149;&nbsp;&nbsp;What is the <TT>gawk</TT> language?

<DT>&#149;&nbsp;&nbsp;Files, records, and fields

<DT>&#149;&nbsp;&nbsp;Pattern-action pairs

<DT>&#149;&nbsp;&nbsp;Calling <TT>gawk</TT> programs

<DT>&#149;&nbsp;&nbsp;Control structures

</DL>

<P>The <TT>awk</TT> programming language was created by the three people who gave their last-name initials to the language: Alfred Aho, Peter Weinberger, and Brian Kernighan. The <TT>gawk</TT> program included with Linux is the GNU implementation of that programming language.</P>

<P>The <TT>gawk</TT> language is more than just a programming language; it is an almost indispensable tool for many system administrators and UNIX programmers. The language itself is easy to learn, easy to master, and amazingly flexible. After you get the hang of using <TT>gawk</TT>, you&#146;ll be surprised how often you can use it for routine tasks on your system.</P>

<P>To help you understand <TT>gawk</TT>, we will follow a simple order of introducing the elements of the programming language, as well as showing good examples. You are encouraged, or course, to experiment as the chapter progresses. It&#146;s not possible to cover all the different aspects and features of <TT>gawk</TT> in this chapter, but we will look at the basics of the language and show you enough, hopefully, to get your curiosity working.</P>

<H3><A NAME="Heading2"></A><FONT COLOR="#000077">What Is the gawk Language?</FONT></H3>

<P><TT>gawk</TT> is designed to be an easy-to-use programming language that lets you work with information either stored in files or piped to them. The main strengths of <TT>gawk</TT> are its capabilities to do the following:</P>

<DL>

<DD><B>&#149;</B>&nbsp;&nbsp;Display some or all the contents of a file, selecting rows, columns, or fields as necessary.

<DD><B>&#149;</B>&nbsp;&nbsp;Analyze text for frequency of words, occurrences, and so on.

<DD><B>&#149;</B>&nbsp;&nbsp;Prepare formatted output reports based on information in a file.

<DD><B>&#149;</B>&nbsp;&nbsp;Filter text in a very powerful manner.

<DD><B>&#149;</B>&nbsp;&nbsp;Perform calculations with numeric information from a file.

</DL>

<P><TT>gawk</TT> isn&#146;t difficult to learn. In many ways, <TT>gawk</TT> is the ideal first programming language because of its simple rules, basic formatting, and standard usage. Experienced programmers will find <TT>gawk</TT> refreshingly easy to use.</P>

<H3><A NAME="Heading3"></A><FONT COLOR="#000077">Files, Records, and Fields</FONT></H3>

<P>Usually, <TT>gawk</TT> works with data stored in files. Often this is numeric data, but <TT>gawk</TT> can work with character information, too. If data is not stored in a file, it is supplied to <TT>gawk</TT> through a pipe or other form of redirection. Only ASCII files (text files) can be properly handled with <TT>gawk</TT>. Although it does have the capability to work with binary files, the results are often unpredictable. Because most information on a Linux system is stored in ASCII, this isn&#146;t a problem.</P>

<P>As a simple example of a file that <TT>gawk</TT> works with, consider a telephone directory. It is composed of many entries, all with the same format: last name, first name, address, telephone number. The entire telephone directory is a database of sorts, although without a sophisticated search routine. Indeed, the telephone directory relies on a pure alphabetical order to enable users to search for the data they need.</P>

<P>Each line in the telephone directory is a complete set of data on its own and is called a <I>record</I>. For example, the entry in the telephone directory for &#147;Smith, John,&#148; which includes his address and telephone number, is a record.</P>

<P>Each piece of information in the record&#151;the last name, the first name, the address, and the telephone number&#151;is called a <I>field</I>. For the <TT>gawk</TT> language, the field is a single piece of information. A record, then, is a number of fields that pertain to a single item. A set of records makes up a <I>file</I>.</P>

<P>In most cases, fields are separated (delineated) by a character that is used only to separate fields, such as a space, a tab, a colon, or some other special symbol. This character is called a <I>field separator</I>. A good example is the file <TT>/etc/passwd</TT>, which looks like this:</P>

<!-- CODE SNIP //-->

<PRE>

tparker:t36s62hsh:501:101:Tim Parker:/home/tparker:/bin/bash

etreijs:2ys639dj3h:502:101:Ed Treijs:/home/etreijs:/bin/tcsh

ychow:1h27sj:503:101:Yvonne Chow:/home/ychow:/bin/bash

</PRE>

<!-- END CODE SNIP //-->

<P>If you look carefully at the file, you can see that it uses a colon as the field separator. Each line in the <TT>/etc/passwd</TT> file has seven fields: the username, the password, the user ID, the group ID, a comment field, the home directory, and the startup shell. Each field is separated by a colon. Colons exist only to separate fields. A program looking for the sixth field in any line needs only count five colons across (because the first field doesn&#146;t have a colon before it).</P>

<P>That&#146;s where we find a problem with the <TT>gawk</TT> definition of fields as they pertain to the telephone directory example. Consider the following lines from a telephone directory:</P>

<!-- CODE SNIP //-->

<PRE>

Smith, John      13 Wilson St.                 555-1283

Smith, John      2736 Artside Dr, Apt 123      555-2736

Smith, John      125 Westmount Cr              555-1728

</PRE>

<!-- END CODE SNIP //-->

<P>We &#147;know&#148; there are four fields here: the last name, the first name, the address, and the telephone number. But <TT>gawk</TT> doesn&#146;t see it that way. The telephone book uses the space character as a field separator, so on the first line it sees &#147;Smith&#148; as the first field, &#147;John&#148; as the second, &#147;13&#148; as the third, &#147;Wilson&#148; as the fourth, and so on. As far as <TT>gawk</TT> is concerned, the first line when using a space character as a field separator has six fields. The second line has eight fields. Whitespace (spaces and tabs) in the preceding example are ignored by <TT>gawk</TT> as being just more characters with no special meanings. Unless you change the field separator to a space or tab character, whitespace has no meaning to <TT>gawk</TT>.</P>

<BLOCKQUOTE>

<P><FONT SIZE="-1"><HR><B>Tip:&nbsp;&nbsp;</B><BR>When working with a programming language, you must consider data the way the language will see it. Remember that programming languages take things literally.<HR></FONT>

</BLOCKQUOTE>

<P><BR></P>

<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="../ch24/435-436.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="442-444.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>





</td>
</tr>
</table>

<!-- begin footer information -->





</body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?