⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch26.htm

📁 linux-unix130.linux.and.unix.ebooks130 linux and unix ebookslinuxLearning Linux - Collection of 12 E
💻 HTM
📖 第 1 页 / 共 5 页
字号:


<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




 



<UL>



	<LI><A HREF="#Heading1">- 26 -</A>



	<UL>



		<LI><A HREF="#Heading2">gawk</A>



		<UL>



			<LI><A HREF="#Heading3">What Is the awk Language?</A>



			<LI><A HREF="#Heading4">Files, Records, and Fields</A>



			<LI><A HREF="#Heading5">NOTE</A>



			<LI><A HREF="#Heading6">Pattern-Action Pairs</A>



			<LI><A HREF="#Heading7">NOTE</A>



			<UL>



				<LI><A HREF="#Heading8">Simple Patterns</A>



			</UL>



			<LI><A HREF="#Heading9">NOTE</A>



			<UL>



				<LI><A HREF="#Heading10">Comparisons and Arithmetic</A>



				<LI><A HREF="#Heading11">Strings and Numbers</A>



				<LI><A HREF="#Heading12">Formatting Output</A>



				<LI><A HREF="#Heading13">Changing Field Separators</A>



				<LI><A HREF="#Heading14">Metacharacters</A>



			</UL>



			<LI><A HREF="#Heading15">Calling gawk Programs</A>



			<UL>



				<LI><A HREF="#Heading16">BEGIN and END</A>



				<LI><A HREF="#Heading17">Variables</A>



			</UL>



			<LI><A HREF="#Heading18">NOTE</A>



			<LI><A HREF="#Heading19">NOTE</A>



			<UL>



				<LI><A HREF="#Heading20">Built-In Variables</A>



			</UL>



			<LI><A HREF="#Heading21">Control Structures</A>



			<UL>



				<LI><A HREF="#Heading22">The if Statement</A>



				<LI><A HREF="#Heading23">The while Loop</A>



				<LI><A HREF="#Heading24">The for Loop</A>



				<LI><A HREF="#Heading25">next and exit</A>



				<LI><A HREF="#Heading26">Arrays</A>



			</UL>



			<LI><A HREF="#Heading27">Summary</A>



		</UL>



	</UL>



</UL>







<P>



<HR SIZE="4">







<H2 ALIGN="CENTER"><A NAME="Heading1<FONT COLOR="#000077">- 26 -</FONT></H2>



<H2 ALIGN="CENTER"><A NAME="Heading2<FONT COLOR="#000077">gawk</FONT></H2>



<P><I>by Tim Parker</I></P>







<P>IN THIS CHAPTER</P>







<UL>



	<LI>What Is the awk Language? 



	<P>



	<LI>Files, Records, and Fields 



	<P>



	<LI>Pattern-Action Pairs 



	<P>



	<LI>Calling gawk Programs 



	<P>



	<LI>Control Structures  



</UL>







<P><BR>



The <TT>awk</TT> programming language was created by the three people who gave their



last-name initials to the language: Alfred Aho, Peter Weinberger, and Brian Kernighan.



The <TT>gawk</TT> program included with Linux is the GNU implementation of that programming



language.</P>







<P>The <TT>awk</TT> language is more than just a programming language; it is an almost



indispensable tool for many system administrators and UNIX programmers. The language



itself is easy to learn, easy to master, and amazingly flexible. Once you get the



hang of using <TT>awk</TT>, you'll be surprised how often you can use it for routine



tasks on your system.</P>



<P>To help you understand <TT>gawk</TT>, I will follow a simple order of introducing



the elements of the programming language, as well as showing good examples. You are



encouraged, or course, to experiment as the chapter progresses.</P>



<P>I can't cover all the different aspects and features of <TT>gawk</TT> in this



chapter, but we will look at the basics of the language and show you enough, hopefully,



to get your curiosity working.



<H3 ALIGN="CENTER"><A NAME="Heading3<FONT COLOR="#000077">What Is the awk Language?</FONT></H3>



<P><TT>awk</TT> is designed to be an easy-to-use programming language that lets you



work with information either stored in files or piped to it. The main strengths of



<TT>awk</TT> are its capabilities to do the following:







<UL>



	<LI>Display some or all the contents of a file, selecting rows, columns, or fields



	as necessary.



	<P>



	<LI>Analyze text for frequency of words, occurrences, and so on.



	<P>



	<LI>Prepare formatted output reports based on information in a file.



	<P>



	<LI>Filter text in a very powerful manner.



	<P>



	<LI>Perform calculations with numeric information from a file.



</UL>







<P><TT>awk</TT> isn't difficult to learn. In many ways, <TT>awk</TT> is the ideal



first programming language because of its simple rules, basic formatting, and standard



usage. Experienced programmers will find <TT>awk</TT> refreshingly easy to use.



<H3 ALIGN="CENTER"><A NAME="Heading4<FONT COLOR="#000077">Files, Records, and



Fields</FONT></H3>



<P>Usually, <TT>gawk</TT> works with data stored in files. Often this is numeric



data, but <TT>gawk</TT> can work with character information, too. If data is not



stored in a file, it is supplied to <TT>gawk</TT> through a pipe or other form of



redirection. Only ASCII files (text files) can be properly handled with <TT>gawk</TT>.



Although it does have the ability to work with binary files, the results are often



unpredictable. Since most information on a Linux system is stored in ASCII, this



isn't a problem.</P>



<P>As a simple example of a file that <TT>gawk</TT> works with, consider a telephone



directory. It is composed of many entries, all with the same format: last name, first



name, address, telephone number. The entire telephone directory is a database of



sorts, although without a sophisticated search routine. Indeed, the telephone directory



relies on a pure alphabetical order to enable users to search for the data they need.</P>



<P>Each line in the telephone directory is a complete set of data on its own and



is called a record. For example, the entry in the telephone directory for &quot;Smith,



John,&quot; which includes his address and telephone number, is a record.</P>



<P>Each piece of information in the record--the last name, the first name, the address,



and the telephone number--is called a field. For the <TT>gawk</TT> language, the



field is a single piece of information. A record, then, is a number of fields that



pertain to a single item. A set of records makes up a file.</P>



<P>In most cases, fields are separated by a character that is used only to separate



fields, such as a space, a tab, a colon, or some other special symbol. This character



is called a field separator. A good example is the file <TT>/etc/passwd</TT>, which



looks like this:<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">tparker:t36s62hsh:501:101:Tim Parker:/home/tparker:/bin/bash



etreijs:2ys639dj3h:502:101:Ed Treijs:/home/etreijs:/bin/tcsh



ychow:1h27sj:503:101:Yvonne Chow:/home/ychow:/bin/bash



</FONT></PRE>



<P>If you look carefully at the file, you will see that it uses a colon as the field



separator. Each line in the <TT>/etc/passwd</TT> file has seven fields: the user



name, the password, the user ID, the group ID, a comment field, the home directory,



and the startup shell. Each field is separated by a colon. Colons exist only to separate



fields. A program looking for the sixth field in any line needs only count five colons



across (because the first field doesn't have a colon before it).</P>



<P>That's where we find a problem with the <TT>gawk</TT> definition of fields as



they pertain to the telephone directory example. Consider the following lines from



a telephone directory:<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">Smith, John    13 Wilson St.              555-1283



Smith, John    2736 Artside Dr, Apt 123   555-2736



Smith, John    125 Westmount Cr           555-1726



</FONT></PRE>



<P>We &quot;know&quot; there are four fields here: the last name, the first name,



the address, and the telephone number. But <TT>gawk</TT> doesn't see it that way.



The telephone book uses the space character as a field separator, so on the first



line it sees &quot;Smith&quot; as the first field, &quot;John&quot; as the second,



&quot;13&quot; as the third, &quot;Wilson&quot; as the fourth, and so on. As far



as <TT>gawk</TT> is concerned, the first line when using a space character as a field



separator has six fields. The second line has eight fields.







<DL>



	<DT></DT>



</DL>











<DL>



	<DD>



<HR>



<A NAME="Heading5<FONT COLOR="#000077"><B>NOTE:</B> </FONT>When working with



	a programming language, you must consider data the way the language will see it.



	Remember that programming languages take things literally.



<HR>







</DL>







<P>To make sense of the telephone directory the way we want to handle it, we have



to find another way of structuring the data so that there is a field separator between



the sections. For example, the following uses the slash character as the field separator:<FONT



COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">Smith/John/13 Wilson St./555-1283



Smith/John/2736 Artside Dr, Apt 123/555-2736



Smith/John/125 Westmount Cr/555-1726



</FONT></PRE>



<P>By default, <TT>gawk</TT> uses blank characters (spaces or tabs) as field separators



unless instructed to use another character. If <TT>gawk</TT> is using spaces, it



doesn't matter how many are in a row; they are treated as a single block for purposes



of finding fields. Naturally, there is a way to override this behavior, too.



<H3 ALIGN="CENTER"><A NAME="Heading6<FONT COLOR="#000077">Pattern-Action Pairs</FONT></H3>



<P>The <TT>gawk</TT> language has a particular format for almost all instructions.



Each command is composed of two parts: a pattern and a corresponding action. Whenever



the pattern is matched, <TT>gawk</TT> executes the action that matches that pattern.</P>



<P>Pattern-action pairs can be thought of in more common terms to show how they work.



Consider instructing someone how to get to the post office. You might say, &quot;Go



to the end of the street and turn right. At the stop sign, turn left. At the end



of the street, go right.&quot; You have created three pattern-action pairs with these



instructions:<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">end of street: turn right



stop sign: turn left



end of street: turn right



</FONT></PRE>



<P>When these patterns are met, the corresponding action is taken. You wouldn't turn



right before you reached the end of the street, and you don't turn right until you



get to the end of the street, so the pattern must be matched precisely for the action



to be performed. This is a bit simplistic, but it gives you the basic idea.</P>



<P>With <TT>gawk</TT>, the patterns to be matched are enclosed in a pair of slashes,



and the actions are in a pair of curly braces:<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">/pattern1/{action1}



/pattern2/{action2}



/pattern3/{action3}



</FONT></PRE>



<P>This format makes it quite easy to tell where the pattern starts and ends, and



when the action starts and ends. All <TT>gawk</TT> programs are sets of these pattern-action



pairs, one after the other. Remember these pattern-action pairs are working on text



files, so a typical set of patterns might be matching a set of strings, and the actions



might be to print out parts of the line that matched.</P>



<P>Suppose there isn't a pattern? In that case, the pattern matches every time and



the action is executed every time. If there is no action, <TT>gawk</TT> copies the



entire line that matched without change.</P>



<P>Here are some simple examples. The <TT>gawk</TT> command<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">gawk `/tparker/' /etc/passwd



</FONT></PRE>



<P>will look for each line in the <TT>/etc/passwd</TT> file that contains the pattern



<TT>tparker</TT> and display it (there is no action, only a pattern). The output



from the command will be the one line in the <TT>/etc/passwd</TT> file that contains



the string <TT>tparker</TT>. If there is more than one line in the file with that



pattern, they all will be displayed. In this case, <TT>gawk</TT> is acting exactly



like the <TT>grep</TT> utility!</P>



<P>This example shows you two important things about <TT>gawk</TT>: It can be invoked



from the command line by giving it the pattern-action pair to work with and a filename,



and it likes to have single quotes around the pattern-action pair in order to differentiate



them from the filename.</P>



<P>The <TT>gawk</TT> language is literal in its matching. The string <TT>cat</TT>



will match any lines with <TT>cat</TT> in them, whether the word &quot;cat&quot;



by itself or part of another word such as &quot;concatenate.&quot; To be exact, put



spaces on either side of the word. Also, case is important. We'll see how to expand



the matching in the section &quot;Metacharacters&quot; a little later in the chapter.</P>



<P>Jumping ahead slightly, we can introduce a <TT>gawk</TT> command. The command<FONT



COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">gawk `{print $3}' file2.data



</FONT></PRE>



<P>has only one action, so it performs that action on every line in the file <TT>file2.data</TT>.



The action is <TT>print $3</TT>, which tells <TT>gawk</TT> to print the third field



of every line. The default field separator, a space, is used to tell where fields



begin and end. If we had tried the same command on the <TT>/etc/passwd</TT> file,



nothing would have been displayed because the field separator used in that file is



the colon.</P>



<P>We can combine the two commands to show a complete pattern-action pair:<FONT COLOR="#0066FF"></FONT>



<PRE><FONT COLOR="#0066FF">gawk `/UNIX/{print $2}' file2.data



</FONT></PRE>



<P>This command will search <TT>file2.data</TT> line by line, looking for the string



<TT>UNIX</TT>. If it finds <TT>UNIX</TT>, it prints the second field of that line



(record).







<DL>



	<DT></DT>



</DL>











<DL>



	<DD>



<HR>



⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -