442-444.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 120 行

HTML
120
字号
<HTML>

<HEAD>

<TITLE>Linux Unleashed, Third Edition:gawk</TITLE>

<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!--ISBN=0672313723//-->

<!--TITLE=Linux Unleashed, Third Edition//-->

<!--AUTHOR=Tim Parker//-->

<!--PUBLISHER=Macmillan Computer Publishing//-->

<!--IMPRINT=Sams//-->

<!--CHAPTER=25//-->

<!--PAGES=442-444//-->

<!--UNASSIGNED1//-->

<!--UNASSIGNED2//-->



<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="438-442.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="444-447.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>

<P><BR></P>

<P>To make sense of the telephone directory the way we want to handle it, we have to find another way of structuring the data so that there is a field separator between the sections. For example, the following uses the slash character as the field separator:

</P>

<!-- CODE SNIP //-->

<PRE>

Smith/John/13 Wilson St./555-1283

Smith/John/2736 Artside Dr, Apt 123/555-2736

Smith/John/125 Westmount Cr/555-1728

</PRE>

<!-- END CODE SNIP //-->

<P>By default, <TT>gawk</TT> uses blank characters (spaces or tabs) as field separators unless instructed to use another character. If <TT>gawk</TT> is using spaces, it doesn&#146;t matter how many are in a row; they are treated as a single block for purposes of finding fields. Naturally, there is a way to override this behavior, too.</P>

<H3><A NAME="Heading4"></A><FONT COLOR="#000077">Pattern-Action Pairs</FONT></H3>

<P>The <TT>gawk</TT> language has a particular format for almost all instructions. Each command is composed of two parts: a pattern and a corresponding action. Whenever the pattern is matched, <TT>gawk</TT> executes the action that matches that pattern.</P>

<P><I>Pattern-action pairs</I> can be thought of in more common terms to show how they work. Consider instructing someone how to get to the post office. You might say, &#147;Go to the end of the street and turn right. At the stop sign, turn left. At the end of the street, go right.&#148; You have created three pattern-action pairs with these instructions:</P>

<!-- CODE SNIP //-->

<PRE>

end of street: turn right

stop sign: turn left

end of street: turn right

</PRE>

<!-- END CODE SNIP //-->

<P>When these patterns are met, the corresponding action is taken. You wouldn&#146;t turn right before you reached the end of the street, and you don&#146;t turn right until you get to the end of the street, so the pattern must be matched precisely for the action to be performed. This is a bit simplistic, but it gives you the basic idea.

</P>

<P>With <TT>gawk</TT>, the patterns to be matched are enclosed in a pair of slashes, and the actions are in a pair of braces:</P>

<!-- CODE SNIP //-->

<PRE>

/<I>pattern1</I>/&#123;<I>action1</I>&#125;

<I>/pattern2</I>/&#123;<I>action2</I>&#125;

/<I>pattern3</I>/&#123;<I>action3</I>&#125;

</PRE>

<!-- END CODE SNIP //-->

<P>This format makes it quite easy to tell where the pattern starts and ends, and when the action starts and ends. All <TT>gawk</TT> programs are sets of these pattern-action pairs, one after the other. Remember these pattern-action pairs are working on text files, so a typical set of patterns might be matching a set of strings, and the actions might be to print out parts of the line that matched.</P>

<P>Suppose there isn&#146;t a pattern? In that case, the pattern matches every time and the action is executed every time. If there is no action, <TT>gawk</TT> copies the entire line that matched without change.</P>

<P>Consider the following example:</P>

<!-- CODE SNIP //-->

<PRE>

gawk &#146;/tparker/&#146; /etc/passwd

</PRE>

<!-- END CODE SNIP //-->

<P>The <TT>gawk</TT> command looks for each line in the <TT>/etc/passwd</TT> file that contains the pattern <TT>tparker</TT> and displays it (there is no action, only a pattern). The output from the command is the one line in the <TT>/etc/passwd</TT> file that contains the string <TT>tparker</TT>. If there is more than one line in the file with that pattern, they all are displayed. In this case, <TT>gawk</TT> is acting exactly like the <TT>grep</TT> utility!</P>

<P>This example shows you two important things about <TT>gawk</TT>: It can be invoked from the command line by giving it the pattern-action pair to work with and a filename, and it likes to have single quotes around the pattern-action pair in order to differentiate them from the filename.</P>

<P>The <TT>gawk</TT> language is literal in its matching. The string <TT>cat</TT> will match any lines with <TT>cat</TT> in them, whether the word &#147;cat&#148; is by itself or part of another word such as &#147;concatenate.&#148; To be exact, insert spaces on each side of the word. Also, case is important. We&#146;ll see how to expand the matching in the section &#147;Metacharacters&#148; a little later in the chapter.</P>

<P>Jumping ahead slightly, we can introduce a <TT>gawk</TT> command:</P>

<!-- CODE SNIP //-->

<PRE>

gawk &#146;&#123;print &#36;3&#125;&#146; file2.data

</PRE>

<!-- END CODE SNIP //-->

<P>The preceding command has only one action, so it performs that action on every line in the file <TT>file2.data</TT>. The action is <TT>print &#36;3</TT>, which tells <TT>gawk</TT> to print the third field of every line. The default field separator, a space, is used to tell where fields begin and end. If we try the same command on the <TT>/etc/passwd</TT> file, nothing displays because the field separator used in that file is the colon.</P>

<P>We can combine the two commands to show a complete pattern-action pair:</P>

<!-- CODE SNIP //-->

<PRE>

gawk &#146;/UNIX/&#123;print &#36;2&#125;&#146; file2.data

</PRE>

<!-- END CODE SNIP //-->

<BLOCKQUOTE>

<P><FONT SIZE="-1"><HR><B>Tip:&nbsp;&nbsp;</B><BR>The quotation marks around the entire pattern-action pair are very important and should not be left off. Without them, the command might not execute properly. Make sure the quotation marks match (don&#146;t use a single quotation mark at the beginning and a double quotation mark at the end).<HR></FONT>

</BLOCKQUOTE>

<P>This command searches <TT>file2.data</TT> line by line, looking for the string <TT>UNIX</TT>. If it finds <TT>UNIX</TT>, it prints the second column of that line (record).</P>

<P>You can combine more than one pattern-action pair in a command. For example, the command</P>

<!-- CODE SNIP //-->

<PRE>

gawk &#146;/scandal/&#123;print &#36;1&#125; /rumor/&#123;print &#36;2&#125;&#146; gossip_file

</PRE>

<!-- END CODE SNIP //-->

<P>scans <TT>gossip_file</TT> for all occurrences of the pattern &#147;scandal&#148; and prints the first column, and then starts at the top again and searches for the pattern &#147;rumor&#148; and prints the second column. The scan starts at the top of the file each time there is a new pattern-action pair.</P><P><BR></P>

<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="438-442.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="444-447.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>





</td>
</tr>
</table>

<!-- begin footer information -->





</body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?