442-444.html
来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 120 行
HTML
120 行
<HTML>
<HEAD>
<TITLE>Linux Unleashed, Third Edition:gawk</TITLE>
<SCRIPT>
<!--
function displayWindow(url, width, height) {
var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>
-->
<!--ISBN=0672313723//-->
<!--TITLE=Linux Unleashed, Third Edition//-->
<!--AUTHOR=Tim Parker//-->
<!--PUBLISHER=Macmillan Computer Publishing//-->
<!--IMPRINT=Sams//-->
<!--CHAPTER=25//-->
<!--PAGES=442-444//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="438-442.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="444-447.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<P>To make sense of the telephone directory the way we want to handle it, we have to find another way of structuring the data so that there is a field separator between the sections. For example, the following uses the slash character as the field separator:
</P>
<!-- CODE SNIP //-->
<PRE>
Smith/John/13 Wilson St./555-1283
Smith/John/2736 Artside Dr, Apt 123/555-2736
Smith/John/125 Westmount Cr/555-1728
</PRE>
<!-- END CODE SNIP //-->
<P>By default, <TT>gawk</TT> uses blank characters (spaces or tabs) as field separators unless instructed to use another character. If <TT>gawk</TT> is using spaces, it doesn’t matter how many are in a row; they are treated as a single block for purposes of finding fields. Naturally, there is a way to override this behavior, too.</P>
<H3><A NAME="Heading4"></A><FONT COLOR="#000077">Pattern-Action Pairs</FONT></H3>
<P>The <TT>gawk</TT> language has a particular format for almost all instructions. Each command is composed of two parts: a pattern and a corresponding action. Whenever the pattern is matched, <TT>gawk</TT> executes the action that matches that pattern.</P>
<P><I>Pattern-action pairs</I> can be thought of in more common terms to show how they work. Consider instructing someone how to get to the post office. You might say, “Go to the end of the street and turn right. At the stop sign, turn left. At the end of the street, go right.” You have created three pattern-action pairs with these instructions:</P>
<!-- CODE SNIP //-->
<PRE>
end of street: turn right
stop sign: turn left
end of street: turn right
</PRE>
<!-- END CODE SNIP //-->
<P>When these patterns are met, the corresponding action is taken. You wouldn’t turn right before you reached the end of the street, and you don’t turn right until you get to the end of the street, so the pattern must be matched precisely for the action to be performed. This is a bit simplistic, but it gives you the basic idea.
</P>
<P>With <TT>gawk</TT>, the patterns to be matched are enclosed in a pair of slashes, and the actions are in a pair of braces:</P>
<!-- CODE SNIP //-->
<PRE>
/<I>pattern1</I>/{<I>action1</I>}
<I>/pattern2</I>/{<I>action2</I>}
/<I>pattern3</I>/{<I>action3</I>}
</PRE>
<!-- END CODE SNIP //-->
<P>This format makes it quite easy to tell where the pattern starts and ends, and when the action starts and ends. All <TT>gawk</TT> programs are sets of these pattern-action pairs, one after the other. Remember these pattern-action pairs are working on text files, so a typical set of patterns might be matching a set of strings, and the actions might be to print out parts of the line that matched.</P>
<P>Suppose there isn’t a pattern? In that case, the pattern matches every time and the action is executed every time. If there is no action, <TT>gawk</TT> copies the entire line that matched without change.</P>
<P>Consider the following example:</P>
<!-- CODE SNIP //-->
<PRE>
gawk ’/tparker/’ /etc/passwd
</PRE>
<!-- END CODE SNIP //-->
<P>The <TT>gawk</TT> command looks for each line in the <TT>/etc/passwd</TT> file that contains the pattern <TT>tparker</TT> and displays it (there is no action, only a pattern). The output from the command is the one line in the <TT>/etc/passwd</TT> file that contains the string <TT>tparker</TT>. If there is more than one line in the file with that pattern, they all are displayed. In this case, <TT>gawk</TT> is acting exactly like the <TT>grep</TT> utility!</P>
<P>This example shows you two important things about <TT>gawk</TT>: It can be invoked from the command line by giving it the pattern-action pair to work with and a filename, and it likes to have single quotes around the pattern-action pair in order to differentiate them from the filename.</P>
<P>The <TT>gawk</TT> language is literal in its matching. The string <TT>cat</TT> will match any lines with <TT>cat</TT> in them, whether the word “cat” is by itself or part of another word such as “concatenate.” To be exact, insert spaces on each side of the word. Also, case is important. We’ll see how to expand the matching in the section “Metacharacters” a little later in the chapter.</P>
<P>Jumping ahead slightly, we can introduce a <TT>gawk</TT> command:</P>
<!-- CODE SNIP //-->
<PRE>
gawk ’{print $3}’ file2.data
</PRE>
<!-- END CODE SNIP //-->
<P>The preceding command has only one action, so it performs that action on every line in the file <TT>file2.data</TT>. The action is <TT>print $3</TT>, which tells <TT>gawk</TT> to print the third field of every line. The default field separator, a space, is used to tell where fields begin and end. If we try the same command on the <TT>/etc/passwd</TT> file, nothing displays because the field separator used in that file is the colon.</P>
<P>We can combine the two commands to show a complete pattern-action pair:</P>
<!-- CODE SNIP //-->
<PRE>
gawk ’/UNIX/{print $2}’ file2.data
</PRE>
<!-- END CODE SNIP //-->
<BLOCKQUOTE>
<P><FONT SIZE="-1"><HR><B>Tip: </B><BR>The quotation marks around the entire pattern-action pair are very important and should not be left off. Without them, the command might not execute properly. Make sure the quotation marks match (don’t use a single quotation mark at the beginning and a double quotation mark at the end).<HR></FONT>
</BLOCKQUOTE>
<P>This command searches <TT>file2.data</TT> line by line, looking for the string <TT>UNIX</TT>. If it finds <TT>UNIX</TT>, it prints the second column of that line (record).</P>
<P>You can combine more than one pattern-action pair in a command. For example, the command</P>
<!-- CODE SNIP //-->
<PRE>
gawk ’/scandal/{print $1} /rumor/{print $2}’ gossip_file
</PRE>
<!-- END CODE SNIP //-->
<P>scans <TT>gossip_file</TT> for all occurrences of the pattern “scandal” and prints the first column, and then starts at the top again and searches for the pattern “rumor” and prints the second column. The scan starts at the top of the file each time there is a new pattern-action pair.</P><P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="438-442.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="444-447.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
</td>
</tr>
</table>
<!-- begin footer information -->
</body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?