📄 unx15.htm
字号:
<PRE>{name = $1
print name}</PRE>
<P>and
<BR></P>
<PRE>{name = $1}
{print name},</PRE>
<P>These two produce identical output.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I29" NAME="I29">
<FONT SIZE=3><B>Variables</B>
<BR></FONT></A></CENTER></H4>
<P>An integral part of any programming language are variables, the virtual boxes within which you can store values, count things, and more. In this section, I talk about variables in awk. Awk has three types of variables: user-defined variables, field
variables, and predefined variables that are provided by the language automatically. The next section is devoted to a discussion of built-in variables. Awk doesn't have variable declarations. A variable comes to life the first time it is mentioned; in a
twist on René Descarte's philosophical conundrum, you use it, therefore it is. The section concludes with an example of turning an awk program into a shell script.
<BR></P>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="caution.gif" WIDTH = 37 HEIGHT = 35><B>CAUTION:</B> Since there are no declarations, be doubly careful to initialize all the variables you use, though you can always be sure that they automatically start with the value zero.
<BR></NOTE>
<HR ALIGN=CENTER>
<H5 ALIGN="CENTER">
<CENTER><A ID="I30" NAME="I30">
<FONT SIZE=3><B>Naming</B>
<BR></FONT></A></CENTER></H5>
<P>The rule for naming user-defined variables is that they can be any combination of letters, digits, and underscores, as long as the name starts with a letter. It is helpful to give a variable a name indicative of its purpose in the program. Variables
already defined by awk are written in all uppercase. Since awk is case-sensitive, ofs is not the same variable as OFS and capitalization (or lack thereof) is a common error. You have already seen field variables—variables beginning with $, followed by
a number, and indicating a specific input field.
<BR></P>
<P>A variable is a number or a string or both. There is no type declaration, and type conversion is automatic if needed. Recall the car sales file used earlier. For illustration suppose I enter the program <B>awk</B><B> </B><B>-F: { print $1 * 10}
emp.data</B>, and awk obligingly provides the rest:
<BR></P>
<PRE>0
0
0
0
0</PRE>
<P>Of course, this makes no sense! The point is that awk did exactly what it was asked without complaint: it multiplied the name of the employee times ten, and when it tried to translate the name into a number for the mathematical operation it failed,
resulting in a zero. Ten times zero, needless to say, is zero...
<BR></P>
<H5 ALIGN="CENTER">
<CENTER><A ID="I31" NAME="I31">
<FONT SIZE=3><B>Awk in a Shell Script</B>
<BR></FONT></A></CENTER></H5>
<P>Before examining the next example, review what you know about shell programming (Chapters 10-14). Remember, every file containing shell commands needs to be changed to an executable file before you can run it as a shell script. To do this you should
enter chmod +x <I>filename</I> from the command line.
<BR></P>
<P>Sometimes awk's automatic type conversion benefits you. Imagine that I'm still trying to build an office system with awk scripts and this time I want to be able to maintain a running monthly sales total based on a data file that contains individual
monthly sales. It looks like this:
<BR></P>
<PRE>cat monthly.sales
John Anderson,12,23,7
Joe Turner,10,25,15
Susan Greco,15,13,18
Bob Burmeister,8,21,17</PRE>
<P>These need to be added together to calculate the running totals for each person's sales. Let a program do it!
<BR></P>
<PRE>$cat total.awk
BEGIN {OFS=,} #change OFS to keep the file format the same.
{print $1, " monthly sales summary: " $2+$3+$4 }</PRE>
<P>That's the awk script, so let's see how it works:
<BR></P>
<PRE>$ awk -f total.awk monthly.sales
cat sales
John Anderson, monthly sales summary: 42
Joe Turner, monthly sales summary: 50
Susan Greco, monthly sales summary: 46
Bob Burmeister, monthly sales summary: 46</PRE>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="caution.gif" WIDTH = 37 HEIGHT = 35><B>CAUTION:</B> Always run your program once to be sure it works before you make it part of a complicated shell script!
<BR></NOTE>
<HR ALIGN=CENTER>
<P>Your task has been reduced to entering the monthly sales figures in the sales file and editing the program file total to include the correct number of fields (if you put a for loop for(i=2;i<+NF;i++) the number of fields is correctly calculated, but
printing is a hassle and needs an if statement with 12 else if clauses).
<BR></P>
<P>In this case, not having to wonder if a digit is part of a string or a number is helpful. Just keep an eye on the input data, since awk performs whatever actions you specify, regardless of the actual data type with which you're working.
<BR></P>
<H5 ALIGN="CENTER">
<CENTER><A ID="I32" NAME="I32">
<FONT SIZE=3><B>Built-in Variables</B>
<BR></FONT></A></CENTER></H5>
<P>This section discusses the built-in variables found in awk. Because there are many versions of awk, I included notes for those variables found in nawk, POSIX awk, and gawk since they all differ. As before, unless otherwise noted, the variables of
earlier releases may be found in the later implementations. Awk was released first and contains the core set of built-in variables used by all updates. Nawk expands the set. The POSIX awk specification encompasses all variables defined in nawk plus one
additional variable. Gawk applies the POSIX awk standards and then adds some built-in variables which are found in gawk alone; the built-in variables noted when discussing gawk are unique to gawk. This list is a guideline not a hard and fast rule. For
instance, the built-in variable ENVIRON is formally introduced in the POSIX awk specifications; it exists in gawk; it is in also in the System V implementation of nawk, but SunOS nawk doesn't have the variable ENVIRON. (See the section "'Oh man! I
need help.'"in Chapter 5 for more information on how to use man pages).
<BR></P>
<P>As I stated earlier, awk is case sensitive. In all implementations of awk, built-in variables are written entirely in upper case.
<BR></P>
<H6 ALIGN="CENTER">
<CENTER>
<FONT SIZE=3><B>Built-in Variables for Awk</B>
<BR></FONT></CENTER></H6>
<P>When awk first became a part of UNIX, the built-in variables were the bare essentials. As the name indicates, the variable FILENAME holds the name of the current input file. Recall the function finder code; type the new line below:
<BR></P>
<PRE>/function <I>functionname</I>/,/} \/* end of <I>functionname</I>/' {print $0}
END {print ""; print "Found in the file " FILENAME}</PRE>
<P>This adds the finishing touch.
<BR></P>
<P>The value of the variable FS determines the input field separator. FS has a space as its default value. The built-in variable NF contains the number of fields in the current record (remember, fields are akin to words, and records are input lines). This
value may change for each input record.
<BR></P>
<P>What happens if within an awk script I have the following statement?
<BR></P>
<PRE>$3 = "Third field"</PRE>
<P>It reassigns $3 and all other field variables, also reassigning NF to the new value. The total number of records read may be found in the variable NR. The variable OFS holds the value for the output field separator. The default value of OFS is a space.
The value for the output format for numbers resides in the variable OFMT which has a default value of %.6g. This is the format specifier for the print statement, though its syntax comes from the C printf format string. ORS is the output record separator.
Unless changed, the value of ORS is newline(\n).
<BR></P>
<H6 ALIGN="CENTER">
<CENTER>
<FONT SIZE=3><B>Built-in Variables for Nawk</B>
<BR></FONT></CENTER></H6>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="note.gif" WIDTH = 35 HEIGHT = 35><B>NOTE:</B> When awk was expanded in 1985, part of the expansion included adding more built-in variables.
<BR></NOTE>
<HR ALIGN=CENTER>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="caution.gif" WIDTH = 37 HEIGHT = 35><B>CAUTION:</B> Some implementations of UNIX simply put the new code in the spot for the old code and didn't bother keeping both awk and nawk. System V and SunOS have both available. Linux has neither awk nor
nawk but uses gawk. System V has both, but the awk uses nawk expansions. The book "awk the programming language" by the awk authors speaks of awk throughout the book, but the programming language it describes is called nawk on most systems.
<BR></NOTE>
<HR ALIGN=CENTER>
<P>The built-in variable ARGC holds the value for the number of command line arguments. The variable ARGV is an array containing the command line arguments. Subscripts for ARGV begin with 0 and continue through ARGC-1. ARGV[0] is always awk. The available
UNIX options do not occupy ARGV. The variable FNR represents the number of the current record within that input file. Like NR, this value changes with each new record. FNR is always <= NR. The built-in variable RLENGTH holds the value of the length of
string matched by the match function. The variable RS holds the value of the input record separator. The default value of RS is a newline. The start of the string matched by the match function resides in RSTART. Between RSTART and RLENGTH, it is possible
to determine what was matched. The variable SUBSEP contains the value of the subscript separator. It has a default value of "\034".
<BR></P>
<H6 ALIGN="CENTER">
<CENTER>
<FONT SIZE=3><B>Built-in Variables for POSIX Awk</B>
<BR></FONT></CENTER></H6>
<P>The POSIX awk specification introduces one new built-in variable beyond those in nawk. The built-in variable ENVIRON is an array that holds the values of the current environment variables. (Environment variables are discussed more thoroughly later in
this chapter.) The subscript values for ENVIRON are the names of the environment variables themselves, and each ENVIRON element is the value of that variable. For instance, ENVIRON["HOME"] on my PC under Linux is "/home". Notice that
using ENVIRON can save much system dependence within awk source code in some cases but not others. ENVIRON["HOME"] at work is "/usr/anne" while my SunOS account doesn't have an ENVIRON variable because it's not POSIX compliant.
<BR></P>
<P>Here's an example of how you could work with the environment variables:
<BR></P>
<PRE>ENVIRON[EDITOR] == "vi" {print NR,$0}</PRE>
<P>This program prints my program listings with line numbers if I am using vi as my default editor. More on this example later in the chapter.
<BR></P>
<H6 ALIGN="CENTER">
<CENTER>
<FONT SIZE=3><B>Built-in Variables in Gawk</B>
<BR></FONT></CENTER></H6>
<P>The GNU group further enhanced awk by adding four new variables to gawk, its public re-implementation of awk. Gawk does not differ between UNIX versions as much as awk and nawk do, fortunately. These built-in variables are in addition to those mentioned
in the POSIX specification as described above. The variable CONVFMT contains the conversion format for numbers. The default value of CONVFMT is "%.6g" and is for internal use only. The variable FIELDWIDTHS allows a programmer the option of having
fixed field widths rather than a single character field separator. The values of FIELDWIDTHS are numbers separated by a space or Tab (\t), so fields need not all be the same width. When the FIELDWIDTHS variable is set, each field is expected to have a
fixed width. Gawk separates the input record using the FIELDWIDTHS values for field widths. If FIELDWIDTHS is set, the value of FS is disregarded. Assigning a new value to FS overrides the use of FIELDWIDTHS; it restores the default behavior.
<BR></P>
<P>To see where this could be useful, let's imagine that you've just received a datafile from accounting that indicates the different employees in your group and their ages. It might look like:
<BR></P>
<PRE>$ cat gawk.datasample
1Swensen, Tim 24
1Trinkle, Dan 22
0Mitchel, Carl 27</PRE>
<P>The very first character, you find out, indicates if they're hourly or salaried: a value of 1 means that they're salaried, and a value of 0 is hourly. How to split that character out from the rest of the data field? With the FIELDWIDTHS statement.
Here's a simple gawk script that could attractively list the data:
<BR></P>
<PRE>BEGIN {FIELDWIDTHS = 1 8 1 4 1 2}
{ if ($1 == 1) print "Salaried employee "$2,$4" is "$6" years old.";
else print "Hourly employee "$2,$4" is "$6" years old."
}</PRE>
<P>The output would look like:
<BR></P>
<PRE>Salaried employee Swensen, Tim is 24 years old.
Salaried employee Trinkle, Dan is 22 years old.
Hourly employee Mitchel, Carl is 27 years old.</PRE>
<HR ALIGN=CENTER>
<NOTE>
<IMG SRC="imp.gif" WIDTH = 68 HEIGHT = 35><B>TIP:</B> When calculating the different FIELDWIDTH values, don't forget any field separators: the spaces between words do count in this case.
<BR></NOTE>
<HR ALIGN=CENTER>
<P>The variable IGNORECASE controls the case sensitivity of gawk regular expressions. If IGNORECASE has a nonzero value, pattern matching ignores case for regular expression operations. The default value of IGNORECASE is zero; all regular expression
operations are normally case sensitive.
<BR></P>
<H4 ALIGN="CENTER">
<CENTER><A ID="I33" NAME="I33">
<FONT SIZE=3><B>Conditions (No </B><B><I>IF</I></B><B>s, </B><B><I>&&</I></B><B>s or </B><B><I>but</I></B><B>s)</B>
<BR></FONT></A></CENTER></H4>
<P>Awk program statements are, by their very nature, conditional; if a pattern matches, then a specified action or actions occurs. Actions, too, have a conditional form. This section discusses conditional flow. It focuses on the syntax of the if statement,
but, as usual in awk, there are multiple ways to do something.
<BR></P>
<P>A conditional statement does a test before it performs the action. One test, the pattern match, has already happened; this test is an action. The last two sections introduced variables; now you can begin putting them to practical uses.
<BR></P>
<H5 ALIGN="CENTER">
<CENTER><A ID="I34" NAME="I34">
<FONT SIZE=3><B>The </B><B><I>if</I></B><B> Statement</B>
<BR></FONT></A></CENTER></H5>
<P>An if statement takes the form of a typical iterative programming language control structure where E1 is an expression, as mentioned in the "Patterns" section earlier in this chapter:
<BR></P>
<PRE>if E1 S2; else S3.</PRE>
<P>While E1 is always a single expression, S2 and S3 may be either single- or multiple-action statements (that means conditions in conditions are legal syntax, but I am getting ahead of myself). Returns and indention are, as usual in awk, entirely up to
you. However, if S2 and the else statement are on the same line, and S2 is a single statement, a semicolon must separate S2 from the else statement. When awk encounters an if statement, evaluation occurs as follows: first E1 is evaluated, and if E1 is
nonzero or nonnull(true), S2 is executed; if E1 is zero or null(false) and there's an else clause, S3 is executed. For instance, if you want to print a blank line when the third field has the value 25 and the entire line in all other cases, you could use a
program snippet like this:
<BR></P>
<PRE>{ if $3 == 25
print ""
else
print $0 }</PRE>
<P>The portion of the if statement involving S is completely optional since sometimes your choice is limited to whether or not to have awk execute S2:
<BR></P>
<PRE>{ if $3 == 25
print "" }</PRE>
<P>Although the if statement is an action, E1 can test for a pattern match using the pattern-match operator ~. As you have already seen, you can use it to look for my name in the password file another way. The first way
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -