📄 gawk.hlp
字号:
Each action can contain more than one statement or expression to be executed, provided that they're separated by semicolons (;) and/or on separate lines. An omitted action is equivalent to { print $0 } which prints the current record.3 operators Relational operators == compare for equality != compare for inequality <, <=, >, >= numerical or lexical comparison (less than, less or equal, greater than, greater or equal, respectively) ~ match against a regular expression !~ match against a regular expression, but accept failed matches instead of successful ones Arithmetic operators + addition - subtraction * multiplication / division % remainder ^, ** exponentiation ('**' is a synonym for '^', unless POSIX compatibility is specified, in which case it's invalid) Boolean operators (aka Logical operators) a value is considered false if it's 0 or a null string, it is true otherwise; the result of a boolean operation (and also of a comparison operation) will be 0 when false or 1 when true || or [expression (a || b) is true if either a is true or b is true or both a and b are true; it is false otherwise; b is not evaluated unless a is false (ie, short-circuit)] && and [expression (a && b) is true if both a and b are true; it is false otherwise; b is only evaluated if a is true] ! not [expression (!a) is true if a is false, false otherwise] in array membership; the keyword 'in' tests whether the value on the left represents a current subscript in the array named on the right Conditional operator ? : the conditional operator takes three operands; the first is an expression to evaluate, the second is the expression to use if the first was true, the third is the expression to use if it was false [simple example (a < b ? b : a) gives the maximum of a and b] Assignment operators = store the value on the right into the variable or array slot on the left [expression (a = b) stores the value of b in a] +=, -=, *=, /=, %=, ^=, **= perform the indicated arithmetic operation using the current value of the variable or array element of the left side and the expression on the right side, then store the result in the left side ++ increment by 1 [expression (++a) gets the current value of a and adds 1 to it, stores that back in a, and returns the new value; expression (a++) gets the current value of a, adds 1 to it, stores that back in a, but returns the original value of a] -- decrement by 1 (analogous to increment) String operators there is no explicit operator for string concatenation; two values and/or variables side-by-side are implicitly concatenated into a string (numeric values are first converted into their string equivalents) Conversion between numeric and string values there is no explicit operator for conversion; adding 0 to a string with force it to be converted to a number (the numeric value will be 0 if the string does not represent an integer or floating point number); the reverse, converting a number into a string, is done by concatenating a null string ("") to it [the expression (5.75 "") evaluates to "5.75"] Field 'operator' $ prefixing a number or variable with a dollar sign ($) causes the appropriate record field to be returned [($2) gives the second field of the record, ($NF) gives the last field (since the builtin variable NF is set to the number of fields in the current record)] Array subscript operator , multi-dimensional arrays are simulated by using comma (,) separated array indices; the actual index is generated by replacing commas with the value of builtin SUBSEP, then concatenating the expression into a string index [comma is also used to separate arguments in function calls and user-defined function definitions] [comma is *also* used to indicate a range pattern in an awk rule] Escape 'operator' \ In quoted character strings, the backslash (\) character causes the following character to be interpreted in a special manner [string "one\ntwo" has an embedded newline character (linefeed on VMS, but treated as if it were both carriage-return and linefeed); string "\033[" has an ASCII 'escape' character (which has octal value 033) followed by a 'right-bracket' character] Backslash is also used in regular expressions Redirection operators < Read-from -- valid with 'getline' > Write-to (create new file) -- valid with 'print' and 'printf' >> Append-to (create file if it doesn't already exist) | Pipe-from/to -- valid with 'getline', 'print', and 'printf'4 precedence Operator precedence, listed from highest to lowest. Assignment, conditional, and exponentiation operators group from right to left; all others group from left to right. Parentheses may be used to override the normal order. field ($) increment (++), decrement (--) exponentiation (^, **) unary plus (+), unary minus (-), boolean not (!) multiplication (*), division (/), remainder (%) addition (+), subtraction (-) concatenation (no special symbol; implied by context) relational (==, !=, <, >=, etc), and redirection (<, >, >>, |) Relational and redirection operators have the same precedence and use similar symbols; context distinguishes between them matching (~, !~) array membership ('in') boolean and (&&) boolean or (||) conditional (? :) assignment (=, +=, etc)4 escaped_characters Inside of a quoted string or constant regular expression, the backslash (\) character gives special meaning to the character(s) after it. Special character letters are case sensitive. \\ results in one backslash in the string \a is an 'alert' (<ctrl/G>. the ASCII <bell> character) \b is a backspace (BS, <ctrl/H>) \f is a form feed (FF, <ctrl/L>) \n 'newline' (<ctrl/J> [line feed treated as CR+LF] \r carriage return (CR, <ctrl/M> [re-positions at the beginning of the current line] \t tab (HT, <ctrl/I>) \v vertical tab (VT, <ctrl/K>) \### is an arbitrary character, where '###' represents 1 to 3 octal (ie, 0 thru 7) digits \x## is an alternate arbitrary character, where '##' represents 1 or more hexadecimal (ie, 0 thru 9 and/or A through E and/or a through e) digits; if more than two digits follow, the result is undefined; not recognized if POSIX compatibility mode is specified.3 statements A statement refers to a unit of instruction found in the action part of an awk rule, and also found in the definition of a function. The distinction between action, statement, and expression usually won't matter to an awk programmer. Compound statements consist of multiple statements separated by semicolons or newlines and enclosed within braces ({}). They are sometimes referred to as 'blocks'.4 expressions An expression such as 'a = 10' or 'n += i++' is a valid statement. Function invocations such as 'reformat_field($3)' are also valid statements.4 if-then-else A conditional statement in awk uses the same syntax as for the 'C' programming language: the 'if' keyword, followed by an expression in parentheses, followed by a statement--or block of statements enclosed within braces ({})--which will be executed if the expression is true but skipped if it's false. This can optionally be followed by the 'else' keyword and another statement--or block of statements-- which will be executed if (and only if) the expression was false.5 examples Simple example showing a statement used to control how many numbers are printed on a given line. if ( ++i <= 10 ) #check whether this would be the 11th printf(" %5d", k) #print on current line if not else { printf("\n %5d", k) #print on next line if so i = 1 #and reset the counter } Another example ('next' is described under 'action-controls') if ($1 > $2) { print "rejected"; next } else diff = $2 - $14 loops Three types of loop statements are available in awk. Each uses the same syntax as 'C'. The simplest of the three is the 'while' statement. It consists of the 'while' keyword, followed by an expression enclosed within parentheses, followed by a statement--or block of statements in braces ({})--which will be executed if the expression evaluates to true. The expression is evaluated before attempting to execute the statement; if it's true, the statement is executed (the entire block of statements if there is a block) and then the expression is re-evaluated. The second type of loop is the do-while loop. It consists of the 'do' keyword, followed by a statement (usually a block of statements enclosed within braces), followed by the 'while' keyword, followed by a test expression enclosed within parentheses. The statement--or block--is always executed at least once. Then the test expression is evaluated, and the statement(s) re-executed if the result was true (followed by re-evaluation of the test, and so on). The most complex of the three loops is the 'for' statement, and it has a second variant that is not found in 'C'. The ordinary for-loop consists of the 'for' keyword, followed by three semicolon-separated expressions enclosed within parentheses, followed by a statement or brace-enclosed block of statements. The first of the three expressions is an initialization clause; it is done before starting the loop. The second expression is used as a test, just like the expression in a while-loop. It is checked before attempting to execute the statement block, and then re-checked after each execution (if any) of the block. The third expression is an 'increment' clause; it is evaluated after an execution of the statement block and before re-evaluation of the test (2nd) expression. Normally, the increment clause will change a variable used in the test clause, in such a fashion that the test clause will eventually evaluate to false and cause the loop to finish. Note to 'C' programmers: the comma (,) operator commonly used in 'C' for-loop expressions is not valid in awk. The awk-specific variant of the for-loop is used for processing arrays. Its syntax is 'for' keyword, followed by variable_name 'in' array_name (where 'var in array' is enclosed in parentheses), followed by a statement (or block). Each valid subscript value for the array in question is successively placed--in no particular order--into the specified 'index' variable.5 while_example # strip fields from the input record until there's nothing left while (NF > 0) { $1 = "" #this will affect the value of $0 $0 = $0 #this causes $0 and NF to be re-evaluated print }5 do_while_example # This is a variation of the while_example; it gives a slightly # different display due to the order of operation. # echo input record until all fields have been stripped do { print #output $0 $1 = "" #this will affect the value of $0 $0 = $0 #this causes $0 and NF to be re-evaluated } while (NF > 0)5 for_example # echo command line arguments (won't include option switches) for ( i = 0; i < ARGC; i++ ) print ARGV[i] # display contents of builtin environment array for (itm in ENVIRON) print itm, ENVIRON[itm]4 loop-controls There are two special statements--both from 'C'--for changing the behavior of loop execution. The 'continue' statement is useful in a compound (block) statement; when executed, it effectively skips the rest of the block so that the increment-expression (only for for-loops) and loop-termination expression can be re-evaluated. The 'break' statement, when executed, effectively skips the rest of the block and also treats the test expression as if it were false (instead of actually re-evaluating it). In this case, the increment-expression of a for-loop is also skipped. 'break' is only allowed within a loop ('for', 'while', or 'do-while'). If 'continue' is used outside of a loop, it is treated like 'next' (see action-controls). Inside nested loops, both 'break' and 'continue' only apply to the innermost loop.4 action-controls There are two special statements for controlling statement execution. The 'next' statement, when executed, causes the rest of the current action and all further pattern-action rules to be skipped, so that the next input record will be immediately processed. This is useful if any early action knows that the current record will fail all the remaining patterns; skipping those rules will reduce processing time. An extended form, 'next file', is also available. It causes the remainder of the current file to be skipped, and then either the next input file will be processed, if any, or the END action will be performed. 'next file' is not available in traditional awk. The 'exit' statement causes GAWK execution to terminate. All open files are closed, and no further processing is done. The END rule, if any, is executed. 'exit' takes an optional numeric value as a argument which is used as an exit status value, so that some sort of indication of why execution has stopped can be passed on to the user's environment.4 other_statements The delete statement is used to remove an element from an array. The syntax is 'delete' keyword followed by array name, followed by index value enclosed in square brackets ([]). The return statement is used in user-defined functions. The syntax is the keyword 'return' optionally followed by a string or numeric expression. See also subtopic 'functions IO_functions' for a description of 'print', 'printf', and 'getline'.3 fields When an input record is read, it is automatically split into fields based on the current values of FS (builtin variable defining field separator expression) and RS (builtin variable defining record separator character). The default value of FS is an expression which matches one or more spaces and tabs; the default for RS is newline. If the FIELDWIDTHS variable is set to a space separated list of numbers (as in ``FIELDWIDTHS = "2 3 2"'') then the input is treated as if it had fixed-width fields of the indicated sizes and the FS value will be ignored.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -