📄 awk.html
字号:
<ol type="a"><li><p>If <b>FS</b> is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more<blank>s.</p></li><li><p>Otherwise, if <b>FS</b> is any other character <i>c</i>, fields shall be delimited by each single occurrence of <i>c</i>.</p></li></ol></li><li><p>Otherwise, the string value of <b>FS</b> shall be considered to be an extended regular expression. Each occurrence of a sequencematching the extended regular expression shall delimit fields.</p></li></ol><p>Except for the <tt>'˜'</tt> and <tt>"!˜"</tt> operators, and in the <b>gsub</b>, <b>match</b>, <b>split</b>, and<b>sub</b> built-in functions, ERE matching shall be based on input records; that is, record separator characters (the firstcharacter of the value of the variable <b>RS</b>, <newline> by default) cannot be embedded in the expression, and noexpression shall match the record separator character. If the record separator is not <newline>, <newline>s embedded inthe expression can be matched. For the <tt>'˜'</tt> and <tt>"!˜"</tt> operators, and in those four built-in functions,ERE matching shall be based on text strings; that is, any character (including <newline> and the record separator) can beembedded in the pattern, and an appropriate pattern shall match any character. However, in all <i>awk</i> ERE matching, the use ofone or more NUL characters in the pattern, input record, or text string produces undefined results.</p><h5><a name="tag_04_06_13_05"></a>Patterns</h5><p>A <i>pattern</i> is any valid <i>expression</i>, a range specified by two expressions separated by a comma, or one of the twospecial patterns <b>BEGIN</b> or <b>END</b>.</p><h5><a name="tag_04_06_13_06"></a>Special Patterns</h5><p>The <i>awk</i> utility shall recognize two special patterns, <b>BEGIN</b> and <b>END</b>. Each <b>BEGIN</b> pattern shall bematched once and its associated action executed before the first record of input is read (except possibly by use of the<b>getline</b> function-see <a href="#tag_04_06_13_14">Input/Output and General Functions</a> - in a prior <b>BEGIN</b> action) andbefore command line assignment is done. Each <b>END</b> pattern shall be matched once and its associated action executed after thelast record of input has been read. These two patterns shall have associated actions.</p><p><b>BEGIN</b> and <b>END</b> shall not combine with other patterns. Multiple <b>BEGIN</b> and <b>END</b> patterns shall beallowed. The actions associated with the <b>BEGIN</b> patterns shall be executed in the order specified in the program, as are the<b>END</b> actions. An <b>END</b> pattern can precede a <b>BEGIN</b> pattern in a program.</p><p>If an <i>awk</i> program consists of only actions with the pattern <b>BEGIN</b>, and the <b>BEGIN</b> action contains no<b>getline</b> function, <i>awk</i> shall exit without reading its input when the last statement in the last <b>BEGIN</b> action isexecuted. If an <i>awk</i> program consists of only actions with the pattern <b>END</b> or only actions with the patterns<b>BEGIN</b> and <b>END</b>, the input shall be read before the statements in the <b>END</b> actions are executed.</p><h5><a name="tag_04_06_13_07"></a>Expression Patterns</h5><p>An expression pattern shall be evaluated as if it were an expression in a Boolean context. If the result is true, the patternshall be considered to match, and the associated action (if any) shall be executed. If the result is false, the action shall not beexecuted.</p><h5><a name="tag_04_06_13_08"></a>Pattern Ranges</h5><p>A pattern range consists of two expressions separated by a comma; in this case, the action shall be performed for all recordsbetween a match of the first expression and the following match of the second expression, inclusive. At this point, the patternrange can be repeated starting at input records subsequent to the end of the matched range.</p><h5><a name="tag_04_06_13_09"></a>Actions</h5><p>An action is a sequence of statements as shown in the grammar in <a href="#tag_04_06_13_16">Grammar</a> . Any single statementcan be replaced by a statement list enclosed in braces. The application shall ensure that statements in a statement list areseparated by <newline>s or semicolons. Statements in a statement list shall be executed sequentially in the order that theyappear.</p><p>The <i>expression</i> acting as the conditional in an <b>if</b> statement shall be evaluated and if it is non-zero or non-null,the following statement shall be executed; otherwise, if <b>else</b> is present, the statement following the <b>else</b> shall beexecuted.</p><p>The <b>if</b>, <b>while</b>, <b>do</b>... <b>while</b>, <b>for</b>, <b>break</b>, and <b>continue</b> statements are based onthe ISO C standard (see <a href="xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from the ISO C Standard</i></a> ), exceptthat the Boolean expressions shall be treated as described in <a href="#tag_04_06_13_02">Expressions in awk</a> , and except in thecase of:</p><pre><tt>for (</tt><i>variable</i> <tt>in</tt> <i>array</i><tt>)</tt></pre><p>which shall iterate, assigning each <i>index</i> of <i>array</i> to <i>variable</i> in an unspecified order. The results ofadding new elements to <i>array</i> within such a <b>for</b> loop are undefined. If a <b>break</b> or <b>continue</b> statementoccurs outside of a loop, the behavior is undefined.</p><p>The <b>delete</b> statement shall remove an individual array element. Thus, the following code deletes an entire array:</p><pre><tt>for (index in array) delete array[index]</tt></pre><p>The <b>next</b> statement shall cause all further processing of the current input record to be abandoned. The behavior isundefined if a <b>next</b> statement appears or is invoked in a <b>BEGIN</b> or <b>END</b> action.</p><p>The <b>exit</b> statement shall invoke all <b>END</b> actions in the order in which they occur in the program source and thenterminate the program without reading further input. An <b>exit</b> statement inside an <b>END</b> action shall terminate theprogram without further execution of <b>END</b> actions. If an expression is specified in an <b>exit</b> statement, its numericvalue shall be the exit status of <i>awk</i>, unless subsequent errors are encountered or a subsequent <b>exit</b> statement withan expression is executed.</p><h5><a name="tag_04_06_13_10"></a>Output Statements</h5><p>Both <b>print</b> and <b>printf</b> statements shall write to standard output by default. The output shall be written to thelocation specified by <i>output_redirection</i> if one is supplied, as follows:</p><pre><tt>></tt> <i>expression</i><tt>>></tt> <i>expression</i><tt>|</tt> <i>expression</i></pre><p>In all cases, the <i>expression</i> shall be evaluated to produce a string that is used as a pathname into which to write (for<tt>'>'</tt> or <tt>">>"</tt> ) or as a command to be executed (for <tt>'|'</tt> ). Using the first two forms, if the fileof that name is not currently open, it shall be opened, creating it if necessary and using the first form, truncating the file. Theoutput then shall be appended to the file. As long as the file remains open, subsequent calls in which <i>expression</i> evaluatesto the same string value shall simply append output to the file. The file remains open until the <b>close</b> function (see <ahref="#tag_04_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same stringvalue.</p><p>The third form shall write output onto a stream piped to the input of a command. The stream shall be created if no stream iscurrently open with the value of <i>expression</i> as its command name. The stream created shall be equivalent to one created by acall to the <a href="../functions/popen.html"><i>popen</i>()</a> function defined in the System Interfaces volume ofIEEE Std 1003.1-2001 with the value of <i>expression</i> as the <i>command</i> argument and a value of <i>w</i> as the<i>mode</i> argument. As long as the stream remains open, subsequent calls in which <i>expression</i> evaluates to the same stringvalue shall write output to the existing stream. The stream shall remain open until the <b>close</b> function (see <a href="#tag_04_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same string value.At that time, the stream shall be closed as if by a call to the <a href="../functions/pclose.html"><i>pclose</i>()</a> functiondefined in the System Interfaces volume of IEEE Std 1003.1-2001.</p><p>As described in detail by the grammar in <a href="#tag_04_06_13_16">Grammar</a> , these output statements shall take acomma-separated list of <i>expression</i>s referred to in the grammar by the non-terminal symbols <b>expr_list</b>,<b>print_expr_list</b>, or <b>print_expr_list_opt</b>. This list is referred to here as the <i>expression list</i>, and each memberis referred to as an <i>expression argument</i>.</p><p>The <b>print</b> statement shall write the value of each expression argument onto the indicated output stream separated by thecurrent output field separator (see variable <b>OFS</b> above), and terminated by the output record separator (see variable<b>ORS</b> above). All expression arguments shall be taken as strings, being converted if necessary; this conversion shall be asdescribed in <a href="#tag_04_06_13_02">Expressions in awk</a> , with the exception that the <b>printf</b> format in <b>OFMT</b>shall be used instead of the value in <b>CONVFMT</b>. An empty expression list shall stand for the whole input record ($0).</p><p>The <b>printf</b> statement shall produce output based on a notation similar to the File Format Notation used to describe fileformats in this volume of IEEE Std 1003.1-2001 (see the Base Definitions volume of IEEE Std 1003.1-2001, <ahref="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a>). Output shall be produced as specified with the first<i>expression</i> argument as the string <i>format</i> and subsequent <i>expression</i> arguments as the strings <i>arg1</i> to<i>argn</i>, inclusive, with the following exceptions:</p><ol><li><p>The <i>format</i> shall be an actual character string rather than a graphical representation. Therefore, it cannot contain emptycharacter positions. The <space> in the <i>format</i> string, in any context other than a <i>flag</i> of a conversionspecification, shall be treated as an ordinary character that is copied to the output.</p></li><li><p>If the character set contains a <tt>'<img src="../images/delta.gif" border="0">'</tt> character and that character appears inthe <i>format</i> string, it shall be treated as an ordinary character that is copied to the output.</p></li><li><p>The <i>escape sequences</i> beginning with a backslash character shall be treated as sequences of ordinary characters that arecopied to the output. Note that these same sequences shall be interpreted lexically by <i>awk</i> when they appear in literalstrings, but they shall not be treated specially by the <b>printf</b> statement.</p></li><li><p>A <i>field width</i> or <i>precision</i> can be specified as the <tt>'*'</tt> character instead of a digit string. In this casethe next argument from the expression list shall be fetched and its numeric value taken as the field width or precision.</p></li><li><p>The implementation shall not precede or follow output from the <tt>d</tt> or <tt>u</tt> conversion specifier characters with<blank>s not specified by the <i>format</i> string.</p></li><li><p>The implementation shall not precede output from the <tt>o</tt> conversion specifier character with leading zeros not specifiedby the <i>format</i> string.</p></li><li><p>For the <tt>c</tt> conversion specifier character: if the argument has a numeric value, the character whose encoding is thatvalue shall be output. If the value is zero or is not the encoding of any character in the character set, the behavior isundefined. If the argument does not have a numeric value, the first character of the string value shall be output; if the stringdoes not contain any characters, the behavior is undefined.</p></li><li><p>For each conversion specification that consumes an argument, the next expression argument shall be evaluated. With the exceptionof the <tt>c</tt> conversion specifier character, the value shall be converted (according to the rules specified in <a href="#tag_04_06_13_02">Expressions in awk</a> ) to the appropriate type for the conversion specification.</p></li><li><p>If there are insufficient expression arguments to satisfy all the conversion specifications in the <i>format</i> string, thebehavior is undefined.</p></li><li><p>If any character sequence in the <i>format</i> string begins with a <tt>'%'</tt> character, but does not form a valid conversionspecification, the behavior is unspecified.</p></li></ol><p>Both <b>print</b> and <b>printf</b> can output at least {LINE_MAX} bytes.</p><h5><a name="tag_04_06_13_11"></a>Functions</h5><p>The <i>awk</i> language has a variety of built-in functions: arithmetic, string, input/output, and general.</p><h5><a name="tag_04_06_13_12"></a>Arithmetic Functions</h5><p>The arithmetic functions, except for <b>int</b>, shall be based on the ISO C standard (see <a href="xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from the ISO C Standard</i></a> ). The behavior is undefined in cases where theISO C standard specifies that an error be returned or that the behavior is undefined. Although the grammar (see <a href="#tag_04_06_13_16">Grammar</a> ) permits built-in functions to appear with no arguments or parentheses, unless the argument orparentheses are indicated as optional in the following list (by displaying them within the <tt>"[]"</tt> brackets), such use isundefined.</p><dl compact><dt><b>atan2</b>(<i>y</i>,<i>x</i>)</dt><dd>Return arctangent of <i>y</i>/<i>x</i> in radians in the range [-<img src="../images/pi.gif" border="0">,<img src="../images/pi.gif" border="0">].</dd><dt><b>cos</b>(<i>x</i>)</dt><dd>Return cosine of <i>x</i>, where <i>x</i> is in radians.</dd><dt><b>sin</b>(<i>x</i>)</dt><dd>Return sine of <i>x</i>, where <i>x</i> is in radians.</dd><dt><b>exp</b>(<i>x</i>)</dt><dd>Return the exponential function of <i>x</i>.</dd><dt><b>log</b>(<i>x</i>)</dt><dd>Return the natural logarithm of <i>x</i>.</dd><dt><b>sqrt</b>(<i>x</i>)</dt><dd>Return the square root of <i>x</i>.</dd><dt><b>int</b>(<i>x</i>)</dt><dd>Return the argument truncated to an integer. Truncation shall be toward 0 when <i>x</i>>0.</dd><dt><b>rand</b>()</dt><dd>Return a random number <i>n</i>, such that 0<=<i>n</i><1.</dd><dt><b>srand</b>(<b>[</b><i>expr</i><b>]</b>)</dt><dd>Set the seed value for <i>rand</i> to <i>expr</i> or use the time of day if <i>expr</i> is omitted. The previous seed valueshall be returned.</dd></dl><h5><a name="tag_04_06_13_13"></a>String Functions</h5><p>The string functions in the following list shall be supported. Although the grammar (see <a href="#tag_04_06_13_16">Grammar</a>) permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated asoptional in the following list (by displaying them within the <tt>"[]"</tt> brackets), such use is undefined.</p><dl compact><dt><b>gsub</b>(<i>ere</i>, <i>repl</i><b>[</b>, <i>in</i><b>]</b>)</dt><dd>Behave like <b>sub</b> (see below), except that it shall replace all occurrences of the regular expression (like the <a href="../utilities/ed.html"><i>ed</i></a> utility global substitute) in $0 or in the <i>in</i> argument, when specified.</dd><dt><b>index</b>(<i>s</i>, <i>t</i>)</dt><dd>Return the position, in characters, numbering from 1, in string <i>s</i> where string <i>t</i> first occurs, or zero if it doesnot occur at all.</dd><dt><b>length[</b>(<b>[</b><i>s</i><b>]</b>)<b>]</b></dt><dd>Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.</dd><dt><b>match</b>(<i>s</i>, <i>ere</i>)</dt><dd>Return the position, in characters, numbering from 1, in string <i>s</i> where the extended regular expression <i>ere</i>occurs, or zero if it does not occur at all. RSTART shall be set to the starting position (which is the same as the returnedvalue), zero if no match is found; RLENGTH shall be set to the length of the matched string, -1 if no match is found.</dd><dt><b>split</b>(<i>s</i>, <i>a</i><b>[</b>, <i>fs </i> <b>]</b>)</dt><dd>Split the string <i>s</i> into array elements <i>a</i>[1], <i>a</i>[2], ..., <i>a</i>[<i>n</i>], and return <i>n</i>. All elementsof the array shall be deleted before the split is performed. The separation shall be done with the ERE <i>fs</i> or with the fieldseparator <b>FS</b> if <i>fs</i> is not given. Each array element shall have a string value when created and, if appropriate, thearray element shall be considered a numeric string (see <a href="#tag_04_06_13_02">Expressions in awk</a> ). The effect of a nullstring as the value of <i>fs</i> is unspecified.</dd><dt><b>sprintf</b>(<i>fmt</i>, <i>expr</i>, <i>expr</i>, ...)</dt><dd>Format the expressions according to the <b>printf</b> format given by <i>fmt</i> and return the resulting string.</dd><dt><b>sub(</b><i>ere</i>, <i>repl</i><b>[</b>, <i>in </i> <b>]</b>)</dt><dd>Substitute the string <i>repl</i> in place of the first instance of the extended regular expression <i>ERE</i> in string <i>in</i>and return the number of substitutions. An ampersand ( <tt>'&'</tt> ) appearing in the string <i>repl</i> shall be replaced bythe string from <i>in</i> that matches the ERE. An ampersand preceded with a backslash ( <tt>'\'</tt> ) shall be interpreted as theliteral ampersand character. An occurrence of two consecutive backslashes shall be interpreted as just a single literal backslashcharacter. Any other occurrence of a backslash (for example, preceding any other character) shall be treated as a literal backslashcharacter. Note that if <i>repl</i> is a string literal (the lexical token <b>STRING</b>; see <a href="#tag_04_06_13_16">Grammar</a> ), the handling of the ampersand character occurs after any lexical processing, including anylexical backslash escape sequence processing. If <i>in</i> is specified and it is not an lvalue (see <a href="#tag_04_06_13_02">Expressions in awk</a> ), the behavior is undefined. If <i>in</i> is omitted, <i>awk</i> shall use the currentrecord ($0) in its place.</dd><dt><b>substr</b>(<i>s</i>, <i>m</i><b>[</b>, <i>n </i> <b>]</b>)</dt><dd>Return the at most <i>n</i>-character substring of <i>s</i> that begins at position <i>m</i>, numbering from 1. If <i>n</i> isomitted, or if <i>n</i> specifies more characters than are left in the string, the length of the substring shall be limited by thelength of the string <i>s</i>.</dd><dt><b>tolower</b>(<i>s</i>)</dt><dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is an uppercase letter specified to have a<b>tolower</b> mapping by the <i>LC_CTYPE</i> category of the current locale shall be replaced in the returned string by thelowercase letter specified by the mapping. Other characters in <i>s</i> shall be unchanged in the returned string.</dd><dt><b>toupper</b>(<i>s</i>)</dt><dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is a lowercase letter specified to have a<b>toupper</b> mapping by the <i>LC_CTYPE</i> category of the current locale is replaced in the returned string by
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -