📄 gawk.texi
字号:
awk -f hello@end example@noindentSelf-contained @code{awk} scripts are useful when you want to write aprogram which users can invoke without knowing that the program iswritten in @code{awk}.@cindex shell scripts@cindex scripts, shellIf your system does not support the @samp{#!} mechanism, you can get asimilar effect using a regular shell script. It would look somethinglike this:@example: The colon makes sure this script is executed by the Bourne shell.awk '@var{program}' "$@@"@end exampleUsing this technique, it is @emph{vital} to enclose the @var{program} insingle quotes to protect it from interpretation by the shell. If youomit the quotes, only a shell wizard can predict the results.The @samp{"$@@"} causes the shell to forward all the command linearguments to the @code{awk} program, without interpretation. The firstline, which starts with a colon, is used so that this shell script willwork even if invoked by a user who uses the C shell.@c Someday: (See @cite{The Bourne Again Shell}, by ??.)@node Comments, Statements/Lines, Running gawk, Getting Started@section Comments in @code{awk} Programs@cindex @samp{#}@cindex comments@cindex use of comments@cindex documenting @code{awk} programs@cindex programs, documentingA @dfn{comment} is some text that is included in a program for the sakeof human readers, and that is not really part of the program. Commentscan explain what the program does, and how it works. Nearly allprogramming languages have provisions for comments, because programs aretypically hard to understand without their extra help.In the @code{awk} language, a comment starts with the sharp signcharacter, @samp{#}, and continues to the end of the line. The@code{awk} language ignores the rest of a line following a sharp sign.For example, we could have put the following into @file{th-prog}:@refill@smallexample# This program finds records containing the pattern @samp{th}. This is how# you continue comments on additional lines./th/@end smallexampleYou can put comment lines into keyboard-composed throw-away @code{awk}programs also, but this usually isn't very useful; the purpose of acomment is to help you or another person understand the program ata later time.@refill@node Statements/Lines, When, Comments, Getting Started@section @code{awk} Statements versus LinesMost often, each line in an @code{awk} program is a separate statement orseparate rule, like this:@exampleawk '/12/ @{ print $0 @} /21/ @{ print $0 @}' BBS-list inventory-shipped@end exampleBut sometimes statements can be more than one line, and lines cancontain several statements. You can split a statement into multiplelines by inserting a newline after any of the following:@refill@example, @{ ? : || && do else@end example@noindentA newline at any other point is considered the end of the statement.(Splitting lines after @samp{?} and @samp{:} is a minor @code{gawk}extension. The @samp{?} and @samp{:} referred to here is the three operand conditional expression described in@ref{Conditional Exp, ,Conditional Expressions}.)@refill@cindex backslash continuation@cindex continuation of linesIf you would like to split a single statement into two lines at a pointwhere a newline would terminate it, you can @dfn{continue} it by ending thefirst line with a backslash character, @samp{\}. This is allowedabsolutely anywhere in the statement, even in the middle of a string orregular expression. For example:@exampleawk '/This program is too long, so continue it\ on the next line/ @{ print $1 @}'@end example@noindentWe have generally not used backslash continuation in the sample programs inthis manual. Since in @code{gawk} there is no limit on the length of a line,it is never strictly necessary; it just makes programs prettier. We havepreferred to make them even more pretty by keeping the statements short.Backslash continuation is most useful when your @code{awk} program is in aseparate source file, instead of typed in on the command line. You shouldalso note that many @code{awk} implementations are more picky about whereyou may use backslash continuation. For maximal portability of your @code{awk}programs, it is best not to split your lines in the middle of a regularexpression or a string.@refill@strong{Warning: backslash continuation does not work as described abovewith the C shell.} Continuation with backslash works for @code{awk}programs in files, and also for one-shot programs @emph{provided} youare using a @sc{posix}-compliant shell, such as the Bourne shell or theBourne-again shell. But the C shell used on Berkeley Unix behavesdifferently! There, you must use two backslashes in a row, followed bya newline.@refill@cindex multiple statements on one lineWhen @code{awk} statements within one rule are short, you might want to putmore than one of them on a line. You do this by separating the statementswith a semicolon, @samp{;}.This also applies to the rules themselves.Thus, the previous program could have been written:@refill@example/12/ @{ print $0 @} ; /21/ @{ print $0 @}@end example@noindent@strong{Note:} the requirement that rules on the same line must beseparated with a semicolon is a recent change in the @code{awk}language; it was done for consistency with the treatment of statementswithin an action.@node When, , Statements/Lines, Getting Started@section When to Use @code{awk}@cindex when to use @code{awk}@cindex applications of @code{awk}You might wonder how @code{awk} might be useful for you. Using additionalutility programs, more advanced patterns, field separators, arithmeticstatements, and other selection criteria, you can produce much morecomplex output. The @code{awk} language is very useful for producingreports from large amounts of raw data, such as summarizing informationfrom the output of other utility programs like @code{ls}. (@xref{More Complex, ,A More Complex Example}.)Programs written with @code{awk} are usually much smaller than they wouldbe in other languages. This makes @code{awk} programs easy to compose anduse. Often @code{awk} programs can be quickly composed at your terminal,used once, and thrown away. Since @code{awk} programs are interpreted, youcan avoid the usually lengthy edit-compile-test-debug cycle of softwaredevelopment.Complex programs have been written in @code{awk}, including a completeretargetable assembler for 8-bit microprocessors (@pxref{Glossary}, formore information) and a microcode assembler for a special purpose Prologcomputer. However, @code{awk}'s capabilities are strained by tasks ofsuch complexity.If you find yourself writing @code{awk} scripts of more than, say, a fewhundred lines, you might consider using a different programminglanguage. Emacs Lisp is a good choice if you need sophisticated stringor pattern matching capabilities. The shell is also good at string andpattern matching; in addition, it allows powerful use of the systemutilities. More conventional languages, such as C, C++, and Lisp, offerbetter facilities for system programming and for managing the complexityof large programs. Programs in these languages may require more linesof source code than the equivalent @code{awk} programs, but they areeasier to maintain and usually run more efficiently.@refill@node Reading Files, Printing, Getting Started, Top@chapter Reading Input Files@cindex reading files@cindex input@cindex standard input@vindex FILENAMEIn the typical @code{awk} program, all input is read either from thestandard input (by default the keyboard, but often a pipe from anothercommand) or from files whose names you specify on the @code{awk} commandline. If you specify input files, @code{awk} reads them in order, readingall the data from one before going on to the next. The name of the currentinput file can be found in the built-in variable @code{FILENAME}(@pxref{Built-in Variables}).@refillThe input is read in units called records, and processed by therules one record at a time. By default, each record is one line. Eachrecord is split automatically into fields, to make it moreconvenient for a rule to work on its parts.On rare occasions you will need to use the @code{getline} command,which can do explicit input from any number of files(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill@menu* Records:: Controlling how data is split into records.* Fields:: An introduction to fields.* Non-Constant Fields:: Non-constant Field Numbers.* Changing Fields:: Changing the Contents of a Field.* Field Separators:: The field separator and how to change it.* Constant Size:: Reading constant width data.* Multiple Line:: Reading multi-line records.* Getline:: Reading files under explicit program control using the @code{getline} function.* Close Input:: Closing an input file (so you can read from the beginning once more).@end menu@node Records, Fields, Reading Files, Reading Files@section How Input is Split into Records@cindex record separatorThe @code{awk} language divides its input into records and fields.Records are separated by a character called the @dfn{record separator}.By default, the record separator is the newline character, defininga record to be a single line of text.@refill@iftex@cindex changing the record separator@end iftex@vindex RSSometimes you may want to use a different character to separate yourrecords. You can use a different character by changing the built-invariable @code{RS}. The value of @code{RS} is a string that says howto separate records; the default value is @code{"\n"}, the string containingjust a newline character. This is why records are, by default, single lines.@code{RS} can have any string as its value, but only the first characterof the string is used as the record separator. The other characters areignored. @code{RS} is exceptional in this regard; @code{awk} uses thefull value of all its other built-in variables.@refill@ignoreSomeday this should be true!The value of @code{RS} is not limited to a one-character string. It canbe any regular expression (@pxref{Regexp, ,Regular Expressions as Patterns}).In general, each recordends at the next string that matches the regular expression; the nextrecord starts at the end of the matching string. This general rule isactually at work in the usual case, where @code{RS} contains just anewline: a record ends at the beginning of the next matching string (thenext newline in the input) and the following record starts just afterthe end of this string (at the first character of the following line).The newline, since it matches @code{RS}, is not part of either record.@refill@end ignoreYou can change the value of @code{RS} in the @code{awk} program with theassignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).The new record-separator character should be enclosed in quotation marks to makea string constant. Often the right time to do this is at the beginningof execution, before any input has been processed, so that the veryfirst record will be read with the proper separator. To do this, usethe special @code{BEGIN} pattern(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). Forexample:@refill@exampleawk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list@end example@noindentchanges the value of @code{RS} to @code{"/"}, before reading any input.This is a string whose first character is a slash; as a result, recordsare separated by slashes. Then the input file is read, and the secondrule in the @code{awk} program (the action with no pattern) prints eachrecord. Since each @code{print} statement adds a newline at the end ofits output, the effect of this @code{awk} program is to copy the inputwith each slash changed to a newline.Another way to change the record separator is on the command line,using the variable-assignment feature(@pxref{Command Line, ,Invoking @code{awk}}).@refill@exampleawk '@{ print $0 @}' RS="/" BBS-list@end example@noindentThis sets @code{RS} to @samp{/} before processing @file{BBS-list}.Reaching the end of an input file terminates the current input record,even if the last character in the file is not the character in @code{RS}.@ignore@c merge the preceding paragraph and this stuff into one paragraph@c and put it in an `expert info' section.This produces correct behavior in the vast majority of cases, althoughthe following (extreme) pipeline prints a surprising @samp{1}. (Thereis one field, consisting of a newline.)@exampleecho | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'@end example@end ignoreThe empty string, @code{""} (a string of no characters), has a special meaningas the value of @code{RS}: it means that records are separated onlyby blank lines. @xref{Multiple Line, ,Multiple-Line Records}, for more details.@cindex number of records, @code{NR} or @code{FNR}@vindex NR@vindex FNRThe @code{awk} utility keeps track of the number of records that havebeen read so far from the current input file. This value is stored in abuilt-in variable called @code{FNR}. It is reset to zero when a newfile is started. Another built-in variable, @code{NR}, is the totalnumber of input records read so far from all files. It starts at zerobut is never automatically reset to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -