📄 awk
字号:
.IT elogarithm,exponential,and integer part of their respective arguments..PPThe name of one of these built-in functions,without argument or parentheses,stands for the value of the function on thewhole record.The program.P1length < 10 || length > 20.P2prints lines whose lengthis less than 10 or greaterthan 20..PPThe function.UL substr(s,\ m,\ n)produces the substring of.UL sthat begins at position.UL m(origin 1)and is at most.UL ncharacters long.If.UL nis omitted, the substring goes to the end of.UL s .The function.UL index(s1,\ s2)returns the position where the string.UL s2occurs in.UL s1 ,or zero if it does not..PPThe function.UL sprintf(f,\ e1,\ e2,\ ...)produces the value of the expressions.UL e1 ,.UL e2 ,etc.,in the.UL printfformat specified by.UL f .Thus, for example,.P1x = sprintf("%8.2f %10ld", $1, $2).P2sets.UL xto the string produced by formattingthe values of.UL $1and.UL $2 ..NH 2Variables, Expressions, and Assignments.PP.IT Awkvariables take on numeric (floating point)or string values according to context.For example, in.P1x = 1.P2.UL xis clearly a number, while in.P1x = "smith".P2it is clearly a string.Strings are converted to numbers andvice versa whenever context demands it.For instance,.P1x = "3" + "4".P2assigns 7 to.UL x .Strings which cannot be interpretedas numbers in a numerical contextwill generally have numeric value zero,but it is unwise to count on this behavior..PPBy default, variables (other than built-ins) are initialized to the null string,which has numerical value zero;this eliminates the need for most.UL BEGINsections.For example, the sums of the first two fields can be computed by.P1 { s1 += $1; s2 += $2 }END { print s1, s2 }.P2.PPArithmetic is done internally in floating point.The arithmetic operators are.UL + ,.UL \- ,.UL \(** ,.UL / ,and.UL %(mod).The C increment.UL ++anddecrement.UL \-\-operators are also available,and so are the assignment operators.UL += ,.UL \-= ,.UL *= ,.UL /= ,and.UL %= .These operators may all be used in expressions..NH 2Field Variables.PPFields in.IT awkshare essentially all of the properties of variables _they may be used in arithmetic or string operations,and may be assigned to.Thus one canreplace the first field with a sequence number like this:.P1{ $1 = NR; print }.P2oraccumulate two fields into a third, like this:.P1{ $1 = $2 + $3; print $0 }.P2or assign a string to a field:.P1{ if ($3 > 1000) $3 = "too big" print}.P2which replaces the third field by ``too big'' when it is,and in any case prints the record..PPField references may be numerical expressions,as in.P1{ print $i, $(i+1), $(i+n) }.P2Whether a field is deemed numeric or string depends on context;in ambiguous cases like.P1if ($1 == $2) ....P2fields are treated as strings..PPEach input line is split into fields automatically as necessary.It is also possible to split any variable or stringinto fields:.P1n = split(s, array, sep).P2splits thethe string.UL sinto.UL array[1] ,\&...,.UL array[n] .The number of elements found is returned.If the.UL separgument is provided, it is used as the field separator;otherwise.UL FSis used as the separator..NH 2String Concatenation.PPStrings may be concatenated.For example.P1length($1 $2 $3).P2returns the length of the first three fields.Or in a.UL printstatement,.P1print $1 " is " $2.P2printsthe two fields separated by `` is ''.Variables and numeric expressions may also appear in concatenations..NH 2Arrays.PPArray elements are not declared;they spring into existence by being mentioned.Subscripts may have.ulanynon-nullvalue, including non-numeric strings.As an example of a conventional numeric subscript,the statement.P1x[NR] = $0.P2assigns the current input record tothe.UL NR -thelement of the array.UL x .In fact, it is possible in principle (though perhaps slow)to process the entire input in a random order with the.IT awkprogram.P1 { x[NR] = $0 }END { \fI... program ...\fP }.P2The first action merely records each input line inthe array.UL x ..PPArray elements may be named by non-numeric values,which gives.IT awka capability rather like the associative memory ofSnobol tables.Suppose the input contains fields with values like.UL apple ,.UL orange ,etc.Then the program.P1/apple/ { x["apple"]++ }/orange/ { x["orange"]++ }END { print x["apple"], x["orange"] }.P2increments counts for the named array elements,and prints them at the end of the input..NH 2Flow-of-Control Statements.PP.IT Awkprovides the basic flow-of-control statements.UL if-else ,.UL while ,.UL for ,and statement grouping with braces, as in C.We showed the.UL ifstatement in section 3.3 without describing it.The condition in parentheses is evaluated;if it is true, the statement following the.UL ifis done.The.UL elsepart is optional..PPThe.UL whilestatement is exactly like that of C.For example, to print all input fields one per line,.P1i = 1while (i <= NF) { print $i ++i}.P2.PPThe.UL forstatement is also exactly that of C:.P1for (i = 1; i <= NF; i++) print $i.P2does the same job as the.UL whilestatement above..PPThere is an alternate form of the.UL forstatement which is suited for accessing theelements of an associative array:.P1for (i in array) \fIstatement\f3.P2does.ulstatementwith .UL iset in turn to each element of.UL array .The elements are accessed in an apparently random order.Chaos will ensue if .UL iis altered, or if any new elements areaccessed during the loop..PPThe expression in the condition part of an.UL if ,.UL whileor.UL forcan include relational operators like.UL < ,.UL <= ,.UL > ,.UL >= ,.UL ==(``is equal to''),and.UL !=(``not equal to'');regular expression matches with the match operators.UL ~and.UL !~ ;the logical operators.UL \||\|| ,.UL && ,and.UL ! ;and of course parentheses for grouping..PPThe.UL breakstatement causes an immediate exitfrom an enclosing.UL whileor.UL for ;the.UL continuestatementcauses the next iteration to begin..PPThe statement.UL nextcauses.IT awkto skip immediately tothe next record and begin scanning the patterns from the top.The statement.UL exitcauses the program to behave as if the end of the inputhad occurred..PPComments may be placed in.IT awkprograms:they begin with the character.UL #and end with the end of the line,as in.P1print x, y # this is a comment.P2.NHDesign.PPThe.UXsystemalready provides several programs thatoperate by passing input through aselection mechanism..IT Grep ,the first and simplest, merely prints all lines whichmatch a single specified pattern..IT Egrepprovides more general patterns, i.e., regular expressionsin full generality;.IT fgrepsearches for a set of keywords with a particularly fast algorithm..IT Sed\|.[unix programm manual.]provides most of the editing facilities ofthe editor.IT ed ,applied to a stream of input.None of these programs providesnumeric capabilities,logical relations,or variables..PP.IT Lex\|.[lesk lexical analyzer cstr.]provides general regular expression recognition capabilities,and, by serving as a C program generator,is essentially open-ended in its capabilities.The use of.IT lex ,however, requires a knowledge of C programming,and a.IT lexprogram must be compiled and loaded before use,which discourages its use for one-shot applications..PP.IT Awkis an attemptto fill in another part of the matrix of possibilities.Itprovides general regular expression capabilitiesand an implicit input/output loop.But it also provides convenient numeric processing,variables,more general selection,and control flow in the actions.Itdoes not require compilation or a knowledge of C.Finally,.IT awkprovidesa convenient way to access fields within lines;it is unique in this respect..PP.IT Awkalso tries to integrate strings and numberscompletely,by treating all quantities as both string and numeric,deciding which representation is appropriateas late as possible.In most cases the user can simply ignore the differences..PPMost of the effort in developing.I awkwent into deciding what.I awkshould or should not do(for instance, it doesn't do string substitution)and what the syntax should be(no explicit operator for concatenation)ratherthan on writing or debugging the code.We have triedto make the syntax powerfulbut easy to use and well adaptedto scanning files.For example,the absence of declarations and implicit initializations,while probably a bad idea for a general-purpose programming language,is desirable in a languagethat is meant to be used for tiny programsthat may even be composed on the command line..PPIn practice,.IT awkusage seems to fall into two broad categories.One is what might be called ``report generation'' \(emprocessing an input to extract counts,sums, sub-totals, etc.This also includes the writing of trivialdata validation programs,such as verifying that a field contains only numeric informationor that certain delimiters are properly balanced.The combination of textual and numeric processing is invaluable here..PPA second area of use is as a data transformer,converting data from the form produced by one programinto that expected by another.The simplest examples merely select fields, perhaps with rearrangements..NHImplementation.PPThe actual implementation of.IT awkuses the language development tools availableon the.UC UNIXoperating system.The grammar is specified with.IT yacc ;.[yacc johnson cstr.]the lexical analysis is done by.IT lex ;the regular expression recognizers aredeterministic finite automataconstructed directly from the expressions.An.IT awkprogram is translated into a parse tree which is then directly executedby a simple interpreter..PP.IT Awkwas designed for ease of use rather than processing speed;the delayed evaluation of variable typesand the necessity to break inputinto fields makes high speed difficult to achieve in any case.Nonetheless,the program has not proven to be unworkably slow..PPTable I below shows the execution (user + system) timeon a PDP-11/70 ofthe.UC UNIXprograms.IT wc ,.IT grep ,.IT egrep ,.IT fgrep ,.IT sed ,.IT lex ,and.IT awkon the following simple tasks:.IP "\ \ 1."count the number of lines..IP "\ \ 2."print all lines containing ``doug''..IP "\ \ 3."print all lines containing ``doug'', ``ken'' or ``dmr''..IP "\ \ 4."print the third field of each line..IP "\ \ 5."print the third and second fields of each line, in that order..IP "\ \ 6."append all lines containing ``doug'', ``ken'', and ``dmr''to files ``jdoug'', ``jken'', and ``jdmr'', respectively..IP "\ \ 7."print each line prefixed by ``line-number\ :\ ''..IP "\ \ 8."sum the fourth column of a table..LPThe program.IT wcmerely counts words, lines and characters in its input;we have already mentioned the others.In all cases the input was a file containing10,000 linesas created by thecommand.IT "ls \-l" ;each line has the form.P1-rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx.P2The total length of this input is452,960 characters.Times for.IT lexdo not include compile or load..PPAs might be expected,.IT awkis not as fast as the specialized tools.IT wc ,.IT sed ,or the programs in the.IT grepfamily,butis faster than the more general tool.IT lex .In all cases, the tasks wereabout as easy to express as.IT awkprogramsas programs in these other languages;tasks involving fields wereconsiderably easier to express as.IT awkprograms.Some of the test programs are shown in.IT awk ,.IT sedand.IT lex ..[$LIST$.].1C.TScenter;c c c c c c c c cc c c c c c c c cc|n|n|n|n|n|n|n|n|. TaskProgram 1 2 3 4 5 6 7 8_\fIwc\fR 8.6\fIgrep\fR 11.7 13.1\fIegrep\fR 6.2 11.5 11.6\fIfgrep\fR 7.7 13.8 16.1\fIsed\fR 10.2 11.6 15.8 29.0 30.5 16.1\fIlex\fR 65.1 150.1 144.2 67.7 70.3 104.0 81.7 92.8\fIawk\fR 15.0 25.6 29.9 33.3 38.9 46.4 71.4 31.1_.TE.sp.ce\fBTable I.\fR Execution Times of Programs. (Times are in sec.).sp 2.2C.PPThe programs for some of these jobs are shown below.The.IT lexprograms are generally too long to show..LPAWK:.LP.P11. END {print NR}.P2.P12. /doug/.P2.P13. /ken|doug|dmr/.P2.P14. {print $3}.P2.P15. {print $3, $2}.P2.P16. /ken/ {print >"jken"} /doug/ {print >"jdoug"} /dmr/ {print >"jdmr"}.P2.P17. {print NR ": " $0}.P2.P18. {sum = sum + $4} END {print sum}.P2.LPSED:.LP.P11. $=.P2.P12. /doug/p.P2.P13. /doug/p /doug/d /ken/p /ken/d /dmr/p /dmr/d.P2.P14. /[^ ]* [ ]*[^ ]* [ ]*\e([^ ]*\e) .*/s//\e1/p.P2.P15. /[^ ]* [ ]*\e([^ ]*\e) [ ]*\e([^ ]*\e) .*/s//\e2 \e1/p.P2.P16. /ken/w jken /doug/w jdoug /dmr/w jdmr.P2.LPLEX:.LP.P11. %{ int i; %} %% \en i++; . ; %% yywrap() { printf("%d\en", i); }.P2.P12. %% ^.*doug.*$ printf("%s\en", yytext); . ; \en ;.P2
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -