📄 guide to awk.htm
字号:
<HTML><HEAD><TITLE>Guide to awk</TITLE></HEAD><BODY><H3>How to get things done with awk ?</H3><P><PRE>Author: Sakari Mattila <! sam@isd.canberra.edu.au>Updated: 22-Nov-2002Updated: 30-Mar-1999Updated: 22-Jan-1996 First: 14-Sep-1994 </PRE></P><P><STRONG>awk</STRONG> is a pattern matching program. It takes two inputs: data file and command file. The data file contains text, that is lines containing words. Data file need not be usual text, any data composed of character groups (words) and lines is suitable. Default character group separator is space, but other separators may be defined on awk starting command line. More on awk starting command is at the end of this document.The command file contains pattern matching instructions, it isequal to ordinary computer program. Feel free to thing awk asan interpreter executing commands from the command file on thedata file.</P><P> <! 990430 ><B>awk</B> comes with all Unix and Linux operating systems.It is a command line utility. <B>awk</B> is also includedin several Unix-like utilities packages for MS Windows 95/98,MS Windows NT and other operating systems. Cygwin<A HREF="http://sourceware.cygnus.com/cygwin"> http://sourceware.cygnus.com/cygwin </A> in one source ofthese packages. <B>awk</B> source code in C is availablewith full Linux packages and GNU packages.</P><P><STRONG>awk</STRONG> can be used to extract parts from a large text body, format text and extract information for other programs. It isvery versatile program, especially when used as a part of apipe. It is good practice to filter all extra controlcharacters out of the awk input text file, because non-printablecontrol characters cause errors with some awk versions. In Unixsystems, tr (translate characters) program is suitable toremove offending characters. See tr manual pages or <A HREF="./sed-tricks.html">tr instructions </A> at the end of the <I> Short sed guide </I> for more information. </P><P>The commands each are usually on one line. There are three executiontypes of commands: <BR><STRONG>1. starting commands,</STRONG> first word BEGIN,which are executed only once for each input file at the beginning of the file; <BR><STRONG>2. pattern matching commands,</STRONG> each ofwhich is executed once for each line in the data file and <BR><STRONG>3. ending commands,</STRONG> first word END, which are executed only once for each input file when end of file has been reached. <BR>Pattern matching commands are executed in order from up to down likereading the program. Lines are read from data file one by one.</P><P>All commands see only that <I> single line </I> from the data file at a time and all awk program variables. The whole line is subject to pattern matching. It is also automatically loaded into special variables.Variable $0 is the whole line, $1 is first word, $2 secondword and so on. </P><P>The awk <A NAME = "pattern"> matching command </A> consists of two parts: patterns and commands. Patterns are zero or more patterns to be matched with the line from the data file. If the whole pattern is missing, it matches any line and the command isalways executed. Pattern consists of character "/", regularexpression and character "/". More on <A HREF = "#Regular"> regular expressions </A> is below, but until then, imagine the regular expression beingjust a text string. Typical pattern may look like:<PRE> /Smith /</PRE></P><P>Several patterns may be put into the pattern part of patternmatching command, separated by logical operators "!" (logicalnegation), "&&" (logical and) and "||" (logical or). Typicalcombined pattern may look like:<PRE> /Smith / || /Jones /</PRE>That would match if either text string "Smith " or "Jones "are found on the line. Space between "/" and text is significant,<PRE> /Jon/</PRE>matches "Jon", "Jones" and any word starting "Jon". Because there is not space between "/" and "J", it also matches anyword letters "Jon" within the word, like "Newton-Jones"</P><P>There are more advanced form of pattern, please see Unix awkman page or awk textbook.</P><P>The command may be missing, then the default command, printthe line, is executed. The command is always between curlybrackets "{" and "}". Brackets may be nested and may be used toextend the command over several lines. Multi line command acts as if it were one long line. Complete awk command linemay look like:<PRE> /Smith / || /Jones / { print NR, $0 }</PRE></P><P>NR is predefined variable containing line number and $0 is thecontents of the whole line. Thus this command prints linenumber and the original line. </P><P><STRONG>awk</STRONG> is <I> typeless </I>, which means, that you can put anything, number or character string into its variables and awk tries to make some sense out of that. There are several inbuilt variables: <PRE> NF number of words on this line NR number of record, ie. line number FILENAME name of the input file FS input field separator (space or tab character) RS input record (line) separator (newline)</PRE></P><P>The command may contain several instructions separated by ";".The common awk instructions are:<PRE> variable1 = variable2 or constant print (variables or constants to be printed) printf ("format string", variables to be printed) if (condition) command1 [ else command2 ] while (condition) command for (command1; (condition); command2) command3</PRE></P><P>These instructions behave like similar instructions in Clanguage. Please be careful with "=" which is assignment and"==" which is equality operator in condition parts ofinstructions. It is possible to refine pattern matching withconditional statements, even replace the whole pattern part.However, pattern matching part can handle partial wordsand regular expressions, if-statement only handles full wordsor string functions of words.More on <A HREF="#Instructions"> awk instructions </A>is near the end of this text. </P><P>There is no defined commenting mechanism in awk. Comments can beincluded using an assignment and a character strings like:<PRE> { comment = "This is a comment text" }</PRE>Some awk implementations allow comments starting with # as thefirst character on line.</P><P><STRONG>awk</STRONG> tries to execute commands as far it is somehow possible, results may be astonishing. Syntactic checking in awk is minimal. In addition to errors in awk commands, unusual input characters or unexpected data may confuse awk program. </P><P> The common awk functions are:<PRE> length(variable) substr(string, first-char, number-of-chars) int (numeric-variable) exp (numeric-variable) log (numeric-variable) sqrt (numeric-variable) </PRE>Check C programming language manuals for details. Note that the C-like part of awk is very small subset of C.</P><P><STRONG>awk</STRONG> is <I> stateless, </I> that is, it treats each new input line similar way. However, you can use variables and conditional instructions to create states. It is important to know how, because most real tasks need states.</P><P>The most practical way to create a <I> state system </I>(state machine) is to reserve one variable as the state variable. It must be initialised in the BEGIN command and then its value shallchange according to patterns matched. Because all commandlines, at least the pattern part, are always executed for each input line, state dependent commands must be guarded with if-instructions.</P><P>Some awk implementations do not allow BEGIN or END commands,but set all variables to zero or space when the execution of theprogram starts.</P><P>An awk program performs the given operation and consist of one or more awk statements. Following awk program extracts letters sent by a given machinefrom standard Unix mail file. System has two states, state 0is searching and state 1 is printing selected mail. Variable pis the state variable. Note, that the end pattern may be partof the start pattern when there are two letters from the selectedmachine immediately after each other. Thus the order ofpattern matching statements is important. If severalstatements are expected to match the same line, they must be inorder of selectivity, most selective last.</P><P>Here is the sample program consisting of five awk statements:<PRE> BEGIN { print FILENAME; p = 0 } /From / && / 199/ { p = 0 } /From / && /cc.adfa.oz.au/ { p = 1 } { if (p > 0) print $0 } END { print "End ..." }</PRE> </P><P>The first line is executed only once when the text file isopened. Second line is conditional return to search state.Third line is conditional entry to printing state. Fourth lineis the guarded print statement, withou pattern part, It is active only when the program is in the printing state. Fifth line is executed only once at the end of text file. In practice the end condition above is notselective enough. It is probable to find "From " and "199"text fragments in the letter body, which would end theprinting of the letter. The main problem in this case is nonexisting end of letter mark, the end of letter is only known whenthe program finds the begining of the next letter.</P><P><A NAME = "Regular"> Regular expressions are a way to define conditional character strings </A>. One regular expression may equal, that ismatch, several different character strings. <STRONG><I>Regular expression </I></STRONG> is a character string,which contains ordinary characters and metacharacters denotingone more real characters. <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -