450-453.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 218 行

HTML
218
字号
<HTML>

<HEAD>

<TITLE>Linux Unleashed, Third Edition:gawk</TITLE>

<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!--ISBN=0672313723//-->

<!--TITLE=Linux Unleashed, Third Edition//-->

<!--AUTHOR=Tim Parker//-->

<!--PUBLISHER=Macmillan Computer Publishing//-->

<!--IMPRINT=Sams//-->

<!--CHAPTER=25//-->

<!--PAGES=450-453//-->

<!--UNASSIGNED1//-->

<!--UNASSIGNED2//-->



<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="447-450.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="453-456.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>

<P><BR></P>

<P>Finally, we can impose some formatting on the output lines themselves. In an earlier example, you saw the use of &#147;\n&#148; to add a newline character. These are called <I>escape codes,</I> because the backslash is interpreted by <TT>gawk</TT> to mean something different than a backslash. Table 25.5 shows the important escape codes that <TT>gawk</TT> supports.</P>

<TABLE WIDTH="100%"><CAPTION ALIGN=LEFT><B>Table 25.5.</B> Escape codes.

<TR>

<TH COLSPAN="2"><HR>

<TR>

<TH WIDTH="25%" ALIGN="LEFT">Code

<TH WIDTH="75%" ALIGN="LEFT">Description

<TR>

<TH COLSPAN="2"><HR>

<TR>

<TD><TT>\a</TT>

<TD>Bell

<TR>

<TD><TT>\b</TT>

<TD>Backspace

<TR>

<TD><TT>\f</TT>

<TD>Formfeed

<TR>

<TD><TT>\n</TT>

<TD>Newline

<TR>

<TD><TT>\r</TT>

<TD>Carriage return

<TR>

<TD><TT>\t</TT>

<TD>Tab

<TR>

<TD><TT>\v</TT>

<TD>Vertical tab

<TR>

<TD><TT>\ooo</TT>

<TD>Octal character <TT>ooo</TT>

<TR>

<TD><TT>\xdd</TT>

<TD>Hexadecimal character <TT>dd</TT>

<TR>

<TD><TT>\c</TT>

<TD>Any character <TT>c</TT>

<TR>

<TD COLSPAN="2"><HR>

</TABLE>

<P>You can, for example, escape a quotation mark by using the sequence <TT>\&#148;</TT>, which places a quotation mark in the string without interpreting it to mean something special:</P>

<!-- CODE SNIP //-->

<PRE>

&#123;printf &#147;I said \&#148;Hello\&#148; and he said &#147;\Hello\&#148;.&#148;

</PRE>

<!-- END CODE SNIP //-->

<P>Awkward-looking, perhaps, but necessary to avoid problems. You&#146;ll see lots more escape characters used in examples later in this chapter.

</P>

<H4 ALIGN="LEFT"><A NAME="Heading9"></A><FONT COLOR="#000077">Changing Field Separators</FONT></H4>

<P>As I mentioned earlier, the default field separator is always a whitespace character (spaces or tabs). This is often not convenient, as we found with the <TT>/etc/passwd</TT> file. You can change the field separator on the <TT>gawk</TT> command line by using the <TT>-F</TT> option followed by the separator you want to use:</P>

<!-- CODE SNIP //-->

<PRE>

gawk -F&#148;:&#148; &#146;/tparker/&#123;print&#125;&#146; /etc/passwd

</PRE>

<!-- END CODE SNIP //-->

<P>This command changes the field separator to a colon and searches the <TT>etc/passwd</TT> file for the lines containing the string <TT>tparker</TT>. The new field separator is put in quotation marks to avoid any confusion. Also, the <TT>-F</TT> option (it must be a capital F) is before the first quotation mark enclosing the pattern-action pair. If it comes after, it won&#146;t be applied.</P>

<H4 ALIGN="LEFT"><A NAME="Heading10"></A><FONT COLOR="#000077">Metacharacters</FONT></H4>

<P>Earlier I mentioned that <TT>gawk</TT> is particular about its pattern-matching habits. The string <TT>cat</TT> matches anything with the three letters on the line. Sometimes you want to be more exact in the matching. If you only want to match the word &#147;cat&#148; but not &#147;concatenate,&#148; put spaces on each side of the pattern:</P>

<!-- CODE SNIP //-->

<PRE>

/ cat / &#123;print&#125;

</PRE>

<!-- END CODE SNIP //-->

<P>What about matching different cases? That&#146;s where the <TT>or</TT> instruction, represented by a vertical bar, comes in.</P>

<!-- CODE SNIP //-->

<PRE>

/ cat | CAT / &#123;print&#125;

</PRE>

<!-- END CODE SNIP //-->

<P>The preceding pattern will match &#147;cat&#148; or &#147;CAT&#148; on a line. However, what about &#147;Cat&#148;? That&#146;s where we also need to specify options within a pattern. With <TT>gawk</TT>, we use square brackets for this. To match any combination of &#147;cat&#148; in upper- or lowercase, write the pattern like this:</P>

<!-- CODE SNIP //-->

<PRE>

/ [Cc][Aa][Tt] / &#123;print&#125;

</PRE>

<!-- END CODE SNIP //-->

<P>This can get pretty awkward, but it&#146;s seldom necessary. To match just &#147;Cat&#148; and &#147;cat,&#148; for example, use the following pattern:

</P>

<!-- CODE SNIP //-->

<PRE>

/ [Cc]at / &#123;print&#125;

</PRE>

<!-- END CODE SNIP //-->

<P>A useful matching operator is the tilde (<TT>~</TT>). This is used when you want to look for a match in a particular field in a record. Consider the following example:</P>

<!-- CODE SNIP //-->

<PRE>

&#36;5 ~ /tparker/

</PRE>

<!-- END CODE SNIP //-->

<P>This pattern matches any records where the fifth field is <TT>tparker</TT>. It is similar to the <TT>==</TT> operator. The matching operator can be negated, so</P>

<!-- CODE SNIP //-->

<PRE>

&#36;5 !~ /tparker/

</PRE>

<!-- END CODE SNIP //-->

<P>This pattern finds any record where the fifth field is not equal to <TT>tparker</TT>.</P>

<P>A few characters (called <I>metacharacters</I>) have special meaning to <TT>gawk</TT>. Many of these metacharacters are familiar to shell users because they are carried over from UNIX shells. The metacharacters shown in Table 25.6 can be used in <TT>gawk</TT> patterns.</P>

<TABLE WIDTH="100%"><CAPTION ALIGN=LEFT><B>Table 25.6.</B> Metacharacters.

<TR>

<TH COLSPAN="4"><HR>

<TR>

<TH WIDTH="20%" ALIGN="LEFT">Metacharacter

<TH WIDTH="25%" ALIGN="LEFT">Meaning

<TH WIDTH="20%" ALIGN="LEFT">Example

<TH WIDTH="35%" ALIGN="LEFT">Meaning of Example

<TR>

<TH COLSPAN="4"><HR>

<TR>

<TD VALIGN="TOP"><TT>~</TT>

<TD>The beginning of the field

<TD VALIGN="TOP"><TT>&#36;3 ~ /^b/</TT>

<TD>Matches if the third field starts with b

<TR>

<TD VALIGN="TOP"><TT>&#36;</TT>

<TD VALIGN="TOP">The end of the field

<TD VALIGN="TOP"><TT>&#36;3 ~ /b&#36;/</TT>

<TD>Matches if the third field ends with b

<TR>

<TD VALIGN="TOP"><TT>.</TT>

<TD VALIGN="TOP">Matches any single character

<TD VALIGN="TOP"><TT>&#36;3 ~ /i.m/</TT>

<TD>Matches any record that has a third field value of i, another character, and then m

<TR>

<TD><TT>|</TT>

<TD>Or.

<TD><TT>/cat|CAT/</TT>

<TD>Matches cat or CAT

<TR>

<TD VALIGN="TOP"><TT>*</TT>

<TD>Zero or more repetitions of a character

<TD VALIGN="TOP"><TT>/UNI*X/</TT>

<TD VALIGN="TOP">Matches UNX, UNIX, UNIIX, UNIIIX, and so on

<TR>

<TD VALIGN="TOP"><TT>&#43;</TT>

<TD>One or more repetitions of a character

<TD VALIGN="TOP"><TT>/UNI&#43;X/</TT>

<TD>Matches UNIX, UNIIX, and so on, but not UNX

<TR>

<TD VALIGN="TOP"><TT>\&#123;a,b\&#125;</TT>

<TD>The number of repetitions between a and b (both integers)

<TD VALIGN="TOP"><TT>/UNI\&#123;1,3\&#125;X</TT>

<TD VALIGN="TOP">Matches only UNIX, UNIIX, and UNIIIX

<TR>

<TD VALIGN="TOP"><TT>?</TT>

<TD>Zero or one repetition of a string

<TD VALIGN="TOP"><TT>/UNI?X/</TT>

<TD VALIGN="TOP">Matches UNX and UNIX only

<TR>

<TD VALIGN="TOP"><TT>[]</TT>

<TD>Range of characters

<TD VALIGN="TOP">/I[BDG]M/

<TD>Matches IBM, IDM, and IGM

<TR>

<TD VALIGN="TOP"><TT>[^]</TT>

<TD VALIGN="TOP">Not in the set

<TD VALIGN="TOP"><TT>/I[^DE]M/</TT>

<TD>Matches all three character sets starting with I and ending in M, except IDM and IEM

<TR>

<TD COLSPAN="4"><HR>

</TABLE>

<P>Some of these metacharacters are used frequently. You will see some examples later in this chapter.

</P><P><BR></P>

<CENTER>

<TABLE BORDER>

<TR>

<TD><A HREF="447-450.html">Previous</A></TD>

<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>

<TD><A HREF="453-456.html">Next</A></TD>

</TR>

</TABLE>

</CENTER>





</td>
</tr>
</table>

<!-- begin footer information -->





</body></html>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?