⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch29_06.htm

📁 the unix power tools
💻 HTM
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 29] 29.6 Counting Lines, Words, and Characters: wc </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:45:06Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch29_01.htm"TITLE="29. Spell Checking, Word Counting, and Textual Analysis"><LINKREL="prev"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "><LINKREL="next"HREF="ch29_07.htm"TITLE="29.7 Count How Many Times Each Word Is Used "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.5 Adding Words to ispell's Dictionary "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 29<BR>Spell Checking, Word Counting, and Textual Analysis</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_07.htm"TITLE="29.7 Count How Many Times Each Word Is Used "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.7 Count How Many Times Each Word Is Used "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-2730">29.6 Counting Lines, Words, and Characters: wc </A></H2><PCLASS="para"><ACLASS="indexterm"NAME="UPT-ART-2730-IX-WORDS-COUNTING"></A><ACLASS="indexterm"NAME="UPT-ART-2730-IX-TEXT-COUNTING-ELEMENTS-OF"></A><ACLASS="indexterm"NAME="UPT-ART-2730-IX-WC-COMMAND"></A><ACLASS="indexterm"NAME="UPT-ART-2730-IX-COUNTING-TEXT-ELEMENTS"></A><ACLASS="indexterm"NAME="UPT-ART-2730-IX-LINES-IN-FILES-COUNTING"></A><ACLASS="indexterm"NAME="UPT-ART-2730-IX-CHARACTERS-COUNTING"></A>The <EMCLASS="emphasis">wc</EM> (word count) command counts the number of lines, words, andcharacters in the files you specify.(<SPANCLASS="link">Like most UNIX utilities (<ACLASS="linkend"HREF="ch01_30.htm"TITLE="Redirecting Input and Output ">1.30</A>)</SPAN>,<EMCLASS="emphasis">wc</EM> reads from its standard inputif you don't specify a filename.)For example, the file <EMCLASS="emphasis">letter</EM> has 120 lines, 734 words, and 4297characters:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>wc letter</B></CODE>     120     734    4297 letter</PRE></BLOCKQUOTE></P><PCLASS="para">You can restrict what is countedby specifying the options <EMCLASS="emphasis">-l</EM> (count lines only), <EMCLASS="emphasis">-w</EM> (count words only), and <EMCLASS="emphasis">-c</EM> (count characters only).For example, you can count the number of lines in a file:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>wc -l letter</B></CODE>     120 letter</PRE></BLOCKQUOTE></P><PCLASS="para">or you can count the number of files in a directory:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>cd man_pages</B></CODE>% <CODECLASS="userinput"><B>ls | wc -w</B></CODE>     233</PRE></BLOCKQUOTE></P><PCLASS="para">The first example uses a file as input; the second example pipesthe output of an <EMCLASS="emphasis">ls</EM> command to the input of <EMCLASS="emphasis">wc</EM>.(Be aware that the<SPANCLASS="link"><EMCLASS="emphasis">-a</EM> option (<ACLASS="linkend"HREF="ch16_11.htm"TITLE="Showing Hidden Files with ls -A and -a ">16.11</A>)</SPAN>makes <EMCLASS="emphasis">ls</EM> list dot files.If your <EMCLASS="emphasis">ls</EM> command is<SPANCLASS="link">aliased (<ACLASS="linkend"HREF="ch10_02.htm"TITLE="Aliases for Common Commands ">10.2</A>)</SPAN>to include <EMCLASS="emphasis">-a</EM>or other options that add words to the normal output&nbsp;- such as theline <CODECLASS="literal">total </CODE><CODECLASS="replaceable"><I>nnn</I></CODE> from <EMCLASS="emphasis">ls -l</EM>-thenyou may not get the results you want.)</P><PCLASS="para">The fact that you can pipe the output of a command through <EMCLASS="emphasis">wc</EM> lets youuse <EMCLASS="emphasis">wc</EM> to perform addition and subtraction. For example, I once wrotea shell script that involved, among other things, splitting files into several pieces, and I needed the script to keep track of how many files werecreated. (The script ran<SPANCLASS="link"><EMCLASS="emphasis">csplit</EM> (<ACLASS="linkend"HREF="ch35_10.htm"TITLE="Splitting Files by Context: csplit ">35.10</A>)</SPAN>on each file, producing an arbitrarynumber of new files named <EMCLASS="emphasis">file.00</EM>, <EMCLASS="emphasis">file.01</EM>, <EMCLASS="emphasis">file.02</EM>, etc.)Here's the code I used to solve this problem:</P><PCLASS="para"><TABLECLASS="screen.co"BORDER="1"><TR><THVALIGN="TOP"><PRECLASS="calloutlist"><ACLASS="co"HREF="ch09_16.htm"TITLE="9.16 Command Substitution ">`...`</A> <ACLASS="co"HREF="ch49_06.htm"TITLE="49.6 Quick Arithmetic with expr ">expr</A> </PRE></TH><TDVALIGN="TOP"><PRECLASS="screen">before=`ls $file* | wc -l`              # count the file<EMCLASS="emphasis">   split the file by running it through csplit</EM>after=`ls $file* | wc -l`               # count file plus new splitsnum_files=`expr $after - $before`       # evaluate the difference</PRE></TD></TR></TABLE></P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-32221"></A>As another trick, the following command will tell you how many more wordsare in <EMCLASS="emphasis">new.file</EM> than in <EMCLASS="emphasis">old.file</EM>:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>expr `wc -w &lt; new.file`    -    `wc -w &lt; old.file`</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">[The C and Korn shells have built-in arithmetic commands and don't really need<EMCLASS="emphasis">expr</EM>-but <EMCLASS="emphasis">expr</EM> works in all shells. <EMCLASS="emphasis">-JP</EM>&nbsp;]&#13;</P><PCLASS="para">Notice that you should have <EMCLASS="emphasis">wc</EM> read the input files by using a <CODECLASS="literal">&lt;</CODE> character. If instead you say:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>expr `wc -w new.file` - `wc -w old.file`</B></CODE></PRE></BLOCKQUOTE></P><PCLASS="para">the filenames will show up in the expressions and produce a syntax error.[1]</P><BLOCKQUOTECLASS="footnote"><PCLASS="para">[1] You could also type <CODECLASS="literal">cat new.file | wc -w</CODE>, but thisinvolves two commands, so it's<SPANCLASS="link">less efficient (<ACLASS="linkend"HREF="ch13_02.htm"TITLE="One Argument with a cat Isn't Enough ">13.2</A>)</SPAN>.</P></BLOCKQUOTE><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="count.it">count.it</A><BR></TH><TDVALIGN="TOP">Taking this concept further, here's a simple shell script to calculatethe differences in word count between two files:<ACLASS="indexterm"NAME="AUTOID-32245"></A>&#13;</TD></TR></TABLE><PCLASS="para"><TABLECLASS="screen.co"BORDER="1"><TR><THVALIGN="TOP"><PRECLASS="calloutlist">&#13;<ACLASS="co"HREF="ch08_06.htm"TITLE="8.6 Output Command-Line Arguments ">echo</A> &#13;</PRE></TH><TDVALIGN="TOP"><PRECLASS="screen">count_1=`wc -w &lt; $1`   # number of words in file 1count_2=`wc -w &lt; $2`   # number of words in file 2diff_12=`expr $count_1 - $count_2`   # difference in word count# if $diff_12 is negative, reverse order and don't show the minus sign:case &quot;$diff_12&quot; in-*) echo &quot;$2 has `expr $diff_12 : '-\(.*\)'` more words than $1&quot; ;;*)  echo &quot;$1 has $diff_12 more words than $2&quot; ;;esac</PRE></TD></TR></TABLE></P><PCLASS="para">If this script were called <EMCLASS="emphasis">count.it</EM>, then you could invoke it like this:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>count.it draft.2 draft.1</B></CODE>draft.1 has 23 more words than draft.2</PRE></BLOCKQUOTE></P><PCLASS="para">You could modify this script to count lines or characters.</P><BLOCKQUOTECLASS="note"><PCLASS="para"><STRONG>NOTE:</STRONG> <ACLASS="indexterm"NAME="AUTOID-32259"></A>Unless the counts are very large, the output of <EMCLASS="emphasis">wc</EM> will have leadingspaces. This can cause trouble in scripts if you aren't careful.For instance, in the script above, the command:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">echo &quot;$1 has $count_1 words&quot;</PRE></BLOCKQUOTE></P><PCLASS="para">might print:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">draft.2 has       79 words</PRE></BLOCKQUOTE></P><PCLASS="para">See the extra spaces?Understanding how the shell handles<SPANCLASS="link">quoting (<ACLASS="linkend"HREF="ch08_14.htm"TITLE="Bourne Shell Quoting ">8.14</A>)</SPAN>will help here.If you can, let the shell read the <EMCLASS="emphasis">wc</EM> output and remove extra spaces.For example, without quotes, the shell passes four separate words to<EMCLASS="emphasis">echo</EM>-and <EMCLASS="emphasis">echo</EM> adds a single space between each word:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">echo $1 has $count_1 words</PRE></BLOCKQUOTE></P><PCLASS="para">that might print:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">draft.2 has 79 words</PRE></BLOCKQUOTE></P><PCLASS="para">That's especially important to understand when you use <EMCLASS="emphasis">wc</EM>with commands like <EMCLASS="emphasis">test</EM> or <EMCLASS="emphasis">expr</EM> which don't expect spacesin their arguments.If you can't use the shell to strip out the spaces, delete them bypiping the <EMCLASS="emphasis">wc</EM> output through<SPANCLASS="link"><CODECLASS="literal">tr&nbsp;-d&nbsp;'&nbsp;'</CODE> (<ACLASS="linkend"HREF="ch35_11.htm"TITLE="Hacking on Characters with tr ">35.11</A>)</SPAN>.</P></BLOCKQUOTE><PCLASS="para">&#13;Finally, two notes about file size:</P><ULCLASS="itemizedlist"><LICLASS="listitem"><PCLASS="para"><EMCLASS="emphasis">wc -c</EM> isn't an efficient way to count the characters in largenumbers of files.<EMCLASS="emphasis">wc</EM> opens and reads each file, which takes time.The fourth or fifth column of output from <EMCLASS="emphasis">ls -l</EM> (depending onyour version) gives the character count without opening the file.You can sum <EMCLASS="emphasis">ls -l</EM> counts for multiple files with the<ACLASS="indexterm"NAME="AUTOID-32293"></A><SPANCLASS="link"><EMCLASS="emphasis">addup</EM> (<ACLASS="linkend"HREF="ch49_07.htm"TITLE="Total a Column with addup ">49.7</A>)</SPAN>command.For example:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>ls -l </B></CODE><CODECLASS="replaceable"><I>files</I></CODE><CODECLASS="userinput"><B> | addup 4</B></CODE>670518</PRE></BLOCKQUOTE></P><PCLASS="para"></P></LI><LICLASS="listitem"><PCLASS="para">Using character counts (as in the item above) doesn't give you the totaldisk space used by files.That's because, in general, each file takes at least one disk blockto store.The<SPANCLASS="link"><EMCLASS="emphasis">du</EM> (<ACLASS="linkend"HREF="ch24_09.htm"TITLE="How Much Disk Space? ">24.9</A>)</SPAN>command gives accurate disk usage.</P></LI></UL><ACLASS="indexterm"NAME="AUTOID-32307"></A><ACLASS="indexterm"NAME="AUTOID-32308"></A><ACLASS="indexterm"NAME="AUTOID-32309"></A><ACLASS="indexterm"NAME="AUTOID-32310"></A><ACLASS="indexterm"NAME="AUTOID-32311"></A><ACLASS="indexterm"NAME="AUTOID-32312"></A><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">DG</SPAN>, <SPANCLASS="authorinitials">JP</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.5 Adding Words to ispell's Dictionary "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_07.htm"TITLE="29.7 Count How Many Times Each Word Is Used "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.7 Count How Many Times Each Word Is Used "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">29.5 Adding Words to ispell's Dictionary </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">29.7 Count How Many Times Each Word Is Used </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -