ch29_07.htm

来自「the unix power tools」· HTM 代码 · 共 495 行
HTM
495 行
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 29] 29.7 Count How Many Times Each Word Is Used </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:45:09Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch29_01.htm"TITLE="29. Spell Checking, Word Counting, and Textual Analysis"><LINKREL="prev"HREF="ch29_06.htm"TITLE="29.6 Counting Lines, Words, and Characters: wc "><LINKREL="next"HREF="ch29_08.htm"TITLE="29.8 Find a a Doubled Word "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_06.htm"TITLE="29.6 Counting Lines, Words, and Characters: wc "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.6 Counting Lines, Words, and Characters: wc "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 29<BR>Spell Checking, Word Counting, and Textual Analysis</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_08.htm"TITLE="29.8 Find a a Doubled Word "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.8 Find a a Doubled Word "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-4670">29.7 Count How Many Times Each Word Is Used </A></H2><TABLECLASS="para.programreference"BORDER="1"><TR><THVALIGN="TOP"><ACLASS="programreference"HREF="examples/index.htm"TITLE="wordfreq">wordfreq</A><BR></TH><TDVALIGN="TOP"><ACLASS="indexterm"NAME="AUTOID-32319"></A><ACLASS="indexterm"NAME="AUTOID-32322"></A><ACLASS="indexterm"NAME="AUTOID-32324"></A>The <EMCLASS="emphasis">wordfreq</EM> script counts the number of occurrences of each word in its input.If you give it files, it reads from them; otherwise it reads standard input.The <EMCLASS="emphasis">-i</EM> option folds uppercase into lowercase (uppercase letterswill count the same as lowercase).</TD></TR></TABLE><PCLASS="para">Here's this book's Preface run through <EMCLASS="emphasis">wordfreq</EM>:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>wordfreq ch00</B></CODE> 141 the  98 to  84 and  84 of  71 a  55 in  44 that  38 book  32 we  ...</PRE></BLOCKQUOTE></P><PCLASS="para">The script was taken from a long-ago<SPANCLASS="link">Usenet (<ACLASS="linkend"HREF="ch01_33.htm"TITLE="UNIX Networking and Communications ">1.33</A>)</SPAN>posting by Carl Brandauer.Here is Carl's original script (with a few small edits):</P><PCLASS="para"><TABLECLASS="screen.co"BORDER="1"><TR><THVALIGN="TOP"><PRECLASS="calloutlist">&#13;<ACLASS="co"HREF="ch35_11.htm"TITLE="35.11 Hacking on Characters with tr ">tr</A> <ACLASS="co"HREF="ch36_01.htm"TITLE="36.1 Putting Things in Order ">sort</A> <ACLASS="co"HREF="ch35_20.htm"TITLE="35.20 Quick Reference: uniq ">uniq</A> <ACLASS="co"HREF="ch35_17.htm"TITLE="35.17 Making Text in Columns with pr ">-4</A> </PRE></TH><TDVALIGN="TOP"><PRECLASS="screen">cat $* |   # tr reads the standard inputtr &quot;[A-Z]&quot; &quot;[a-z]&quot; |   # Convert all uppercase to lowercasetr -cs &quot;a-z'&quot; &quot;\012&quot; |   # replace all characters not a-z or '   # with a new line. i.e. one word per linesort |   # uniq expects sorted inputuniq -c |   # Count number of times each word appearssort +0nr +1d |   # Sort first from most to least frequent,   # then alphabeticallypr -w80 -4 -h &quot;Concordance for $*&quot;     # Print in four columns</PRE></TD></TR></TABLE></P><PCLASS="para">&#13;The version on the disc is somewhat different.It adjusts the <EMCLASS="emphasis">tr</EM> commands for the script's <EMCLASS="emphasis">-i</EM> option.The disc version also doesn't use <EMCLASS="emphasis">pr</EM> to make output in fourcolumns, though you can add that to your copy of the script&nbsp;- or justpipe the <EMCLASS="emphasis">wordfreq</EM> output through <EMCLASS="emphasis">pr</EM> on the command linewhen you need it.</P><PCLASS="para">The second <EMCLASS="emphasis">tr</EM> command above (with the <CODECLASS="literal">-cs</CODE> options)is for the Berkeley version of <EMCLASS="emphasis">tr</EM>.For System V <EMCLASS="emphasis">tr</EM>, the command should be:</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">tr -cs &quot;[a-z]'&quot; &quot;[\012*]&quot;</PRE></BLOCKQUOTE></P><PCLASS="para">If you aren't sure which version of <EMCLASS="emphasis">tr</EM> you have, see article<ACLASS="xref"HREF="ch35_11.htm"TITLE="Hacking on Characters with tr ">35.11</A>.You could use <SPANCLASS="link"><EMCLASS="emphasis">deroff</EM> (<ACLASS="linkend"HREF="ch29_10.htm"TITLE="Just the Words, Please ">29.10</A>)</SPAN>instead.</P><PCLASS="para">One of the beauties of a simple script like this is that you cantweak it if you don't like the way it counts.For example, if you want hyphenated words like <EMCLASS="emphasis">copy-editor</EM>to count as one, add a hyphen to the <CODECLASS="literal">tr&nbsp;-cs</CODE> expression:<CODECLASS="literal">&quot;[a-z]'-&quot;</CODE> (System V) or <CODECLASS="literal">&quot;-a-z'&quot;</CODE> (Berkeley).</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">JP</SPAN>, <SPANCLASS="authorinitials">TOR</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_06.htm"TITLE="29.6 Counting Lines, Words, and Characters: wc "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.6 Counting Lines, Words, and Characters: wc "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_08.htm"TITLE="29.8 Find a a Doubled Word "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.8 Find a a Doubled Word "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">29.6 Counting Lines, Words, and Characters: wc </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">29.8 Find a a Doubled Word </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>
ch29_07.htm - 源码说明

本页面展示了「the unix power tools」中的 ch29_07.htm 源码文件，采用 HTM 编程语言编写，共 495 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与power相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?