⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch29_04.htm

📁 the unix power tools
💻 HTM
字号:
<HTML><!--Distributed by F --><HEAD><TITLE>[Chapter 29] 29.4 Inside spell </TITLE><METANAME="DC.title"CONTENT="UNIX Power Tools"><METANAME="DC.creator"CONTENT="Jerry Peek, Tim O'Reilly &amp; Mike Loukides"><METANAME="DC.publisher"CONTENT="O'Reilly &amp; Associates, Inc."><METANAME="DC.date"CONTENT="1998-08-04T21:45:02Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-260-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch29_01.htm"TITLE="29. Spell Checking, Word Counting, and Textual Analysis"><LINKREL="prev"HREF="ch29_03.htm"TITLE="29.3 How Do I Spell That Word? "><LINKREL="next"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "></HEAD><BODYBGCOLOR="#FFFFFF"TEXT="#000000"><DIVCLASS="htmlnav"><H1><IMGSRC="gifs/smbanner.gif"ALT="UNIX Power Tools"USEMAP="#srchmap"BORDER="0"></H1><MAPNAME="srchmap"><AREASHAPE="RECT"COORDS="0,0,466,58"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="467,0,514,18"HREF="jobjects/fsearch.htm"ALT="Search this book"></MAP><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_03.htm"TITLE="29.3 How Do I Spell That Word? "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.3 How Do I Spell That Word? "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1">Chapter 29<BR>Spell Checking, Word Counting, and Textual Analysis</FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.5 Adding Words to ispell's Dictionary "BORDER="0"></A></TD></TR></TABLE>&nbsp;<HRALIGN="LEFT"WIDTH="515"TITLE="footer"></DIV><DIVCLASS="SECT1"><H2CLASS="sect1"><ACLASS="title"NAME="UPT-ART-7963">29.4 Inside spell </A></H2><PCLASS="para">[If you have <SPANCLASS="link"><EMCLASS="emphasis">ispell</EM> (<ACLASS="linkend"HREF="ch29_02.htm"TITLE="Check Spelling Interactively with ispell ">29.2</A>)</SPAN>,there's not a whole lot of reason for using <ICLASS="filename">spell</I> any more.Not only is <ICLASS="filename">ispell</I> more powerful, it's a heck of a lot easier to updateits spelling dictionaries.Nonetheless, we decided to include thisarticle, because it makes clear the kinds of rules that spellingcheckers go through to expand on the words in their dictionaries. -TOR\]</P><PCLASS="para">On many UNIX systems,the directory <EMCLASS="emphasis">/usr/lib/spell</EM> contains the main program invoked by the<EMCLASS="emphasis">spell</EM> command along with auxiliary programs and data files.</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>ls -l /usr/lib/spell</B></CODE>total 888-rwxr-xr-x   1 bin          545 Dec  9  1988 compress-rwxr-xr-x   1 bin        16324 Dec  9  1988 hashcheck-rwxr-xr-x   1 bin        14828 Dec  9  1988 hashmake-rw-r--r--   1 bin        53872 Dec  9  1988 hlista-rw-r--r--   1 bin        53840 Dec  9  1988 hlistb-rw-r--r--   1 bin         6336 Dec  9  1988 hstop-rw-rw-rw-   1 root      252312 Nov 27 16:24 spellhist-rwxr-xr-x   1 bin        21634 Dec  9  1988 spellin-rwxr-xr-x   1 bin        23428 Dec  9  1988 spellprog</PRE></BLOCKQUOTE></P><PCLASS="para">On some systems, the <EMCLASS="emphasis">spell</EM> command is a shell script that pipes its input through<SPANCLASS="link"><EMCLASS="emphasis">deroff -w</EM> (<ACLASS="linkend"HREF="ch29_10.htm"TITLE="Just the Words, Please ">29.10</A>)</SPAN>and<SPANCLASS="link"><EMCLASS="emphasis">sort -u</EM> (<ACLASS="linkend"HREF="ch36_06.htm"TITLE="Miscellaneous sort Hints ">36.6</A>)</SPAN>to remove formatting codes and prepare a sorted word list, one word per line.On other systems, it is a stand-alone program that does these stepsinternally.Two separate spelling lists are maintained, one for American usageand one for British usage (invoked with the <EMCLASS="emphasis">-b</EM> option to <EMCLASS="emphasis">spell</EM>).These lists, <EMCLASS="emphasis">hlista</EM> and <EMCLASS="emphasis">hlistb</EM>, cannot be read or updated directly.They are compressed files, compiled from a list of wordsrepresented as nine-digit hash codes.(Hash coding is a special technique used to quickly search for information.)</P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-32021"></A>The main program invoked by <EMCLASS="emphasis">spell</EM> is <EMCLASS="emphasis">spellprog</EM>.It loads the list of hash codes from either <EMCLASS="emphasis">hlista</EM> or <EMCLASS="emphasis">hlistb</EM> into atable, and looks for the hash code corresponding to each word on thesorted word list.This eliminates all words (or hash codes) actually found in thespelling list.For the remaining words, <EMCLASS="emphasis">spellprog</EM> tries to see if it can derive arecognizable word by performing various operations on the word stem,based on suffix and prefix rules.<ACLASS="indexterm"NAME="AUTOID-32028"></A>A few of these manipulations follow:</P><BLOCKQUOTECLASS="blockquote"><PCLASS="para">-y+iness+ness-y+i+less+less-y+ies-t+ce-t+cy</P></BLOCKQUOTE><PCLASS="para">The new words created as a result of these manipulations will bechecked once more against the spell table.However, before the stem-derivative rules are applied, the remainingwords are checked against a table of hash codes built from the file <EMCLASS="emphasis">hstop</EM>.<ACLASS="indexterm"NAME="AUTOID-32034"></A>The stop list contains typical misspellings that stem-derivativeoperations might allow to pass.For instance, the misspelled word <EMCLASS="emphasis">thier</EM> would be converted into <EMCLASS="emphasis">thy</EM>using the suffix rule -y+ier.The <EMCLASS="emphasis">hstop</EM> file accounts for as many cases of this typeof error as possible.</P><PCLASS="para">The final output consists of words not found in the spell list, evenafter the program tried to search for their stems, and words thatwere found in the stop list.</P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-32041"></A>You can get a better sense of these rules in action by using the <EMCLASS="emphasis">-v</EM>or <EMCLASS="emphasis">-x</EM> option.The <EMCLASS="emphasis">-v</EM> option eliminates the last lookup in the table, and producesa list of words that are not actually in the spelling list alongwith possible derivatives.It allows you to see which words were found as a result ofstem-derivative operations, and prints the rule used.(Refer to the <EMCLASS="emphasis">sample</EM> file in article<ACLASS="xref"HREF="ch29_01.htm#UPT-ART-4080"TITLE="The UNIX spell Command ">29.1</A>.)</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>spell -v sample</B></CODE>AlcuinditroffLaserWriterPostScriptprinterrTranScript+out  output+s    uses</PRE></BLOCKQUOTE></P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-32053"></A>The <EMCLASS="emphasis">-x</EM> option makes <EMCLASS="emphasis">spell</EM> begin at the stem-derivative stage, andprints the various attempts it makes to find the word stem of each word.</P><PCLASS="para"><BLOCKQUOTECLASS="screen"><PRECLASS="screen">% <CODECLASS="userinput"><B>spell -x sample</B></CODE>...=into=LaserWriter=LaserWrite=LaserWrit=laserWriter=laserWrite=laserWrit=output=put...LaserWriter...</PRE></BLOCKQUOTE></P><PCLASS="para">The stem is preceded by an equal sign (<CODECLASS="literal">=</CODE>).At the end of the output are the words whosestem does not appear in the spell list.</P><PCLASS="para"><ACLASS="indexterm"NAME="AUTOID-32064"></A>One other file you should know about is <EMCLASS="emphasis">spellhist</EM>.On some systems,each time you run <EMCLASS="emphasis">spell</EM>, the output is appended through<SPANCLASS="link"><EMCLASS="emphasis">tee</EM> (<ACLASS="linkend"HREF="ch13_09.htm"TITLE="Send Output Two or More Places with tee ">13.9</A>)</SPAN>into <EMCLASS="emphasis">spellhist</EM>, in effect creating a list of all themisspelled or unrecognized words for your site.The <EMCLASS="emphasis">spellhist</EM> file is something of a &quot;garbage&quot; file that keeps ongrowing.You will want to reduce it or remove it periodically.To extract useful information from this <EMCLASS="emphasis">spellhist</EM>, you might use the<EMCLASS="emphasis">sort</EM>and<SPANCLASS="link"><EMCLASS="emphasis">uniq&nbsp;-c</EM> (<ACLASS="linkend"HREF="ch35_20.htm"TITLE="Quick Reference: uniq ">35.20</A>)</SPAN>commands to compile alist of misspelled words or special terms that occur most frequently(see article<ACLASS="xref"HREF="ch29_07.htm"TITLE="Count How Many Times Each Word Is Used ">29.7</A>for a similar example).It is possible to add these words back into the basic spellingdictionary, but this is too complex a process to describe here.It's probably easier just to use a <SPANCLASS="link">local spelling dictionary (<ACLASS="linkend"HREF="ch29_01.htm#UPT-ART-4080"TITLE="The UNIX spell Command ">29.1</A>)</SPAN>.Even better, use <EMCLASS="emphasis">ispell</EM>; not only is it a more powerfulspelling program, it is much easier to<SPANCLASS="link">update the word lists it uses (<ACLASS="linkend"HREF="ch29_05.htm"TITLE="Adding Words to ispell's Dictionary ">29.5</A>)</SPAN>.</P><DIVCLASS="sect1info"><PCLASS="SECT1INFO">- <SPANCLASS="authorinitials">DD</SPAN> <SPANCLASS="bibliomisc">from <CITECLASS="citetitle">UNIX Text Processing</CITE>, Hayden Books, 1987</SPAN></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><TABLEWIDTH="515"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_03.htm"TITLE="29.3 How Do I Spell That Word? "><IMGSRC="gifs/txtpreva.gif"SRC="gifs/txtpreva.gif"ALT="Previous: 29.3 How Do I Spell That Word? "BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="book"HREF="index.htm"TITLE="UNIX Power Tools"><IMGSRC="gifs/txthome.gif"SRC="gifs/txthome.gif"ALT="UNIX Power Tools"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172"><ACLASS="SECT1"HREF="ch29_05.htm"TITLE="29.5 Adding Words to ispell's Dictionary "><IMGSRC="gifs/txtnexta.gif"SRC="gifs/txtnexta.gif"ALT="Next: 29.5 Adding Words to ispell's Dictionary "BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="172">29.3 How Do I Spell That Word? </TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="171"><ACLASS="index"HREF="index/idx_0.htm"TITLE="Book Index"><IMGSRC="gifs/index.gif"SRC="gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="172">29.5 Adding Words to ispell's Dictionary </TD></TR></TABLE><HRALIGN="LEFT"WIDTH="515"TITLE="footer"><IMGSRC="gifs/smnavbar.gif"SRC="gifs/smnavbar.gif"USEMAP="#map"BORDER="0"ALT="The UNIX CD Bookshelf Navigation"><MAPNAME="map"><AREASHAPE="RECT"COORDS="0,0,73,21"HREF="../index.htm"ALT="The UNIX CD Bookshelf"><AREASHAPE="RECT"COORDS="74,0,163,21"HREF="index.htm"ALT="UNIX Power Tools"><AREASHAPE="RECT"COORDS="164,0,257,21"HREF="../unixnut/index.htm"ALT="UNIX in a Nutshell"><AREASHAPE="RECT"COORDS="258,0,321,21"HREF="../vi/index.htm"ALT="Learning the vi Editor"><AREASHAPE="RECT"COORDS="322,0,378,21"HREF="../sedawk/index.htm"ALT="sed &amp; awk"><AREASHAPE="RECT"COORDS="379,0,438,21"HREF="../ksh/index.htm"ALT="Learning the Korn Shell"><AREASHAPE="RECT"COORDS="439,0,514,21"HREF="../lrnunix/index.htm"ALT="Learning the UNIX Operating System"></MAP></DIV></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -