📄 0280-0280.html
字号:
<HTML>
<HEAD>
<TITLE>Linux Complete Command Reference:User Commands:EarthWeb Inc.-</TITLE>
</HEAD>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<SCRIPT>
<!--
function displayWindow(url, width, height) {
var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>
-->
<!-- ISBN=0672311046 //-->
<!-- TITLE=Linux Complete Command Reference//-->
<!-- AUTHOR=Red Hat//-->
<!-- PUBLISHER=Macmillan Computer Publishing//-->
<!-- IMPRINT=Sams//-->
<!-- CHAPTER=01 //-->
<!-- PAGES=0001-0736 //-->
<!-- UNASSIGNED1 //-->
<!-- UNASSIGNED2 //-->
<P><CENTER>
<a href="0278-0279.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0281-0282.html">Next</A></CENTER></P>
<A NAME="PAGENUM-280"><P>Page 280</P></A>
<P>If the _s (strip) option is specified, words that are in the specified
hash-file are removed from the word list. This can
be useful with personal dictionaries.
</P>
<P>The _l can be used to specify an alternate
affix-file for munching dictionaries in languages other than English.
</P>
<P>The _c option can be used to convert dictionaries that were built with an older affix file, without risk of
accidentally introducing unintended affix combinations into the dictionary.
</P>
<P>The _T option allows dictionaries to be converted to a canonical string-character format. The suffix specified is looked up
in the affix file (_l switch) to determine the string-character format used for the input file; the output always uses the
canonical string-character format. For example, a dictionary collected from TeX source FILES might be converted to canonical format
by specifying _T tex.
</P>
<P>The _w option is passed on to ispell.
</P>
<H3><A NAME="ch01_ 130">
findaffix
</A></H3>
<P>The findaffix shell script is an aid to writers of new language DESCRIPTIONs in choosing affixes. The given dictionary
FILES (standard input if none are given) are examined for possible prefixes
(_p switch) or suffixes (_s switch, the default).
Each commonly occurring affix is presented along with a count of the number of times it appears and an estimate of the
number of bytes that would be saved in a dictionary hash file if it were added to the language table. Only affixes that generate
legal roots (found in the original input) are listed.
</P>
<P>If the -c option is not given, the output lines are in the following format:
</P>
<!-- CODE SNIP //-->
<PRE>
strip/add/count/bytes
</PRE>
<!-- END CODE SNIP //-->
<P>where strip is the string that should be stripped from a root word before adding the affix,
add is the affix to be added, count is a count of the number of times that this
strip/add combination appears, and bytes is an estimate of the number of
bytes that might be saved in the raw dictionary file if this combination is added to the affix file. The field separator in the
output will be the tab character specified by the
-t switch; the default is a slash (/).
</P>
<P>If the _c (clean output) option is given, the appearance of the output is made visually cleaner (but harder to post process)
by changing it to
</P>
<!-- CODE SNIP //-->
<PRE>
-strip+add<tab>count<tab>bytes
</PRE>
<!-- END CODE SNIP //-->
<P>where strip, add, count,and bytes are as before, and
<tab> represents the ASCII tab character.
</P>
<P>The method used to generate possible affixes will also generate longer affixes which have common headers or trailers.
For example, the two words moth and mother will generate not only the obvious substitution
+er but also -h+her and -th+ther (and possibly even longer ones, depending on the value of
min). To prevent cluttering the output with such affixes, any
affix pair that shares a common header (or, for prefixes, trailer) string longer than
elim characters (default 1) will be suppressed. You may want to set
elim to a value greater than 1 if your language has string characters; usually, the need for this
parameter will become obvious when you examine the output of your
findaffix run.
</P>
<P>Normally, the affixes are sorted according to the estimate of bytes saved. The
_f switch may be used to cause the affixes to be sorted by frequency of appearance.
</P>
<P>To save output file space, affixes which occur fewer than 10 times are eliminated; this limit may be changed with the
_l switch. The _M switch specifies a maximum affix length (default
8). Affixes longer than this will not be reported. (This
saves on temporary disk space and makes the script run faster.)
</P>
<P>Affixes which generate stems shorter than three characters are suppressed. (A stem is the word after the
strip string has been removed, and before the add string has been added.) This reduces both the running time and the size of the output file.
This limit may be changed with the _m switch. The minimum stem length should only be set to
1 if you have a lot of free time and disk space (in the range of many days and hundreds of megabytes).
</P>
<P>The findaffix script requires a nonblank field-separator character for internal use. Normally, this character is a slash
(/), but if the slash appears as a character in the input word list, a different character can be specified with the
_t switch.
</P>
<P>ispell dictionaries should be expanded before being fed to
findaffix; in addition, characters that are not in the
English alphabet (if any) should be translated to lowercase.
</P>
<P><CENTER>
<a href="0278-0279.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0281-0282.html">Next</A></CENTER></P>
</td>
</tr>
</table>
<!-- begin footer information -->
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -