0280-0280.html

来自「linux-unix130.linux.and.unix.ebooks130 l」· HTML 代码 · 共 155 行

HTML

155 行

<HTML>



<HEAD>

<TITLE>Linux Complete Command Reference:User Commands:EarthWeb Inc.-</TITLE>

</HEAD>

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<SCRIPT>
<!--
function displayWindow(url, width, height) {
        var Win = window.open(url,"displayWindow",'width=' + width +
',height=' + height + ',resizable=1,scrollbars=yes');
}
//-->
</SCRIPT>
</HEAD>

 -->




<!-- ISBN=0672311046 //-->

<!-- TITLE=Linux Complete Command Reference//-->

<!-- AUTHOR=Red Hat//-->

<!-- PUBLISHER=Macmillan Computer Publishing//-->

<!-- IMPRINT=Sams//-->

<!-- CHAPTER=01 //-->

<!-- PAGES=0001-0736 //-->

<!-- UNASSIGNED1 //-->

<!-- UNASSIGNED2 //-->



<P><CENTER>

<a href="0278-0279.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0281-0282.html">Next</A></CENTER></P>







<A NAME="PAGENUM-280"><P>Page 280</P></A>





<P>If the _s (strip) option is specified, words that are in the specified

hash-file are removed from the word list. This can

be useful with personal dictionaries.

</P>



<P>The _l can be used to specify an alternate

affix-file for munching dictionaries in languages other than English.

</P>



<P>The _c option can be used to convert dictionaries that were built with an older affix file, without risk of

accidentally introducing unintended affix combinations into the dictionary.

</P>



<P>The _T option allows dictionaries to be converted to a canonical string-character format. The suffix specified is looked up

in the affix file (_l switch) to determine the string-character format used for the input file; the output always uses the

canonical string-character format. For example, a dictionary collected from TeX source FILES might be converted to canonical format

by specifying _T tex.

</P>



<P>The _w option is passed on to ispell.

</P>



<H3><A NAME="ch01_ 130">

findaffix

</A></H3>



<P>The findaffix shell script is an aid to writers of new language DESCRIPTIONs in choosing affixes. The given dictionary

FILES (standard input if none are given) are examined for possible prefixes

(_p switch) or suffixes (_s switch, the default).

Each commonly occurring affix is presented along with a count of the number of times it appears and an estimate of the

number of bytes that would be saved in a dictionary hash file if it were added to the language table. Only affixes that generate

legal roots (found in the original input) are listed.

</P>



<P>If the -c option is not given, the output lines are in the following format:

</P>



<!-- CODE SNIP //-->

<PRE>

strip/add/count/bytes

</PRE>

<!-- END CODE SNIP //-->



<P>where strip is the string that should be stripped from a root word before adding the affix,

add is the affix to be added, count is a count of the number of times that this

strip/add combination appears, and bytes is an estimate of the number of

bytes that might be saved in the raw dictionary file if this combination is added to the affix file. The field separator in the

output will be the tab character specified by the

-t switch; the default is a slash (/).

</P>



<P>If the _c (clean output) option is given, the appearance of the output is made visually cleaner (but harder to post process)

by changing it to

</P>



<!-- CODE SNIP //-->

<PRE>

-strip+add&lt;tab&gt;count&lt;tab&gt;bytes

</PRE>

<!-- END CODE SNIP //-->



<P>where strip, add, count,and bytes are as before, and

&lt;tab&gt; represents the ASCII tab character.

</P>



<P>The method used to generate possible affixes will also generate longer affixes which have common headers or trailers.

For example, the two words moth and mother will generate not only the obvious substitution

+er but also -h+her and -th+ther (and possibly even longer ones, depending on the value of

min). To prevent cluttering the output with such affixes, any

affix pair that shares a common header (or, for prefixes, trailer) string longer than

elim characters (default 1) will be suppressed. You may want to set

elim to a value greater than 1 if your language has string characters; usually, the need for this

parameter will become obvious when you examine the output of your

findaffix run.

</P>



<P>Normally, the affixes are sorted according to the estimate of bytes saved. The

_f switch may be used to cause the affixes to be sorted by frequency of appearance.

</P>



<P>To save output file space, affixes which occur fewer than 10 times are eliminated; this limit may be changed with the

_l switch. The _M switch specifies a maximum affix length (default

8). Affixes longer than this will not be reported. (This

saves on temporary disk space and makes the script run faster.)

</P>



<P>Affixes which generate stems shorter than three characters are suppressed. (A stem is the word after the

strip string has been removed, and before the add string has been added.) This reduces both the running time and the size of the output file.

This limit may be changed with the _m switch. The minimum stem length should only be set to

1 if you have a lot of free time and disk space (in the range of many days and hundreds of megabytes).

</P>



<P>The findaffix script requires a nonblank field-separator character for internal use. Normally, this character is a slash

(/), but if the slash appears as a character in the input word list, a different character can be specified with the

_t switch.

</P>



<P>ispell dictionaries should be expanded before being fed to

findaffix; in addition, characters that are not in the

English alphabet (if any) should be translated to lowercase.

</P>







<P><CENTER>

<a href="0278-0279.html">Previous</A> | <a href="../ewtoc.html">Table of Contents</A> | <a href="0281-0282.html">Next</A></CENTER></P>







</td>
</tr>
</table>

<!-- begin footer information -->







</body></html>

0280-0280.html - 源码说明

本页面展示了「linux-unix130.linux.and.unix.ebooks130 linux and unix ebookslinuxLearning Linux - Collection of 12 E」中的 0280-0280.html 源码文件，采用 HTML 编程语言编写，共 155 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?