⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 select-vocab.1

📁 这是一款很好用的工具包
💻 1
字号:
select-vocab(1)                                   select-vocab(1)NNAAMMEE       select-vocab - Select a maximum-likelihood vocabulary from       a mixture of corpora.SSYYNNOOPPSSIISS       sseelleecctt--vvooccaabb [ _-_o_p_t_i_o_n_s ... ] --hheellddoouutt _f_i_l_e _f_1 _f_2 ...  _f_nDDEESSCCRRIIPPTTIIOONN       sseelleecctt--vvooccaabb picks a vocabulary  from  the  union  of  the       vocabularies  of  files _f_1 through _f_n in order to maximize       the likelihood of  the  heldout  file.   When  invoked  as       above,  the  program will print out (unsorted) the list of       words in all of the  input  corpora  together  with  their       weights.   This list may subsequently be sorted to put the       words in decreasing order of weight and a  vocabulary  may       be  chosen  by  picking  a  suitable  threshold weight and       ignoring words with weight less than this.       A number of automatically detected formats  are  supported       for  the  input  files  _f_1  through _f_n_.  They can be count       files, which are characterized by each line  ending  in  a       number, ARPA language models in nnggrraamm--ffoorrmmaatt(5), or simply       text files.  If they are text-files,  further,  and  their       names end in ".sentid", it is assumed that the first field       of each line is a sentence identifier that  is  then  dis-       carded.   Furthermore,  all of the input files can also be       compressed (if gzip is installed and available on the sys-       tem).OOPPTTIIOONNSS       --hheellpp  Prints a short help message.       --hheellddoouutt _f_i_l_e              Likelihood  maximization  is  performed on the con-              tents of _f_i_l_e_.  This file may also be in any of the              formats  supported  for  the input corpora, namely:              text, counts, sentid, or ARPA-lm.       --qquuiieett Suppresses printing of progress and other  informa-              tive  messages  during  execution.   By default the              script writes these out to the output error stream.       --ssccaallee _n              The  combined  final  counts are scaled by _n before              being written out. This makes it possible  to  sort              the  output  list  numerically  with  sort(1).  The              default scale is 1e6.NNOOTTEESS       This implementation corrects a minor error  in  the  algo-       rithm  specification  in  [1].  The paper describes corpus       level interpolation, but the script  actually  does  word-       level interpolation.       The  program  is  written in perl(1) and requires it to be       installed in order to run.SSEEEE AALLSSOO       ngram-count(1), ngram-format(5), training-scripts(1).       [1] A. Venkataraman and W. Wang, "Techniques for effective       vocabulary   selection",  in  _P_r_o_c_e_e_d_i_n_g_s  _o_f  _E_u_r_o_s_p_e_e_c_h,       Geneva, 2003.BBUUGGSS       Probably.   Send  bug-reports,  fixes,  modifications  and       enhancements to Anand Venkataraman (anand@speech.sri.com).SSOOUURRCCEE       Download as part of the SRILM toolkit, or stand-alone from       http://www.speech.sri.com/people/anand/downloads/selvoc-       v1.tar.gzAAUUTTHHOORRSS       Anand Venkataraman <anand@speech.sri.com>       Wen Wang <wwang@speech.dsri.com>       Copyright 2003 SRI InternationalSRILM Tools        $Date: 2003/12/14 02:43:14 $   select-vocab(1)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -