⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ngram.1

📁 这是一款很好用的工具包
💻 1
📖 第 1 页 / 共 3 页
字号:
       --ppppll _t_e_x_t_f_i_l_e              Compute sentence  scores  (log  probabilities)  and              perplexities  from the sentences in _t_e_x_t_f_i_l_e, which              should contain one sentence per line.   The  --ddeebbuugg              option  controls  the level of detail printed, even              though output is to stdout (not stderr).              --ddeebbuugg 00  Only summary statistics  for  the  entire                        corpus  are  printed,  as  well a partial                        statistics for each input portion  delim-                        ited  by  escaped  lines  (see  --eessccaappee).                        These statistics include  the  number  of                        sentences, words, out-of-vocabulary words                        and zero-probability tokens in the input,                        as  well as its total log probability and                        perplexity.  Perplexity is given with two                        different  normalizations:  counting  all                        input tokens (``ppl'') and excluding end-                        of-sentence tags (``ppl1'').              --ddeebbuugg 11  Statistics  for  individual sentences are                        printed.              --ddeebbuugg 22  Probabilities for  each  word,  plus  LM-                        dependent   details  about  backoff  used                        etc., are printed.              --ddeebbuugg 33  Probabilities for all words are summed in                        each context, and the sum is printed.  If                        this  differs  significantly  from  1,  a                        warning message to stderr will be issued.       --nnbbeesstt _f_i_l_e              Read an N-best list in nnbbeesstt--ffoorrmmaatt(5)  and  rerank              the   hypotheses   using  the  specified  LM.   The              reordered N-best list is written to stdout.  If the              N-best list is given in ``NBestList1.0'' format and              contains composite acoustic/language model  scores,              then --ddeecciipphheerr--llmm and the recognizer language model              and word transition weights (see below) need to  be              specified  so  the  original acoustic scores can be              recovered.       --nnbbeesstt--ffiilleess _f_i_l_e_l_i_s_t              Process multiple N-best lists whose  filenames  are              listed in _f_i_l_e_l_i_s_t.       --wwrriittee--nnbbeesstt--ddiirr _d_i_r              Deposit  rescored  N-best lists into directory _d_i_r,              using filenames derived from the input ones.       --ddeecciipphheerr--nnbbeesstt              Output rescored N-best lists in Decipher  1.0  for-              mat, rather than SRILM format.       --nnoo--rreeoorrddeerr              Output  rescored  N-best  lists without sorting the              hypotheses by their new combined scores.       --sspplliitt--mmuullttiiwwoorrddss              Split multiwords into their components when reading              N-best  lists;  the  rescored  N-best lists thus no              longer contain multiwords.  (Note this is different              from the --mmuullttiiwwoorrddss option, which leaves the input              word stream unchanged and  splits  multiwords  only              for the purpose of LM probability computation.)       --mmaaxx--nnbbeesstt _n              Limits the number of hypotheses read from an N-best              list.  Only the first _n hypotheses are processed.       --rreessccoorree _f_i_l_e              Similar to --nnbbeesstt, but the input is processed as  a              stream  of N-best hypotheses (without header).  The              output consists of the rescored hypotheses in SRILM              format  (the  third  of  the  formats  described in              nnbbeesstt--ffoorrmmaatt(5)).       --ddeecciipphheerr--llmm _m_o_d_e_l_-_f_i_l_e              Designates the N-gram backoff  model  (typically  a              bigram)  that  was  used by the Decipher(TM) recog-              nizer  in  computing  composite  scores   for   the              hypotheses fed to --rreessccoorree or --nnbbeesstt.  Used to com-              pute acoustic scores from the composite scores.       --ddeecciipphheerr--oorrddeerr _N              Specifies the order of the  Decipher  N-gram  model              used (default is 2).       --ddeecciipphheerr--nnoobbaacckkooffff              Indicates  that  the Decipher N-gram model does not              contain backoff  nodes,  i.e.,  all  recognizer  LM              scores are correct up to rounding.       --ddeecciipphheerr--llmmww _w_e_i_g_h_t              Specifies  the  language  model  weight used by the              recognizer.  Used to compute acoustic  scores  from              the composite scores.       --ddeecciipphheerr--wwttww _w_e_i_g_h_t              Specifies  the  word  transition weight used by the              recognizer.  Used to compute acoustic  scores  from              the composite scores.       --eessccaappee _s_t_r_i_n_g              Set an ``escape string'' for the --ppppll, --ccoouunnttss, and              --rreessccoorree computations.  Input lines  starting  with              _s_t_r_i_n_g  are  not  processed as sentences and passed              unchanged to stdout instead.  This  allows  associ-              ated  information  to  be passed to scoring scripts              etc.       --ccoouunnttss _c_o_u_n_t_s_f_i_l_e              Perform a computation similar to  --ppppll,  but  based              only  on  the  N-gram  counts  found in _c_o_u_n_t_s_f_i_l_e.              Probabilities are computed for  the  last  word  of              each N-gram, using the other words as contexts, and              scaling by the associated  N-gram  count.   Summary              statistics are output at the end, as well as before              each escaped input line.       --ccoouunntt--oorrddeerr _n              Use only counts of order _n in the --ccoouunnttss  computa-              tion.   The  default  value  is  0, meaning use all              counts.       --ccoouunnttss--eennttrrooppyy              Weight the log probabilities for --ccoouunnttss processing              by  the  join  probabilities  of the N-grams.  This              effectively  computes  the  sum  over  p(w,h)   log              p(w|h),  i.e., the entropy of the model.  In debug-              ging mode, both the conditional  log  probabilities              and  the corresponding joint probabilities are out-              put.       --sskkiippoooovvss              Instruct the LM to skip over contexts that  contain              out-of-vocabulary words, instead of using a backoff              strategy in these cases.       --nnooiissee _n_o_i_s_e_-_t_a_g              Designate _n_o_i_s_e_-_t_a_g as a vocabulary item that is to              be  ignored  by the LM.  (This is typically used to              identify a noise marker.)  Note that the LM  speci-              fied by --ddeecciipphheerr--llmm does NOT ignore this _n_o_i_s_e_-_t_a_g              since the DECIPHER recognizer  treats  noise  as  a              regular word.       --nnooiissee--vvooccaabb _f_i_l_e              Read  several  noise tags from _f_i_l_e, instead of, or              in addition to, the single noise tag  specified  by              --nnooiissee.       --rreevveerrssee              Reverse the words in a sentence for LM scoring pur-              poses.  (This assumes the LM used is a  ``right-to-              left''  model.)   Note  that  the  LM  specified by              --ddeecciipphheerr--llmm is always  applied  to  the  original,              left-to-right word sequence.SSEEEE AALLSSOO       ngram-count(1),    ngram-class(1),   lm-scripts(1),   ppl-       scripts(1), pfsg-scripts(1), nbest-scripts(1),  ngram-for-       mat(5), nbest-format(5), classes-format(5).       J.  A. Bilmes and K. Kirchhoff, ``Factored Language Models       and Generalized Parallel Backoff,'' _P_r_o_c_.  _H_L_T_-_N_A_A_C_L,  pp.       4-6, Edmonton, Alberta, 2003.       S. F. Chen and J. Goodman, ``An Empirical Study of Smooth-       ing Techniques for Language Modeling,'' TR-10-98, Computer       Science Group, Harvard Univ., 1998.       K. Kirchhoff et al., ``Novel Speech Recognition Models for       Arabic,'' Johns Hopkins University Summer  Research  Work-       shop 2002, Final Report.       R. Kneser, J. Peters and D. Klakow, ``Language Model Adap-       tation Using Dynamic Marginals'',  _P_r_o_c_.  _E_u_r_o_s_p_e_e_c_h,  pp.       1971-1974, Rhodes, 1997.       A.  Stolcke and E. Shriberg, ``Statistical language model-       ing for speech  disfluencies,''  Proc.  IEEE  ICASSP,  pp.       405-409, Atlanta, GA, 1996.       A.  Stolcke,``  Entropy-based  Pruning of Backoff Language       Models,'' _P_r_o_c_. _D_A_R_P_A  _B_r_o_a_d_c_a_s_t  _N_e_w_s  _T_r_a_n_s_c_r_i_p_t_i_o_n  _a_n_d       _U_n_d_e_r_s_t_a_n_d_i_n_g  _W_o_r_k_s_h_o_p, pp. 270-274, Lansdowne, VA, 1998.       A. Stolcke  et  al.,  ``Automatic  Detection  of  Sentence       Boundaries  and  Disfluencies based on Recognized Words,''       _P_r_o_c_. _I_C_S_L_P, pp. 2247-2250, Sydney, 1998.       M. Weintraub et al., ``Fast Training and Portability,'' in       Research  Note  No. 1, Center for Language and Speech Pro-       cessing, Johns Hopkins University, Baltimore, Feb. 1996.BBUUGGSS       Some LM types (such  as  Bayes-interpolated  and  factored       LMs) currently do not support the --wwrriittee--llmm function.       For  the --lliimmiitt--vvooccaabb option to work correctly with hidden       event and class N-gram LMs, the  event/class  vocabularies       have   to  be  specified  by  options  (--hhiiddddeenn--vvooccaabb  and       --ccllaasssseess, respectively).   Embedding  event/class  defini-       tions in the LM file only will not work correctly.       Sentence generation is slow and takes time proportional to       the vocabulary size.       The file given by  --ccllaasssseess  is  read  multiple  times  if       --lliimmiitt--vvooccaabb is in effect or if a mixture of LMs is speci-       fied.   This  will  lead  to  incorrect  behavior  if  the       argument of --ccllaasssseess is stdin (``-'').       Also,  --lliimmiitt--vvooccaabb will not work correctly with LM opera-       tions that require the entire vocabulary to be enumerated,       such  as  --aaddaapptt--mmaarrggiinnaallss  or perplexity computation with       --ddeebbuugg 33.       Support for factored LMs is experimental and many LM oper-       ations  supported  by  standard  N-grams  (such as --lliimmiitt--       vvooccaabb) are not implemented yet.AAUUTTHHOORRSS       Andreas Stolcke <stolcke@speech.sri.com>       Jing Zheng <zj@speech.sri.com>       Copyright 1995-2006 SRI InternationalSRILM Tools        $Date: 2006/07/30 04:54:35 $          ngram(1)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -