📄 ngram.1
字号:
--ppppll _t_e_x_t_f_i_l_e Compute sentence scores (log probabilities) and perplexities from the sentences in _t_e_x_t_f_i_l_e, which should contain one sentence per line. The --ddeebbuugg option controls the level of detail printed, even though output is to stdout (not stderr). --ddeebbuugg 00 Only summary statistics for the entire corpus are printed, as well a partial statistics for each input portion delim- ited by escaped lines (see --eessccaappee). These statistics include the number of sentences, words, out-of-vocabulary words and zero-probability tokens in the input, as well as its total log probability and perplexity. Perplexity is given with two different normalizations: counting all input tokens (``ppl'') and excluding end- of-sentence tags (``ppl1''). --ddeebbuugg 11 Statistics for individual sentences are printed. --ddeebbuugg 22 Probabilities for each word, plus LM- dependent details about backoff used etc., are printed. --ddeebbuugg 33 Probabilities for all words are summed in each context, and the sum is printed. If this differs significantly from 1, a warning message to stderr will be issued. --nnbbeesstt _f_i_l_e Read an N-best list in nnbbeesstt--ffoorrmmaatt(5) and rerank the hypotheses using the specified LM. The reordered N-best list is written to stdout. If the N-best list is given in ``NBestList1.0'' format and contains composite acoustic/language model scores, then --ddeecciipphheerr--llmm and the recognizer language model and word transition weights (see below) need to be specified so the original acoustic scores can be recovered. --nnbbeesstt--ffiilleess _f_i_l_e_l_i_s_t Process multiple N-best lists whose filenames are listed in _f_i_l_e_l_i_s_t. --wwrriittee--nnbbeesstt--ddiirr _d_i_r Deposit rescored N-best lists into directory _d_i_r, using filenames derived from the input ones. --ddeecciipphheerr--nnbbeesstt Output rescored N-best lists in Decipher 1.0 for- mat, rather than SRILM format. --nnoo--rreeoorrddeerr Output rescored N-best lists without sorting the hypotheses by their new combined scores. --sspplliitt--mmuullttiiwwoorrddss Split multiwords into their components when reading N-best lists; the rescored N-best lists thus no longer contain multiwords. (Note this is different from the --mmuullttiiwwoorrddss option, which leaves the input word stream unchanged and splits multiwords only for the purpose of LM probability computation.) --mmaaxx--nnbbeesstt _n Limits the number of hypotheses read from an N-best list. Only the first _n hypotheses are processed. --rreessccoorree _f_i_l_e Similar to --nnbbeesstt, but the input is processed as a stream of N-best hypotheses (without header). The output consists of the rescored hypotheses in SRILM format (the third of the formats described in nnbbeesstt--ffoorrmmaatt(5)). --ddeecciipphheerr--llmm _m_o_d_e_l_-_f_i_l_e Designates the N-gram backoff model (typically a bigram) that was used by the Decipher(TM) recog- nizer in computing composite scores for the hypotheses fed to --rreessccoorree or --nnbbeesstt. Used to com- pute acoustic scores from the composite scores. --ddeecciipphheerr--oorrddeerr _N Specifies the order of the Decipher N-gram model used (default is 2). --ddeecciipphheerr--nnoobbaacckkooffff Indicates that the Decipher N-gram model does not contain backoff nodes, i.e., all recognizer LM scores are correct up to rounding. --ddeecciipphheerr--llmmww _w_e_i_g_h_t Specifies the language model weight used by the recognizer. Used to compute acoustic scores from the composite scores. --ddeecciipphheerr--wwttww _w_e_i_g_h_t Specifies the word transition weight used by the recognizer. Used to compute acoustic scores from the composite scores. --eessccaappee _s_t_r_i_n_g Set an ``escape string'' for the --ppppll, --ccoouunnttss, and --rreessccoorree computations. Input lines starting with _s_t_r_i_n_g are not processed as sentences and passed unchanged to stdout instead. This allows associ- ated information to be passed to scoring scripts etc. --ccoouunnttss _c_o_u_n_t_s_f_i_l_e Perform a computation similar to --ppppll, but based only on the N-gram counts found in _c_o_u_n_t_s_f_i_l_e. Probabilities are computed for the last word of each N-gram, using the other words as contexts, and scaling by the associated N-gram count. Summary statistics are output at the end, as well as before each escaped input line. --ccoouunntt--oorrddeerr _n Use only counts of order _n in the --ccoouunnttss computa- tion. The default value is 0, meaning use all counts. --ccoouunnttss--eennttrrooppyy Weight the log probabilities for --ccoouunnttss processing by the join probabilities of the N-grams. This effectively computes the sum over p(w,h) log p(w|h), i.e., the entropy of the model. In debug- ging mode, both the conditional log probabilities and the corresponding joint probabilities are out- put. --sskkiippoooovvss Instruct the LM to skip over contexts that contain out-of-vocabulary words, instead of using a backoff strategy in these cases. --nnooiissee _n_o_i_s_e_-_t_a_g Designate _n_o_i_s_e_-_t_a_g as a vocabulary item that is to be ignored by the LM. (This is typically used to identify a noise marker.) Note that the LM speci- fied by --ddeecciipphheerr--llmm does NOT ignore this _n_o_i_s_e_-_t_a_g since the DECIPHER recognizer treats noise as a regular word. --nnooiissee--vvooccaabb _f_i_l_e Read several noise tags from _f_i_l_e, instead of, or in addition to, the single noise tag specified by --nnooiissee. --rreevveerrssee Reverse the words in a sentence for LM scoring pur- poses. (This assumes the LM used is a ``right-to- left'' model.) Note that the LM specified by --ddeecciipphheerr--llmm is always applied to the original, left-to-right word sequence.SSEEEE AALLSSOO ngram-count(1), ngram-class(1), lm-scripts(1), ppl- scripts(1), pfsg-scripts(1), nbest-scripts(1), ngram-for- mat(5), nbest-format(5), classes-format(5). J. A. Bilmes and K. Kirchhoff, ``Factored Language Models and Generalized Parallel Backoff,'' _P_r_o_c_. _H_L_T_-_N_A_A_C_L, pp. 4-6, Edmonton, Alberta, 2003. S. F. Chen and J. Goodman, ``An Empirical Study of Smooth- ing Techniques for Language Modeling,'' TR-10-98, Computer Science Group, Harvard Univ., 1998. K. Kirchhoff et al., ``Novel Speech Recognition Models for Arabic,'' Johns Hopkins University Summer Research Work- shop 2002, Final Report. R. Kneser, J. Peters and D. Klakow, ``Language Model Adap- tation Using Dynamic Marginals'', _P_r_o_c_. _E_u_r_o_s_p_e_e_c_h, pp. 1971-1974, Rhodes, 1997. A. Stolcke and E. Shriberg, ``Statistical language model- ing for speech disfluencies,'' Proc. IEEE ICASSP, pp. 405-409, Atlanta, GA, 1996. A. Stolcke,`` Entropy-based Pruning of Backoff Language Models,'' _P_r_o_c_. _D_A_R_P_A _B_r_o_a_d_c_a_s_t _N_e_w_s _T_r_a_n_s_c_r_i_p_t_i_o_n _a_n_d _U_n_d_e_r_s_t_a_n_d_i_n_g _W_o_r_k_s_h_o_p, pp. 270-274, Lansdowne, VA, 1998. A. Stolcke et al., ``Automatic Detection of Sentence Boundaries and Disfluencies based on Recognized Words,'' _P_r_o_c_. _I_C_S_L_P, pp. 2247-2250, Sydney, 1998. M. Weintraub et al., ``Fast Training and Portability,'' in Research Note No. 1, Center for Language and Speech Pro- cessing, Johns Hopkins University, Baltimore, Feb. 1996.BBUUGGSS Some LM types (such as Bayes-interpolated and factored LMs) currently do not support the --wwrriittee--llmm function. For the --lliimmiitt--vvooccaabb option to work correctly with hidden event and class N-gram LMs, the event/class vocabularies have to be specified by options (--hhiiddddeenn--vvooccaabb and --ccllaasssseess, respectively). Embedding event/class defini- tions in the LM file only will not work correctly. Sentence generation is slow and takes time proportional to the vocabulary size. The file given by --ccllaasssseess is read multiple times if --lliimmiitt--vvooccaabb is in effect or if a mixture of LMs is speci- fied. This will lead to incorrect behavior if the argument of --ccllaasssseess is stdin (``-''). Also, --lliimmiitt--vvooccaabb will not work correctly with LM opera- tions that require the entire vocabulary to be enumerated, such as --aaddaapptt--mmaarrggiinnaallss or perplexity computation with --ddeebbuugg 33. Support for factored LMs is experimental and many LM oper- ations supported by standard N-grams (such as --lliimmiitt-- vvooccaabb) are not implemented yet.AAUUTTHHOORRSS Andreas Stolcke <stolcke@speech.sri.com> Jing Zheng <zj@speech.sri.com> Copyright 1995-2006 SRI InternationalSRILM Tools $Date: 2006/07/30 04:54:35 $ ngram(1)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -