📄 ngram.1
字号:
greater than _M*_m, the _i=_M weights are used.) The N-gram counts themselves are given in an indexed directory structure rooted at _d_i_r, in an external _f_i_l_e, or, if _f_i_l_e is the string --, starting on the line following the ccoouunnttss keyword. --vvooccaabb _f_i_l_e Initialize the vocabulary for the LM from _f_i_l_e. This is especially useful if the LM itself does not specify a complete vocabulary, e.g., as with --nnuullll. --vvooccaabb--aalliiaasseess _f_i_l_e Reads vocabulary alias definitions from _f_i_l_e, con- sisting of lines of the form _a_l_i_a_s _w_o_r_d This causes all tokens _a_l_i_a_s to be mapped to _w_o_r_d. --nnoonneevveennttss _f_i_l_e Read a list of words from _f_i_l_e that are to be con- sidered non-events, i.e., that should only occur in LM contexts, but not as predictions. Such words are excluded from sentence generation (--ggeenn) and probability summation (--ppppll --ddeebbuugg 33). --lliimmiitt--vvooccaabb Discard LM parameters on reading that do not per- tain to the words specified in the vocabulary. The default is that words used in the LM are automati- cally added to the vocabulary. This option can be used to reduce the memory requirements for large LMs that are going to be evaluated only on a small vocabulary subset. --uunnkk Indicates that the LM contains the unknown word, i.e., is an open-class LM. --mmaapp--uunnkk _w_o_r_d Map out-of-vocabulary words to _w_o_r_d, rather than the default <<uunnkk>> tag. --ttoolloowweerr Map all vocabulary to lowercase. Useful if case conventions for text/counts and language model dif- fer. --mmuullttiiwwoorrddss Split input words consisting of multiwords joined by underscores into their components, before evalu- ating LM probabilities. --mmiixx--llmm _f_i_l_e Read a second N-gram model for interpolation pur- poses. The second and any additional interpolated models can also be class N-grams (using the same --ccllaasssseess definitions), but are otherwise con- strained to be standard N-grams, i.e., the options --ddff, --ttaaggggeedd, --sskkiipp, and --hhiiddddeenn--vvooccaabb do not apply to them. NNOOTTEE:: Unless --bbaayyeess (see below) is specified, --mmiixx-- llmm triggers a static interpolation of the models in memory. In most cases a more efficient, dynamic interpolation is sufficient, requested by --bbaayyeess 00. Also, mixing models of different type (e.g., word- based and class-based) will _o_n_l_y work correctly with dynamic interpolation. --llaammbbddaa _w_e_i_g_h_t Set the weight of the main model when interpolating with --mmiixx--llmm. Default value is 0.5. --mmiixx--llmm22 _f_i_l_e --mmiixx--llmm33 _f_i_l_e --mmiixx--llmm44 _f_i_l_e --mmiixx--llmm55 _f_i_l_e --mmiixx--llmm66 _f_i_l_e --mmiixx--llmm77 _f_i_l_e --mmiixx--llmm88 _f_i_l_e --mmiixx--llmm99 _f_i_l_e Up to 9 more N-gram models can be specified for interpolation. --mmiixx--llaammbbddaa22 _w_e_i_g_h_t --mmiixx--llaammbbddaa33 _w_e_i_g_h_t --mmiixx--llaammbbddaa44 _w_e_i_g_h_t --mmiixx--llaammbbddaa55 _w_e_i_g_h_t --mmiixx--llaammbbddaa66 _w_e_i_g_h_t --mmiixx--llaammbbddaa77 _w_e_i_g_h_t --mmiixx--llaammbbddaa88 _w_e_i_g_h_t --mmiixx--llaammbbddaa99 _w_e_i_g_h_t These are the weights for the additional mixture components, corresponding to --mmiixx--llmm22 through --mmiixx-- llmm99. The weight for the --mmiixx--llmm model is 1 minus the sum of --llaammbbddaa and --mmiixx--llaammbbddaa22 through --mmiixx-- llaammbbddaa99. --lloogglliinneeaarr--mmiixx Implement a log-linear (rather than linear) mixture LM, using the parameters above. --bbaayyeess _l_e_n_g_t_h Interpolate the second and the main model using posterior probabilities for local N-gram-contexts of length _l_e_n_g_t_h. The --llaammbbddaa value is used as a prior mixture weight in this case. --bbaayyeess--ssccaallee _s_c_a_l_e Set the exponential scale factor on the context likelihood in conjunction with the --bbaayyeess function. Default value is 1.0. --ccaacchhee _l_e_n_g_t_h Interpolate the main LM (or the one resulting from operations above) with a unigram cache language model based on a history of _l_e_n_g_t_h words. --ccaacchhee--llaammbbddaa _w_e_i_g_h_t Set interpolation weight for the cache LM. Default value is 0.05. --ddyynnaammiicc Interpolate the main LM (or the one resulting from operations above) with a dynamically changing LM. LM changes are indicated by the tag ``<LMstate>'' starting a line in the input to --ppppll, --ccoouunnttss, or --rreessccoorree, followed by a filename containing the new LM. --ddyynnaammiicc--llaammbbddaa _w_e_i_g_h_t Set interpolation weight for the dynamic LM. Default value is 0.05. --aaddaapptt--mmaarrggiinnaallss _L_M Use an LM obtained by adapting the unigram marginals to the values specified in the _L_M in nnggrraamm--ffoorrmmaatt(5), using the method described in Kneser et al. (1997). The LM to be adapted is that constructed according to the other options. --bbaassee--mmaarrggiinnaallss _L_M Specify the baseline unigram marginals in a sepa- rate file _L_M, which must be in nnggrraamm--ffoorrmmaatt(5) as well. If not specified, the baseline marginals are taken from the model to be adapted, but this might not be desirable, e.g., when Kneser-Ney smoothing was used. --aaddaapptt--mmaarrggiinnaallss--bbeettaa _B The exponential weight given to the ratio between adapted and baseline marginals. The default is 0.5. --aaddaapptt--mmaarrggiinnaallss--rraattiiooss Compute and output only the log ratio between the adapted and the baseline LM probabilities. These can be useful as a separate knowledge source in N- best rescoring. The following options specify the operations performed on/with the LM constructed as per the options above. --rreennoorrmm Renormalize the main model by recomputing backoff weights for the given probabilities. --pprruunnee _t_h_r_e_s_h_o_l_d Prune N-gram probabilities if their removal causes (training set) perplexity of the model to increase by less than _t_h_r_e_s_h_o_l_d relative. --pprruunnee--lloowwpprroobbss Prune N-gram probabilities that are lower than the corresponding backed-off estimates. This generates N-gram models that can be correctly converted into probabilistic finite-state networks. --mmiinnpprruunnee _n Only prune N-grams of length at least _n. The default (and minimum allowed value) is 2, i.e., only unigrams are excluded from pruning. This option applies to both --pprruunnee and --pprruunnee--lloowwpprroobbss. --rreessccoorree--nnggrraamm _f_i_l_e Read an N-gram LM from _f_i_l_e and recompute its N- gram probabilities using the LM specified by the other options; then renormalize and evaluate the resulting new N-gram LM. --wwrriittee--llmm _f_i_l_e Write a model back to _f_i_l_e. The output will be in the same format as read by --llmm, except if opera- tions such as --mmiixx--llmm or --eexxppaanndd--ccllaasssseess were applied, in which case the output will contain the generated single N-gram backoff model in ARPA nnggrraamm--ffoorrmmaatt(5). --wwrriittee--bbiinn--llmm _f_i_l_e Write a model to _f_i_l_e using a binary data format. This is only supported by certain model types, specifically, N-gram backoff models. Binary model files should be recognized automatically by the --rreeaadd function. --wwrriittee--vvooccaabb _f_i_l_e Write the LM's vocabulary to _f_i_l_e. --ggeenn _n_u_m_b_e_r Generate _n_u_m_b_e_r random sentences from the LM. --sseeeedd _v_a_l_u_e Initialize the random number generator used for sentence generation using seed _v_a_l_u_e. The default is to use a seed that should be close to unique for each invocation of the program.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -