📄 ngram.1

📁 这是一款很好用的工具包
💻 1
📖 第 1 页 / 共 3 页
字号:
              greater than _M*_m, the _i=_M weights are  used.)   The              N-gram  counts  themselves  are given in an indexed              directory structure rooted at _d_i_r, in  an  external              _f_i_l_e,  or, if _f_i_l_e is the string --, starting on the              line following the ccoouunnttss keyword.       --vvooccaabb _f_i_l_e              Initialize the vocabulary for  the  LM  from  _f_i_l_e.              This is especially useful if the LM itself does not              specify a complete vocabulary, e.g., as with --nnuullll.       --vvooccaabb--aalliiaasseess _f_i_l_e              Reads  vocabulary alias definitions from _f_i_l_e, con-              sisting of lines of the form                   _a_l_i_a_s _w_o_r_d              This causes all tokens _a_l_i_a_s to be mapped to  _w_o_r_d.       --nnoonneevveennttss _f_i_l_e              Read  a list of words from _f_i_l_e that are to be con-              sidered non-events, i.e., that should only occur in              LM  contexts,  but  not as predictions.  Such words              are excluded from sentence  generation  (--ggeenn)  and              probability summation (--ppppll --ddeebbuugg 33).       --lliimmiitt--vvooccaabb              Discard  LM  parameters on reading that do not per-              tain to the words specified in the vocabulary.  The              default  is that words used in the LM are automati-              cally added to the vocabulary.  This option can  be              used  to  reduce  the memory requirements for large              LMs that are going to be evaluated only on a  small              vocabulary subset.       --uunnkk   Indicates  that  the  LM contains the unknown word,              i.e., is an open-class LM.       --mmaapp--uunnkk _w_o_r_d              Map out-of-vocabulary words to  _w_o_r_d,  rather  than              the default <<uunnkk>> tag.       --ttoolloowweerr              Map  all  vocabulary  to lowercase.  Useful if case              conventions for text/counts and language model dif-              fer.       --mmuullttiiwwoorrddss              Split  input  words consisting of multiwords joined              by underscores into their components, before evalu-              ating LM probabilities.       --mmiixx--llmm _f_i_l_e              Read  a  second N-gram model for interpolation pur-              poses.  The second and any additional  interpolated              models  can  also  be class N-grams (using the same              --ccllaasssseess  definitions),  but  are  otherwise   con-              strained  to be standard N-grams, i.e., the options              --ddff, --ttaaggggeedd, --sskkiipp, and --hhiiddddeenn--vvooccaabb do not apply              to them.              NNOOTTEE:: Unless --bbaayyeess (see below) is specified, --mmiixx--              llmm triggers a static interpolation of the models in              memory.   In  most  cases a more efficient, dynamic              interpolation is sufficient, requested by --bbaayyeess 00.              Also,  mixing models of different type (e.g., word-              based and class-based)  will  _o_n_l_y  work  correctly              with dynamic interpolation.       --llaammbbddaa _w_e_i_g_h_t              Set the weight of the main model when interpolating              with --mmiixx--llmm.  Default value is 0.5.       --mmiixx--llmm22 _f_i_l_e       --mmiixx--llmm33 _f_i_l_e       --mmiixx--llmm44 _f_i_l_e       --mmiixx--llmm55 _f_i_l_e       --mmiixx--llmm66 _f_i_l_e       --mmiixx--llmm77 _f_i_l_e       --mmiixx--llmm88 _f_i_l_e       --mmiixx--llmm99 _f_i_l_e              Up to 9 more N-gram models  can  be  specified  for              interpolation.       --mmiixx--llaammbbddaa22 _w_e_i_g_h_t       --mmiixx--llaammbbddaa33 _w_e_i_g_h_t       --mmiixx--llaammbbddaa44 _w_e_i_g_h_t       --mmiixx--llaammbbddaa55 _w_e_i_g_h_t       --mmiixx--llaammbbddaa66 _w_e_i_g_h_t       --mmiixx--llaammbbddaa77 _w_e_i_g_h_t       --mmiixx--llaammbbddaa88 _w_e_i_g_h_t       --mmiixx--llaammbbddaa99 _w_e_i_g_h_t              These  are  the  weights for the additional mixture              components, corresponding to --mmiixx--llmm22 through --mmiixx--              llmm99.   The  weight for the --mmiixx--llmm model is 1 minus              the sum of --llaammbbddaa and --mmiixx--llaammbbddaa22  through  --mmiixx--              llaammbbddaa99.       --lloogglliinneeaarr--mmiixx              Implement a log-linear (rather than linear) mixture              LM, using the parameters above.       --bbaayyeess _l_e_n_g_t_h              Interpolate the second and  the  main  model  using              posterior  probabilities  for local N-gram-contexts              of length _l_e_n_g_t_h.  The --llaammbbddaa value is used  as  a              prior mixture weight in this case.       --bbaayyeess--ssccaallee _s_c_a_l_e              Set  the  exponential  scale  factor on the context              likelihood in conjunction with the --bbaayyeess function.              Default value is 1.0.       --ccaacchhee _l_e_n_g_t_h              Interpolate  the main LM (or the one resulting from              operations above) with  a  unigram  cache  language              model based on a history of _l_e_n_g_t_h words.       --ccaacchhee--llaammbbddaa _w_e_i_g_h_t              Set interpolation weight for the cache LM.  Default              value is 0.05.       --ddyynnaammiicc              Interpolate the main LM (or the one resulting  from              operations  above)  with a dynamically changing LM.              LM changes are indicated by the  tag  ``<LMstate>''              starting  a  line in the input to --ppppll, --ccoouunnttss, or              --rreessccoorree, followed by a filename containing the new              LM.       --ddyynnaammiicc--llaammbbddaa _w_e_i_g_h_t              Set   interpolation  weight  for  the  dynamic  LM.              Default value is 0.05.       --aaddaapptt--mmaarrggiinnaallss _L_M              Use  an  LM  obtained  by  adapting   the   unigram              marginals  to  the  values  specified  in the _L_M in              nnggrraamm--ffoorrmmaatt(5),  using  the  method  described  in              Kneser et al. (1997).  The LM to be adapted is that              constructed according to the other options.       --bbaassee--mmaarrggiinnaallss _L_M              Specify the baseline unigram marginals in  a  sepa-              rate  file  _L_M, which must be in nnggrraamm--ffoorrmmaatt(5) as              well.  If not specified, the baseline marginals are              taken  from the model to be adapted, but this might              not be desirable, e.g., when  Kneser-Ney  smoothing              was used.       --aaddaapptt--mmaarrggiinnaallss--bbeettaa _B              The  exponential  weight given to the ratio between              adapted and baseline  marginals.   The  default  is              0.5.       --aaddaapptt--mmaarrggiinnaallss--rraattiiooss              Compute  and  output only the log ratio between the              adapted and the baseline LM  probabilities.   These              can  be useful as a separate knowledge source in N-              best rescoring.       The following options  specify  the  operations  performed       on/with the LM constructed as per the options above.       --rreennoorrmm              Renormalize  the  main model by recomputing backoff              weights for the given probabilities.       --pprruunnee _t_h_r_e_s_h_o_l_d              Prune N-gram probabilities if their removal  causes              (training  set) perplexity of the model to increase              by less than _t_h_r_e_s_h_o_l_d relative.       --pprruunnee--lloowwpprroobbss              Prune N-gram probabilities that are lower than  the              corresponding backed-off estimates.  This generates              N-gram models that can be correctly converted  into              probabilistic finite-state networks.       --mmiinnpprruunnee _n              Only  prune  N-grams  of  length  at  least _n.  The              default (and minimum allowed  value)  is  2,  i.e.,              only  unigrams  are  excluded  from  pruning.  This              option applies to both --pprruunnee and  --pprruunnee--lloowwpprroobbss.       --rreessccoorree--nnggrraamm _f_i_l_e              Read  an  N-gram  LM from _f_i_l_e and recompute its N-              gram probabilities using the LM  specified  by  the              other  options;  then  renormalize and evaluate the              resulting new N-gram LM.       --wwrriittee--llmm _f_i_l_e              Write a model back to _f_i_l_e.  The output will be  in              the  same  format  as read by --llmm, except if opera-              tions  such  as  --mmiixx--llmm  or  --eexxppaanndd--ccllaasssseess  were              applied,  in which case the output will contain the              generated  single  N-gram  backoff  model  in  ARPA              nnggrraamm--ffoorrmmaatt(5).       --wwrriittee--bbiinn--llmm _f_i_l_e              Write  a  model to _f_i_l_e using a binary data format.              This is only  supported  by  certain  model  types,              specifically,  N-gram backoff models.  Binary model              files should be  recognized  automatically  by  the              --rreeaadd function.       --wwrriittee--vvooccaabb _f_i_l_e              Write the LM's vocabulary to _f_i_l_e.       --ggeenn _n_u_m_b_e_r              Generate _n_u_m_b_e_r random sentences from the LM.       --sseeeedd _v_a_l_u_e              Initialize  the  random  number  generator used for              sentence generation using seed _v_a_l_u_e.  The  default              is to use a seed that should be close to unique for              each invocation of the program.
💿 文件大小 3034 K
👤 上传用户 wanghaihah
📂 所属分类其他
🏷️ 相关标签

#工具包
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -