lm-scripts.1

来自「这是一款很好用的工具包」· 1 代码 · 共 123 行

123 行

lm-scripts(1)                                       lm-scripts(1)NNAAMMEE       lm-scripts,  add-dummy-bows,  change-lm-vocab,  empty-sen-       tence-lm, get-unigram-probs, make-hiddens-lm, make-lm-sub-       set, make-sub-lm, remove-lowprob-ngrams, reverse-lm, sort-       lm - manipulate N-gram language modelsSSYYNNOOPPSSIISS       aadddd--dduummmmyy--bboowwss [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e       cchhaannggee--llmm--vvooccaabb --vvooccaabb _v_o_c_a_b --llmm _l_m_-_f_i_l_e --wwrriittee--llmm _n_e_w_-_l_m_-       _f_i_l_e [--ttoolloowweerr] [--ssuubbsseett] [_n_g_r_a_m_-_o_p_t_i_o_n_s...]       eemmppttyy--sseenntteennccee--llmm  --pprroobb  _p  --llmm _l_m_-_f_i_l_e --wwrriittee--llmm _n_e_w_-_l_m_-       _f_i_l_e [_n_g_r_a_m_-_o_p_t_i_o_n_s...]       ggeett--uunniiggrraamm--pprroobbss [lliinneeaarr==11]       mmaakkee--hhiiddddeennss--llmm [_l_m_-_f_i_l_e] >>_h_i_d_d_e_n_s_-_l_m_-_f_i_l_e       mmaakkee--llmm--ssuubbsseett _c_o_u_n_t_-_f_i_l_e|-- [_l_m_-_f_i_l_e|--]       mmaakkee--ssuubb--llmm [mmaaxxoorrddeerr==_N] [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e       rreemmoovvee--lloowwpprroobb--nnggrraammss [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e       rreevveerrssee--llmm [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e       ssoorrtt--llmm [_l_m_-_f_i_l_e] >>_s_o_r_t_e_d_-_l_m_-_f_i_l_eDDEESSCCRRIIPPTTIIOONN       These scripts perform various useful manipulations  on  N-       gram models in their textual representation.  Most operate       on backoff N-grams in ARPA nnggrraamm--ffoorrmmaatt(5).       Since these tools are implemented as  scripts  they  don't       automatically  input or output compressed model files cor-       rectly, unlike the main SRILM tools.  However, since  most       scripts  work with data from standard input or to standard       output (by leaving out the file argument, or specifying it       as  ``-'')  it  is  easy to combine them with gguunnzziipp(1) or       ggzziipp(1) on the command line.       Also note that many of the scripts take their options with       the ggaawwkk(1) syntax _o_p_t_i_o_n==_v_a_l_u_e instead of the more common       --_o_p_t_i_o_n _v_a_l_u_e.       aadddd--dduummmmyy--bboowwss adds dummy backoff weights to N-grams, even       where  they are not required, to satisfy some broken soft-       ware that expects backoff weights on all  N-grams  (except       those of highest order).       cchhaannggee--llmm--vvooccaabb  modifies  the  vocabulary  of an LM to be       that in _v_o_c_a_b.  Any N-grams  containing  out-of-vocabulary       words  are  removed, new words receive a unigram probabil-       ity, and the model is renormalized.  The  --ttoolloowweerr  option       causes  case  distinctions  to  be  ignored.  --ssuubbsseett only       removes words from the LM vocabulary, without adding  any.       Any  remaining  _n_g_r_a_m_-_o_p_t_i_o_n_s  are passes to nnggrraamm(1), and       can be used to set debugging level, N-gram order, etc.       eemmppttyy--sseenntteennccee--llmm modifies an LM so  that  it  allows  the       empty sentence with probability _p.  This is useful to mod-       ify existing LMs that are trained on  non-empty  sentences       only.   _n_g_r_a_m_-_o_p_t_i_o_n_s  are  passes to nnggrraamm(1), and can be       used to set debugging level, N-gram order, etc.       mmaakkee--hhiiddddeennss--llmm constructs an N-gram  model  that  can  be       used  with  the nnggrraamm --hhiiddddeennss option.  The new model con-       tains intra-utterance sentence boundary tags ``<#s>'' with       the  same probability as the original model had final sen-       tence tags </s>.  Also, utterance-initial  words  are  not       conditioned on <s> and there is no penalty associated with       utterance-final </s>.  Such as model might work better  it       the  test  corpus is segmented at places other than proper       <s> boundaries.       mmaakkee--llmm--ssuubbsseett forms a new LM containing only the  N-grams       found  in  the  _c_o_u_n_t_-_f_i_l_e, in nnggrraamm--ccoouunntt(1) format.  The       result still needs to be renormalized with  nnggrraamm  --rreennoorrmm       (which  will also adjust the N-gram counts in the header).       mmaakkee--ssuubb--llmm removes N-grams of order  exceeding  _N.   This       function  is  now  redundant, since all SRILM tools can do       this implicitly (without using extra memory and very small       time  overhead) when reading N-gram models with the appro-       priate --oorrddeerr parameter.       rreemmoovvee--lloowwpprroobb--nnggrraammss eliminates N-grams whose probability       is  lower than that which they would receive through back-       off.  This is useful when building  finite-state  networks       for  N-gram  models.   However,  this function is now per-       formed much faster by nnggrraamm(1)  with  the  --pprruunnee--lloowwpprroobbss       option.       rreevveerrssee--llmm produces a new LM that generates sentences with       probabilities equal to the reversed sentences in the input       model.       ssoorrtt--llmm  sorts the n-grams in an LM in lexicographic order       (left-most words being the most significant).  This is not       a  requirement  for SRILM, but might be necessary for some       other LM software.  (The LMs output by  SRILM  are  sorted       somewhat  differently, reflecting the internal data struc-       tures used; that is also the order that should  give  best       cache utilization when using SRILM to read models.)       ggeett--uunniiggrraamm--pprroobbss  extracts the unigram probabilities in a       simple table format from a backoff  language  model.   The       lliinneeaarr==11  option  causes  probabilities  to be output on a       linear (instead of log) scale.SSEEEE AALLSSOO       ngram-format(5), ngram(1).BBUUGGSS       These are quick-and-dirty scripts, what do you expect?       rreevveerrssee--llmm supports  only  bigram  LMs,  and  can  produce       improper probability estimates as a result of inconsistent       marginals in the input model.AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1995-2006 SRI InternationalSRILM Tools        $Date: 2006/11/18 22:32:45 $     lm-scripts(1)

lm-scripts.1 - 源码说明

本页面展示了「这是一款很好用的工具包」中的 lm-scripts.1 源码文件，采用 1 编程语言编写，共 123 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与工具包相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?