📄 lm.3
字号:
LM(3) LM(3)NNAAMMEE LM - Generic language modelSSYYNNOOPPSSIISS ##iinncclluuddee <<LLMM..hh>>DDEESSCCRRIIPPTTIIOONN The LLMM class specifies a minimal language model interface and provides some generic utilities. LLMM inherits from DDeebbuugg, and the debugging level of an LM object determines if and how much verbose information var- ious is printed by various functions.CCLLAASSSS MMEEMMBBEERRSS LLMM((VVooccaabb &&_v_o_c_a_b)) Initializeing an LM object requries specifying the vocabulary over which the LM is defined. The _v_o_c_a_b object can be shared among different LM instances. The LM object can modify _v_o_c_a_b as a side-effect, e.g., as a result of reading an LM from a file. LLooggPP wwoorrddPPrroobb((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t)) LLooggPP wwoorrddPPrroobb((VVooccaabbSSttrriinngg _w_o_r_d,, ccoonnsstt VVooccaabbSSttrriinngg **_c_o_n_- _t_e_x_t)) Returns the conditional log probability of _w_o_r_d given a history. The history is given in reversed order (most recent word first) in _c_o_n_t_e_x_t, and ter- minated by VVooccaabb__NNuullll. Word or history can be specified either by strings or indices. All func- tional LM subclasses have to implement at least the first version. LLooggPP wwoorrddPPrroobbRReeccoommppuuttee((VVooccaabbIInnddeexx _w_o_r_d,, ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t)) Returns the same conditional log probability as wwoorrddPPrroobb(()), but on the promise that _c_o_n_t_e_x_t is identical to the last call to wwoorrddPPrroobb(()). This often allows for efficient implementation to speed up repeated lookups in the same context. LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s)) LLooggPP sseenntteenncceePPrroobb((ccoonnsstt VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e,, TTeexxttSSttaattss &&_s_t_a_t_s)) Returns the total log probability of a string of word (a sentence). The data in the _s_t_a_t_s object is incremented to reflect the statistics of the sen- tence. uunnssiiggnneedd ppppllFFiillee((FFiillee &&_f_i_l_e,, TTeexxttSSttaattss &&_s_t_a_t_s,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g == 00)) Reads sentences from _f_i_l_e, computing their proba- bilities and aggregate perplexity, and updating the _s_t_a_t_s. The debugging state of the LM object deter- mines how much information is printed to stderr. debuglevel 0: total statistics only; debuglevel 1: per-sentence statistics; debuglevel 2: word proba- bilities; debuglevel 3 and greater: LM specific information. Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the output. This allows extra informa- tion in the input file to be passed through unchanged. uunnssiiggnneedd rreessccoorreeFFiillee((FFiillee &&_f_i_l_e,, ddoouubbllee _l_m_S_c_a_l_e,, ddoouubbllee _w_t_S_c_a_l_e,, LLMM &&_o_l_d_L_M,, ddoouubbllee _o_l_d_L_m_S_c_a_l_e,, ddoouubbllee _o_l_d_W_t_S_c_a_l_e,, ccoonnsstt cchhaarr **_e_s_c_a_p_e_S_t_r_i_n_g == 00)) Reads N-best hypotheses and scores from _f_i_l_e, replaces the LM scores with new ones computed from the current model, and prints the new scores (including hypotheses) to stdout. _l_m_S_c_a_l_e and _w_t_S_c_o_r_e are the LM and word transition weights, respectively. _o_l_d_L_M is the LM whose scores are included in the aggregate scores read from the input (provided so that they can be subtracted out), and _o_l_d_L_m_S_c_a_l_e and _o_l_d_W_t_S_c_a_l_e are the old LM and word transition weights, respectively. Lines in _f_i_l_e that start with _e_s_c_a_p_e_S_t_r_i_n_g are copied to the output. vvooiidd sseettSSttaattee((ccoonnsstt cchhaarr **_s_t_a_t_e)) This is a generic interface to change the internal ``state'' of a LM. The default implementation of this function does nothing, but certain LM subclass implementation may interpret the _s_t_a_t_e string to assume different internal configurations. PPrroobb wwoorrddPPrroobbSSuumm((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t)) Returns the sum of all word probabilities in _c_o_n_- _t_e_x_t. Useful for checking the well-definedness of a model. VVooccaabbIInnddeexx ggeenneerraatteeWWoorrdd((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t)) Returns a word index from the vocabulary, randomly generated according to the conditional probabili- ties in _c_o_n_t_e_x_t. VVooccaabbIInnddeexx **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxx-- WWoorrddssPPeerrLLiinnee,, VVooccaabbIInnddeexx **_s_e_n_t_e_n_c_e == 00)) VVooccaabbSSttrriinngg **ggeenneerraatteeSSeenntteennccee((uunnssiiggnneedd _m_a_x_W_o_r_d_s == mmaaxx-- WWoorrddssPPeerrLLiinnee,, VVooccaabbSSttrriinngg **_s_e_n_t_e_n_c_e == 00)) Generates a random sentence of length up to _m_a_x_- _W_o_r_d_s. The result is placed in _s_e_n_t_e_n_c_e if speci- fied, or in a static buffer otherwise. vvooiidd **ccoonntteexxttIIDD((ccoonnsstt VVooccaabbIInnddeexx **_c_o_n_t_e_x_t)) Returns an implementation-dependent value that identifies a the word context used to compute a conditional probability. (The context actually used may be shorted that what is specified in _c_o_n_- _t_e_x_t). BBoooolleeaann iissNNoonnWWoorrdd((VVooccaabbIInnddeexx _w_o_r_d)) Return ttrruuee if _w_o_r_d is a regular word in the LM, i.e., one that the LM computes probabilities for (as opposed to non-event tag such as sentence- start). BBoooolleeaann rreeaadd((FFiillee &&_f_i_l_e,, BBoooolleeaann _l_i_m_i_t_V_o_c_a_b == ffaallssee)) Read a LM from _f_i_l_e. Return ttrruuee is the file con- tents was formated correctly and an internal LM representation could be successfully constructed from it. The optional 2nd argument controls whether words not already in the vocabulary are to be added automatically. vvooiidd wwrriittee((FFiillee &&_f_i_l_e)) Writes the LM to _f_i_l_e in a format that can be read back by rreeaadd(()). VVooccaabb &&vvooccaabb The vocabulary object associated with LM (set at initialization). VVooccaabbIInnddeexx nnooiisseeIInnddeexx The index of the noise tag, i.e., a word that is skipped when computing probabilities. ccoonnsstt cchhaarr **ssttaatteeTTaagg A string introducing ``state'' information that should be passed to the LM. Input lines starting with this tag are handed to sseettSSttaattee(()) bbyy ppppllFFiillee(()) aanndd rreessccoorreeFFiillee(()).. BBoooolleeaann rreevveerrsseeWWoorrddss If set to ttrruuee, the LM reverses word order before computing sentence probabilities. This means wwoorrdd-- PPrroobb(()) is expected to compute conditional probabil- ities based on _r_i_g_h_t contexts.SSEEEE AALLSSOO Vocab(3).BBUUGGSSAAUUTTHHOORR Andreas Stolcke <stolcke@speech.sri.com>. Copyright 1995, 1996 SRI InternationalSRILM $Date: 2005/04/26 03:33:56 $ LM(3)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -