📄 nbest-scripts.1
字号:
nbest-scripts(1) nbest-scripts(1)NNAAMMEE nbest-scripts, combine-rover-controls, compare-sclite, compute-sclite, fix-ctm, merge-nbest, nbest-error, nbest- posteriors, nbest-rover, nbest-vocab, nbest2-to-nbest1, rescore-acoustic, rescore-decipher, rescore-reweight, sen- tid-to-sclite - rescore and evaluate N-best listsSSYYNNOOPPSSIISS rreessccoorree--ddeecciipphheerr [--bbyytteelloogg] [--nnooddeecciipphheerrllmm] [--mmuullttiiwwoorrddss] [--pprreettttyy _m_a_p_f_i_l_e] [--nnggrraamm--ttooooll _p_r_o_g_r_a_m] [--ffiilltteerr _c_o_m_m_a_n_d] [--nnoorreessccoorree] [--llmm--oonnllyy] [--ccoouunntt--oooovvss] [--lliimmiitt--vvooccaabb] [--vvooccaabb--aalliiaasseess _m_a_p_f_i_l_e] [--ffaasstt] _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t _s_c_o_r_e_-_d_i_r --llmm ... _l_m_-_o_p_t_i_o_n_s ... rreessccoorree--aaccoouussttiicc _o_l_d_-_n_b_e_s_t_-_d_i_r|_o_l_d_-_f_i_l_e_-_l_i_s_t _o_l_d_-_a_c_-_w_e_i_g_h_t _n_e_w_-_s_c_o_r_e_-_d_i_r_1 _n_e_w_-_a_c_-_w_e_i_g_h_t_1 ... _n_e_w_-_n_b_e_s_t_-_d_i_r [_m_a_x_- _n_b_e_s_t] rreessccoorree--rreewweeiigghhtt [--mmuullttiiwwoorrddss] _s_c_o_r_e_-_d_i_r|_f_i_l_e_-_l_i_s_t [_l_m_w [_w_t_w [_s_c_o_r_e_-_d_i_r_1 _s_c_o_r_e_-_w_e_i_g_h_t_1 ...] [_m_a_x_-_n_b_e_s_t]]] rreessccoorree--mmiinniimmiizzee--wweerr _s_c_o_r_e_-_d_i_r [_l_m_w [_w_t_w [_m_a_x_-_n_b_e_s_t]]] nnbbeesstt22--ttoo--nnbbeesstt11 [_n_b_e_s_t_-_f_i_l_e] nnbbeesstt--rroovveerr [ _s_e_n_t_i_d_-_l_i_s_t | -- ] _c_o_n_t_r_o_l_-_f_i_l_e [ _p_o_s_t_e_r_i_o_r_- _f_i_l_e [ _n_b_e_s_t_-_l_a_t_t_i_c_e_-_o_p_t_i_o_n_s ] ] ccoommbbiinnee--rroovveerr--ccoonnttrroollss [ llaammbbddaa==_w_e_i_g_h_t_s ] _r_o_v_e_r_-_c_o_n_t_r_o_l [ ... ] nnbbeesstt--ppoosstteerriioorrss [ wweeiigghhtt==_W llmmww==_l_m_w wwttww==_w_t_w ppoossttssccaallee==_S mmaaxx__nnbbeesstt==_M ] _n_b_e_s_t_-_f_i_l_e mmeerrggee--nnbbeesstt [ mmaaxx__nnbbeesstt==_M mmuullttiiwwoorrddss==11 nnooppaauusseess==11 ] _n_b_e_s_t_- _f_i_l_e ... nnbbeesstt--vvooccaabb [_n_b_e_s_t_-_l_i_s_t...] nnbbeesstt--eerrrroorr _s_c_o_r_e_-_d_i_r|_f_i_l_e_-_l_i_s_t _r_e_f_s [_n_b_e_s_t_-_l_a_t_t_i_c_e_- _o_p_t_i_o_n...] sseennttiidd--ttoo--sscclliittee _h_y_p_s sseennttiidd--ttoo--ccttmm _h_y_p_s ffiixx--ccttmm _c_t_m_f_i_l_e ccoommppuuttee--sscclliittee --rr _r_e_f_s --hh _h_y_p_s [--hh _h_y_p_s ...] [--SS _s_u_b_s_e_t ...] [--mmuullttiiwwoorrddss|--MM] [--nnooppeerriiooddss] [--RR] [--gg _g_l_m_f_i_l_e] [--HH] [--vv] [_s_c_l_i_t_e_-_o_p_t_i_o_n_s...] ccoommppaarree--sscclliittee --rr _r_e_f_s --hh11 _h_y_p_s_1 --hh22 _h_y_p_s_2 [--SS _s_u_b_s_e_t] [--mmuullttiiwwoorrddss|--MM] [_s_c_l_i_t_e_-_o_p_t_i_o_n_s...]DDEESSCCRRIIPPTTIIOONN These scripts perform common tasks on N-best hypotheses in nnbbeesstt--ffoorrmmaatt(5), especially those needed for rescoring and extracting and evaluating 1-best hypotheses. rreessccoorree--ddeecciipphheerr applies a language model implemented by nnggrraamm(1) to the N-best lists listed in _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t. The N-best files may be in compressed format. The rescored N-best lists are stored in directory _s_c_o_r_e_-_d_i_r. All following arguments are passed to nnggrraamm(1) and are used to control the language model. The following options are handled by rreessccoorree--ddeecciipphheerr itself: --bbyytteelloogg causes scores to be output on the bytelog scale (see nnbbeesstt--ffoorrmmaatt(1)). --nnooddeecciipphheerrllmm indicates that the recognizer language model is not being provided (with --ddeecciipphheerr--llmm). (This is only possible if the N-best lists are not in ``NBestList1.0'' format.) --mmuullttiiwwoorrddss specifies that N-best lists contain words joined by underscores, which are to be split into their component prior to rescoring. --pprreettttyy _m_a_p_f_i_l_e specifies a word mapping file that allows individual words to be globally replaced by strings of zero or more other words, e.g., to remove vocabu- lary mismatches between the input N- best lists and the rescoring LM. The _m_a_p_f_i_l_e contains one mapping per line, the first field specifying the word to be replaced and subsequent fields forming the replacement string. --nnggrraamm--ttooooll _p_r_o_g_r_a_m specifies a non-standard _p_r_o_g_r_a_m to perform the actual LM evaluation (by default, nnggrraamm(1) is used). Such a program must understand nnggrraamm's com- mand-line options related to N-best rescoring. --ffiilltteerr _c_o_m_m_a_n_d specifies a _c_o_m_m_a_n_d that is used to filter the N-best hypotheses prior to evaluating the language model. This may be used for more general textual rewriting so that non-standard LMs can be applied. The output N-best lists will contain the filtered hypotheses. --nnoorreessccoorree causes N-best lists to be simply reformatted from one of the Decipher formats into the SRILM N-best format, separating acoustic and LM scores, without replacing the existing LM scores. In this case only the nnggrraamm(1) options --ddeecciipphheerr--llmmww and --ddeecciipphheerr--wwttww are relevant, and others are ignored. --nnoorreessccoorree and --ffiilltteerr may be used together to perform tex- tual rewriting of N-best lists. --llmm--oonnllyy dumps out LM scores only, instead of complete N-best lists. --ccoouunntt--oooovvss writes the count of out-of-vocabulary and zero-probability words to the out- put score files (instead of rescored N-best lists). --lliimmiitt--vvooccaabb saves memory by arranging for nnggrraamm(1) to load only those N-gram parameters that are relevant to the vocabulary of the N-best lists to be rescored. After determining the N-best vocabu- lary the --lliimmiitt--vvooccaabb option is passed to nnggrraamm(1). --vvooccaabb--aalliiaasseess _m_a_p declares that certain words are to be treated as alternative spellings of the same word for LM evaluation; see the same option for nnggrraamm(1). The _m_a_p is filtered of unused words when used in conjunction with --lliimmiitt--vvooccaabb, and then passed on to nnggrraamm(1). --ffaasstt performs rescoring using only func- tions built into nnggrraamm(1). This avoids some computational and I/O overhead and therefore runs faster, but the options --ffiilltteerr, --pprreettttyy, and --llmm--oonnllyy are not supported, and --nnooddee-- cciipphheerrllmm is obligatory. rreessccoorree--aaccoouussttiicc replaces the acoustic scores in a set of N-best lists by a weighted combination of new scores. The old N-best lists are given by either a directory _o_l_d_- _s_c_o_r_e_-_d_i_r or a filelist _o_l_d_-_f_i_l_e_-_l_i_s_t; _o_l_d_-_a_c_-_w_e_i_g_h_t is the weight given to the old scores. Directories contain- ing the new scores are listed alternating with the corre- sponding weights; each score directory must contain one file per waveform segment, each having the same file base- names as the original N-best lists. The new scores should appear in a single column per file, one per line. The N- best lists containing the new combined acoustic scores are written to _n_e_w_-_n_b_e_s_t_-_d_i_r. The optional _m_a_x_-_n_b_e_s_t argument can be used to limit the length of the N-best lists out- put. Also, When a new score file is encountered contain- ing fewer than _m_a_x_-_n_b_e_s_t lines, the missing scores are set to the lowest score encountered so far. rreessccoorree--rreewweeiigghhtt combines the scores in N-best lists with a set of weights and outputs the 1-best hypotheses. The N-best files are found in directory _s_c_o_r_e_-_d_i_r or listed in _f_i_l_e_-_l_i_s_t. Optional arguments set the language model weight _l_m_w (default 8), the word transition weight _w_t_w (default 0), and the maximum number _m_a_x_-_n_b_e_s_t of hypothe- ses to consider (default all). Optionally, any number of additional score directories and associated weights _s_c_o_r_e_- _d_i_r_1 _s_c_o_r_e_-_w_e_i_g_h_t_1 _s_c_o_r_e_-_d_i_r_2 _s_c_o_r_e_-_w_e_i_g_h_t_2 ... can be specified, following the _w_t_w parameter. These additional scores are combined with those contained in the N-best lists themselves as in rreessccoorree--aaccoouussttiicc (using unit weight for the original acoustic scores). --mmuullttiiwwoorrddss indicates that multi-words are to be split into their components. The output format for 1-best hypotheses is _s_e_n_t_i_d _w_1 _w_2 ... where _s_e_n_t_i_d is the sentence ID derived from the N-best
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -