📄 nbest-optimize.1
字号:
nbest-optimize(1) nbest-optimize(1)NNAAMMEE nbest-optimize - optimize score combination for N-best word error minimizationSSYYNNOOPPSSIISS nnbbeesstt--ooppttiimmiizzee [--hheellpp] option ... [ _s_c_o_r_e_d_i_r ... ]DDEESSCCRRIIPPTTIIOONN nnbbeesstt--ooppttiimmiizzee reads a set of N-best lists, additional score files, and corresponding reference transcripts and optimizes the score combination weights so as to minimize the word error of a classifier that performs word-level posterior probability maximization. The optimized weights are meant to be used with nnbbeesstt--llaattttiiccee(1) and the --uussee-- mmeesshh option, or the nnbbeesstt--rroovveerr script (see nnbbeesstt-- ssccrriippttss(1)). nnbbeesstt--ooppttiimmiizzee determines both the best rel- ative weighting of knowledge source scores and the optimal --ppoosstteerriioorr--ssccaallee parameter that controls the peakedness of the posterior distribution. The optimization is performed by gradient descent on a smoothed (sigmoidal) approximation of the true 0/1 word error function (Katagiri et al. 1990). Therefore, the result can only be expected to be a _l_o_c_a_l minimum of the error surface. (A more global search can be attempted by specifying different starting points.) Another approxima- tion is that the error function is computed assuming a fixed multiple alignment of all N-best hypotheses and the reference string, which tends to slightly overestimate the true pairwise error between any single hypothesis and the reference. An alternative search strategy uses a simplex-based "Amoeba" search on the (non-smoothed) word error function (Press et al. 1988). The search is restarted multiple times to avoid local minima. Alternatively, nnbbeesstt--ooppttiimmiizzee can also optimize weights for a standard, 1-best hypothesis rescoring that selects entire (sentence) hypotheses (--11bbeesstt option). In this mode sentence-level error counts may be read from external files, or computed on the fly from the reference strings. The weights obtained are meant to be used for N-best list rescoring with rreessccoorree--rreewweeiigghhtt (see nnbbeesstt--ssccrriippttss(1)).OOPPTTIIOONNSS Each filename argument can be an ASCII file, or a com- pressed file (name ending in .Z or .gz), or ``-'' to indi- cate stdin/stdout. --hheellpp Print option summary. --vveerrssiioonn Print version information. --ddeebbuugg _l_e_v_e_l Controls the amount of output (the higher the _l_e_v_e_l, the more). At level 1, error statistics at each iteration are printed. At level 2, word alignments are printed. At level 3, full score matrix is printed. At level 4, detailed informa- tion about word hypothesis ranking is printed for each training iteration and sample. --nnbbeesstt--ffiilleess _f_i_l_e_-_l_i_s_t Specifies the set of N-best files as a list of filenames. Three sets of standard scores are extracted from the N-best files: the acoustic model score, the language model score, and the number of words (for insertion penalty computation). See nnbbeesstt--ffoorrmmaatt(5) for details. --rreeffss _r_e_f_e_r_e_n_c_e_s Specifies the reference transcripts. Each line in _r_e_f_e_r_e_n_c_e_s must contain the sentence ID (the last component in the N-best filename path, minus any suffixes) followed by zero or more reference words. --iinnsseerrttiioonn--wweeiigghhtt _W Weight insertion errors by a factor _W. This may be useful to optimize for keyword spotting tasks where insertions have a cost different from deletion and substitution errors. --wwoorrdd--wweeiigghhttss _f_i_l_e Read a table of words and weights from _f_i_l_e. Each word error is weighted according to the word-spe- cific weight. The default weight is 1, and used if a word has no specified weight. Also, when this option is used, substitution errors are counted as the sum of a deletion and an insertion error, as opposed to counting as 1 error as in traditional word error computation. --11bbeesstt Select optimization for standard sentence-level hypothesis selection. --11bbeesstt--ffiirrsstt Optimized first using --11bbeesstt mode, then switch to full optimization. This is an effective way to quickly bring the score weights near an optimal point, and then fine-tune them jointly with the posterior scale parameter. --eerrrroorrss _d_i_r In 1-best mode, optimize for error counts that are stored in separate files in directory _d_i_r. Each N- best list must have a matching error counts file of the same basename in _d_i_r. Each file contains 7 columns of numbers in the format wcr wer nsub ndel nins nerr nw Only the last two columns (number of errors and words, respectively) are used. If this option is omitted, errors will be computed from the N-best hypotheses and the reference tran- scripts. --mmaaxx--nnbbeesstt _n Limits the number of hypotheses read from each N- best list to the first _n. --rreessccoorree--llmmww _l_m_w Sets the language model weight used in combining the language model log probabilities with acoustic log probabilities. This is used to compute initial aggregate hypotheses scores. --rreessccoorree--wwttww _w_t_w Sets the word transition weight used to weight the number of words relative to the acoustic log proba- bilities. This is used to compute initial aggre- gate hypotheses scores. --ppoosstteerriioorr--ssccaallee _s_c_a_l_e Initial value for scaling log posteriors. The total weighted log score is divided by _s_c_a_l_e when computing normalized posterior probabilities. This controls the peakedness of the posterior distribu- tion. The default value is whatever was chosen for --rreessccoorree--llmmww, so that language model scores are scaled to have weight 1, and acoustic scores have weight 1/_l_m_w. --ccoommbbiinnee--lliinneeaarr Compute aggregate scores by linear combination, rather than log-linear combination. (This is appropriate if the input scores represent log-pos- terior probabilities.) --nnoonn--nneeggaattiivvee Constrain search to non-negative weight values. --vvooccaabb _f_i_l_e Read the N-best list vocabulary from _f_i_l_e. This option is mostly redundant since words found in the N-best input are implicitly added to the vocabu- lary. --ttoolloowweerr Map vocabulary to lowercase, eliminating case dis- tinctions. --mmuullttiiwwoorrddss Split multiwords (words joined by '_') into their components when reading N-best lists.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -