📄 lattice-tool.1
字号:
relative to the larger of the two sets. --rreedduuccee--bbeeffoorree--pprruunniinngg Perform lattice reduction before posterior-based pruning. The default order is to first prune, then reduce. --pprree--rreedduuccee--iitteerraattee _I Perform iterative reduction prior to lattice expan- sion, but after pause elimination. --ppoosstt--rreedduuccee--iitteerraattee _I Perform iterative reduction after lattice expansion and pause node recovery. Note: this is not recom- mended as it changes the weights assigned from the specified language model. --nnoo--nnuullllss Eliminate NULL nodes from lattices. --nnoo--ppaauussee Eliminate pause nodes from lattices (and do not recover them after lattice expansion). --ccoommppaacctt--ppaauussee Use compact encoding of pause nodes that saves nodes but allows optional pauses where they might not have been included in the original lattice. --lloooopp--ppaauussee Add self-loops on pause nodes. --iinnsseerrtt--ppaauussee Insert optional pauses after every word in the lat- tice. The structure of inserted pauses is affected by --ccoommppaacctt--ppaauussee and --lloooopp--ppaauussee. --ccoollllaappssee--ssaammee--wwoorrddss Perform an operation on the final lattices that collapses all nodes with the same words, except null nodes, pause nodes, or nodes with noise words. This can reduce the lattice size dramatically, but also introduces new paths. --ccoonnnneeccttiivviittyy Check the connectedness of lattices. --ccoommppuuttee--nnooddee--eennttrrooppyy Compute the node entropy of lattices. --ccoommppuuttee--ppoosstteerriioorrss Compute node posterior probabilities (which are included in HTK lattice output). --ddeennssiittyy Compute and output lattice densities. --rreeff--lliisstt _f_i_l_e Read reference word strings from _f_i_l_e. Each line starts with a sentence ID (the basename of the lat- tice file name), followed by the words. This and the next option triggers computation of lattice word errors (minimum word error counts of any path through a lattice). --rreeff--ffiillee _f_i_l_e Read reference word strings from _f_i_l_e. Lines must contain reference words only, and must be matched to input lattices in the order processed. --wwrriittee--rreeffss _f_i_l_e Write the references back to _f_i_l_e (for validation). --aadddd--rreeffss _P Add the reference words as an additional path to the lattice, with probability _P. Unless --nnoo--ppaauussee is specified, optional pause nodes between words are also added. Note that this operation is per- formed before lattice reduction and expansion, so the new path can be merged with existing ones, and the probabilities for the new path can be reas- signed from an LM later. --nnooiissee--vvooccaabb _f_i_l_e Read a list of ``noise'' words from _f_i_l_e. These words are ignored when computing lattice word errors, when decoding the best word sequence using --vviitteerrbbii--ddeeccooddee or --ppoosstteerriioorr--ddeeccooddee, or when col- lapsing nodes with --ccoollllaappssee--ssaammee--wwoorrddss. --kkeeeepp--ppaauussee Causes the pause word ``-pau-'' to be treated like a regular word. It prevents pause from being implicitly added to the list of noise words. --iiggnnoorree--vvooccaabb _f_i_l_e Read a list of words that are to be ignored in lat- tice operations, similar to pause tokens. Unlike noise words (see above) they are also skipped dur- ing LM evaluation. With this option and --kkeeeepp-- ppaauussee, pause words are not ignored by default. --sspplliitt--mmuullttiiwwoorrddss Split lattice nodes with multiwords into a sequence of non-multiword nodes. This option is necessary to compute lattice error of multiword lattices against non-multiword references, but may be useful in its own right. --sspplliitt--mmuullttiiwwoorrddss--aafftteerr--llmm Perform multiword splitting after lattice expansion using the specified LM. This should be used if the LM uses multiwords, but the final lattices are not supposed to contain multiwords. --mmuullttiiwwoorrdd--ddiiccttiioonnaarryy _f_i_l_e Read a dictionary from _f_i_l_e containing multiword pronunciations and word boundary markers (a ``|'' phone label). Specifying such a dictionary allows the multiword splitting options to infer accurate time marks and pronunciation information for the multiword components. --mmuullttii--cchhaarr _C Designate _C as the character used for separating multiword components. The default is an underscore ``_''. --ooppeerraattiioonn _O Perform a lattice algebra operation _O on the lat- tice or lattices processed, with the second operand specified by --iinn--llaattttiiccee22. Operations currently supported are ccoonnccaatteennaattee and oorr, for serial and parallel lattice combination, respectively, and are applied after all other lattices manipulations. --vviitteerrbbii--ddeeccooddee Print out the word sequence corresponding to the highest probability path. --ppoosstteerriioorr--ddeeccooddee Print out the word sequence with lowest expected word error. --oouuttppuutt--ccttmm Output word sequences in NIST CTM (conversation time mark) format. Note that word start times will be relative to the lattice start time, the first column will contain the lattice name, and the chan- nel field is always 1. The word confidence field contains posterior probabilities if --ppoosstteerriioorr-- ddeeccooddee is in effect. This option also implies --aaccoouussttiicc--mmeesshh. --hhiiddddeenn--vvooccaabb file Read a subvocabulary from _f_i_l_e and constrain word meshes to only align those words that are either all in or outside the subvocabulary. This may be used to keep ``hidden event'' tags from aligning with regular words. --ddiiccttiioonnaarryy--aalliiggnn Use the dictionary pronunciations specified with --ddiiccttiioonnaarryy to induce a word distance metric used for word mesh alignment. See the nnbbeesstt--llaattttiiccee(1) --ddiiccttiioonnaarryy option. --nnbbeesstt--ddeeccooddee _N Generate the up to _N highest scoring paths through a lattice and write them out in nnbbeesstt--ffoorrmmaatt(5), along with optional additional score files to store knowledge sources encoded in the lattice. Further options are needed to specify the location of N- best lists and score files, described below under "N-BEST DECODING". Duplicated Hypotheses that dif- fer only in pause and words specified with --iiggnnoorree-- vvooccaabb are removed from the N-best output. If the --mmuullttiiwwoorrddss option is specified, duplicates due to multiwords are also eliminated. --nnbbeesstt--dduupplliiccaatteess _K Allow up to _K duplicate word hypotheses to be out- put in N-best decoding. --nnbbeesstt--mmaaxx--ssttaacckk _M Limits the depth of the hypothesis stack used in N- best decoding to _M entries, which may be useful for limiting memory use and runtime. --nnbbeesstt--vviitteerrbbii Use a Viterbi algorithm to generate N-best, rather than A-star. This uses less memory but may take more time. --ppppll _f_i_l_e Read sentences from _f_i_l_e and compute the maximum probability (of any path) assigned to them by the lattice being processed. Effectively, the lattice is treated as a (deficient) language model. The output detail is controlled by the --ddeebbuugg option, similar to nnggrraamm --ppppll output. (In particular, --ddeebbuugg 22 enables tracing of lattice nodes corre- sponding to sentence prefixes.) Pause words in _f_i_l_e are treated as regular words and have to match pause nodes in the lattice, unless --nnooppaauussee speci- fied, in which case pauses in both lattice and input sentences are ignored. The following options control transition weight assign- ment: --oorrddeerr _n Set the maximal N-gram order to be used for transi- tion weight assignment (the default is 3). --llmm _f_i_l_e Read N-gram language model from _f_i_l_e. This option also triggers weight reassignment and lattice expansion. --mmuullttiiwwoorrddss Resolve multiwords in the lattice without splitting nodes. This is useful in rescoring lattices con- taining multiwords with a LM does not use multi- words. --ccllaasssseess _f_i_l_e Interpret the LM as an N-gram over word classes. The expansions of the classes are given in _f_i_l_e in ccllaasssseess--ffoorrmmaatt(5). Tokens in the LM that are not defined as classes in _f_i_l_e are assumed to be plain words, so that the LM can contain mixed N-grams over both words and word classes. --ssiimmppllee--ccllaasssseess Assume a "simple" class model: each word is member of at most one word class, and class expansions are exactly one word long. --mmiixx--llmm _f_i_l_e Read a second N-gram model for interpolation pur- poses. The second and any additional interpolated models can also be class N-grams (using the same --ccllaasssseess definitions). --ffaaccttoorreedd Interpret the files specified by --llmm, --mmiixx--llmm, etc. as factored N-gram model specifications. See nnggrraamm(1) for more details. --llaammbbddaa _w_e_i_g_h_t Set the weight of the main model when interpolating with --mmiixx--llmm. Default value is 0.5. --mmiixx--llmm22 _f_i_l_e --mmiixx--llmm33 _f_i_l_e --mmiixx--llmm44 _f_i_l_e --mmiixx--llmm55 _f_i_l_e --mmiixx--llmm66 _f_i_l_e --mmiixx--llmm77 _f_i_l_e --mmiixx--llmm88 _f_i_l_e --mmiixx--llmm99 _f_i_l_e Up to 9 more N-gram models can be specified for interpolation. --mmiixx--llaammbbddaa22 _w_e_i_g_h_t --mmiixx--llaammbbddaa33 _w_e_i_g_h_t --mmiixx--llaammbbddaa44 _w_e_i_g_h_t --mmiixx--llaammbbddaa55 _w_e_i_g_h_t --mmiixx--llaammbbddaa66 _w_e_i_g_h_t --mmiixx--llaammbbddaa77 _w_e_i_g_h_t --mmiixx--llaammbbddaa88 _w_e_i_g_h_t
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -