📄 wlat-format.5
字号:
wlat-format(5) wlat-format(5)NNAAMMEE wlat-format - File format for SRILM word posterior lat- ticesSSYYNNOOPPSSIISS Word lattices: vveerrssiioonn 22 nnaammee _s iinniittiiaall _i ffiinnaall _f nnooddee _n _w _a _p _n_1 _p_1 _n_2 _p_2 ... ... Word meshes (confusion networks): nnaammee _s nnuummaalliiggnnss _N ppoosstteerriioorr _P aalliiggnn _a _w_1 _p_1 _w_2 _p_2 ... rreeffeerreennccee _a _w hhyyppss _a _w _h_1 _h_2 ... iinnffoo _a _w _s_t_a_r_t _d_u_r _a_s_c_o_r_e _g_s_c_o_r_e _p_h_o_n_e_s _p_h_o_n_e_d_u_r_s ...DDEESSCCRRIIPPTTIIOONN Word posterior lattices and meshes are lattices generated by aligning N-best hypotheses with nnbbeesstt--llaattttiiccee(1), or by aligning PFSG or HTK lattices with llaattttiiccee--ttooooll(1). They compactly encode possible word hypotheses sequences and their posterior probabilities. (Word meshes have become generally known as ``confusion networks'' or ``sausages.'') A word lattice is a partially ordered directed graph with nodes representing word hypotheses. Nodes are identified by non-negative integers. The file format specifies the initial node _i, the final node _f, and any number of addi- tional nodes _n. For each node _n the following associated information is given on the same line: the word identity _w (the string ``NULL'' is used with initial and final nodes), the alignment position _a (identical values in this field identify hypotheses that occur at the same posi- tion), and the word posterior probability _p. Following these values, zero or more transitions to successor nodes are specified, each given by the node index _n_i and the transition posterior probability _p_i. In a properly nor- malized word lattice the transition posteriors _p_i sum up to the node posterior _p. Word meshes represent a more constrained lattice format in which word hypotheses are in a total order. A mesh con- tains a number of alignment positions, and a set of mutu- ally exclusive word hypotheses in each position (the ``confusion sets''). The word mesh represents all sen- tence hypotheses that can be generated by freely combining word hypotheses at each position. The file format speci- fies the number of alignment positions _A and the total posterior probability mass _P contained in the lattice, followed by one or more confusion set specifications. For each alignment position _a, the hypothesized words _w_i and their posterior probabilities _p_i are listed in alterna- tion. The pseudo-word string **DDEELLEETTEE** represents an empty hypothesis. Optionally, the word mesh format encodes additional infor- mation about the hypothesis alignment from which it resulted. The keyword rreeffeerreennccee specifies the correct word _w that was aligned at position _a. The keyword hhyyppss is used to list the sentence hypotheses of which a certain word hypothesis was a part. The word hypothesis is iden- tified by an alignment postion _a and the word string _w, and is followed by the integer IDs _h_i (typically, the N- best ranks) of the associated sentence hypotheses. As another optional element, the word mesh can contain word-level acoustic and temporal information, following the keyword iinnffoo, the alignment position _a, and the word identity _w. This information is derived by nnbbeesstt--llaatt-- ttiiccee(1) from word- and phone-level backtraces of N-best hypotheses (as represented in Decipher NBestList2.0 for- mat). The details of this information are defined in the SRILM class NNBBeessttWWoorrddIInnffoo and subject to change, but cur- rently include the following. _s_t_a_r_t: word start time (in seconds from the beginning of the waveform); _d_u_r: word duration (in seconds); _a_s_c_o_r_e: acoustic model likelihood (log base 10); _g_s_c_o_r_e: grammar (LM and pronunciation) score (log base 10); _p_h_o_n_e_s: sequence of phones in word (separated by colons); _p_h_o_n_e_d_u_r_s: sequence of phone dura- tions (in numbers of frames, separated by colons). When word meshes are derived from HTK format lattices, pronun- ciation field will consist of the HTK phone alignment information, which encodes both phone sequence and dura- tions; the phone duration field in turn is used to encode the duration model scores, if present. NNoottee:: The encoded information pertains to the word hypothesis with the high- est per-unit-time acoustic score among all hypotheses of the same word aligned to a given word mesh position. Both formats optionally encode the associated utterance IDs in the nnaammee field. Word lattices and meshes can be converted to PFSG format using the script wwllaatt--ttoo--ppffssgg.SSEEEE AALLSSOO nbest-lattice(1), lattice-tool(1), pfsg-scripts(1), pfsg- format(5), nbest-format(5). L. Mangu, E. Brill, & A. Stolcke, ``Finding consensus in speech recognition: word error minimization and other applications of confusion networks,'' _C_o_m_p_u_t_e_r _S_p_e_e_c_h _a_n_d _L_a_n_g_u_a_g_e 14(4), 373-400, 2000.BBUUGGSS Detailed alignment and acoustic information is so far only implemented for word meshes, although conceptually it would apply equally to word lattices.AAUUTTHHOORR Andreas Stolcke <stolcke@speech.sri.com>. Copyright 2001-2005 SRI InternationalSRILM File Formats $Date: 2005/08/22 19:14:08 $ wlat-format(5)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -