📄 wlat-format.5

📁 这是一款很好用的工具包
💻 5
字号:
.\" $Id: wlat-format.5,v 1.7 2005/08/22 19:14:08 stolcke Exp $.TH wlat-format 5 "$Date: 2005/08/22 19:14:08 $" "SRILM File Formats".SH NAMEwlat-format \- File format for SRILM word posterior lattices.SH SYNOPSISWord lattices:.br\fBversion 2\fP.br\fBname\fP \fIs\fP.br\fBinitial\fP \fIi\fP.br\fBfinal\fP \fIf\fP.br\fBnode\fP \fIn\fP \fIw\fP \fIa\fP \fIp\fP \fIn1\fP \fIp1\fP \fIn2\fP \fIp2\fP ....br\&....brWord meshes (confusion networks):.br\fBname\fP \fIs\fP.br\fBnumaligns\fP \fIN\fP.br\fBposterior\fP \fIP\fP.br\fBalign\fP \fIa\fP \fIw1\fP \fIp1\fP \fIw2\fP \fIp2\fP ....br\fBreference\fP \fIa\fP \fIw\fP.br\fBhyps\fP \fIa\fP \fIw\fP \fIh1\fP \fIh2\fP ....br\fBinfo\fP \fIa\fP \fIw\fP \fIstart\fP \fIdur\fP \fIascore\fP \fIgscore\fP \fIphones\fP \fIphonedurs\fP.br\&....SH DESCRIPTIONWord posterior lattices and meshes are lattices generated by aligning N-best hypotheses with.BR nbest-lattice (1),or by aligning PFSG or HTK lattices with.BR lattice-tool (1).They compactly encode possible word hypotheses sequences and theirposterior probabilities.(Word meshes have become generally known as ``confusion networks'' or``sausages.'').PPA word lattice is a partially ordered directed graph with nodes representingword hypotheses.Nodes are identified by non-negative integers.The file format specifies the initial node.IR i ,the final node.IR f ,and any number of additional nodes .IR n .For each node.I nthe following associated information is given on the same line:the word identity .I w(the string ``NULL'' is used with initial and final nodes),the alignment position .I a (identical values in this field identify hypotheses that occur at thesame position),and the word posterior probability.IR p .Following these values, zero or more transitions to successor nodesare specified, each given by the node index.I niand the transition posterior probability.IR pi .In a properly normalized word lattice the transition posteriors.I pisum up to the node posterior.IR p ..PPWord meshes represent a more constrained lattice format in whichword hypotheses are in a total order.A mesh contains a number of alignment positions, and a set of mutually exclusive word hypotheses in each position (the ``confusion sets'').The word mesh represents all sentence hypotheses that can be generated by freely combining word hypotheses at each position.The file format specifies the number of alignment positions.IR A and the total posterior probability mass .I Pcontained in the lattice,followed by one or more confusion set specifications.For each alignment position .IR a ,the hypothesized words.I wiand their posterior probabilities.I piare listed in alternation.The pseudo-word string.B *DELETE*represents an empty hypothesis..PPOptionally, the word mesh format encodes additional information aboutthe hypothesis alignment from which it resulted.The keyword.B reference specifies the correct word.I wthat was aligned at position.IR a .The keyword.B hypsis used to list the sentence hypotheses of which a certain word hypothesis was a part.The word hypothesis is identified by an alignment postion .I aand the word string.IR w ,and is followed by the integer IDs .I hi(typically, the N-best ranks)of the associated sentence hypotheses..PPAs another optional element, the word mesh can contain word-level acoustic andtemporal information,following the keyword .BR info ,the alignment position.IR a ,and the word identity.IR w .This information is derived by .BR nbest-lattice (1)from word- and phone-level backtraces of N-best hypotheses (as represented in Decipher NBestList2.0 format).The details of this information are defined in the SRILM class .B NBestWordInfoand subject to change, but currently include the following..IR start :word start time (in seconds from the beginning of the waveform);.IR dur :word duration (in seconds);.IR ascore :acoustic model likelihood (log base 10);.IR gscore :grammar (LM and pronunciation) score (log base 10);.IR phones :sequence of phones in word (separated by colons);.IR phonedurs :sequence of phone durations (in numbers of frames, separated by colons).When word meshes are derived from HTK format lattices, pronunciation fieldwill consist of the HTK phone alignment information, which encodes bothphone sequence and durations; the phone duration field in turn is usedto encode the duration model scores, if present..B Note:The encoded information pertains to the word hypothesis with the highestper-unit-time acoustic score among all hypotheses of the same word alignedto a given word mesh position..PPBoth formats optionally encode the associated utterance IDs in the.B namefield.Word lattices and meshes can be converted to PFSG format usingthe script.BR wlat-to-pfsg ..SH "SEE ALSO"nbest-lattice(1), lattice-tool(1),pfsg-scripts(1), pfsg-format(5), nbest-format(5)..brL. Mangu, E. Brill, & A. Stolcke, ``Finding consensus in speech recognition:word error minimization and other applications of confusion networks,''\fIComputer Speech and Language\fP 14(4), 373-400, 2000..SH BUGSDetailed alignment and acoustic information is so far only implementedfor word meshes, although conceptually it would apply equally to word lattices..SH AUTHORAndreas Stolcke <stolcke@speech.sri.com>..brCopyright 2001-2005 SRI International
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -