📄 nbest-scripts.html

📁 这是一款很好用的工具包
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
(default 0),and the maximum number<I> max-nbest </I>of hypotheses to consider (default all).Optionally, any number of additional score directories and associatedweights<I> score-dir1 score-weight1 score-dir2 score-weight2 </I>... can be specified, following the<I> wtw </I>parameter.These additional scores are combined with those contained in theN-best lists themselves as in<B> rescore-acoustic </B>(using unit weight for the original acoustic scores).<B> -multiwords </B>indicates that multi-words are to be split into their components.The output format for 1-best hypotheses is<BR>	<I>sentid</I> <I>w1</I> <I>w2</I> ...<BR>where<I> sentid </I>is the sentence ID derived from the N-best filename, followed by the words.<P><B> rescore-minimize-wer </B>is similar to <B> rescore-reweight </B>but picks hypotheses using the word error minimization algorithmof <A HREF="nbest-lattice.html">nbest-lattice(1)</A>.<P><B> nbest2-to-nbest1 </B>converts an N-best list in ``NBestList2.0'' format to ``NBestlist1.0'',for the benefit of programs that have not yet been updated to deal with the new format.<P><B> nbest-rover </B>combines hypotheses from multiple N-best lists at the word level,by performing the same kind of word error minimization as <A HREF="nbest-lattice.html">nbest-lattice(1)</A>,in a generalization of the ROVER algorithm.<I> sentid-list </I>is a file listing sentence IDs.These must match the filenames in a set of N-best directories,which are specified in a<I>control-file</I>.<I></I>The format for the latter is<BR>	<I>dir1</I> <I>lmw1</I> <I>wtw1</I> <I>w1</I> [<I>n1</I> [<I>s1</I>]]<BR>	<I>dir2</I> <I>lmw2</I> <I>wtw2</I> <I>w2</I> [<I>n2</I> [<I>s2</I>]]<BR>	...<BR>Each line specifies an N-best directory, the language model and word transitionweights to be used in score combination, and a weight to be applied to theposterior probabilities.An optional next-to-last parameter for each N-best list allows the lists to be truncated to the top <I>n1</I>, <I>n2</I>, etc., hypotheses.The final optional parameter sets the posterior distribution scaling factor,which defaults to the language model weight.Optionally,<I> control-file </I>can also contain lines of the form<BR>	<I>dir</I> <I>w</I> <B>+</B><BR>These indicate that additional score files can be found in directory<I> dir </I>and that the scores found therein should be added to the following N-best list set with weight<I>w</I>.<I></I>Several lines of this form may occur preceding a regular N-bestdirectory specification; the corresponding additive combination of multiplescores is performed.<BR>If ``-'' is specified for<I>sentid-list</I>,<I></I>the sentence IDs are inferred fromthe contents of the first directory <I>dir1</I> specified in<I>control-file</I>.<I></I>If<I> posterior-file </I>is specified on the command line, posterior word probability estimates arewritten to that file.Any additional arguments are passed as options to the underlying<A HREF="nbest-lattice.html">nbest-lattice(1)</A>invocation.<BR><B> nbest-rover </B>can process N-best lists in any of the formats described in<A HREF="nbest-format.html">nbest-format(5)</A>,<I>as long as all N-best lists for a given utterance are in the same format</I>.When Decipher formats are used only their acoustic scores are used.<P><B> combine-rover-controls </B>takes one or more<B> nbest-rover </B>control files as arguments and outputs a new control file that specifiesthe combination of the input files.Each input system is given equal weight.Directory names in the input files are adjusted to reflect the relativelocation of the input files.The optional<B> lambda= </B>argument may be used to specify a space-separated list of system weights;the default weights are uniform.<P><B> nbest-posteriors </B>rescales the scores in an N-best list to reflect (weighted) posteriorprobabilities.The output is the same N-best list with acoustic scores set tothe log (base 10) of the posterior hyp probabilities and LM scores set to zero.<B>postscale=</B><I>S</I><B></B>attenuates the posterior distribution by dividing combined log scores by<I> S </I>(the default is<I>S</I>=<I>lmw</I>If<B>weight=</B><I>W</I><B></B>is specified the posteriors are multiplied by<I>W</I>.<I></I><B>max_nbest=</B><I>M</I><B></B>limits the number of hypotheses used to the top <I>M</I>.<I></I>This script is used mostly as a helper in<B>nbest-rover</B>.<B></B><P><B> merge-nbest </B>merges hypotheses from one or more N-best lists into a single list,collapsing hypotheses that occur in more than one input list.If all input lists use the same <A HREF="nbest-format.html">nbest-format(5)</A>then the output will also be in that format and contain the informationfrom the first list in which a hypothesis was encountered.Otherwise, the output will be in SRI Decipher(TM) NBestList1.0 formatand contain acoustic scores and word strings only.The<B>max_nbest=</B><I>M</I><B></B>option limits input to the first <I> M </I>hypotheses from each input list.<B> multiwords=1 </B>merges hypotheses that are identical after resolving multiwords.<B> nopauses=1 </B>merges hypotheses that are identical after removal of pause words.<P><B> nbest-vocab </B>outputs the vocabulary used in a set of N-best lists.(The N-best files cannot be compressed, but may be concatenated andsupplied via stdin.)<P><B> nbest-error </B>computes the overall oracle word error rate of a set of N-best listsin directory<I> score-dir </I>or listed in<I>file-list</I>.<I></I>The reference answers are given in<I> refs </I>in the format output by <B> rescore-reweight </B>(see above).Additional arguments are passed to the underlying invocation of<A HREF="nbest-lattice.html">nbest-lattice(1)</A>,and can be used to limit the depth of the N-best list,compute lattice error rather than N-best error, etc.<P><B> sentid-to-sclite </B>converts 1-best hypotheses and references in the format used here tothe ``trn'' format expected by the NIST<A HREF="sclite.html">sclite(1)</A>scoring software.<P><B> sentid-to-ctm </B>converts 1-best hypotheses and references in the format used here to NIST<A HREF="ctm.html">ctm(5)</A>format.The script relies on an encoding of conversation IDs, channel, and utterancetime marks in the sentence IDs and may need adjustment to local conventions.<P><B> fix-ctm </B>converts output produced by the<B> -output-ctm </B>option of <A HREF="nbest-lattice.html">nbest-lattice(1)</A>and<A HREF="lattice-tool.html">lattice-tool(1)</A>to a format suitable for scoring with NIST<A HREF="sclite.html">sclite(1)</A>.It, too, relies on information encoded in the sentids IDs and may needadjustments.<P><B> compute-sclite </B>is a wrapper around the NIST <A HREF="sclite.html">sclite(1)</A>scoring tool.<I> refs </I>and<I> hyps </I>are the reference and hypothesized transcripts, respectively. The<I> refs </I>file can be either in "sentid" format or in <A HREF="stm.html">stm(5)</A> format.  In the latter case,<I> hyps </I>will be converted to <A HREF="ctm.html">ctm(5)</A>format using the <B> sentid-to-ctm </B>helper script.The<I> hyps </I>file can be either in "sentid" format or in <A HREF="ctm.html">ctm(5)</A>format.More than one <B> -h </B>option can be given to combine the contents of multiple hypotheses files.Optionally, <B> -S </B>specifies asorted list of sentence IDs<I> subset </I>to score.Multiple <B> -S </B>options may be given, to form the intersection of several subsets.<B> -multiwords </B>or<B> -M </B>splits ``multiwords'' joined by underscores into their component wordsprior to scoring.<B> -noperiods </B>deletes periods from the hypotheses prior to scoring(typically used to bridge different conventions for spelled letters).<B> -R </B>preserves reject words in the hypotheses for scoring (as appropriate ifreferences also contain rejects).<B> -g </B><I> glmfile </I>enables filtering of references and hypotheses by the NIST<B> csrfilt.sh </B>script, controlled by the filter file <I> glmfile </I>(this is only possible with an stm reference file).In that case, the<B> -H </B>option causes hesitations (as defined by the filter)to be deleted from the output for scoring purposes.<B> -v </B>displays the complete command used to invoke<B>sclite</B>.<B></B>Any additional options are passed to<B>sclite</B>,<B></B>e.g., to control its output actions or alignment mode.<P><B> compare-sclite </B>scores two sets of hypotheses <I> hyps1 </I>and<I> hyps2 </I>for the same test set and computes inhow many cases the first or second set had lower word error.The remaining options are as for<B>compute-sclite</B>.<B></B>The script ignores hypotheses for sentence that do not appear in bothhypothesis files, to ensure comparable scoring results.<H2> SEE ALSO </H2><A HREF="nbest-format.html">nbest-format(5)</A>, <A HREF="ngram.html">ngram(1)</A>, <A HREF="nbest-lattice.html">nbest-lattice(1)</A>, <A HREF="nbest-optimize.html">nbest-optimize(1)</A>, <A HREF="sclite.html">sclite(1)</A>,<A HREF="stm.html">stm(5)</A>, <A HREF="ctm.html">ctm(5)</A>.<BR>J.G. Fiscus, A Post-Processing System to Yield Reduced Word Error Rates:Recognizer Output Voting Error Reduction (ROVER),<I>Proc. IEEE Automatic Speech Recognition and Understanding Workshop</I>,Santa Barbara, CA, 347-352, 1997.<BR>A. Stolcke et al., "The SRI March 2000 Hub-5 Conversational SpeechTranscription System",<I>Proc. NIST Speech Transcription Workshop</I>, College Park, MD, 2000.<H2> BUGS </H2><B> sentid-to-sclite </B>has some assumptions about the structure of sentence IDs built-in andmay need to be modified for <B> compute-sclite </B>and <B> compare-sclite </B>to work.<P><B> rescore-decipher </B><B> -pretty </B>may not work correctly with the<B> -limit-vocab </B>option if the word mapping adds to the vocabulary subset used in the N-bestlists.<H2> AUTHOR </H2>Andreas Stolcke &lt;stolcke@speech.sri.com&gt;.<BR>Copyright 1995-2006 SRI International</BODY></HTML>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -