📄 hresults.tex
字号:
%/* ----------------------------------------------------------- */%/* */%/* ___ */%/* |_| | |_/ SPEECH */%/* | | | | \ RECOGNITION */%/* ========= SOFTWARE */ %/* */%/* */%/* ----------------------------------------------------------- */%/* developed at: */%/* */%/* Speech Vision and Robotics group */%/* Cambridge University Engineering Department */%/* http://svr-www.eng.cam.ac.uk/ */%/* */%/* Entropic Cambridge Research Laboratory */%/* (now part of Microsoft) */%/* */%/* ----------------------------------------------------------- */%/* Copyright: Microsoft Corporation */%/* 1995-2000 Redmond, Washington USA */%/* http://www.microsoft.com */%/* */%/* 2001-2002 Cambridge University */%/* Engineering Department */%/* */%/* Use of this software is governed by a License Agreement */%/* ** See the file License for the Conditions of Use ** */%/* ** This banner notice must not be removed ** */%/* */%/* ----------------------------------------------------------- */%% HTKBook - Steve Young and Julian Odell 24/11/97%\newpage\mysect{HResults}{HResults}\mysubsect{Function}{HResults-Function}\index{hresults@\htool{HResults}|(}\htool{HResults} is the \HTK\ performance analysis tool.It reads in a set of label files (typically outputfrom a recognition tool such as \htool{HVite}) and compares themwith the corresponding reference transcription files. For the analysis of speech recognition output, the comparisonis based on a Dynamic Programming-based string alignment procedure.For the analysis of word-spotting output, the comparisonuses the standard US NIST FOM metric.When used to calculate the sentence accuracy using DP the basic output is recognition statistics for the whole file set in the format\begin{verbatim} --------------------------- Overall Results ------------------- SENT: %Correct=13.00 [H=13, S=87, N=100] WORD: %Corr=53.36, Acc=44.90 [H=460,D=49,S=353,I=73,N=862] ===============================================================\end{verbatim}The first line gives the sentence-level accuracy based on the total number of label files which are identical to the transcriptionfiles. The second line is the word accuracy based on the DP matchesbetween the label files and the transcriptions \footnote{The choice of ``Sentence'' and ``Word'' here is the usualcase but is otherwise arbitrary.\htool{HResults} just compares label sequences. The sequencescould be paragraphs, sentences, phrases or words, and the labelscould be phrases, words, syllables or phones, etc. Options existto change the output designations `SENT' and `WORD' to whateveris appropriate.}.In this second line,$H$ is the number of correct labels, $D$ is the number of deletions,$S$ is the number of substitutions, $I$ is the number of insertions and$N$ is the total number of labels in the defining transcription files.The percentage number of labels correctly recognised is given by\begin{equation} \mbox{\%Correct} = \frac{H}{N} \times 100\%\end{equation}and the accuracy is computed by\begin{equation} \mbox{Accuracy} = \frac{H-I}{N} \times 100\%\end{equation}In addition to the standard \HTK\ output format, \htool{HResults} provides an alternative similar to that usedin the US NIST scoring package, i.e.\\begin{verbatim} |=============================================================| | # Snt | Corr Sub Del Ins Err S. Err | |-------------------------------------------------------------| | Sum/Avg | 87 | 53.36 40.95 5.68 8.47 55.10 87.00 | `-------------------------------------------------------------'\end{verbatim}When \htool{HResults} is used to generate a confusion matrix, thevalues are as follows:\begin{description}\item[\%c] The percentage correct in the row; that is, how many times a phone instance was correctly labelled.\item[\%e] The percentage of incorrectly labeled phones in the row as a percentage of the total number of labels in the set.\end{description}An example from the HTKDemo routines:\begin{verbatim}====================== HTK Results Analysis ======================= Date: Thu Jan 10 19:00:03 2002 Ref : labels/bcplabs/mon Rec : test/te1.rec : test/te2.rec : test/te3.rec------------------------ Overall Results --------------------------SENT: %Correct=0.00 [H=0, S=3, N=3]WORD: %Corr=63.91, Acc=59.40 [H=85, D=35, S=13, I=6, N=133]------------------------ Confusion Matrix ------------------------- S C V N L Del [ %c / %e] S 6 1 0 1 0 0 [75.0/1.5] C 2 35 3 1 0 18 [85.4/4.5] V 0 1 28 0 1 12 [93.3/1.5] N 0 1 0 7 0 1 [87.5/0.8] L 0 1 1 0 9 4 [81.8/1.5]Ins 2 2 0 2 0===================================================================\end{verbatim}Reading across the rows, \%c indicates the number of correct instancesdivided by the total number of instances in the row. \%e is thenumber of incorrect instances in the row divided by the total numberof instances (N).Optional extra outputs available from \htool{HResults} are\begin{itemize} \item recognition statistics on a per file basis \item recognition statistics on a per speaker basis \item recognition statistics from best of N alternatives \item time-aligned transcriptions \item confusion matrices\end{itemize}For comparison purposes, it is also possible to assign twolabels to the same equivalence class (see {\tt -e option}). Also, the {\em null} label {\tt ???} is defined so that making anylabel equivalent to the null label means that it will beignored in the matching process. Note that the order of equivalencelabels is important, to ensure that label {\tt X} is ignored, thecommand line option \verb+-e ??? X+ would be used.Label files containing triphone labels of the form {\tt A-B+C} can be optionally stripped down to just the class name {\tt B} via the {\tt -s} switch.The word spotting mode of scoring can be used to calculate hits,false alarms and the associated figure of merit for each of aset of keywords.Optionally it can also calculate ROC information over a range offalse alarm rates. A typical output is as follows\begin{verbatim}------------------------ Figures of Merit ------------------------- KeyWord: #Hits #FAs #Actual FOM A: 8 1 14 30.54 B: 4 2 14 15.27 Overall: 12 3 28 22.91-------------------------------------------------------------------\end{verbatim}which shows the number of hits and false alarms (FA) for two keywords\texttt{A} and \texttt{B}. A label in the test file with start time $t_s$and end time $t_e$ constitutes a hit if there is a corresponding labelin the reference file such that $t_s < t_m < t_e$ where $t_m$ is themid-point of the reference label.Note that for keyword scoring, the testtranscriptions must include a score with each labelled word spotand all transcriptions must include boundary time information.The FOM gives the \% of hitsaveraged over the range 1 to 10 FA's per hour. This is calculatedby first ordering all spots for a particular keyword according tothe match score. Then for each FA rate $f$, the number of hits are countedstarting from the top of the ordered list and stopping when $f$ have been encountered. This corresponds to \textit{a posteriori}setting of the keyword detection threshold and effectively gives anupper bound on keyword spotting performance.\mysubsect{Use}{HResults-Use}\htool{HResults} is invoked by typing the command line\begin{verbatim} HResults [options] hmmList recFiles ...\end{verbatim}This causes \htool{HResults} to be applied to each {\tt recFile} in turn. The{\tt hmmList} should contain a list of all model names for which resultinformation is required. Note, however, that since the context dependent partsof a label can be stripped, this list is not necessarily the same as the oneused to perform the actual recognition. For each {\tt recFile}, atranscription file with the same name but the extension {\tt .lab} (or someuser specified extension - see the {\tt -X} option) is read in and matched withit. The {\tt recfiles} may be master label files (MLFs), but note that even if such an MLF is loaded using the {\tt -I} option, the list of files to be checked still needs to be passed, either as individual command line arguments or via a script with the {\tt -S} option. For this reason, it is simpler to pass the {\tt recFile} MLF as one of the command line filename arguments. For loading reference label file MLF's, the {\tt -I} option must be used. The reference labels and the recognition labels must have different file extensions.The available options are\begin{optlist} \ttitem{-a s} change the label \texttt{SENT} in the output to \texttt{s}. \ttitem{-b s} change the label \texttt{WORD} in the output to \texttt{s}. \ttitem{-c} when comparing labels convert to upper case. Note that case is still significant for equivalences (see \texttt{-e} below). \ttitem{-d N} search the first \texttt{N} alternatives for each test label file to find the most accurate match with the reference labels. Output results will be based on the most accurate match to allow NBest error rates to be found. \ttitem{-e s t} the label {\tt t} is made equivalent to the label {\tt s}. More precisely, {\tt t} is assigned to an equivalence class of which {\tt s} is the identifying member. \ttitem{-f} Normally, \htool{HResults} accumulates statistics for all input files and just outputs a summary on completion. This option forces match statistics to be output for each input test file. \ttitem{-g fmt} This sets the test label format to {\tt fmt}. If this is not set, the {\tt recFiles} should be in the same format as the reference files. \ttitem{-h} Output the results in the same format as US NIST scoring software. \ttitem{-k s} Collect and output results on a speaker by speaker basis (as well as globally). \texttt{s} defines a pattern which is used to extract the speaker identifier from the test label file name. In addition to the pattern matching metacharacters \texttt{*} and \texttt{?} (which match zero or more characters and a single character respectively), the character \texttt{\%} matches any character whilst including it as part of the speaker identifier. \ttitem{-m N} Terminate after collecting statistics from the first \texttt{N} files. \ttitem{-n} Set US NIST scoring software compatibility. \ttitem{-p} This option causes a phoneme confusion matrix to be output. \ttitem{-s} This option causes all phoneme labels with the form {\tt A-B+C} to be converted to {\tt B}. It is useful for analysing the results of phone recognisers using context dependent models. \ttitem{-t} This option causes a time-aligned transcription of each test file to be output provided that it differs from the reference transcription file. \ttitem{-u f} Changes the time unit for calculating false alarm rates (for word spotting scoring) to \texttt{f} hours (default is 1.0). \ttitem{-w} Perform word spotting analysis rather than string accuracy calculation. \ttitem{-z s} This redefines the null class name to {\tt s}. The default null class name is {\tt ???}, which may be difficult to manage in shell script programming.\stdoptG\stdoptI\stdoptL\stdoptX\end{optlist}\stdopts{HResults}\mysubsect{Tracing}{HResults-Tracing}\htool{HResults} supports the following trace options where eachtrace flag is given using an octal base\begin{optlist} \ttitem{00001} basic progress reporting. \ttitem{00002} show error rate for each test alternative. \ttitem{00004} show speaker identifier matches. \ttitem{00010} warn about non-keywords found during word spotting. \ttitem{00020} show detailed word spotting scores. \ttitem{00040} show memory usage.\end{optlist}Trace flags are set using the \texttt{-T} option or the \texttt{TRACE} configuration variable.\index{hresults@\htool{HResults}|)}%%% Local Variables: %%% mode: latex%%% TeX-master: "../htkbook"%%% End:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -