hdecode.tex

来自「隐马尔可夫模型源代码」· TEX 代码 · 共 256 行
TEX
256 行



%/*                                                             */


%/*                          ___                                */


%/*                       |_| | |_/   SPEECH                    */


%/*                       | | | | \   RECOGNITION               */


%/*                       =========   SOFTWARE                  */ 


%/*                                                             */


%/*                                                             */


%/* ----------------------------------------------------------- */


%/*   Use of this software is governed by a License Agreement   */


%/*    ** See the file License for the Conditions of Use  **    */


%/*    **     This banner notice must not be removed      **    */


%/*                                                             */


%/* ----------------------------------------------------------- */


%


% HTKBook - Steve Young and Julian Odell - 24/11/97


%


\newpage


\mysect{HDecode}{HDecode}





{\bf\large WARNING: In contrast to the rest of HTK, \htool{HDecode} has


been specifically written for speech recognition. Known restrictions


are:


\begin{itemize}


\item only works for cross-word triphones;


\item \texttt{sil} and \texttt{sp} models are reserved as silence


models and are, by default, automatically added to the end of all


``words'' in the pronunciation dictionary.


\item lattices generated with \htool{HDecode} must be {\em merged} 


using \htool{HLRescore}


to remove duplicate word paths prior to being used for lattice rescoring


with \htool{HDecode} and \htool{HVite}.


\end{itemize}


For an example of the use of \htool{HDecode} see the Resource


Management recipe, section 11, in the \texttt{samples} tar-ball


that is available for download


}





\mysubsect{Function}{HDecode-Function}





\index{hdecode@\htool{HDecode}|(}


\htool{HDecode} is a large vocabulary word recogniser.  


Similar to \htool{HVite}, it transcribes speech files using a HMM model set


and a dictionary (vocabulary). The best transcription hypothesis will be


generated in the Master Label File (MLF) format. Optionally,


multiple hypotheses can also be generated as a word lattice in the form


of the HTK Standard Lattice Format (SLF).





The search space of the recognition process is defined by a model based 


network, produced from expanding a supplied language model or a word 


level lattice using the dictionary. In the absence of a word lattice,


a language model must be supplied to perform a \emph{full decoding}. 


The current version of \htool{HDecode} only supports bigram full decoding.


When a word lattice is supplied, the use of a language model is optional. 


This mode of operation is known as \emph{lattice rescoring}.


The acoustic and language model scores can be adjusted using the


\texttt{-a} and \texttt{-s} options respectively. 


In the case where the supplied dictionary


contains pronunciation probability information, the corresponding scale


factor can be adjusted using the \texttt{-r} option. Use the \texttt{-q} option


to control the type of information to be included in the generated lattices.





\htool{HDecode}, when compiled with the \texttt{MODALIGN} compile directive,


can also be used to align the HMM models to a given word level lattice


(also known as model marking the lattice). When using the default


\texttt{Makefile} supplied with \htool{HDecode}, this binary will be


made and stored in \htool{HDecode.mod}.





\htool{HDecode} supports shared parameters and appropriately pre-computes 


output probabilities. The runtime of the decoding process can be adjusted by


changing the pruning beam width (see the \texttt{-t} option), 


word end beam width (see the \texttt{-v} option) and the maximum model 


pruning (see the \texttt{-u} option).


\htool{HDecode} also allows probability calculation to be carried out in


blocks at the same time. The block size (in frames) can be specified using


the \texttt{-k} option. However, when CMLLR adaptation is used, probabilities


have to be calculated one frame at a time (i.e. using \texttt{-k 1})\footnote{


This is due to the different caching mechanism used in \htool{HDecode} and 


the \htool{HAdapt} module}.


Speaker adaptation is supported by \htool{HDecode} only in terms of


using a speaker specific linear adaptation transformation. The use of


an adaptation transformation is enabled using the \texttt{-m} option.


The path, name and extension of the transformation matrices are specified


using the \texttt{-J} and the file names are derived from the name of the


speech file using a \emph{mask} (see the \texttt{-h} option).


Online (batch or incremental) adaptations are not supported by \htool{HDecode}.





Note that for lattices rescoring word lattices must be deterministic.


Duplicated paths and pronunciation variants are not permitted.  See


\htool{HLRescore} reference page for information on how to produce


deterministic lattices.








\mysubsect{Use}{HDecode-Use}





\htool{HDecode} is invoked via the command line


\begin{verbatim}


   HDecode [options] dictFile hmmList testFiles ...


\end{verbatim}


HDecode will then either load a N-gram language model file (\texttt{-w s}) 


and create a decoding network for the test files, which is


the {\em full decoding} mode, or create a


new network for each test file from the corresponding 


word lattice, which is the {\em lattice rescoring} mode. 


When a new network is created for each test file the path name


of the label (or lattice) file to load is determined from the


test file name and the \texttt{-L} and \texttt{-X} options


described below.





The \texttt{hmmList} should contain a list of the models required to


construct the network from the word level representation.





The recogniser output is written in the form of a label file whose


path name is determined from the test file name and the \texttt{-l} and 


\texttt{-y} options described below. The list of test files can be stored 


in a script file if required.


When performing lattice recognition (see \texttt{-z s} option described


below) the output lattice file contains multiple alternatives and the


format is determined by the \texttt{-q} option.





The detailed operation of \htool{HDecode} is controlled by the following


command line options


\begin{optlist}





  \ttitem{-a f} Set acoustic scale factor to \texttt{f}.


                This factor post-multiplies the acoustic likelihoods


                from the word lattices.  (default value 1.0).





  \ttitem{-d dir} This specifies the directory to search for the


        HMM definition files corresponding to the labels used in


        the recognition network.





  \ttitem{-h mask} Set the mask for determining which transform names are 


	to be used for the input transforms. 





  \ttitem{-i s} Output transcriptions to MLF \texttt{s}.





  \ttitem{-k i} Set frame block size in output probability calculation for


  diagonal covariance systems.





  \ttitem{-l dir} This specifies the directory to store the  output label 


        files.  If this option is not used then \htool{HDecode} will store 


        the label files in the same directory as the data. 


        When output is directed to an MLF, this option can be used to


      add a path to each output file name.  In particular, setting the option


      \verb+-l '*'+ will cause a label file named \texttt{xxx} to be prefixed


      by the pattern \verb+"*/xxx"+ in the output MLF file.  This is useful


      for generating MLFs which are independent of the location of the 


      corresponding data files.





  \ttitem{-m  } Use an input transform. (default is off)





  \ttitem{-n i} Use \texttt{i} tokens in each state to perform


        lattice recognition. (default is 32 tokens per state)





  \ttitem{-o s} Choose how the output labels should be formatted.


        \texttt{s} is a string with certain letters (from \texttt{NSCTWMX}) 


        indicating binary flags that control formatting options. 


        \texttt{N} normalise acoustic scores by dividing by the duration


        (in frames) of the segment.


        \texttt{S} remove scores from output label.  By default 


        scores will be set to the total likelihood of the segment.


        \texttt{C} Set the transcription labels to start and end on


        frame centres. By default start times are set to the start


        time of the frame and end times are set to the end time of 


        the frame.


        \texttt{T} Do not include times in output label files.


        \texttt{W} Do not include words in output label files


        when performing state or model alignment.


        \texttt{M} Do not include model names in output label


        files when performing state and model alignment.


        \texttt{X} Strip the triphone context.





  \ttitem{-p f}  Set the word insertion log probability to \texttt{f} 


        (default 0.0).





  \ttitem{-q s} Choose how the output lattice should be formatted.


         \texttt{s} is a string with certain letters (from \texttt{ABtvaldmnr})


         indicating binary flags that control formatting options.


         \texttt{A} attach word labels to arcs rather than nodes.


         \texttt{B} output lattices in binary for speed.


         \texttt{t} output node times.


         \texttt{v} output pronunciation information.


         \texttt{a} output acoustic likelihoods.


         \texttt{l} output language model likelihoods.


         \texttt{d} output word alignments (if available).


         \texttt{m} output within word alignment durations.


         \texttt{n} output within word alignment likelihoods.


         \texttt{r} output pronunciation probabilities.





  \ttitem{-r f} Set the dictionary pronunciation probability scale 


        factor to \texttt{f}. (default value 1.0).





  \ttitem{-s f} Set the grammar scale factor to \texttt{f}.


        This factor post-multiplies the language model likelihoods


        from the word lattices.  (default value 1.0).


 


  \ttitem{-t f [g]} Enable beam searching such that any model whose 


        maximum log probability token falls more than the main beam


        \texttt{f} below the maximum for all models is deactivated. 


        An extra parameter \texttt{g} can be specified as the relative 


        beam width. It may override the main beam width.





  \ttitem{-u i} Set the maximum number of active models to \texttt{i}.


        Setting \texttt{i} to \texttt{0} disables this limit (default 0).





  \ttitem{-v f [g]} Enable word end pruning.  Do not propagate tokens from


        word end nodes that fall more than \texttt{f} below the maximum 


        word end likelihood.  (default \texttt{0.0}). 


        An extra parameter \texttt{g} can be specified.





  \ttitem{-w s} Load language model from \texttt{s}.





  \ttitem{-x ext}  This sets the extension to use for HMM definition


      files to \texttt{ext}.





  \ttitem{-y ext}  This sets the extension for output label files to


        \texttt{ext} (default \texttt{rec}).





  \ttitem{-z ext}  Enable output of lattices with extension \texttt{ext}


                   (default off).





  \ttitem{-L dir} This specifies the directory to find input lattices. 





%   \ttitem{-R s}  Load 1-best alignment label file from \texttt{-s}.





  \ttitem{-X ext} Set the extension for the input lattice files 


        to be \texttt{ext}  (default value \texttt{lat}).





\stdoptE


\stdoptF


\stdoptG


\stdoptH


\stdoptJ


\stdoptK


\stdoptP





\end{optlist}


\stdopts{HDecode}





\mysubsect{Tracing}{HDecode-Tracing}





\htool{HDecode} supports the following trace options where each


trace flag is given using an octal base


\begin{optlist}


   \ttitem{0001} enable basic progress reporting.  


   \ttitem{0002} list observations.


   \ttitem{0004} show adaptation process.


   \ttitem{0010} show memory usage at start and finish.


\end{optlist}


Trace flags are set using the \texttt{-T} option or the \texttt{TRACE} 


configuration variable.


\index{hvite@@\htool{HDecode}|)}
hdecode.tex - 源码说明

本页面展示了「隐马尔可夫模型源代码」中的 hdecode.tex 源码文件，采用 TEX 编程语言编写，共 256 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与马尔可夫模型相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?