herest.tex

来自「隐马尔科夫模型工具箱」· TEX 代码 · 共 277 行
TEX
277 行
%/* ----------------------------------------------------------- */%/*                                                             */%/*                          ___                                */%/*                       |_| | |_/   SPEECH                    */%/*                       | | | | \   RECOGNITION               */%/*                       =========   SOFTWARE                  */ %/*                                                             */%/*                                                             */%/* ----------------------------------------------------------- */%/*         Copyright: Microsoft Corporation                    */%/*          1995-2000 Redmond, Washington USA                  */%/*                    http://www.microsoft.com                */%/*                                                             */%/*   Use of this software is governed by a License Agreement   */%/*    ** See the file License for the Conditions of Use  **    */%/*    **     This banner notice must not be removed      **    */%/*                                                             */%/* ----------------------------------------------------------- */%% HTKBook - Steve Young and Dave Ollason   24/11/97%\newpage\mysect{HERest}{HERest}\mysubsect{Function}{HERest-Function}\index{herest@\htool{HERest}|(}This program is used to perform a single re-estimation ofthe parameters of a set of HMMs using an {\em embedded training} version of the Baum-Welch algorithm.  Trainingdata consists of one or more utterances each of which has a transcription in the form of a standard label file (segmentboundaries are ignored).  For each training utterance, acomposite model is effectively synthesised by concatenatingthe phoneme models given by the transcription.  Each phonemodel has the same set of accumulators allocated to it as areused in HRest but in \htool{HERest} theyare updated simultaneously by performing a standard Baum-Welch pass overeach training utterance using the composite model.   \htool{HERest} is intended to operate on HMMs with initial parameter values estimated by HInit/HRest.\htool{HERest} supports multiple mixture Gaussians, discrete and tied-mixtureHMMs, multiple data streams, parameter tying within and between models, andfull or diagonal covariance matrices. \htool{HERest} also supports tee-models(see section~\ref{s:teemods}), for handling optional silence and non-speechsounds. These may be placed between the units (typically words or phones)listed in the transcriptions but they cannot be used at the start or end of atranscription. Furthermore, chains of tee-models are not permitted.\htool{HERest} includes features to allow parallel operation where a networkof processors is available. When the training set is large, it can be split into separate chunks that are processed in parallel on multiple machines/processors, consequently speeding up the training process. Like all re-estimation tools, \htool{HERest} allowsa floor to be set on each individual variance by defining a variance floormacro for each data stream (see chapter~\ref{c:Training}).\htool{HERest} supports two specific methods for initilisation ofmodel parameters , \textit{single pass retraining} and \textit{2-model  reestimation}.\textit{Single pass retraining} is useful when the parameterisation ofthe front-end (e.g. from MFCC to PLP coefficients) is to be modified.Given a set of well-trained models, a set of new models using adifferent parameterisation of the training data can be generated in asingle pass.  This is done by computing the forward and backwardprobabilities using the original well-trained models and the originaltraining data, but then switching to a new set of training data tocompute the new parameter estimates.In \textit{2-model re-estimation} one model set can be used to obtainthe forward backward probablilites which then are used to update theparameters of another model set. Contrary to \textit{single pass  retraining} the two model sets are not required to be tied in thesame fashion.  This is particulary useful for training of singlemixture models prior to decision-tree based state clustering. The useof 2-model re-estimation in \htool{HERest} is triggered by setting theconfig variables {\tt ALIGNMODELMMF} or {\tt ALIGNMODELDIR} and {\tt  ALIGNMODELEXT} together with {\tt ALIGNHMMLIST} (see section \ref{s:twomodel}).\htool{HERest} operates in two distinct stages. \begin{enumerate}\item    In the first stage, one of the following two options applies \begin{enumerate}  \item            Each input data file contains training data which is     processed and the accumulators for state occupation,     state transition, means and variances are updated.          \item          Each data file contains a dump of the accumulators    produced by previous runs of the program.  These    are read in and added together to form a single set    of accumulators.  \end{enumerate}\item   In the second stage, one of the following options applies  \begin{enumerate}    \item         The accumulators are used to calculate new          estimates for the HMM parameters.    \item         The accumulators are dumped into a file.  \end{enumerate}\end{enumerate}Thus, on a single processor the default combination 1(a) and 2(a) wouldbe used.  However, if N processors are available then the training data would be split into N equal groups and \htool{HERest} wouldbe set to process one data set on each processor using the combination1(a) and 2(b). When all processors had finished, the program would then be run again using the combination 1(b) and 2(a)to load in the partial accumulators created by the N processorsand do the final parameter re-estimation.  The choice of which combinationof operations \htool{HERest} will perform is governed by the {\tt -p} optionswitch as described below.As a further performance optimisation, \htool{HERest} will also prune the$\alpha$ and $\beta$ matrices.  By this means, a factor of 3 to 5speed improvement and a similar reduction in memory requirements can beachieved with negligible effects on training performance (see the {\tt-t} option below).  \mysubsect{Use}{HERest-Use}\htool{HERest} is invoked via the command line\begin{verbatim}   HERest [options] hmmList trainFile ...\end{verbatim}This causes the set of HMMs given in {\tt hmmList} to be loaded.The given list oftraining files is then used to perform one re-estimation cycle. As always,the list of training files can be stored in a script file if required.  Oncompletion, \htool{HERest} outputs new updated versions of each HMM definition. Ifthe number of training examples falls below a specified threshold for some particular HMM, thenthe new parameters for that HMM are ignored and the original parameters are used instead.The detailed operation of \htool{HERest} is controlled by the followingcommand line options\begin{optlist}  \ttitem{-c f} Set the threshold for tied-mixture observation      pruning to {\tt f}.      For tied-mixture \texttt{TIEDHS} systems, only those       mixture component probabilities which fall within {\tt f} of      the maximum mixture component probability are used in calculating      the state output probabilities (default 10.0).   \ttitem{-d dir}       Normally \htool{HERest} looks for HMM definitions       (not already loaded via MMF files)       in the current directory.  This option tells \htool{HERest} to look in      the directory {\tt dir} to find them.  \ttitem{-m N}  Set the minimum number of training examples     required for any model to {\tt N}.  If the actual number    falls below this value, the HMM is not updated and the original    parameters are used for the new version (default value 3).  \ttitem{-o ext}  This causes the file name extensions of the      original models (if any) to be replaced by {\tt ext}.  \ttitem{-p N}  This switch is used to set parallel mode operation.      If {\tt p} is set to a positive integer {\tt N}, then \htool{HERest} will      process the training files and then dump all the accumulators      into a file called {\tt HERN.acc}.  If {\tt p} is set to 0, then      it treats all file names input on the command line as the names      of {\tt .acc} dump files.  It reads them all in, adds together      all the partial accumulations and then re-estimates all the      HMM parameters in the normal way.   \ttitem{-r}  This enables single-pass retraining.  The list of training      files is processed pair-by-pair.  For each pair, the first file      should match the parameterisation of the original model set.  The      second file should match the parameterisation of the required new      set.  All speech input processing is controlled by configuration      variables in the normal way except that the variables describing      the old parameterisation are qualified by the name \texttt{HPARM1}      and the variables describing the new parameterisation are      qualified by the name \texttt{HPARM2}.  The stream widths for the      old and the new must be identical.  \ttitem{-s file} This causes statistics on occupation of each      state to be output to the named file.  This file      is needed for the {\tt RO} command of HHEd but it is also      generally useful for assessing the amount of training material      available for each HMM state.        \ttitem{-t f [i l]} Set the pruning threshold to {\tt f}.  During the       backward probability calculation, at      each time $t$       all (log) $\beta$ values falling more than {\tt f} below the      maximum $\beta$ value at that time are ignored.  During the      subsequent forward pass, (log) $\alpha$ values are only      calculated if there are corresponding valid $\beta$ values.      Furthermore, if the ratio of the $ \alpha \beta $ product divided      by the total probability (as computed on the backward pass)      falls below a fixed threshold then those values of $\alpha$      and $\beta$ are ignored. Setting {\tt f} to zero disables      pruning  (default value 0.0).  Tight pruning thresholds can       result in \htool{HERest} failing to process an utterance.      if the {\tt i} and {\tt l} options are given, then a pruning      error results in the threshold being increased by {\tt i} and      utterance processing restarts.  If errors continue, this procedure will       be repeated until the limit {\tt l} is reached.        \ttitem{-u flags} By default, \htool{HERest} updates all of the HMM parameters,      that is, means, variances, mixture weights and       transition probabilies. This       option causes just the parameters indicated by the {\tt flags}      argument to be updated, this argument is a string containing one      or more of the letters {\tt m} (mean), {\tt v} (variance) ,      {\tt t} (transition) and {\tt w} (mixture weight).  The       presence of a letter enables      the updating of the corresponding parameter set.  \ttitem{-v f}  This sets the minimum variance (i.e. diagonal element of      the covariance matrix) to the real value {\tt f} (default value      0.0).  \ttitem{-w f}  Any mixture weight which falls below the global            constant {\tt MINMIX} is treated as being zero.      When this parameter is  set,  all mixture weights  are floored      to {\tt f * MINMIX}.        \ttitem{-x ext}  By default, \htool{HERest} expects a HMM definition for       the label X to be stored in a file called {\tt X}.  This      option causes \htool{HERest} to look for the HMM definition in the      file {\tt X.ext}.\stdoptB\stdoptF\stdoptG\stdoptH\stdoptI\stdoptL\stdoptM\stdoptX\end{optlist}\stdopts{HERest}\mysubsect{Tracing}{HERest-Tracing}\htool{HERest} supports the following trace options where eachtrace flag is given using an octal base\begin{optlist}   \ttitem{00001} basic progress reporting.   \ttitem{00002} show the logical/physical HMM map.   \ttitem{00004} report statistics on pruning.   \ttitem{00010} show the alpha/beta matrices.   \ttitem{00020} show the occupation counters.   \ttitem{00040} show the transition counters.   \ttitem{00100} show the mixture weight counters.   \ttitem{00200} show the calculation of the output probabilities.   \ttitem{00400} list the updated model parameters.   \ttitem{01000} show the average percentage utilisation           of tied mixture components.\end{optlist}Trace flags are set using the \texttt{-T} option or the  \texttt{TRACE} configuration variable.\index{herest@\htool{HERest}|)}%%% Local Variables: %%% mode: latex%%% TeX-master: "../htkbook"%%% End:
herest.tex - 源码说明

本页面展示了「隐马尔科夫模型工具箱」中的 herest.tex 源码文件，采用 TEX 编程语言编写，共 277 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与模型相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?