📄 herest.tex
字号:
%/* ----------------------------------------------------------- */
%/* */
%/* ___ */
%/* |_| | |_/ SPEECH */
%/* | | | | \ RECOGNITION */
%/* ========= SOFTWARE */
%/* */
%/* */
%/* ----------------------------------------------------------- */
%/* Copyright: Microsoft Corporation */
%/* 1995-2000 Redmond, Washington USA */
%/* http://www.microsoft.com */
%/* */
%/* Use of this software is governed by a License Agreement */
%/* ** See the file License for the Conditions of Use ** */
%/* ** This banner notice must not be removed ** */
%/* */
%/* ----------------------------------------------------------- */
%
% HTKBook - Steve Young and Dave Ollason 24/11/97
%
\newpage
\mysect{HERest}{HERest}
\mysubsect{Function}{HERest-Function}
\index{herest@\htool{HERest}|(} This program is used to perform a
single re-estimation of the parameters of a set of HMMs, or linear
transforms, using an {\em embedded training} version of the Baum-Welch
algorithm. Training data consists of one or more utterances each of
which has a transcription in the form of a standard label file
(segment boundaries are ignored). For each training utterance, a
composite model is effectively synthesised by concatenating the
phoneme models given by the transcription. Each phone model has the
same set of accumulators allocated to it as are used in HRest but in
\htool{HERest} they are updated simultaneously by performing a
standard Baum-Welch pass over each training utterance using the
composite model.
\htool{HERest} is intended to operate on HMMs with initial parameter values
estimated by HInit/HRest.
\htool{HERest} supports multiple mixture Gaussians, discrete and tied-mixture
HMMs, multiple data streams, parameter tying within and between models, and
full or diagonal covariance matrices. \htool{HERest} also supports tee-models
(see section~\ref{s:teemods}), for handling optional silence and non-speech
sounds. These may be placed between the units (typically words or phones)
listed in the transcriptions but they cannot be used at the start or end of a
transcription. Furthermore, chains of tee-models are not permitted.
\htool{HERest} includes features to allow parallel operation where a network
of processors is available. When the training set is large, it can be split into separate chunks that are processed in parallel on multiple machines/processors, consequently speeding up the training process.
Like all re-estimation tools, \htool{HERest} allows a floor to be set on
each individual variance by defining a variance floor macro for each data
stream (see chapter~\ref{c:Training}). The configuration variable {\tt
VARFLOORPERCENTILE} allows the same thing to be done in a different way
which appears to improve recognition results. By setting this to e.g. 20,
the variances from each dimension are floored to the 20th percentile of the
distribution of variances for that dimensioon.
%as suggested in:
%\bibitem[Lee, Giachin, Rabiner, Pieraccini \& Rosenberg, 1992]{lee92csl}
%Lee C-H., Giachin E., Rabiner L.R., Pieraccini R. \& Rosenberg A.E.
%(1992). ``Improved Acoustic Modeling for Large Vocabulary Continuous
%Speech Recognition,'' {\it Computer Speech and Language} {\bf 6}, pp.
%103-127.
\htool{HERest} supports two specific methods for initilisation of
model parameters , \textit{single pass retraining} and \textit{2-model
reestimation}.
\textit{Single pass retraining} is useful when the parameterisation of
the front-end (e.g. from MFCC to PLP coefficients) is to be modified.
Given a set of well-trained models, a set of new models using a
different parameterisation of the training data can be generated in a
single pass. This is done by computing the forward and backward
probabilities using the original well-trained models and the original
training data, but then switching to a new set of training data to
compute the new parameter estimates.
In \textit{2-model re-estimation} one model set can be used to obtain
the forward backward probablilites which then are used to update the
parameters of another model set. Contrary to \textit{single pass
retraining} the two model sets are not required to be tied in the
same fashion. This is particulary useful for training of single
mixture models prior to decision-tree based state clustering. The use
of 2-model re-estimation in \htool{HERest} is triggered by setting the
config variables {\tt ALIGNMODELMMF} or {\tt ALIGNMODELDIR} and {\tt
ALIGNMODELEXT} together with {\tt ALIGNHMMLIST} (see section \ref{s:twomodel}).
As the model list can differ for the alignment model set a seperate set of
input transforms may be specified using the {\tt ALIGNXFORMDIR} and
{\tt ALIGNXFORMEXT}.
\htool{HERest} for updating model parameters operates in two distinct stages.
\begin{enumerate}
\item
In the first stage, one of the following two options applies
\begin{enumerate}
\item
Each input data file contains training data which is
processed and the accumulators for state occupation,
state transition, means and variances are updated.
\item
Each data file contains a dump of the accumulators
produced by previous runs of the program. These
are read in and added together to form a single set
of accumulators.
\end{enumerate}
\item
In the second stage, one of the following options applies
\begin{enumerate}
\item
The accumulators are used to calculate new
estimates for the HMM parameters.
\item
The accumulators are dumped into a file.
\end{enumerate}
\end{enumerate}
Thus, on a single processor the default combination 1(a) and 2(a) would
be used. However, if N processors are available then the
training data would be split into N equal groups and \htool{HERest} would
be set to process one data set on each processor using the combination
1(a) and 2(b).
When all processors had finished, the
program would then be run again using the combination 1(b) and 2(a)
to load in the partial accumulators created by the N processors
and do the final parameter re-estimation. The choice of which combination
of operations \htool{HERest} will perform is governed by the {\tt -p} option
switch as described below.
As a further performance optimisation, \htool{HERest} will also prune the
$\alpha$ and $\beta$ matrices. By this means, a factor of 3 to 5
speed improvement and a similar reduction in memory requirements can be
achieved with negligible effects on training performance (see the {\tt
-t} option below).
\htool{HERest} is able to make use of, and estimate, linear
transformations for model adaptation. There are three types of linear
transform that are made use in \htool{HERest}.
\begin{itemize}
\item {\it Input transform}: the input transform is used to determine
the forward-backward probabilities, hence the component posteriors, for
estimating model and transform
\item {\it Output transform}: the output transform is generated when the
{\tt -u} option is set to {\tt a}. The transform will be stored in the
current directory, or the directory specified by the {\tt -K} option
and optionally the transform extension.
\item {\it Parent transform}: the parent transform determines the
model, or features, on which the model set or transform is to be
generated. For transform estimation this allows {\em cascades} of transforms
to be used to adapt the model parameters. For model estimation this
supports {\em speaker adaptive training}. Note the current implementation
only supports adaptove training with CMLLR. Any parent transform can be
used when generating transforms.
\end{itemize}
When input or parent transforms are specified the transforms may
physically be stored in multple diirectories. Which transform to be used
is determined in the following search order:
order is used.
\begin{enumerate}
\item Any loaded macro that matches the transform (and its' extension) name.
\item If it is a parent transform, the directory specified with the
{\tt -E} option.
\item The list of directories specified with the {\tt -J} option.
The directories are searched in the order that they are specified
in the command line.
\end{enumerate}
As the search order above looks for loaded macros first it is
recommended that unique extensions are specified for each set of
transforms generated. Transforms may either be stored in
a single TMF. These TMFs may be loaded using the {\tt -H} option.
When macros are specified for the regression class trees and
the base classes the following search order is used
\begin{enumerate}
\item Any loaded macro that matches the macro name.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -