📄 train.tex
字号:
This can be avoided by using an incremental threshold. For example,executing \begin{verbatim} HERest -t 120.0 60.0 240.0 -S trainlist -I labs \ -H dir1/hmacs -M dir2 hmmlist\end{verbatim}would cause \htool{HERest} to run normallyat a beam width\index{beam width} of 120.0. However, if a pruning error\index{pruning errors} occurs, thebeam is increased by 60.0 and \htool{HERest} reprocesses the offending trainingutterance. Repeated errors cause the beam width to be increasedagain and this continues until either the utterance is successfully processed or the upper beam limit is reached, in thiscase 240.0. Note that errors which occur at very high beam widthsare often caused by transcription errors, hence, it is best not toset the upper limit too high.\centrefig{parher}{90}{\htool{HERest} Parallel Operation}\index{model training!in parallel}The second way of speeding-up the operation of \htool{HERest} is to use more thanone computer in parallel. The way that this is done is to divide thetraining data amongst the available machines and then to run \htool{HERest} on eachmachine such that each invocation of \htool{HERest} uses the same initial set of models but has its own private set of data.By setting the option {\tt -p N} where {\tt N} is an integer, \htool{HERest} willdump the contents of all its accumulators\index{accumulators} into a file called {\tt HERN.acc}rather than updating and outputing a new set of models. These dumpedfiles are collected together and input to a new invocation of \htool{HERest} withthe option {\tt -p 0} set. \htool{HERest} then reloads the accumulators fromall of the dump files and updates the models in the normal way.This process is illustrated in Figure~\href{f:parher}.To give a concrete example, suppose that four networked workstationswere available to execute the \htool{HERest} command given earlier. The training files listed previously in \texttt{trainlist} would be split into four equal sets and a listof the files in each set stored in {\tt trlist1}, {\tt trlist2}, {\tt trlist3}, and {\tt trlist4}.On the first workstation, the command\begin{verbatim} HERest -S trlist1 -I labs -H dir1/hmacs -M dir2 -p 1 hmmlist\end{verbatim}would be executed. This will load in the HMM definitions in {\tt dir1/hmacs}, process the files listed in {\tt trlist1} and finallydump its accumulators into a file called {\tt HER1.acc} in the outputdirectory {\tt dir2}. At the same time, the command\begin{verbatim} HERest -S trlist2 -I labs -H dir1/hmacs -M dir2 -p 2 hmmlist\end{verbatim}would be executed on the second workstation, and so on. When \htool{HERest} has finished on all fourworkstations, the following command will be executed on just one of them\begin{verbatim} HERest -H dir1/hmacs -M dir2 -p 0 hmmlist dir2/*.acc\end{verbatim}where the list of training files has been replaced by the dumped accumulatorfiles. This will cause the accumulatedstatistics to be reloaded and merged so that the model parameters canbe reestimated and the new model set output to \texttt{dir2}The time to perform this last phase of the operation is very small, hencethe whole process will be around four times quicker than for thestraightforward sequential case. \mysect{Single-Pass Retraining}{singlepass}In addition to re-estimating the parameters of a HMM set, \htool{HERest}also provides a mechanism for mapping a set of models trained usingone parameterisation into another set based on a different parameterisation.This facility allows the front-end of a HMM-based recogniser to be modified without having to rebuild the models from scratch.This facility is known as single-pass retraining\index{single-pass retraining}.Given one set of well-trained models, a new set matching adifferent training data parameterisation can be generated in a singlere-estimation pass. This is done by computing the forward and backwardprobabilities using the original models together with the originaltraining data, but then switching to the new training data to computethe parameter estimates for the new set of models.Single-pass retraining is enabled in \htool{HERest} by setting the\texttt{-r} switch. This causes the input training files to be readin pairs. The first of each pair is used to compute theforward/backward probabilities and the second is used to estimate theparameters for the new models. Very often, of course, data input to\HTK\ is modified by the \htool{HParm} module in accordance withparameters set in a configuration file. In single-pass retraining mode,configuration parameters can be prefixed by the pseudo-module names\texttt{HPARM1} and \texttt{HPARM2}. Then when reading in the firstfile of each pair, only the \texttt{HPARM1} parameters are used andwhen reading the second file of each pair, only the \texttt{HPARM2}parameters are used.\index{configuration parameters!switching}As an example, suppose that a set of models has been trained on datawith \texttt{MFCC\_E\_D} parameterisation and a new set of models usingCepstral Mean Normalisation (\texttt{\_Z}) is required. These two dataparameterisations are specified in a configuration file(\texttt{config}) as two separate instances of the configurationvariable \texttt{TARGETKIND} i.e.\begin{verbatim} # Single pass retraining HPARM1: TARGETKIND = MFCC_E_D HPARM2: TARGETKIND = MFCC_E_D_Z\end{verbatim}\htool{HERest} would then be invoked with the \texttt{-r} option set to enable single-pass retraining. For example,\begin{verbatim} HERest -r -C config -S trainList -I labs -H dir1/hmacs -M dir2 hmmList\end{verbatim}The script file \texttt{trainlist} contains a list of data filepairs. For each pair, the first file should match the parameterisation of the original model set and the second file should match that of therequired new set.This will cause the model parameter estimates to be performed using the new set of training data and a new set of models matching this data will be output to \texttt{dir2}. This process of single-pass retraining isa significantly faster route to a new set of models than training a fresh set from scratch.\mysect{Two-model Re-Estimation}{twomodel}Another method for initialisation of model parameters implemented in\htool{HERest} is two-model re-estimation. HMM sets often use the samebasic units such as triphones but differ in the way the underlying HMMparameters are tied. In these cases two-model re-estimation can beused to obtain the state-level alignment using one model set which isused to update the parameters of a second model set. This is helpfulwhen the model set to be updated is less well trained.A typical use of two-model re-estimation\index{two-model re-estimation} is the initialisation of state clustered triphonemodels. In the standard case triphone models are obtained by cloningof monophone models and subsequent clustering of triphone states.However, the unclustered triphone models are considerably lesspowerful than state clustered triphone HMMs using mixtures ofGaussians. The consequence is poor state level alignment and thus poorparameter estimates, prior to clustering. This can be ameliorated bythe use of well-trained \textit{alignment models} for computing theforward-backward probabilities. In the maximisation stage of theBaum-Welch algorithm the state level posteriors are used tore-restimate the parameters of the \textit{update model set}. Note thatthe corresponding models in the two sets must have the same number ofstates.As an example, suppose that we would like to update a set of clonedsingle Gaussian monophone models in {\tt dir1/hmacs} using the welltrained state-clustered triphones in {\tt dir2/hmacs} as alignmentmodels. Associated with each model set are the model lists {\tt hmmlist1} and {\tt hmmlist2} respectively. In order to use thesecond model set for alignment a configuration file {\tt config.2model} containing\begin{verbatim} # alignment model set for two-model re-estimation ALIGNMODELMMF = dir2/hmacs ALIGNHMMLIST = hmmlist2\end{verbatim}is necessary. \htool{HERest} only needs to be invoked using thatconfiguration file.\begin{verbatim} HERest -C config -C config.2model -S trainlist -I labs -H dir1/hmacs -M dir3 hmmlist1\end{verbatim}The models in directory {\tt dir1} are updated using the alignmentmodels stored in directory {\tt dir2} and the result is written todirectory {\tt dir3}. Note that {\tt trainlist} is a standard \HTK\ script and that the above command uses the capability of HERest toaccept multiple configuration files on the command line. If each HMMis stored in a separate file, the configuration variables {\tt ALIGNMODELDIR} and {\tt ALIGNMODELEXT} can be used.Only the state level alignment is obtained using the alignment models.In the exceptional case that the update model set contains mixtures ofGaussians, component level posterior probabilities are obtained fromthe update models themselves.\mysect{Parameter Re-Estimation Formulae}{bwformulae}\index{model training!re-estimation formulae}For reference purposes, this section lists the various formulae employed within the \HTK\ parameter estimation tools. All are standard, however, the use of non-emittingstates and multiple data streams leads to various special cases which areusually not covered fully in the literature.The following notation is used in this section\begin{tabbing}++ \= ++++++++ \= \kill\> $N$ \> number of states \\\> $S$ \> number of streams \\\> $M_s$ \> number of mixture components in stream $s$\\\> $T$ \> number of observations \\\> $Q$ \> number of models in an embedded training sequence \\\> $N_q$ \> number of states in the $q$'th model in a training sequence \\\> $\bm{O}$ \> a sequence of observations \\\> $\bm{o}_t$ \> the observation at time $t$, $1 \leq t \leq T $ \\\> $\bm{o}_{st}$ \> the observation vector for stream $s$ at time $t$ \\\> $a_{ij}$ \> the probability of a transition from state $i$ to $j$ \\\> $c_{jsm}$ \> weight of mixture component $m$ in state $j$ stream $s$\\\> $\bm{\mu}_{jsm}$ \> vector of means for the mixture component $m$ of state $j$ stream $s$\\ \> $\bm{\Sigma}_{jsm}$ \> covariance matrix for the mixture component $m$ of state $j$ stream $s$ \\\> $\lambda$ \> the set of all parameters defining a HMM\end{tabbing}\subsection{Viterbi Training (\htool{HInit})}\index{model training!Viterbi formulae}In this style of model training, a set of training observations$\bm{O}^r, \;\; 1 \leq r \leq R$ is used to estimate the parameters of a single HMM by iteratively computing Viterbi alignments.When used to initialise a new HMM, the Viterbi segmentation isreplaced by a uniform segmentation (i.e.\ each trainingobservation is divided into $N$ equal segments) for the first iteration.Apart from the first iteration on a new model, each training sequence $\bm{O}$ is segmented using a state alignment procedurewhich results from maximising\[ \phi_N(T) = \max_i \phi_i(T) a_{iN}\]for $1<i<N$ where\[ \phi_j(t) = \left[ \max_i \phi_i(t-1) a_{ij} \right] b_j(\bm{o}_t)\]with initial conditions given by \[ \phi_1(1) = 1\]\[ \phi_j(1) = a_{1j} b_j(\bm{o}_1)\]for $1<j<N$. In this and all subsequent cases, the output probability $b_j(\cdot)$ is as defined inequations~\ref{e:cdpdf} and \ref{e:gnorm} in section~\ref{s:HMMparm}.If $A_{ij}$ represents the total number of transitions from state $i$ to state $j$in performing the above maximisations, then the transition probabilities canbe estimated from the relative frequencies\[ \hat{a}_{ij} = \frac{A_{ij}}{\sum_{k=2}^{N}A_{ik}}\]The sequence of states which maximises $\phi_N(T)$ implies an alignment oftraining data observations with states. Within each state, a further alignmentof observations to mixture components is made. The tool \htool{HInit} providestwo mechanisms for this: for each state and each stream\begin{enumerate}\item use clustering to allocate each observation $\bm{o}_{st}$ to one of $M_s$ clusters, or\item associate each observation $\bm{o}_{st}$ with the mixture component with the highest probability\end{enumerate}In either case, the net result is that every observation is associated with a singleunique mixture component. This association can berepresented by the indicator function $\psi^r_{jsm}(t)$ which is 1if $\bm{o}^r_{st}$ is associated with mixture component $m$ of stream $s$ of state $j$ and is zero otherwise.The means and variances are then estimated via simple averages\newcommand{\vitsum}[2]{ \sum_{r=1}^R \sum_{t=1}^{T_r} #1 \psi^r_{js#2}(t)}\[ \hat{\bm{\mu}}_{jsm} = \frac{ \vitsum{}{m}\bm{o}^r_{st}}{\vitsum{}{m}}\]\[ \hat{\bm{\Sigma}}_{jsm} = \frac{ \vitsum{}{m}(\bm{o}^r_{st} - \hat{\bm{\mu}}_{jsm}) (\bm{o}^r_{st} - \hat{\bm{\mu}}_{jsm})' }{\vitsum{}{m}}\]Finally, the mixture weights are based on the number ofobservations allocated to each component\[ \bm{c}_{jsm} = \frac{\vitsum{}{m}}{ \vitsum{\sum_{l=1}^{M_s}}{l} }\]\subsection{Forward/Backward Probabilities}\index{model training!forward/backward formulae}Baum-Welch training is similar to the Viterbi training describedin the previous section except that the \textit{hard} boundary impliedby the $\psi$ function is replaced by a \textit{soft} boundaryfunction $L$ which represents the probability of an observation beingassociated any given Gaussian mixture component. This \textit{occupation} probability is computed from the \textit{forward}and \textit{backward} probabilities.For the isolated-unit style of training, the forward probability $\alpha_j(t)$ for $1<j<N$ and$1<t \leq T$ is calculated by the forward recursion
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -