📄 adapt.tex
字号:
MMF is that the TMFs are considerably smaller than MMFs (especiallytriphone MMFs). Thissection describes the format of the transform model file in detail.The mean transformation matrix is stored as a block diagonaltransformation matrix.The example block diagonal matrix ${\bf A}$ shown below contains threeblocks. The first block represents the transformation for only thestatic components of the feature vector, while the second representsthe deltas and the third the accelerations. This block diagonal matrixexample makes the assumption that for the transformation, there is nocorrelation between the statics, deltas and delta deltas. In practicethis assumption works quite well.\[ {\bf A} \; = \; \left( \begin{array}{ccc} {\bf A}_s & {\bf 0} & {\bf 0} \\ {\bf 0} & {\bf A}_\Delta & {\bf 0} \\ {\bf 0} & {\bf 0} & {\bf A}_{\Delta^2} \end{array} \right)\]This format reduces the number of transformation parameters requiredto be learnt, making the adaptation process faster. It also reducesthe adaptation data required per transform when compared with the fullcase. When comparing the storage requirements, the 3 block diagonalmatrix requires much less storage capacity than the full transform matrix.Note that for convenience a full transformation matrix is also stored as a block diagonal matrix, only in this case there is a single block.The variance transformation is a diagonal matrix and as such is simplystored as a vector.\noindentFigure~\ref{f:exampletmf} shows a simple example of a TMF. In thiscase the feature vector has nine dimensions, and the mean transform hasthree diagonal blocks. The TMF can be saved in ASCII or binary format. The user header isalways output in ascii.The first two fields are speaker descriptor fields. The next field\texttt{<MMFID>}, the MMF identifier, is obtained from the globaloptions macro in the MMF, while the regression class tree identifier\texttt{<RCID>} is obtained from the regression tree macro name in theMMF.If global adaptation is being performed, then the \texttt{<RCID>} willcontain the identifier \texttt{global}, since a tree is unnecessary inthe global case.Note that the MMF and regression class tree identifiers are set within theMMF using the tool \htool{HHEd}. The final two fields are optional, but \htool{HEAdapt} outputs theseanyway for the user's convenience. These can be edited at any time (ascan all the fields if desired, but editing \texttt{<MMFID>} and\texttt{<RCID>} fields should be avoided). The \texttt{<CHAN>} fieldshould represent the adaptation data recording environment. Examplescould be a particular microphone name, telephone channel or variousbackground noise conditions. The \texttt{<DESC>} allow the user toenter any other information deemed useful. An example could be thespeaker's dialect region.\sideprog{exampletmf}{70}{A Simple example of a TMF}{\hmkw{UID} djk \\\hmkw{NAME} Dan Kershaw \\\hmkw{MMFID} ECRL\_UK\_XWRD \\\hmkw{RCID} global \\\hmkw{CHAN} Standard \\\hmkw{DESC} None \\\hmkw{NBLOCKS} 3 \\\hmkw{NODETHRESH} 700.0 \\\hmkw{NODEOCC} 1 24881.8 \\\hmkw{TRANSFORM} 1 \\\> \hmkw{MEAN\_TR} 3 \\\>\>\hmkw{BLOCK} 1 \\\>\>\> \mbox{ }0.942 -0.032 -0.001 \\\>\>\> -0.102 \mbox{ }0.922 -0.015 \\\>\>\> -0.016 \mbox{ }0.045 \mbox{ }0.910 \\\>\>\hmkw{BLOCK} 2 \\ \>\>\> \mbox{ }1.021 -0.032 -0.011 \\\>\>\> -0.017 \mbox{ }1.074 -0.043 \\\>\>\> -0.099 \mbox{ }0.091 \mbox{ }1.050 \\\>\>\hmkw{BLOCK} 3 \\ \>\>\> \mbox{ }1.028 \mbox{ }0.032 \mbox{ }0.001 \\\>\>\> -0.012 \mbox{ }1.014 -0.011 \\\>\>\> -0.091 -0.043 \mbox{ }1.041 \\\>\hmkw{BIASOFFSET} 9 \\\>\> -0.357 \mbox{ }0.001 -0.002 \mbox{ }0.132 \mbox{ }0.072 \\\>\>\mbox{ }0.006 \mbox{ }0.150 \mbox{ }0.138 \mbox{ }0.198 \\\>\hmkw{VARIANCE\_TR} 9 \\\>\> \mbox{ }0.936 \mbox{ }0.865 \mbox{ }0.848 \mbox{ }0.832 \mbox{ }0.829 \\\>\>\mbox{ }0.786 \mbox{ }0.947 \mbox{ }0.869 \mbox{ }0.912}{}Whenever a TMF is being used (in conjunction with an MMF), the MMFidentifier in the MMF is checked against that in the TMF. These\textbf{must} match since the TMF is dependent on the model set it wasconstructed from. Also unless the \texttt{<RCID>} field is set to\texttt{global}, it is also checked for consistency against theregression tree identifier in the MMF.The rest of the TMF contains a further information header, followed byall the transforms. The information header contains necessarytransform set information such as the number of blocks used, node occupationthreshold used, and the node occupation counts. Eachtransform has a regression class identifier number, the meantransformation matrix ${\bf A}$, an optional bias vector ${\bf b}$ (asin equation~\ref{e:decompmtrans}) andan optional variance transformation diagonal matrix ${\bf H}$ (stored as a vector). The example has both a bias offset and avariance transform.\mysect{Model Adaptation using MAP}{mapadapt}Model adaptation can also be accomplished using a maximum aposteriori (MAP) approach\index{adaptation!MAP}. This adaptation process is sometimesreferred to as Bayesian adaptation. MAP adaptation involves the use of prior knowledge about the model parameter distribution.Hence, if we know what the parameters of the model arelikely to be (before observing any adaptation data) using the priorknowledge, we might well be able to make good use of the limitedadaptation data, to obtain a decent MAP estimate. This type of prioris often termed an informative prior.Note that if the priordistribution indicates no preference as to what the model parametersare likely to be (a non-informative prior), then the MAP estimateobtained will be identical to that obtained using a maximum likelihoodapproach.For MAP adaptation purposes, the informative priors that are generallyused are the speaker independent model parameters. For mathematicaltractability conjugate priors are used, which results in a simpleadaptation formula. The update formula for a single stream system for state $j$ and mixture component $m$ is\hequation{\hat{\bm{\mu}}_{jm} = \frac{ N_{jm} } { N_{jm} + \tau } \bar{\bm{\mu}}_{jm} + \frac{ \tau } { N_{jm} + \tau } \bm{\mu}_{jm}}{meanmap}where $\tau$ is a weighting of the a priori knowledge to theadaptation speech data and $N$ is the occupation likelihood of theadaptation data, defined as,\[ N_{jm} = \liksum{jm}\]where $\bm{\mu}_{jm}$ is the speaker independent mean and $\bar{\bm{\mu}}_{jm}$ is the mean of the observed adaptationdata and is defined as,\[ \bar{\bm{\mu}}_{jm} = \frac{ \liksum{jm}\bm{o}^r_{t}}{\liksum{jm}}\]As can be seen, if the occupation likelihoodof a Gaussian component ($N_{jm}$) is small, then themean MAP estimate will remain close to the speakerindependent component mean. With MAP adaptation, every single meancomponent in the system is updated with a MAP estimate, based on theprior mean, the weighting and the adaptation data. Hence, MAPadaptation requires a new ``speaker-dependent'' model set to be saved.One obvious drawback to MAP adaptation is that it requires moreadaptation data to be effective when compared to MLLR, because MAPadaptation is specifically defined at the component level. Whenlarger amounts of adaptation training data become available, MAPbegins to perform better than MLLR, due to this detailed update ofeach component (rather than the pooled Gaussian transformationapproach of MLLR). In fact the two adaptation processes can becombined to improve performance still further, by using the MLLRtransformed means as the priors for MAP adaptation (by replacing$\bm{\mu}_{jm}$ in equation~\ref{e:meanmap} with the transformed meanof equation~\ref{e:mtrans}). In this casecomponents that have a low occupation likelihood in the adaptationdata, (and hence would not change much using MAP alone) have beenadapted using a regression class transform in MLLR. An example usageis shown in the following section. \pagebreak\mysect{Using \htool{HEAdapt}}{UsingHEAdapt}At the outset \htool{HEAdapt} operates in a very similar fashion to\htool{HERest}. Both use a frame/state alignmentin order to accumulate various statistics about the data. In\htool{HERest} these statistics are used to estimate new modelparameters whilst in \htool{HEAdapt} they are used to estimate the transformations for each regression base class, or new modelparameters. \htool{HEAdapt} willcurrently only produce transforms with single stream data and\texttt{PLAINHS} or \texttt{SHAREDHS} HMM systems (seesection~\ref{s:hmmsets} on HMM set kinds).In outline, \htool{HEAdapt} works as follows. On startup, \htool{HEAdapt} loads in a complete set of HMM definitions, including, theregression class tree and the base class number of each Gaussiancomponent. Note that \htool{HEAdapt} requires the MMF to contain aregression class tree. Every training file musthave an associated label file which gives a transcription for thatfile. Only the sequence of labels is used by \htool{HEAdapt},and any boundary location information is ignored. Thus, thesetranscriptions can be generated automatically from the knownorthography of what was said and a pronunciation dictionary.\centrefig{headaptrdp}{120}{File Processing in HEAdapt}\htool{HEAdapt}\index{hvite@\htool{HEAdapt}}processes each training file in turn.After loading it into memory, it uses the associated transcription to construct a composite HMM which spans the whole utterance.This composite HMM is made by concatenating instances of the phone HMMs corresponding to each label in the transcription. The Forward-Backwardalgorithm is then applied to obtain a frame/state alignment and theinformation necessary to form the standard auxiliary function is accumulated at the Gaussian componentlevel. Note that this information isdifferent from that required in \htool{HERest} (seesection~\ref{s:mllrformulae}).When all of the training files have been processed (within the staticor incremental block), the regressionbase class statistics are accumulated using the component levelstatistics. Next the regression class tree is traversed and the new regression class transformations are calculated for those regression classes containinga sufficient occupation count at the lowest level in the tree,as described in section~\ref{s:reg_classes}. Finallyeither the updated (i.e. adapted) HMM set or the transformations are output.Note that \htool{HEAdapt} produces a transforms model file (TMF) that contains transforms that are estimated to \textit{transform from theinput MMF} to a new environment/speaker based on the adaptation data presented.%\index{forward-backward!embedded}The mathematical details of the Forward-Backward algorithm are givenin section~\ref{s:bwformulae}, while the mathematical details for the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -