📄 adapt.tex
字号:
%/* ----------------------------------------------------------- */%/* */%/* ___ */%/* |_| | |_/ SPEECH */%/* | | | | \ RECOGNITION */%/* ========= SOFTWARE */ %/* */%/* */%/* ----------------------------------------------------------- */%/* Copyright: Microsoft Corporation */%/* 1995-2000 Redmond, Washington USA */%/* http://www.microsoft.com */%/* */%/* Use of this software is governed by a License Agreement */%/* ** See the file License for the Conditions of Use ** */%/* ** This banner notice must not be removed ** */%/* */%/* ----------------------------------------------------------- */\mychap{HMM Adaptation}{Adapt}\sidepic{headapt}{80}{Chapter~\ref{c:Training} described how the parameters are estimatedfor plain continuous density HMMs within \HTK, primarily using theembedded training tool \htool{HERest}. Using the training strategydepicted in figure~\ref{f:subword}, together with other techniques canproduce high performance speaker independent acoustic models for a large vocabulary recognition system. However it is possible to buildimproved acoustic models by tailoring a model set to a specificspeaker. By collecting data from a speaker and traininga model set on this speaker's data alone, the speaker's characteristicscan be modelled more accurately. Such systems are commonly known as \textit{speaker dependent} systems, and on a typical wordrecognition task, may have half the errors of a speakerindependent system. The drawback of speaker dependent systems is thata large amount of data (typically hours) must be collected in order toobtain sufficient model accuracy.}Rather than training speaker dependent models, \textit{adaptation} techniques can be applied. In this case, by usingonly a small amount of data from a new speaker, a good speakerindependent system model set can be adapted to better fit thecharacteristics of this new speaker.Speaker adaptation techniques can be used in various differentmodes\index{adaptation!adaptation modes}. Ifthe true transcription of the adaptation data is known thenit is termed \textit{supervised adaptation}\index{adaptation!supervised adaptation}, whereas if the adaptationdata is unlabelled then it is termed \textit{unsupervised adaptation}\index{adaptation!unsupervisedadaptation}.In the case where all the adaptation data is available in one block,e.g. from a speaker enrollment session, then this termed \textit{staticadaptation}. Alternatively adaptation can proceed incrementally asadaptation data becomes available, and this is termed \textit{incremental adaptation}. % \htool{HVite} can provide unsupervised incremental adaptation.\HTK\ provides two tools to adapt continuous density HMMs. \htool{HEAdapt}\index{headapt@\htool{HEAdapt}} performs offline supervised adaptationusing maximum likelihood linear regression (MLLR) and/or maximum a-posteriori (MAP) adaptation, while unsupervised adaptation is supported by \htool{HVite} (using only MLLR).In this case \htool{HVite} not only performs recognition, butsimultaneously adapts the model set as the data becomes availablethrough recognition. Currently, MLLR adaptation can be applied in bothincremental and static modes while MAP supports only staticadaptation. If MLLR and MAP adaptation is to be performedsimultaneously using \htool{HEAdapt} in the same pass, then the restriction is that the entireadaptation must be performed statically\footnote{By using two passes,one could perform incremental MLLR in the first pass (saving the newmodel or transform), followed by a second pass, this time using MAP adaptation.}.This chapter describes the supervised adaptation tool \htool{HEAdapt}.The first sections of the chaptergive an overview of MLLR and MAP adaptation and this is followed by a section describing the generalusages of \htool{HEAdapt} to build simple and more complex adaptedsystems. The chapter concludes with a section detailing the variousformulae used by the adaptation tool.The use of \htool{HVite} to perform unsupervised adaptation is discussed in section~\ref{s:unsup_adapt}. \mysect{Model Adaptation using MLLR}{mllr}\mysubsect{Maximum Likelihood Linear Regression}{whatismllr}Maximum likelihood linear regression or MLLR\index{adaptation!MLLR}computes a set of transformations that will reduce the mismatchbetween an initial model set and the adaptation data\footnote{MLLR can also be used to perform environmental compensation byreducing the mismatch due to channel or additive noise effects.}.More specifically MLLR is a model adaptation techniquethat estimates a set of linear transformations for the mean andvariance parameters of a Gaussian mixture HMM system. %The set of%transformations are estimated so to as to maximise the likelihood of the%adaptation data. The effect of these transformations is to shift thecomponent means and alter the variances in the initial system so that each state in the HMM system is more likely to generate the adaptation data.Note that due to computational reasons, MLLR is only implementedwithin \HTK\ for diagonal covariance, single stream, continuous densityHMMs.The transformation matrix used to give a new estimate of the adapted mean isgiven by\hequation{ \hat{\bm{\mu}} = \bm{W}\bm{\xi}, }{mtrans}where $\bm{W}$ is the $n \times \left( n + 1 \right)$transformation matrix (where $n$ is the dimensionality of the data)and $\bm{\xi}$ is the extended mean vector,\[ \bm{\xi} = \left[\mbox{ }w\mbox{ }\mu_1\mbox{ }\mu_2\mbox{ }\dots\mbox{ }\mu_n\mbox{ }\right]^T\]where $w$ represents a bias offset whose value is fixed (within \HTK) at 1.\\Hence $\bm{W}$ can be decomposed into\hequation{ \bm{W} = \left[\mbox{ }\bm{b}\mbox{ }\bm{A}\mbox{ }\right]}{decompmtrans}where $\bm{A}$ represents an $n \times n$transformation matrix and $\bm{b}$ represents a bias vector.The transformation matrix $\bm{W}$ is obtained by solving amaximisation problem using the \textit{Expectation-Maximisation}(EM) technique. This technique is also used to compute the variancetransformation matrix. Using EM results in the maximisation of astandard \textit{auxiliary function}. (Full details are available insection~\ref{s:mllrformulae}.)\mysubsect{MLLR and Regression Classes}{reg_classes}\index{adaptation!regression tree}This adaptation method can be applied in a very flexible manner,depending on the amount of adaptation data that is available. If asmall amount of data is available then a \textit{global} adaptation transform \index{adaptation!global transforms} can be generated. A global transform (as its name suggests) is appliedto every Gaussian component in the model set. However as moreadaptation data becomes available, improved adaptation is possible byincreasing the number of transformations. Each transformation is nowmore specific and applied to certain groupings of Gaussian components.For instance the Gaussian components could be grouped into the broad phone classes: silence, vowels, stops, glides, nasals, fricatives, etc.The adaptation data could now be used to construct more specific broadclass transforms to apply to these groupings.Rather than specifying static component groupings or classes, a robustand dynamic method is used for the construction of further transformationsas more adaptation data becomes available. MLLR makesuse of a \textit{regression class tree} to group the Gaussians in themodel set, so that the set of transformations to be estimated can bechosen according to the amount and type of adaptation data that isavailable. The tying of each transformation across a number of mixturecomponents makes it possible to adapt distributions for which therewere no observations at all. With this process all models can beadapted and the adaptation process is dynamically refined when moreadaptation data becomes available.\\The regression class tree is constructed so as to clustertogether components that are close in acousticspace, so that similar components can be transformed in a similar way.Note that the tree is built using the original speaker independentmodel set, and is thus independent of any new speaker.The tree is constructed with a centroid splitting algorithm, which uses a Euclidean distance measure. For more details seesection~\ref{s:hhedregtree}.The terminal nodes or leaves of the tree specify the final componentgroupings, and are termed the \textit{base(regression) classes}. Each Gaussian component of a model set belongs to one particular base class. The tool \htool{HHEd} canbe used to build a binary regression class tree, and to label eachcomponent with a base class number. Both the tree and component baseclass numbers are saved automatically as part of the MMF. Please refer tosection~\ref{s:regtreemods} and section~\ref{s:hhedregtree} forfurther details.\sidefig{regtree1}{55}{A binary regression tree}{4}{}Figure~\ref{f:regtree1} shows a simple example of a binary regressiontree with four base classes, denoted as $\{C_4, C_5, C_6,C_7\}$. During ``dynamic'' adaptation, theoccupation counts are accumulated for each of the regression baseclasses. The diagram shows a solid arrow and circle(or node), indicating that there is sufficient data for a transformationmatrix to be generated using the data associated with that class. Adotted line and circle indicates that there is insufficientdata. For example neither node 6 or 7 has sufficient data; howeverwhen pooled at node 3, there is sufficient adaptation data. The amount of data that is ``determined'' as sufficient is setby the user as a command-line option to \htool{HEAdapt} (see referencesection~\ref{s:HEAdapt}).\htool{HEAdapt} uses a top-down approach to traverse the regressionclass tree. Here the search starts at the root node and progressesdown the tree generating transforms only for those nodes which\begin{enumerate}\item have sufficient data \textbf{and}\item are either terminal nodes (i.e. base classes) \textbf{or} haveany children without sufficient data.\end{enumerate}In the example shown in figure~\ref{f:regtree1}, transforms are constructedonly for regression nodes 2, 3 and 4, which can be denoted as${\bf W}_2$, ${\bf W}_3$ and ${\bf W}_4$. Hence when the transformedmodel set is required, the transformation matrices (mean and variance)are applied in the following fashion to the Gaussian components ineach base class:-\[ \left\{ \begin{array}{ccl} {\bf W}_2 & \rightarrow & \left\{C_5\right\} \\ {\bf W}_3 & \rightarrow & \left\{C_6, C_7\right\} \\ {\bf W}_4 & \rightarrow & \left\{C_4\right\} \end{array} \right\}\]At this point it is interesting to note that the global adaptationcase is the same as a tree with just a root node, and is in facttreated as such.\mysubsect{Transform Model File Format}{tmfs}\htool{HEAdapt} estimates the required transformation statistics and caneither output a transformed MMF or a transform model file (TMF)\index{adaptation!transform model file}. Theadvantage in storing the transforms as opposed to an adapted
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -