⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 .#adapt.tex.1.2

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 2
📖 第 1 页 / 共 3 页
字号:
%/* ----------------------------------------------------------- */%/*                                                             */%/*                          ___                                */%/*                       |_| | |_/   SPEECH                    */%/*                       | | | | \   RECOGNITION               */%/*                       =========   SOFTWARE                  */ %/*                                                             */%/*                                                             */%/* ----------------------------------------------------------- */%/*         Copyright: Microsoft Corporation                    */%/*          1995-2000 Redmond, Washington USA                  */%/*                    http://www.microsoft.com                 */%/*                                                             */%/*   Use of this software is governed by a License Agreement   */%/*    ** See the file License for the Conditions of Use  **    */%/*    **     This banner notice must not be removed      **    */%/*                                                             */%/* ----------------------------------------------------------- */\mychap{HMM Adaptation}{Adapt}\sidepic{headapt}{80}{Chapter~\ref{c:Training} described how the parameters are estimatedfor plain continuous density HMMs within \HTK, primarily using theembedded training tool \htool{HERest}. Using the training strategydepicted in figure~\ref{f:subword}, together with other techniques canproduce high performance speaker independent acoustic models for a large vocabulary recognition system. However it is possible to buildimproved acoustic models by tailoring a model set to a specificspeaker. By collecting data from a speaker and traininga model set on this speaker's data alone, the speaker's characteristicscan be modelled more accurately. Such systems are commonly known as \textit{speaker dependent} systems, and on a typical wordrecognition task, may have half the errors of a speakerindependent system. The drawback of speaker dependent systems is thata large amount of data (typically hours) must be collected in order toobtain sufficient model accuracy.}Rather than training speaker dependent models, \textit{adaptation} techniques can be applied. In this case, by usingonly a small amount of data from a new speaker, a good speakerindependent system model set can be adapted to better fit thecharacteristics of this new speaker.Speaker adaptation techniques can be used in various differentmodes\index{adaptation!adaptation modes}. Ifthe true transcription of the adaptation data is known thenit is termed \textit{supervised adaptation}\index{adaptation!supervised adaptation}, whereas if the adaptationdata is unlabelled then it is termed \textit{unsupervised adaptation}\index{adaptation!unsupervisedadaptation}.In the case where all the adaptation data is available in one block,e.g. from a speaker enrollment session, then this termed \textit{staticadaptation}. Alternatively adaptation can proceed incrementally asadaptation data becomes available, and this is termed \textit{incremental adaptation}.  % \htool{HVite} can provide unsupervised incremental adaptation.\HTK\ provides two tools to adapt continuous density HMMs. \htool{HERest}\index{headapt@\htool{HERest}} performs offline supervised adaptation using various forms of lineartransformation and/or maximum a-posteriori (MAP) adaptation, whileunsupervised adaptation is supported by \htool{HVite} (using onlylinear transformations).  In this case \htool{HVite} not only performsrecognition, but simultaneously adapts the model set as the databecomes available through recognition. Currently, lineartransformation adaptation can be applied in both incremental andstatic modes while MAP supports only static adaptation.This chapter describes the operation of supervised adaptation with the\htool{HERest} tools.  The first sections of the chapter give anoverview of linear transformation schemes and MAP adaptation and thisis followed by a section describing the general usages of\htool{HERest} to build simple and more complex adapted systems. Thechapter concludes with a section detailing the various formulae usedby the adaptation tool.  The use of \htool{HVite} to performunsupervised adaptation is discussed in section~\ref{s:unsup_adapt}.\mysect{Model Adaptation using Linear Transformations}{mllr}\mysubsect{Linear Transformations}{whatismllr}This section briefly discusses the forms of transform available. Notethat this form of adaptation is only available with diagonalcontinuous density HMMs.The transformation matrices are all obtained by solving a maximisationproblem using the \textit{Expectation-Maximisation} (EM)technique. Using EM results in the maximisation of a standard\textit{auxiliary function}. (Full details are available insection~\ref{s:mllrformulae}.)\mysubsub{Maximum Likelihood Linear Regression ({\tt MLLRMEAN})}{mumllr}Maximum likelihood linear regression or MLLR\index{adaptation!MLLR}computes a set of transformations that will reduce the mismatchbetween an initial model set and the adaptation data\footnote{MLLR can also be used to perform environmental compensation byreducing the mismatch due to channel or additive noise effects.}.More specifically MLLR is a model adaptation techniquethat estimates a set of linear transformations for the mean andvariance parameters of a Gaussian mixture HMM system. %The set of%transformations are estimated so to as to maximise the likelihood of the%adaptation data. The effect of these transformations is to shift thecomponent means and alter the variances in the initial system so that each state in the HMM system is more likely to generate the adaptation data.The transformation matrix used to give a new estimate of the adapted mean isgiven by\hequation{         \hat{\bm{\mu}} = \bm{W}\bm{\xi}, }{mtrans}where $\bm{W}$ is the $n \times \left( n + 1 \right)$transformation matrix (where $n$ is the dimensionality of the data)and $\bm{\xi}$ is the extended mean vector,\[        \bm{\xi} = \left[\mbox{ }w\mbox{ }\mu_1\mbox{ }\mu_2\mbox{ }\dots\mbox{ }\mu_n\mbox{ }\right]^T\]where $w$ represents a bias offset whose value is fixed (within \HTK) at 1.\\Hence $\bm{W}$ can be decomposed into\hequation{        \bm{W} = \left[\mbox{ }\bm{b}\mbox{ }\bm{A}\mbox{ }\right]}{decompmtrans}where $\bm{A}$ represents an $n \times n$transformation matrix and $\bm{b}$ represents a bias vector. Thisform of transform is referred to in the code as {\tt MLLRMEAN}.\mysubsub{Variance MLLR ({\tt MLLRVAR} and {\tt MLLRCOV})}{vmllr}There are two standard forms of linear adaptation of the variances. The firstis of the form\[        \hat{\bm{\Sigma}}_{m} = \bm{B}_m^T\bm{H}_m\bm{B}_m\]where $\bm{H}_m$ is the linear transformation to be estimated and$\bm{B}_m$ is the inverse of the Choleski factor of $\bm{\Sigma}_{m}^{-1}$,so\[         \bm{\Sigma}_{m}^{-1} = \bm{C}_m\bm{C}_m^T\]and\[        \bm{B}_m = \bm{C}_m^{-1}\]This form of transform results in an effective full covariance matrix ifthe transform matrix $\bm{H}_m$ is full. This makes likelihood calculationshighly inefficient. This form of transform is only available with a diagonal transform and in conjunction with estimating an MLLR transform. TheMLLR transform is used as a parent transform for estimating $\bm{H}_m$.This form of transform is referred to in the code as {\tt MLLRVAR}.An alternative more efficient form of variance transformation is also available.Here, the transformation of the covariance matrix is  of the form\hequation{         \hat{\bm{\Sigma}} = \bm{H}\bm{\Sigma}\bm{H}, }{covtrans}where $\bm{H}$ is the $n\times n$ covariance transformation matrix.This form of transformation, referred to in the code as {\tt MLLRCOV}can be efficiently implemented as a transformation of the means and the features.\hequation{        {\cal N}(\bm{o};\bm{\mu},\bm{H}\bm{\Sigma}\bm{H}) =        \frac{1}{|\bm{H}|^2}{\cal N}(\bm{H}^{-1}\bm{o};\bm{H}^{-1}\bm{\mu},\bm{\Sigma}) =        {|\bm{A}|^2}{\cal N}(\bm{A}\bm{o};\bm{A}\bm{\mu},\bm{\Sigma})} {covlike}where $\bm{A}=\bm{H}^{-1}$.Using this form it is possible to estimate and efficiently apply full transformations.{\tt MLLRCOV} transformations are normally estimated using {\tt MLLRMEAN} transformationsas the parent transform.\mysubsub{Constrained MLLR ({\tt CMLLR})}{cmllr}Constrained maximum likelihood linear regression or CMLLR\index{adaptation!CMLLR}computes a set of transformations that will reduce the mismatchbetween an initial model set and the adaptation data\footnote{MLLR can also be used to perform environmental compensation byreducing the mismatch due to channel or additive noise effects.}.More specifically CMLLR is a feature adaptation techniquethat estimates a set of linear transformations for the features. The effect of these transformations is to shift thefeature vector in the initial system so that each state in the HMM system is more likely to generate the adaptation data.Note that due to computational reasons, CMLLR is only implementedwithin \HTK\ for diagonal covariance, continuous densityHMMs.The transformation matrix used to give a new estimate of the adapted mean isgiven by\hequation{         \hat{\bm{o}} = \bm{W}\bm{\zeta}, }{mtrans2}where $\bm{W}$ is the $n \times \left( n + 1 \right)$transformation matrix (where $n$ is the dimensionality of the data)and $\bm{\zeta}$ is the extended observation vector,\[        \bm{\zeta} = \left[\mbox{ }w\mbox{ }o_1\mbox{ }o_2\mbox{ }\dots\mbox{ }o_n\mbox{ }\right]^T\]where $w$ represents a bias offset whose value is fixed (within \HTK) at 1.\\Hence $\bm{W}$ can be decomposed into\hequation{        \bm{W} = \left[\mbox{ }\bm{b}\mbox{ }\bm{A}\mbox{ }\right]}{decompmtrans2}where $\bm{A}$ represents an $n \times n$transformation matrix and $\bm{b}$ represents a bias vector.This form of transform is referred to in the code as {\tt CMLLR}.\mysubsect{Input/Output/Parent Transformations}{whattransform}There are three types of linear transform that may be used with the HTKTools.\begin{itemize}\item {\it Input transform}: the input transform is used to determinethe forward-backward probabilities, hence the component posteriors,for estimating model and transform parameters. MLLR transforms can beiteratively estimated by refining the posteriors using a newlyestimated transform.\item {\it Output transform}: the output transform is the transformthat is generated. The form of the transform is specified using the appropriate configuration options.\item {\it Parent transform}: the parent transform determines the model, or features, on which the model set or transform is to be generated. For transform estimation this allows {\em cascades} of transformsto be used to adapt the model parameters. For model estimation this supports {\em speaker adaptive training}. Note the current implementation only supports adaptive training with CMLLR. Any parent transform can beused when generating transforms.\end{itemize}There is no difference in the storage of the transform parameters dependingon whether it is to be a parent transform or an input transform. There isalso no restrictions on the base classes, or regression classes, that are used for each transform.\mysubsect{Base Class Definitions}{base_classes}The first requirement to allow adaptation is to specify the set ofthe components that share the same transform. This is achieved using abaseclass.  The baseclass definition files uses the same syntax fordefining components as the \htool{HHEd} command. However, forbaseclass definitions the components must always be specified.\begin{figure}[htbp]\begin{verbatim}  ~b ``global''  <MMFIDMASK> CUED_WSJ*   <PARAMETERS> MIXBASE  <NUMCLASSES> 1    <CLASS> 1  {*.state[2-4].mix[1-12]}      \end{verbatim}\caption{Global base class definition}\label{fig:globbase}\end{figure}The simplest form of transform uses a global transformation for allcomponents.  Figure~\ref{fig:globbase} shows a global transformationfor a system where there are upto 3 emitting states and upto 12Gaussian components per state.\begin{figure}[htbp]\begin{verbatim}  ~b ``baseclass_4.base''  <MMFIDMASK> CUED_WSJ*  <PARAMETERS> MIXBASE  <NUMCLASSES> 4    <CLASS> 1  {(one,sil).state[2-4].mix[1-12]}    <CLASS> 2  {two.state[2-4].mix[1-12]}    <CLASS> 3  {three.state[2-4].mix[1-12]}    <CLASS> 4  {four.state[2-4].mix[1-12]}\end{verbatim}\caption{Four base classes definition}\end{figure}These baseclasses may be directly used to determine which componentsshare a particular transform. However a more general approachis to use a regression class tree.\mysubsect{Regression Class Trees}{reg_classes}\index{adaptation!regression tree}To improve the flexibility of the adaptation process it is possible todetermine the appropriate set of baseclasses depending on the amountof adaptation data that is available. If a small amount of data isavailable then a \textit{global} adaptation transform\index{adaptation!global transforms} can be generated. A global transform (as its name suggests) is appliedto every Gaussian component in the model set. However as moreadaptation data becomes available, improved adaptation is possible byincreasing the number of transformations. Each transformation is nowmore specific and applied to certain groupings of Gaussian components.For instance the Gaussian components could be grouped into the broad phone classes: silence, vowels, stops, glides, nasals, fricatives, etc.The adaptation data could now be used to construct more specific broad

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -