📄 adapt.tex

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 TEX
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
                \liksum{jm}\bm{o}^r_{t}}{\liksum{jm}}\]As can be seen, if the occupation likelihoodof a Gaussian component ($N_{jm}$) is small, then themean MAP estimate will remain close to the speakerindependent component mean. With MAP adaptation, every single meancomponent in the system is updated with a MAP estimate, based on theprior mean, the weighting and the adaptation data. Hence, MAPadaptation requires a new ``speaker-dependent'' model set to be saved.One obvious drawback to MAP adaptation is that it requires moreadaptation data to be effective when compared to MLLR, because MAPadaptation is specifically defined at the component level. Whenlarger amounts of adaptation training data become available, MAPbegins to perform better than MLLR, due to this detailed update ofeach component (rather than the pooled Gaussian transformationapproach of MLLR). In fact the two adaptation processes can becombined to improve performance still further, by using the MLLRtransformed means as the priors for MAP adaptation (by replacing$\bm{\mu}_{jm}$ in equation~\ref{e:meanmap} with the transformed meanof equation~\ref{e:mtrans}). In this casecomponents that have a low occupation likelihood in the adaptationdata, (and hence would not change much using MAP alone) have beenadapted using a regression class transform in MLLR. An example usageis shown in the following section. %\pagebreak\mysect{Linear Transformation Estimation Formulae}{mllrformulae}For reference purposes, this section lists the various formulaeemployed within the \HTK\ adaptation tool\index{adaptation!MLLRformulae}. It is assumed throughoutthat single stream data is used and that diagonal covariances are alsoused. All are standard and can be found in various literature. The following notation is used in this section\begin{tabbing}++ \= ++++++++ \= \kill\> $\mathcal{M}$ \> the model set\\\> $\hat{\mathcal{M}}$ \> the adapted model set\\\> $T$ \> number of observations \\\> $m$ \> a mixture component \\\> $\bm{O}$      \> a sequence of $d$-dimensional observations \\\> $\bm{o}(t)$    \> the observation at time $t$, $1 \leq t \leq T $\\\> $\bm{\zeta}(t)$\> extended observation at time $t$, $1 \leq t \leq T $\\\> $\bm{\mu}_{m_r}$  \> mean vector for the mixture component $m_r$\\\> $\bm{\xi}_{m_r}$  \> extended mean vector for the mixture component $m_r$\\\> $\bm{\Sigma}_{m_r}$  \> covariance matrix for the mixture component $m_r$ \\\> $L_{m_r}(t)$ \> the occupancy probability for the mixture component $m_r$\\\>              \>   at time $t$        \end{tabbing}To enable robust transformations to be trained, the transform matricesare tied across a number of Gaussians. The set of Gaussians whichshare a transform is referred to as a regression class.  For aparticular transform case $\bm{W_r}$, the $M_r$ Gaussian components$\left\{m_1, m_2, \dots, m_{M_r}\right\}$ will be tied together, asdetermined by the regression class tree (seesection~\ref{s:reg_classes}).  The standard auxiliaryfunction shown below is used to estimate the transforms.\newcommand{\like}{L_{m_r}(t)}\begin{eqnarray}{\cal Q}({\cal M},{\hat{\cal M}}) = - \frac{1}{2}\sum_{r=1}^R\sum_{m_r=1}^{M_r}\sum_{t=1}^T\like\left[K^{(m)}+\log(|{\hat{\bm\Sigma}}_{m_r}|)+({\bm o}(t)-{\hat{\bm\mu}}_{m_r})^T{{\hat{\bm\Sigma}}}_{m_r}^{-1}({\bm o}(t)-{\hat{\bm\mu}}_{m_r})\right] \nonumber\end{eqnarray}where $K^{(m)}$ subsumes all constants and $\like$, the occupation likelihood, is defined as,\[          \like = p(q_{m_r}(t)\;|\;\mathcal{M}, \bm{O}_T)\]and $q_{m_r}(t)$ indicates the Gaussian component $m_r$ at time $t$,and $\bm{O}_T = \left\{\bm{o}(1),\dots,\bm{o}(T)\right\}$ is theadaptation data. The occupation likelihood is obtained from theforward-backward process described in section~\ref{s:bwformulae}.\mysubsect{Mean Transformation Matrix ({\tt MLLRMEAN})}{mtransest}Substituting the for expressions for MLLR mean adaptation \begin{eqnarray}\hat{\bm{\mu}}_{m_r} = \bm{W}_r\bm{\xi}_{m_r}, \:\:\:\: \hat{\bm{\Sigma}}_{m_r} = {\bm{\Sigma}}_{m_r}\end{eqnarray}into the auxiliary function, and using the fact that the covariancematrices are diagonal, yields\begin{eqnarray}{\cal Q}({\cal M},{\hat{\cal M}}) = K- \frac{1}{2}\sum_{r=1}^R\sum_{j=1}^d{\left({\bm{w}}_{rj}{\bf G}^{(j)}_r{\bm{w}}^T_{rj} -2 {\bm{w}}_{rj}{\bf k}^{(j)T}_r\right)} \nonumber\end{eqnarray}where ${\bm{w}}_{rj}$ is the $j^{th}$ {\em row} of $\bm{W}_r$,\begin{eqnarray}{\bf G}^{(i)}_r=\sum_{m_r=1}^{M_r}\frac{1}{\sigma_{m_ri}^{2}}{\bm\xi}_{m_r}{\bm\xi}^{T}_{m_r}\sum_{t=1}^T\like\label{eq:gi_mllr}\end{eqnarray}and\begin{eqnarray}{\bf k}^{(i)}_{r} =  \sum_{m_r=1}^{M_r}\sum\limits_{t=1}^T\like\frac{1}{\sigma^{2}_{m_ri}} o_i(t){\bm\xi}^{T}_{m_r}\end{eqnarray}Differentiating the auxiliary function with respect to the transform${\bm W}_r$ , and then maximising it with respect to the transformed meanyields the following update\begin{eqnarray}{{\bm{w}}}_{ri} = {\bf k}_r^{(i)}{\bf G}_r^{(i)-1} \label{eq:mllrmeansol}\end{eqnarray}The above expressions assume that each base regression class $r$ has aseparate transform. If regression class trees are used then the sharedtransform parameters may be simply estimated by combining thestatistics of the base regression classes.  The regression class treeis used to generate the classes dynamically, so it is not knowna-priori which regression classes will be used to estimate thetransform. This does not present a problem, since $\bm{G}^{(i)}$ and$\bm{k}^{(i)}$ for the chosen regression class may be obtained from itschild classes (as defined by the tree). If the parent node $R$ haschildren $\left\{R_1,\dots,R_C\right\}$ then\[        {\bf k}^{(i)} = \sum_{c=1}^{C} {\bf k}^{(i)}_{R_c}\]and\[        {\bf G}^{(i)} = \sum_{c=1}^{C} {\bf G}^{(i)}_{R_c}\]The same approach of combining statistics from multiple children canbe applied to all the estimation formulae in this section.\mysubsect{Variance Transformation Matrix ({\tt MLLRVAR}, {\tt MLLRCOV})}{vtransest}Estimation of the first variance transformation matrices is only availablefor diagonal covariance Gaussian systems in the current implementation,though full transforms can in theory be estimated. The Gaussian covariance istransformed using\footnote{In the current implementation of the code thisform of transform can only be estimated in addition to the {\tt MLLRMEAN} transform},\[        \hat{\bm{\mu}}_{m_r} = \bm{\mu}_{m_r}, \:\:\:\:         \hat{\bm{\Sigma}}_{r} = \bm{B}_{m_r}^T\bm{H}_r\bm{B}_{m_r}\]where $\bm{H}_m$ is the linear transformation to be estimated and$\bm{B}_m$ is the inverse of the Choleski factor of $\bm{\Sigma}_{m_r}^{-1}$,so\[         \bm{\Sigma}_{m_r}^{-1} = \bm{C}_{m_r}\bm{C}_{m_r}^T\]and\[        \bm{B}_{m_r} = \bm{C}_{m_r}^{-1}\]After rewriting the auxiliary function, the transform matrix $\bm{H}_m$is estimated from,\[        \bm{H}_r = \frac{ \sum_{m_r=1}^{M_r}\bm{C}_{m_r}^T                           \left[                            \like(\bm{o}(t) - \hat{\bm{\mu}}_{m_r})                                   (\bm{o}(t) - \hat{\bm{\mu}}_{m_r})^T                          \right]                          \bm{C}_{m_r}                        }                        { \like }\]Here, $\bm{H}_r$ is forced to be a diagonal transformation by settingthe off-diagonal terms to zero, which ensures that$\hat{\bm{\Sigma}}_{m_r}$ is also diagonal.The alternative form of variance adaptation us supported for full,block and diagonal transforms. Substituting the for expressions forvariance adaptation\begin{eqnarray}\hat{\bm{\mu}}_{m_r} = \bm{\mu}_{m_r}, \:\:\:\: \hat{\bm{\Sigma}}_{m_r} = {\bm H}_r{\bm{\Sigma}}_{m_r}{\bm H}_r^T\end{eqnarray}into the auxiliary function, and using the fact that the covariancematrices are diagonal yields\begin{eqnarray}{\cal Q}({\cal M},{\hat{\cal M}}) = K + \frac{1}{2}\sum_{r=1}^R\beta_r\log({\bf c}_{ri}{\bf a}_{ri}^T)-\sum_{j=1}^d{\left({\bf a}_{rj}{\bf G}^{(j)}_r{\bf a}^T_{rj}\right)} \nonumber\end{eqnarray}where \begin{eqnarray}\beta_r &=& \sum_{m_r=1}^{M_r}\sum_{t=1}^T\like\\{\bf A}_r &=& {\bf H}_r^{-1}\end{eqnarray}${\bf a}_{ri}$ is $i^{th}$ row of ${\bfA}_r$, the $1\times n$ row vector ${\bf c}_{ri}$ is the vector ofcofactors of ${\bf A}_r$, $c_{rij}={\mbox{cof}}({\bf A}_{rij})$,and  ${\bf G}^{(i)}_r$ is defined as\begin{eqnarray}{\bf G}^{(i)}_r=\sum_{m_r=1}^{M_r}\frac{1}{\sigma_{m_ri}^{2}}\sum_{t=1}^T\like(\bm{o}(t)-\hat{\bm{\mu}}_{m_r})(\bm{o}(t)-\hat{\bm{\mu}}_{m_r})^T\label{eq:gi_mllr2}\end{eqnarray}Differentiating the auxiliary function with respect to the transform${\bm A}_r$ , and then maximising it with respect to the transformed meanyields the following update\begin{eqnarray}{\bf a}_{ri} ={\bf c}_{ri}{\bf G}^{(i)-1}_r\sqrt{\left(\frac{\beta_r}{{\bfc}_{ri}{\bf G}_r^{(i)-1}{\bf c}^T_{ri}}\right)}\end{eqnarray}This is an iterative optimisation scheme as the cofactors mean the estimateof row $i$ is dependent on all the other rows (in that block). For the diagonal transform case it is of course non-iterative and simplifies tothe same form as the {\tt MLLRVAR} transform.\mysubsect{Constrained MLLR Transformation Matrix ({\tt CMLLR})}{cmllrest}Substituting the for expressions for CMLLR adaptation where\footnote{Forefficiency this transformation is implemented as\begin{eqnarray}\hat{\bm o}_r(t) = \bm{A}_r\bm{o}(t) + \bm{b}_r = \bm{W}_r\bm{\zeta}(t)\end{eqnarray}}\begin{eqnarray}\hat{\bm{\mu}}_{m_r} = \bm{H}_r\bm{\mu}_{m_r} + \tilde{\bm{b}}_r, \:\:\:\: \hat{\bm{\Sigma}}_{m_r} = {\bm H}_r{\bm{\Sigma}}_{m_r}{\bm H}_r^T\end{eqnarray}into the auxiliary function, and using the fact that the covariancematrices are diagonal yields\begin{eqnarray}{\cal Q}({\cal M},{\hat{\cal M}}) = K + \frac{1}{2}\sum_{r=1}^R\left[\beta\log({\bf p}_{ri}\bm{w}_{ri}^T)-\sum_{j=1}^d{\left(\bm{w}_{rj}{\bf G}^{(j)}_r\bm{w}^T_{rj} - 2\bm{w}_{rj}{\bf k}^{(j)}_r\right)}\right] \nonumber\end{eqnarray}where \begin{eqnarray}\bm{W}_r = \left[\begin{array}{c c}-\bm{A}_r\tilde{\bm{b}}_r & \bm{H}_r^{-1} \end{array}\right] =  \left[\begin{array}{c c}\bm{b} & \bm{A} \end{array}\right]\end{eqnarray}$\bm{w}_{ri}$ is $i^{th}$ row of $\bm{W}_r$, the $1\times n$ row vector ${\bf p}_{ri}$ is the zero  extended vector of cofactors of ${\bf A}_r$, ${\bf G}^{(i)}_r$ and ${\bf k}^{(i)}_r$ are defined as\begin{eqnarray}{\bf G}^{(i)}_r=\sum_{m_r=1}^{M_r}\frac{1}{\sigma_{m_ri}^{2}}\sum_{t=1}^T\like{\bm\zeta}(t){\bm\zeta}^{T}(t)\label{eq:gi_mllr3}\end{eqnarray}and \begin{eqnarray}{\bf k}^{(i)}_r=\sum_{m_r=1}^{M_r}\frac{\mu_{m_ri}}{\sigma_{m_ri}^{2}}\sum_{t=1}^T\like{\bm\zeta}^{T}(t)\label{eq:gi_mllr4}\end{eqnarray}Differentiating the auxiliary function with respect to the transform$\bm{W}_r$ , and then maximising it with respect to the transformed meanyields the following update\begin{eqnarray}\bm{w}_{ri} = \left(\alpha{{\bf p}_{ri}} + {\bf k}^{(i)}_r\right){\bf G}^{(i)-1}_r\end{eqnarray}where $\alpha$ satisfies\begin{eqnarray}\alpha^2{\bf p}_{ri}{\bf G}^{(i)-1}_r{\bf p}_{ri}^T +\alpha{\bf p}_{ri}{\bf G}^{(i)-1}_r{\bf k}^{(i)T}_r - \beta=0\label{eq:alpha_quad}\end{eqnarray}There are thus two possible solutions for $\alpha$. The solutions thatyields the maximum increase in the auxiliary function (obtained bysimply substituting in the two options) is used. This is an iterativeoptimisation scheme as the cofactors mean the estimate of row $i$ isdependent on all the other rows (in that block).%%% Local Variables: %%% mode: plain-tex%%% TeX-master: "htkbook"%%% End:
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -