📄 adapt.tex

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 TEX
📖 第 1 页 / 共 3 页
字号:
class transforms to apply to these groupings.Rather than specifying static component groupings or classes, a robustand dynamic method is used for the construction of further transformationsas more adaptation data becomes available. MLLR makesuse of a \textit{regression class tree} to group the Gaussians in themodel set, so that the set of transformations to be estimated can bechosen according to the amount and type of adaptation data that isavailable. The tying of each transformation across a number of mixturecomponents makes it possible to adapt distributions for which therewere no observations at all. With this process all models can beadapted and the adaptation process is dynamically refined when moreadaptation data becomes available.\\The regression class tree is constructed so as to cluster togethercomponents that are close in acoustic space, so that similarcomponents can be transformed in a similar way.  Note that the tree isbuilt using the original speaker independent model set, and is thusindependent of any new speaker.  The tree is constructed with acentroid splitting algorithm, which uses a Euclidean distancemeasure. For more details see section~\ref{s:hhedregtree}.  Theterminal nodes or leaves of the tree specify the final componentgroupings, and are termed the \textit{base (regression) classes}. EachGaussian component of a model set belongs to one particular baseclass. The tool \htool{HHEd} can be used to build a binary regressionclass tree, and to label each component with a base class number.Both the tree and component base class numbers can be saved as part ofthe MMF, or simply stored separately. Please refer tosection~\ref{s:hhedregtree} for further details.\sidefig{regtree1}{55}{A binary regression tree}{4}{}Figure~\ref{f:regtree1} shows a simple example of a binary regressiontree with four base classes, denoted as $\{C_4, C_5, C_6,C_7\}$. During ``dynamic'' adaptation, the occupation counts areaccumulated for each of the regression base classes. The diagram showsa solid arrow and circle (or node), indicating that there issufficient data for a transformation matrix to be generated using thedata associated with that class. A dotted line and circle indicatesthat there is insufficient data. For example neither node 6 or 7 hassufficient data; however when pooled at node 3, there is sufficientadaptation data.  The amount of data that is ``determined'' assufficient is set as a configuration option for \htool{HERest} (seereference section~\ref{s:HERest}).\htool{HERest} uses a top-down approach to traverse the regressionclass tree. Here the search starts at the root node and progressesdown the tree generating transforms only for those nodes which\begin{enumerate}\item have sufficient data \textbf{and}\item are either terminal nodes (i.e. base classes) \textbf{or} haveany children without sufficient data.\end{enumerate}In the example shown in figure~\ref{f:regtree1}, transforms are constructedonly for regression nodes 2, 3 and 4, which can be denoted as${\bf W}_2$, ${\bf W}_3$ and ${\bf W}_4$. Hence when the transformedmodel set is required, the transformation matrices (mean and variance)are applied in the following fashion to the Gaussian components ineach base class:-\[        \left\{        \begin{array}{ccl}                {\bf W}_2 & \rightarrow & \left\{C_5\right\} \\                {\bf W}_3 & \rightarrow & \left\{C_6, C_7\right\} \\                {\bf W}_4 & \rightarrow & \left\{C_4\right\}        \end{array}        \right\}\]At this point it is interesting to note that the global adaptationcase is the same as a tree with just a root node, and is in facttreated as such.\begin{center}\begin{figure}\begin{verbatim}    ~r "regtree_4.tree"    <BASECLASS>~b "baseclass_4.base"    <NODE> 1 2 2 3     <NODE> 2 2 4 5     <NODE> 3 2 6 7     <TNODE> 4 1 1    <TNODE> 5 1 2    <TNODE> 6 1 3    <TNODE> 7 1 4\end{verbatim}\caption{Regression class tree example}\label{fig:regtree}\end{figure}\end{center}An example of a regression class tree is shown in figure~\ref{fig:regtree}.This uses the four baseclasses from the baseclass macro ``baseclass\_4.base''.A binary regression tree is shown, thus there are 4 terminal nodes. \mysubsect{Linear Transform Format}{tmfs}\htool{HERest} estimates the required transformation statistics and caneither output a set of transformation models, or a single transformmodel file (TMF)\index{adaptation!transform model file}. The advantagein storing the transforms as opposed to an adapted MMF is that theTMFs are considerably smaller than MMFs (especially triphoneMMFs). This section describes the format that the transforms are stored in.\noindent\begin{figure}[htbp]\begin{verbatim}    ~a ``cued''    <ADAPTKIND> CLASS    <BASECLASSES> ~b ``global''    <XFORMSET>        <XFORMKIND> CMLLR      <NUMXFORMS> 1      <LINXFORM> 1 <VECSIZE> 5        <OFFSET>         <BIAS> 5            -0.357 0.001 -0.002 0.132 0.072        <LOGDET> ????        <BLOCKINFO> 2 3 2        <BLOCK> 1          <XFORM> 3 3             0.942 -0.032 -0.001            -0.102  0.922 -0.015            -0.016  0.045  0.910        <BLOCK> 2          <XFORM> 2 2             1.028 -0.032            -0.017  1.041     <XFORMWGTSET>      <CLASSXFORM> 1 1\end{verbatim}\caption{Example Constrained MLLR transform using hard weights}\end{figure}Figure~\ref{fig:hiermllr} shows the format of a single transform. Inthe same fashion as HMMs all transforms are stored as macros. Theheader information gives how the transform was estimated, currentlyeither with a regression class tree {\tt TREE} or directly using thebase classes {\tt BASE}. The base class macro is then specified. Theform of transformation is then described in the transformset. The codecurrently supports constrained MLLR (illustrated), MLLR meanadaptation, MLLR full variance adaptation and diagonal varianceadaptation.  Arbitrary block structures are allowable.  The assignmentof base class to transform number is specified at the end of the file.\mysubsect{Hierarchy of Transform}{hieradapt}It is possible to specify a hierarchy of transformations. This resultsfrom using a parent transform during the training process.\begin{figure}[htbp]\begin{verbatim}  ~a ``mjfg''  <ADAPTKIND> TREE  <BASECLASSES> ~b ``baseclass_4.base''  <PARENTXFORM> ~a ``cued''  <XFORMSET>    <XFORMKIND> MLLRMEAN  <NUMXFORMS> 2  <LINXFORM> 1 <VECSIZE> 5    <OFFSET>       <BIAS> 5        -0.357 0.001 -0.002 0.132 0.072    <BLOCKINFO> 2 3 2    <BLOCK> 1      <XFORM> 3 3         0.942 -0.032 -0.001        -0.102  0.922 -0.015        -0.016  0.045  0.910    <BLOCK> 2      <XFORM> 2 2         1.028 -0.032        -0.017  1.041   <LINXFORM> 2 <VECSIZE> 5    <OFFSET>       <BIAS> 5        -0.357 0.001 -0.002 0.132 0.072    <BLOCKINFO> 2 3 2    <BLOCK> 1      <XFORM> 3 3         0.942 -0.032 -0.001        -0.102  0.922 -0.015        -0.016  0.045  0.910    <BLOCK> 2      <XFORM> 2 2         1.028 -0.032        -0.017  1.041   <XFORMWGTSET>    <CLASSXFORM> 1 1    <CLASSXFORM> 2 1    <CLASSXFORM> 3 1    <CLASSXFORM> 4 2\end{verbatim}\caption{Example of an MLLR transform using with a parent transform}\label{fig:hiermllr}\end{figure}Figure~\ref{fig:hiermllr} shows the use of a set of MLLR transformsgenerated using a parent CMLLR transform stored in the macro ``cued''. Theaction of this transform is\begin{enumerate}\item Apply transform {\tt cued}\item Apply transform {\tt mjfg}\end{enumerate}The parent transform is always applied {\it before} the transformitself.Hierarchy of transforms automatically result from using a parent transformwhen estimating a transform. \mysubsect{Mutiple Stream Systems}{streamadapt}The specification of the base-class components are given in termsof the Gaussian component. In HTK this is specified for a particularstream of the HMM state. When multiple streams are used there are two situations to consider\footnote{The current code in \htool{HHEd} for generating decision trees does not support generating trees for multiple streams. However, the code does support adaptation for hand generated trees.}.First, if the streams have the same number of components, then transformsmay be shared between different streams. For example it may be decided thatthe same linear transform is to be used by the static stream, the deltastream and the delta-delta stream.Second, if the streams have different dimensions associated with them. Forthis case the root node is a special node for which a transform cannot be generated. It is required to partition the Gaussian components so that all subsequent nodes have the same dimensionality associated with them.\mysect{Adaptive Training with Linear Transforms}{adapttrain}In order to improve the performance of systems when there are multiplespeakers, or acoustic environments, present in the training corpus adaptive training may be used. Here, rather than using adaptationtransformations only during test, adaptation transforms are estimatedfor each training speaker. The model, sometimes referred to as a {\em canonicalmodel}, is then estimated given the set of speaker transforms. In thesame fashion as standard training, the whole process can then be repeated.In the current implementation, adaptive training is only supportedwith constrained MLLR as the transform for each speaker. As CMLLR isimplemented as one, or more, feature-space transformations. Theestimation formulae in section~\ref{s:bwformulae} are simplifiedmodified to accumulate statistics using$\bm{A}^{(i)}\bm{o}+\bm{b}^{(i)}$ for all the data from speaker $i$rather than $\bm{o}$. The update formula for $\bm{\mu}_{jsm}$ thenbecomes\newcommand{\satliksum}[1]{                  \sum_{i=1}^I\sum_{r=1}^{R^i}  \sum_{t=1}^{T_r} L^r_{#1}(t)}\[   \hat{\bm{\mu}}_{jsm} = \frac{                \satliksum{jsm}(\bm{A}^{(i)}\bm{o}^r_{st}+\bm{b}^{(i)})}{\satliksum{jsm}}\]Specifying that adaptive training is to be used simply requires specifyingthe parent transform that the model set should be built on. Note that usuallythe parent transform will also be used as an input transform.\mysect{Model Adaptation using MAP}{mapadapt}Model adaptation can also be accomplished using a maximum aposteriori (MAP) approach\index{adaptation!MAP}. This adaptation process is sometimesreferred to as Bayesian adaptation. MAP adaptation involves the use of prior knowledge about the model parameter distribution.Hence, if we know what the parameters of the model arelikely to be (before observing any adaptation data) using the priorknowledge, we might well be able to make good use of the limitedadaptation data, to obtain a decent MAP estimate. This type of prioris often termed an informative prior.Note that if the priordistribution indicates no preference as to what the model parametersare likely to be (a non-informative prior), then the MAP estimateobtained will be identical to that obtained using a maximum likelihoodapproach.For MAP adaptation purposes, the informative priors that are generallyused are the speaker independent model parameters. For mathematicaltractability conjugate priors are used, which results in a simpleadaptation formula. The update formula for a single stream system for state $j$ and mixture component $m$ is\hequation{\hat{\bm{\mu}}_{jm} = \frac{ N_{jm} } { N_{jm} + \tau } \bar{\bm{\mu}}_{jm} +                       \frac{ \tau } { N_{jm} + \tau } \bm{\mu}_{jm}}{meanmap}where $\tau$ is a weighting of the a priori knowledge to theadaptation speech data and $N$ is the occupation likelihood of theadaptation data, defined as,\[   N_{jm} = \liksum{jm}\]where $\bm{\mu}_{jm}$ is the speaker independent mean and $\bar{\bm{\mu}}_{jm}$ is the mean of the observed adaptationdata and is defined as,\[   \bar{\bm{\mu}}_{jm} = \frac{
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -