📄 models.tex

📁 隐马尔科夫模型工具箱
💻 TEX
📖 第 1 页 / 共 5 页
字号:
notation to that used for tied-mixture HMMs.  A discrete HMM\index{discrete HMMs} canhave multiple data streams but the width of each stream must be1.  The output probabilities are stored as logs in a scaled\index{discrete HMM!output probability scaling}integer format such that if $d_{js}[v]$ is the stored  discreteprobability  for symbol $v$ in stream $s$ of state $j$, the trueprobability is given by\hequation{  P_{js}[v] = exp(-d_{js}[v]/2371.8)}{dpscale}Storage in the form of scaled logs allows discrete probabilityHMMs to be implemented very efficiently since \HTK\ toolsmostly use log arithmetic  and direct storage in log formavoids the need for a run-time conversion.  Therange determined by the constant 2371.8 was selected to enable probabilities from 1.0 down to0.000001 to be stored.\putprog{tmixhmm2}{85}{HMM using Repeat Counts}{\hmmc{h}{htm} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4 \\\> \hmkw{State} 2 \hmkw{NumMixes} 5 \\\> \>   \hmkw{TMix} mix 0.2 0.1 0.3*2 0.1\\\>\hmkw{State} 3 \hmkw{NumMixes} 5 \\\>  \>  \hmkw{TMix} mix 0.4 0.3 0.1*3\\\>\hmkw{TransP} 4 \\\> \> ... \\\hmkw{EndHMM} }As an example, Fig~\href{f:dischmm} shows the definition of a  discreteHMM called \textsf{dhmm1}.  As can be seen, this has two streams.  The codebookfor stream 1 is size 10 and for stream 2, it is size 2.  For consistency withthe representation used for continuous density HMMs, these sizes are encoded in the \hmkw{NumMixes}\index{nummixes@$<$NumMixes$>$} specifier.\mysect{Input Linear Transforms}{lintran}When reading feature vectors from files HTK will coerce them to the\texttt{TARGETKIND} specified in the config file. Often the\texttt{TARGETKIND} will contain certain qualifiers (specifying forexample delta parameters). In addition to this parameter coercion itis possible to apply a linear transform before, or after, appendingdelta, acceleration and third derivative parameters.\putprog{lintran}{70}{Input Linear Transform}{\hmmc{j}{lintran.mat} \\\hmkw{MMFIdMask} *\\\hmkw{MFCC} \\\hmkw{PreQual}\\\hmkw{LinXform}\\\> \hmkw{VecSize} 2\\\> \hmkw{BlockInfo} 1 2\\\> \hmkw{Block} 1\\\> \>   \hmkw{Xform} 2 5\\\> \>   \> 1.0 0.1 0.2 0.1 0.4\\\> \>   \> 0.2 1.0 0.1 0.1 0.1}Figure~\ref{f:lintran}  shows an example linear transform. The\hmkw{PreQual} keyword specifies that the linear transformis to be applied before the delta and delta-deltaparameters specified in \texttt{TARGETKIND} are added. The defaultmode, no \hmkw{PreQual} keyword, applies the linear transform after the addition of the qualifiers.The linear transform fully supports projection from higher number of features to a smaller number of features. In the example, the parameterised data must consist of 5 \texttt{MFCC}parameters\footnote{If C0 or normalised log-energy are addedthese will be stripped prior to applying the linear transform}.The model sets that are generated using this transform havea vector size of 2.By default the linear transform is stored with the HMM. This isachieved by adding the \hmkw{InputXform} keyword and specifying thetransform or macroname. To allow compatibilty with tools onlysupporting the old format models it is possible to specify that nolinear transform is to be stored with the model. \begin{verbatim}    # Do not store linear transform    HMODEL: SAVEINPUTXFORM = FALSE\end{verbatim}In addition it is possible to specify the linear transform as a\htool{HPARM} configuration variable, \texttt{MATRTRANFN}.\begin{verbatim}    # Specifying an input linear transform    HPARM: MATTRANFN = /home/test/lintran.mat\end{verbatim}When a linear transform is specified in this form it is not necessaryto have a macroname linked with it. In this case the filenamewill be used as the macroname (having stripped the directory name)\mysect{Tee Models}{teemods}Normally, the transition probability from the non-emitting entrystate to the non-emitting exit state of a HMM will be zero to ensurethat the HMM aligns with at least one observation vector.  Models which have a non-zero entry to exit transition probability are referred to as {\it tee-models}.Tee-models\index{tee-models} are useful for modelling optional transient effectssuch as short pauses and noise bursts, particularly between words.Although most \HTK\ tools support tee-models, they are incompatible withthose that work with isolated models such as \htool{HInit} and\htool{HRest}. When a tee-model is loaded into one of these tools, itsentry to exit transition probability is reset to zero and the first row ofits transition matrix is renormalised.\putprog{tmixhmm}{80}{Tied-Mixture HMM}{\hmmc{h}{htm} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4 \\\> \hmkw{State} 2 \hmkw{NumMixes} 5 \\\> \>   \hmkw{TMix} mix 0.2 0.1 0.3 0.3 0.1\\\>\hmkw{State} 3 \hmkw{NumMixes} 5 \\\>  \>  \hmkw{TMix} mix 0.4 0.3 0.1 0.1 0.1\\\>\hmkw{TransP} 4 \\\> \>  0.0 1.0 0.0 0.0 \\\>  \>  0.0 0.5 0.5 0.0 \\\>  \>  0.0 0.0 0.6 0.4 \\\> \>   0.0 0.0 0.0 0.0 \\\hmkw{EndHMM} }\mysect{Regression Class Trees for Adaptation}{regtreemods}In order to perform adaptation \HTK\ generally \index{adaptation!regression tree}requires the use of abinary regression tree. Its use in the adaptation process is explainedin further detail in chapter~\ref{c:Adapt}.After construction (seesection~\ref{s:hhedregtree}) the terminal nodes of the binaryregression tree contain mixture component groupings orclusters. These clusters are referred to as regression baseclasses. Each mixture component in an HMM set belongs to a uniqueregression base class. The binary regression tree is stored as part ofthe HMM set, since its structure is necessary for the dynamicadaptation procedure described in section~\ref{s:reg_classes}. Also each mixture component has a regression base class identifier (theterminal node indices) stored with it. An example is shown infigure~\ref{f:macregtreedef} and corresponds with the tree shown infigure~\ref{f:regtree1}. The example shows the use of the keyword {\sf$<$HMMSetId$>$} used to store an identifier for this HMM set. This isimportant because the regression tree is built based on this HMM setand is hence specific to it. Many sets of transforms may be built thatcan be applied to this HMM set, but only one HMM set can betransformed by a transform set that utilises the regression tree.The regression tree is described by non-terminal nodes {\sf $<$Nodes$>$} andterminal nodes {\sf $<$TNodes$>$}. Each node contains its indexfollowed by either the indices of its children (if it is anon-terminal) or the number of mixture components clustered at aterminal. Each mixture component as defined by the keyword  {\sf$<$Mixture$>$} has an {\sf $<$RClass$>$} keyword followed by theregression base class index.When an HMM definition is loaded, a check is made to see that all theregression classes have been defined and that the total number ofmixture components loaded for each regression class matches the numberof mixture components defined in the regression tree.The regression tree together with the mixture regression base classnumbers can be constructed automatically with the use of the tool\htool{HHEd} (see section~\ref{s:hhedregtree}).\mysect{Binary Storage Format}{binsave}Throughout this chapter, a text-based representationhas been used for the external storage of HMMdefinitions.  For experimental work, text-based storage allowssimple and direct access to HMM parameters and this can be invaluable.However, when using very large HMM sets, storage in text formis less practical since it is inefficient in its use ofmemory and the time taken to load can be excessive due tothe large number of character to float conversions needed.To solve these problems, \HTK\ also provides a binary storage\index{HMM definition!binary storage}\index{binary storage}format.  In binary mode, keywords are written as a singlecolon followed by an 8 bit code representing the actualkeyword.  Any subsequent numerical information followingthe keyword is then in binary.  Integers are written as16-bit shorts and all floating-point numbers are writtenas 32-bit single precision floats.  The repeatfactor used in the run-length encodingscheme for tied-mixture and discrete HMMs is written asa single byte.  Its presence immediately after a 16-bitdiscrete log probability is indicated by setting the topbit to 1 (this is the reason why the range of discrete log probabilities is limited to0 to 32767 i.e.\ only 15 bits are used for the actualvalue).  For tied-mixtures, the repeat count is signalledby subtracting 2.0 from the weight.Binary storage format and text storage format can be mixedwithin and between input files.  Each time a keyword isencountered, its coding is used to determine whether thesubsequent numerical information should be input in textor binary form.  This means, for example, that binaryfiles can be manually patched by replacing a binary-formatdefinition by a text format definition\footnote{The fact thatthis is possible does not mean that it is recommended practice!}.\HTK\ tools provide a standard command line option (\texttt{-B}) to indicatethat HMM definitions should be output in binary format.Alternatively, the Boolean configuration variable \texttt{SAVEBINARY}\index{savebinary@\texttt{SAVEBINARY}} can be set to true toforce binary format output.\putprog{dischmm}{80}{Discrete Probability HMM}{\hmmt{o} \hmkw{DISCRETE} \hmkw{StreamInfo} 2 1 1 \\\hmmc{h}{dhmm1} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4  \\\> \hmkw{State} 2   \\\>\>   \hmkw{NumMixes} 10 2 \\\>\>   \hmkw{SWeights} 2 0.9 1.1 \\\>\>   \hmkw{Stream} 1 \\\> \>\>     \hmkw{DProb} 3288*4  32767*6 \\\>\>   \hmkw{Stream} 2 \\\>\> \>     \hmkw{DProb} 1644*2 \\\> \hmkw{State} 3   \\\>\>   \hmkw{NumMixes} 10 2 \\\>\>   \hmkw{SWeights} 2 0.9 1.1 \\\> \>  \hmkw{Stream} 1 \\\> \> \>    \hmkw{DProb} 5461*10 \\\> \>  \hmkw{Stream} 2 \\\> \> \>    \hmkw{DProb} 1644*2 \\\>\hmkw{TransP} 4 \\\> \>  0.0 1.0 0.0 0.0 \\ \> \>  0.0 0.5 0.5 0.0 \\ \> \>  0.0 0.0 0.6 0.4 \\\> \>   0.0 0.0 0.0 0.0 \\\hmkw{EndHMM} }\putprog{macregtreedef}{80}{MMF with a regression tree and classes}{\hmmt{o} \> \hmkw{HMMSetId} ecrl\_us\_mono \\\>    \hmkw{VecSize} 4 \hmkw{MFCC} \\\hmmc{r}{ecrl\_us\_mono\_tree\_4} \\\>    \hmkw{RegTree} 4 \\\>    \hmkw{Node} 1 2 3 \\\>    \hmkw{Node} 2 4 5 \\\>    \hmkw{Node} 3 6 7 \\\>    \hmkw{TNode} 4 30 \\\>    \hmkw{TNode} 5 25 \\\>    \hmkw{TNode} 6 40 \\\>    \hmkw{TNode} 7 39 \\\hmmc{s}{stateA} \hmkw{NumMixes} 3 \\\>    \hmkw{Mixture} 1 0.34 \\\>\>    \hmkw{RClass} 4 \\\>\>    \hmmc{u}{mean51} \\   \>\>    \hmmc{v}{var65} \\\>    \hmkw{Mixture} 2 0.52 \\\>\>    \hmkw{RClass} 7 \\\>\>    \hmmc{u}{mean32} \\   \>\>    \hmmc{v}{var65} \\\>    \hmkw{Mixture} 3 0.14 \\\>\>    \hmkw{RClass} 5 \\\>\>    \hmmc{u}{mean12} \\   \>\>    \hmmc{v}{var3}}\mysect{The HMM Definition Language}{hmmdef}To conclude this chapter,this section presents a formal\index{HMM definition!formal syntax} descriptionof the HMM definition language used by \HTK.Syntax is described using an extended BNF notation in whichalternatives are separated by a vertical bar $|$, parentheses () denotefactoring, brackets [\ ] denote options, and braces \{\} denote zero or morerepetitions. All keywords are enclosed in angle brackets\footnote{This definition covers the textual version only.  The syntax forthe binary formatis identical apart from the way that the lexical items are encoded.} andthe case of thekeyword name is not significant.White space is not significant except within double-quoted strings.The top level structure of a HMM definition is shown by the followingrule. {\sf\begin{tabbing}++++ \= ++++++++ \= ++ \= +++++++++++++++++ \= +++ \=  \kill\>  hmmdef = \> [ $\sim$h macro ] \\\>\>         $<$BeginHMM$>$ \\\>\>\>          [ globalOpts ] \\\>\>\>          $<$NumStates$>$ short \\\>\>\>          state \{ state \} \\\>\>\>          [ regTree ] \\\>\>\>          transP \\\>\>\>          [ duration ] \\\>\>         $<$EndHMM$>$ \end{tabbing}}A HMM definition consists of an optional set of global options\index{HMM definition!global options} followed bythe \hmkw{NumStates}\index{numstates@$<$NumStates$>$} keyword whose following argument specifies the number of states in the model inclusive of the non-emitting entry and exit states\footnote{Integer numbers are specified as either \textsf{char} or \textsf{short}.This has no effect on text-based definitions but for binary format it indicatesthe underlying C type used to represent the number.}.The information for each state is then given in turn, followed by the parameters of the transition matrix and the model duration parameters, if any.The name of the HMM is given by the \hmmt{h} macro.  If the HMM is theonly definition within a file, the \hmmt{h} macro name can be omittedand the HMM name is assumed to be the same as the file name.The global options\index{global options} are common to all HMMs.  They can be given separately using a \hmmt{o} option macro
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -