📄 models.tex

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 TEX
📖 第 1 页 / 共 5 页
字号:
observation sequences  whichconsist of symbols drawn from a discrete and finite set of size$M$.  As in the case of tied-mixture  systems described above,this set is often referred to as a \textit{codebook}. The form of the output distributions in a discrete HMM was given inequation~\ref{e:ddpdf}.  It consists of a table  giving theprobability of each possible observation symbol.  Each symbol isidentified by an index in the range 1 to $M$ and hence theprobability of any symbol can be determined by a simpletable look-up operation.For speechapplications,  the observation symbols are generated by a vector quantiser which typically associates a prototype speechvector with each codebook\index{codebook} symbol.  Each incoming speech vectoris then represented by the symbol whose associated prototype is closest.  The prototypes themselves are chosento cover the acoustic space and they are usually calculatedby clustering a representative sample of speech vectors. In \HTK, discrete HMMs are specified using a very similarnotation to that used for tied-mixture HMMs.  A discrete HMM\index{discrete HMMs} canhave multiple data streams but the width of each stream must be1.  The output probabilities are stored as logs in a scaled\index{discrete HMM!output probability scaling}integer format such that if $d_{js}[v]$ is the stored  discreteprobability  for symbol $v$ in stream $s$ of state $j$, the trueprobability is given by\hequation{  P_{js}[v] = exp(-d_{js}[v]/2371.8)}{dpscale}Storage in the form of scaled logs allows discrete probabilityHMMs to be implemented very efficiently since \HTK\ toolsmostly use log arithmetic  and direct storage in log formavoids the need for a run-time conversion.  Therange determined by the constant 2371.8 was selected to enable probabilities from 1.0 down to0.000001 to be stored.\putprog{tmixhmm2}{85}{HMM using Repeat Counts}{\hmmc{h}{htm} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4 \\\> \hmkw{State} 2 \hmkw{NumMixes} 5 \\\> \>   \hmkw{TMix} mix 0.2 0.1 0.3*2 0.1\\\>\hmkw{State} 3 \hmkw{NumMixes} 5 \\\>  \>  \hmkw{TMix} mix 0.4 0.3 0.1*3\\\>\hmkw{TransP} 4 \\\> \> ... \\\hmkw{EndHMM} }As an example, Fig~\href{f:dischmm} shows the definition of a  discreteHMM called \textsf{dhmm1}.  As can be seen, this has two streams.  The codebookfor stream 1 is size 10 and for stream 2, it is size 2.  For consistency withthe representation used for continuous density HMMs, these sizes are encoded in the \hmkw{NumMixes}\index{nummixes@$<$NumMixes$>$} specifier.\mysect{Input Linear Transforms}{lintran}When reading feature vectors from files HTK will coerce them to the\texttt{TARGETKIND} specified in the config file. Often the\texttt{TARGETKIND} will contain certain qualifiers (specifying forexample delta parameters). In addition to this parameter coercion itis possible to apply a linear transform before, or after, appendingdelta, acceleration and third derivative parameters.\putprog{lintran}{70}{Input Linear Transform}{\hmmc{j}{lintran.mat} \\\hmkw{MMFIdMask} *\\\hmkw{MFCC} \\\hmkw{PreQual}\\\hmkw{LinXform}\\\> \hmkw{VecSize} 2\\\> \hmkw{BlockInfo} 1 2\\\> \hmkw{Block} 1\\\> \>   \hmkw{Xform} 2 5\\\> \>   \> 1.0 0.1 0.2 0.1 0.4\\\> \>   \> 0.2 1.0 0.1 0.1 0.1}Figure~\ref{f:lintran}  shows an example linear transform. The\hmkw{PreQual} keyword specifies that the linear transformis to be applied before the delta and delta-deltaparameters specified in \texttt{TARGETKIND} are added. The defaultmode, no \hmkw{PreQual} keyword, applies the linear transform after the addition of the qualifiers.The linear transform fully supports projection from higher number of features to a smaller number of features. In the example, the parameterised data must consist of 5 \texttt{MFCC}parameters\footnote{If C0 or normalised log-energy are addedthese will be stripped prior to applying the linear transform}.The model sets that are generated using this transform havea vector size of 2.By default the linear transform is stored with the HMM. This isachieved by adding the \hmkw{InputXform} keyword and specifying thetransform or macroname. To allow compatibilty with tools onlysupporting the old format models it is possible to specify that nolinear transform is to be stored with the model. \begin{verbatim}    # Do not store linear transform    HMODEL: SAVEINPUTXFORM = FALSE\end{verbatim}In addition it is possible to specify the linear transform as a\htool{HPARM} configuration variable, \texttt{MATRTRANFN}.\begin{verbatim}    # Specifying an input linear transform    HPARM: MATTRANFN = /home/test/lintran.mat\end{verbatim}When a linear transform is specified in this form it is not necessaryto have a macroname linked with it. In this case the filenamewill be used as the macroname (having stripped the directory name)\mysect{Tee Models}{teemods}Normally, the transition probability from the non-emitting entrystate to the non-emitting exit state of a HMM will be zero to ensurethat the HMM aligns with at least one observation vector.  Models which have a non-zero entry to exit transition probability are referred to as {\it tee-models}.Tee-models\index{tee-models} are useful for modelling optional transient effectssuch as short pauses and noise bursts, particularly between words.Although most \HTK\ tools support tee-models, they are incompatible withthose that work with isolated models such as \htool{HInit} and\htool{HRest}. When a tee-model is loaded into one of these tools, itsentry to exit transition probability is reset to zero and the first row ofits transition matrix is renormalised.\putprog{tmixhmm}{80}{Tied-Mixture HMM}{\hmmc{h}{htm} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4 \\\> \hmkw{State} 2 \hmkw{NumMixes} 5 \\\> \>   \hmkw{TMix} mix 0.2 0.1 0.3 0.3 0.1\\\>\hmkw{State} 3 \hmkw{NumMixes} 5 \\\>  \>  \hmkw{TMix} mix 0.4 0.3 0.1 0.1 0.1\\\>\hmkw{TransP} 4 \\\> \>  0.0 1.0 0.0 0.0 \\\>  \>  0.0 0.5 0.5 0.0 \\\>  \>  0.0 0.0 0.6 0.4 \\\> \>   0.0 0.0 0.0 0.0 \\\hmkw{EndHMM} }\mysect{Binary Storage Format}{binsave}Throughout this chapter, a text-based representationhas been used for the external storage of HMMdefinitions.  For experimental work, text-based storage allowssimple and direct access to HMM parameters and this can be invaluable.However, when using very large HMM sets, storage in text formis less practical since it is inefficient in its use ofmemory and the time taken to load can be excessive due tothe large number of character to float conversions needed.To solve these problems, \HTK\ also provides a binary storage\index{HMM definition!binary storage}\index{binary storage}format.  In binary mode, keywords are written as a singlecolon followed by an 8 bit code representing the actualkeyword.  Any subsequent numerical information followingthe keyword is then in binary.  Integers are written as16-bit shorts and all floating-point numbers are writtenas 32-bit single precision floats.  The repeatfactor used in the run-length encodingscheme for tied-mixture and discrete HMMs is written asa single byte.  Its presence immediately after a 16-bitdiscrete log probability is indicated by setting the topbit to 1 (this is the reason why the range of discrete log probabilities is limited to0 to 32767 i.e.\ only 15 bits are used for the actualvalue).  For tied-mixtures, the repeat count is signalledby subtracting 2.0 from the weight.Binary storage format and text storage format can be mixedwithin and between input files.  Each time a keyword isencountered, its coding is used to determine whether thesubsequent numerical information should be input in textor binary form.  This means, for example, that binaryfiles can be manually patched by replacing a binary-formatdefinition by a text format definition\footnote{The fact thatthis is possible does not mean that it is recommended practice!}.\HTK\ tools provide a standard command line option (\texttt{-B}) to indicatethat HMM definitions should be output in binary format.Alternatively, the Boolean configuration variable \texttt{SAVEBINARY}\index{savebinary@\texttt{SAVEBINARY}} can be set to true toforce binary format output.\putprog{dischmm}{80}{Discrete Probability HMM}{\hmmt{o} \hmkw{DISCRETE} \hmkw{StreamInfo} 2 1 1 \\\hmmc{h}{dhmm1} \\\hmkw{BeginHMM} \\\> \hmkw{NumStates} 4  \\\> \hmkw{State} 2   \\\>\>   \hmkw{NumMixes} 10 2 \\\>\>   \hmkw{SWeights} 2 0.9 1.1 \\\>\>   \hmkw{Stream} 1 \\\> \>\>     \hmkw{DProb} 3288*4  32767*6 \\\>\>   \hmkw{Stream} 2 \\\>\> \>     \hmkw{DProb} 1644*2 \\\> \hmkw{State} 3   \\\>\>   \hmkw{NumMixes} 10 2 \\\>\>   \hmkw{SWeights} 2 0.9 1.1 \\\> \>  \hmkw{Stream} 1 \\\> \> \>    \hmkw{DProb} 5461*10 \\\> \>  \hmkw{Stream} 2 \\\> \> \>    \hmkw{DProb} 1644*2 \\\>\hmkw{TransP} 4 \\\> \>  0.0 1.0 0.0 0.0 \\ \> \>  0.0 0.5 0.5 0.0 \\ \> \>  0.0 0.0 0.6 0.4 \\\> \>   0.0 0.0 0.0 0.0 \\\hmkw{EndHMM} }\putprog{macregtreedef}{80}{MMF with a regression tree and classes}{\hmmt{o} \> \hmkw{HMMSetId} ecrl\_us\_mono \\\>    \hmkw{VecSize} 4 \hmkw{MFCC} \\\hmmc{r}{ecrl\_us\_mono\_tree\_4} \\\>    \hmkw{RegTree} 4 \\\>    \hmkw{Node} 1 2 3 \\\>    \hmkw{Node} 2 4 5 \\\>    \hmkw{Node} 3 6 7 \\\>    \hmkw{TNode} 4 30 \\\>    \hmkw{TNode} 5 25 \\\>    \hmkw{TNode} 6 40 \\\>    \hmkw{TNode} 7 39 \\\hmmc{s}{stateA} \hmkw{NumMixes} 3 \\\>    \hmkw{Mixture} 1 0.34 \\\>\>    \hmkw{RClass} 4 \\\>\>    \hmmc{u}{mean51} \\   \>\>    \hmmc{v}{var65} \\\>    \hmkw{Mixture} 2 0.52 \\\>\>    \hmkw{RClass} 7 \\\>\>    \hmmc{u}{mean32} \\   \>\>    \hmmc{v}{var65} \\\>    \hmkw{Mixture} 3 0.14 \\\>\>    \hmkw{RClass} 5 \\\>\>    \hmmc{u}{mean12} \\   \>\>    \hmmc{v}{var3}}\mysect{The HMM Definition Language}{hmmdef}To conclude this chapter,this section presents a formal\index{HMM definition!formal syntax} descriptionof the HMM definition language used by \HTK.Syntax is described using an extended BNF notation in whichalternatives are separated by a vertical bar $|$, parentheses () denotefactoring, brackets [\ ] denote options, and braces \{\} denote zero or morerepetitions. All keywords are enclosed in angle brackets\footnote{This definition covers the textual version only.  The syntax forthe binary formatis identical apart from the way that the lexical items are encoded.} andthe case of thekeyword name is not significant.White space is not significant except within double-quoted strings.The top level structure of a HMM definition is shown by the followingrule. {\sf\begin{tabbing}++++ \= ++++++++ \= ++ \= +++++++++++++++++ \= +++ \=  \kill\>  hmmdef = \> [ $\sim$h macro ] \\\>\>         $<$BeginHMM$>$ \\\>\>\>          [ globalOpts ] \\\>\>\>          $<$NumStates$>$ short \\\>\>\>          state \{ state \} \\\>\>\>          [ regTree ] \\\>\>\>          transP \\\>\>\>          [ duration ] \\\>\>         $<$EndHMM$>$ \end{tabbing}}A HMM definition consists of an optional set of global options\index{HMM definition!global options} followed bythe \hmkw{NumStates}\index{numstates@$<$NumStates$>$} keyword whose following argument specifies the number of states in the model inclusive of the non-emitting entry and exit states\footnote{Integer numbers are specified as either \textsf{char} or \textsf{short}.This has no effect on text-based definitions but for binary format it indicatesthe underlying C type used to represent the number.}.The information for each state is then given in turn, followed by the parameters of the transition matrix and the model duration parameters, if any.The name of the HMM is given by the \hmmt{h} macro.  If the HMM is theonly definition within a file, the \hmmt{h} macro name can be omittedand the HMM name is assumed to be the same as the file name.The global options\index{global options} are common to all HMMs.  They can be given separately using a \hmmt{o} option macro {\sf\begin{tabbing}++++ \= ++++++++ \= ++ \= +++++++++++++++++ \= +++ \=  \kill\> optmacro = \> $\sim$o globalOpts \end{tabbing}}\noindentor they can be included in one or more HMM definitions.  Globaloptions may be repeated but no definition can change a previous
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -