⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 discmods.tex

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 TEX
📖 第 1 页 / 共 2 页
字号:
%/* ----------------------------------------------------------- */%/*                                                             */%/*                          ___                                */%/*                       |_| | |_/   SPEECH                    */%/*                       | | | | \   RECOGNITION               */%/*                       =========   SOFTWARE                  */ %/*                                                             */%/*                                                             */%/* ----------------------------------------------------------- */%/*         Copyright: Microsoft Corporation                    */%/*          1995-2000 Redmond, Washington USA                  */%/*                    http://www.microsoft.com                */%/*                                                             */%/*   Use of this software is governed by a License Agreement   */%/*    ** See the file License for the Conditions of Use  **    */%/*    **     This banner notice must not be removed      **    */%/*                                                             */%/* ----------------------------------------------------------- */%% HTKBook - Steve Young 15/11/95%\mychap{Discrete and Tied-Mixture Models}{discmods}\sidepic{Tool.disc}{80}{ Most of the discussion so far has focussed on using \HTK\to model sequences of continuous-valued vectors.  In contrast, this chapteris mainly concerned with using \HTK\ to model sequences of discretesymbols.  Discrete symbols arise naturally in modelling many types ofdata, for example, letters and words, bitmap images, and DNA sequences.Continuous signals can also be converted to discrete symbol sequencesby using a quantiser and in particular, speech vectors can be\textit{vector quantised} as described in section~\ref{s:vquant}.In all cases, \HTK\ expects a set of $N$ discrete symbols to be represented bythe contiguous sequence of integer numbers from 1 to $N$.}\index{discrete HMMs}In \HTK\ discrete probabilities are regarded as being closely analogousto the mixture weights of a continuous density system.  As a consequence,the representation and processing of discrete HMMs shares a great dealwith continuous density models.  It follows from this that most of theprinciples and practice developed already are equally applicable todiscrete systems.  As a consequence, this chapter can be quite brief.The first topic covered concerns building HMMs for discretesymbol sequences.  The use of discrete HMMs with speech is thenpresented.  The tool \htool{HQuant} is described and the methodof converting continuous speech vectors to discrete symbols isreviewed.  This is followed by a brief discussion of tied-mixture systems which can be regarded as a compromise between continuous and discrete density systems.Finally, the use of the \HTK\ tool \htool{HSmooth} forparameter smoothing by deleted interpolation is presented.\mysect{Modelling Discrete Sequences}{discseq}Building HMMs for discrete symbol sequences is essentially the sameas described previously for continuous density systems. Firstly, a prototype HMM definition must be specified in orderto fix the model topology.  For example, the followingis a 3 state ergodic HMM in which the emitting statesare fully connected.\begin{verbatim}    ~o <DISCRETE> <StreamInfo> 1 1     ~h "dproto"    <BeginHMM>       <NumStates> 5        <State> 2 <NumMixes> 10          <DProb> 5461*10       <State> 3 <NumMixes> 10          <DProb> 5461*10       <State> 4 <NumMixes> 10          <DProb> 5461*10       <TransP> 5           0.0 1.0 0.0 0.0 0.0           0.0 0.3 0.3 0.3 0.1           0.0 0.3 0.3 0.3 0.1           0.0 0.3 0.3 0.3 0.1           0.0 0.0 0.0 0.0 0.0    <EndHMM>\end{verbatim}As described in chapter~\ref{c:HMMDefs}, the notation for discreteHMMs borrows heavily on that used for continuous density modelsby equating mixture components with symbol indices.  Thus,this definition assumes that each training data sequence containsa single stream of symbols indexed from 1 to 10.  In this example,all symbols in each state have been set to be equally likely\footnote{Remember that discrete probabilities are scaled such that32767 is equivalent to a probability of 0.000001 and 0 is equivalent to a probability of 1.0}.  If prior information is available then this can of course be usedto set these initial values.The training data needed to build a discrete HMM can take one of two forms. Itcan either be discrete (\texttt{SOURCEKIND=DISCRETE}) in which case it consistsof a sequence of 2-byte integer symbol indices.  Alternatively, it can consistof continuous parameter vectors with an associated VQ codebook.  This lattercase is dealt with in the next section.  Here it will be assumed that the datais symbolic and that it is therefore stored in discrete form.\index{discretedata} Given a set of training files listed in the script file\texttt{train.scp}, an initial HMM could be estimated using\begin{verbatim}    HInit -T 1 -w 1.0 -o dhmm -S train.scp -M hmm0 dproto\end{verbatim}This use of \htool{HInit} is identical to that which would beused for building whole word HMMs where no associated label file isassumed and the whole of each training sequence is used to estimatethe HMM parameters.  Its effect is to read in the prototypestored in the file \texttt{dproto} and then use the training examplesto estimate initial values for the output distributionsand transition probabilities.  This is done by firstly uniformly segmenting the data and for each segment counting the number of occurrencesof each symbol.  These counts are then normalised to provide output distributionsfor each state.  \htool{HInit} then uses the Viterbi algorithm to resegmentthe data and recompute the parameters.  This is repeated until convergenceis achieved or an upper limit on the iteration count is reached.The transition probabilities at each step are estimated simply bycounting the number of times that each transition is made in the Viterbi alignmentsand normalising.  The final model is renamed \texttt{dhmm} and stored inthe directory \texttt{hmm0}.When building discrete HMMs, it is important to floor the discreteprobabilites so that no symbol has a zero probability.  This is achieved using the \texttt{-w} option which specifies a floor valueas a multiple of a global constant called \texttt{MINMIX} whosevalue is $10^{-5}$. The initialised HMM created by \htool{HInit} can then be further refined if desired by using \htool{HRest}to perform Baum-Welch re-estimation.  It would be invoked in a similarway to the above except that there is now no need to rename the model.For example,\begin{verbatim}    HRest -T 1 -w 1.0 -S train.scp -M hmm1 hmm0/dhmm\end{verbatim}would read in the model stored in \texttt{hmm0/dhmm} and write out a newmodel of the same name to the directory \texttt{hmm1}.\mysect{Using Discrete Models with Speech}{speechvq}As noted in section~\ref{s:vquant}, discrete HMMs can be used to modelspeech by using a vector quantiser to map continuous density vectors intodiscrete symbols.  A vector quantiser depends on a so-called \textit{codebook} which defines a set of partitions of the vector space.  Each partitionis represented by the mean value of the speech vectors belongingto that partition and optionally  a variance representing the spread.Each incoming speech vector is thenmatched with each partition and assigned the index correspondingto the partition which is closest using a Mahanalobis distance metric.In \HTK\ such a codebook can be built using the tool \htool{HQuant}.  This tooltakes as input a set of continuous speech vectors, clusters them and usesthe centroid and optionally the variance of each cluster to definethe partitions.  \htool{HQuant} can build both linear and tree structuredcodebooks.  To build a linear codebook, all training vectors are initiallyplaced in one cluster and the mean calculated.  The mean is then perturbedto give two means and the training vectors are partitioned according towhich mean is nearest to them.  The means are then recalculated and thedata is repartitioned.  At each cycle, the total distortion (i.e. totaldistance between the cluster members and the mean) is recorded and repartitioningcontinues until there is no significant reduction in distortion.  The wholeprocess then repeats by perturbing the mean of the cluster with the highestdistortion.  This continues until the required number of clusters have beenfound.Since all training vectors are reallocated at every cycle, this is anexpensive algorithm to compute.  The maximum number of iterations withinany single cluster increment can be limited using the configurationvariable \texttt{MAXCLUSTITER}\index{maxclustiter@\texttt{MAXCLUSTITER}} and although this can speed-up the computationsignificantly, the overall training process is still computationally expensive.Once built, vector quantisation is performed by scanning all codebookentries and finding the nearest entry.  Thus, if a large codebook is used,the run-time VQ look-up operation can also be expensive.As an alternative to building a linear codebook, a tree-structured codebookcan be used.  The algorithm for this is essentially the same as aboveexcept that every cluster is split at each stage so that the first clusteris split into two, they are split into four and so on.  At each stage, themeans are recorded so that when using the codebook for vector quantisinga fast binary search can be used to find the appropriate leaf cluster.Tree-structured codebooks are much faster to build since there is norepeated reallocation of vectors and much faster in use since only $O(\log_2 N)$distance need to be computed where $N$ is the size of the codebook.Unfortunately, however, tree-structured codebooks will normally incur higher VQ distortion for a given codebook size.When delta and acceleration coefficients are used, it is usually bestto split the data into multiple streams (see section~\ref{s:streams}.In this case, a separate codebook is built for each stream.As an example, the following invocation of \htool{HQuant} wouldgenerate a linear codebook in the file \texttt{linvq} usingthe data stored in the files listed in \texttt{vq.scp}.  \begin{verbatim}   HQuant -C config -s 4 -n 3 64 -n 4 16 -S vq.scp linvq\end{verbatim}Here the configuration file \texttt{config} specifies the \texttt{TARGETKIND}as being \texttt{MFCC\_E\_D\_A} i.e.\ static coefficients plus deltas plusaccelerations plus energy.  The \texttt{-s} options requests that thisparameterisation be split into4 separate streams.  By default, each individual codebook has 256 entries, however,the \texttt{-n} option can be used to specify alternative sizes.If a tree-structured codebook was wanted rather than a linear codebook,the \texttt{-t} option would be set.Also the default is to use Euclidean distances both for building thecodebook and for subsequent coding.  Setting the \texttt{-d} optioncauses a diagonal covariance Mahalanobis metric to be used and the \texttt{-f} option causes a full covariance Mahalanobis metricto be used.\index{hcopy@\htool{HCopy}}\sidefig{vqtohmm}{55}{VQ Processing}{-4}{Once the codebook is built, normal speech vector files can be converted to discrete files using \htool{HCopy}.  This was explained previously in section~\ref{s:vquant}.  The basic mechanism is toadd the qualifier \texttt{\_V} to the \texttt{TARGETKIND}.\index{qualifiers!aaav@\texttt{\_V}}  This causes\htool{HParm} to append a codebook index to each constructed observation

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -