📄 node3.html

📁 高斯混合模型算法
💻 HTML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
    <!-- MATH \begin{displaymath}\ensuremath\boldsymbol{\mu}_{k}^{(i+1)} = \frac{\sum_{n=1}^{N} \ensuremath\mathbf{x}_nP(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})}
    {\sum_{n=1}^{N} P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})} \end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="212" HEIGHT="55" ALIGN="MIDDLE" BORDER="0" SRC="img173.gif" ALT="$\displaystyle \ensuremath\boldsymbol{\mu}_{k}^{(i+1)} = \frac{\sum_{n=1}^{N} \e......P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})} $"></DIV><P></P>
</LI><LI>update the variances:
    <!-- MATH \begin{displaymath}\ensuremath\boldsymbol{\Sigma}_{k}^{(i+1)} = \frac{\sum_{n=1}^{N} P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})\;(\ensuremath\mathbf{x}_n - \ensuremath\boldsymbol{\mu}_k^{(i+1)})(\ensuremath\mathbf{x}_n - \ensuremath\boldsymbol{\mu}_k^{(i+1)})^{\mathsf T}}
    {\sum_{n=1}^{N} P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})} \end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="372" HEIGHT="55" ALIGN="MIDDLE" BORDER="0" SRC="img174.gif" ALT="$\displaystyle \ensuremath\boldsymbol{\Sigma}_{k}^{(i+1)} = \frac{\sum_{n=1}^{N}......P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})} $"></DIV><P></P>
</LI><LI>update the priors:
    <!-- MATH \begin{displaymath}P(q_k^{(i+1)}|\ensuremath\boldsymbol{\Theta}^{(i+1)}) = \frac{1}{N} \sum_{n=1}^{N}    P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)}) \end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="258" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img175.gif" ALT="$\displaystyle P(q_k^{(i+1)}\vert\ensuremath\boldsymbol{\Theta}^{(i+1)}) = \frac......P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)}) $"></DIV><P></P></LI></UL>
In the present case, all the data points participate to the update of all
the models, but their participation is weighted by the value of
<!-- MATH $P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})$ --><SPAN CLASS="MATH"><IMG WIDTH="97" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img167.gif" ALT="$ P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})$"></SPAN>.
</LI><LI>Go to 1.
</LI></OL>
</LI><LI><B>Until</B>: the total likelihood increase for the training data
falls under some desired threshold.
</LI></UL>
The global criterion in the present case is the joint likelihood of
all data with respect to all the models:
<P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><TABLE CELLPADDING="0" WIDTH="100%" ALIGN="CENTER"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"><SPAN CLASS="MATH"><IMG WIDTH="122" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img176.gif" ALT="$\displaystyle {\cal L}(\ensuremath\boldsymbol{\Theta}) = \log p(X\vert\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP ALIGN="LEFT"><SPAN CLASS="MATH"><IMG WIDTH="124" HEIGHT="47" ALIGN="MIDDLE" BORDER="0" SRC="img177.gif" ALT="$\displaystyle = \log \sum_Q p(X,Q\vert\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">&nbsp;&nbsp;&nbsp;</TD></TR><TR VALIGN="MIDDLE"><TD>&nbsp;</TD><TD NOWRAP ALIGN="LEFT"><SPAN CLASS="MATH"><IMG WIDTH="175" HEIGHT="47" ALIGN="MIDDLE" BORDER="0" SRC="img178.gif" ALT="$\displaystyle = \log \sum_Q P(Q\vert X,\ensuremath\boldsymbol{\Theta})p(X\vert\ensuremath\boldsymbol{\Theta})$">&nbsp; &nbsp;(Bayes)</SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">&nbsp;&nbsp;&nbsp;</TD></TR><TR VALIGN="MIDDLE"><TD>&nbsp;</TD><TD NOWRAP ALIGN="LEFT"><SPAN CLASS="MATH"><IMG WIDTH="177" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img179.gif" ALT="$\displaystyle = \log \sum_{k=1}^{K} P(q_k\vert X,\ensuremath\boldsymbol{\Theta}) p(X\vert\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">&nbsp;&nbsp;&nbsp;</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P></P>Applying Jensen's inequality <!-- MATH $\left( \log \sum_j \lambda_j y_j \geq\sum_j \lambda_j \log y_j \mbox{ if } \sum_j \lambda_j = 1 \right)$ --><SPAN CLASS="MATH"><IMG WIDTH="264" HEIGHT="39" ALIGN="MIDDLE" BORDER="0" SRC="img180.gif" ALT="$ \left( \log \sum_j \lambda_j y_j \geq\sum_j \lambda_j \log y_j \mbox{ if } \sum_j \lambda_j = 1 \right)$"></SPAN>,
we obtain:
<P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><TABLE CELLPADDING="0" WIDTH="100%" ALIGN="CENTER"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"><SPAN CLASS="MATH"><IMG WIDTH="87" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img181.gif" ALT="$\displaystyle {\cal L}(\ensuremath\boldsymbol{\Theta}) \ge J(\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP ALIGN="LEFT"><SPAN CLASS="MATH"><IMG WIDTH="180" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img182.gif" ALT="$\displaystyle = \sum_{k=1}^{K} P(q_k\vert X,\ensuremath\boldsymbol{\Theta}) \log p(X\vert\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">&nbsp;&nbsp;&nbsp;</TD></TR><TR VALIGN="MIDDLE"><TD>&nbsp;</TD><TD NOWRAP ALIGN="LEFT"><SPAN CLASS="MATH"><IMG WIDTH="211" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img183.gif" ALT="$\displaystyle = \sum_{k=1}^{K} \sum_{n=1}^{N} P(q_k\vert\ensuremath\mathbf{x}_n......bol{\Theta}) \log p(\ensuremath\mathbf{x}_n\vert\ensuremath\boldsymbol{\Theta})$"></SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">&nbsp;&nbsp;&nbsp;</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P></P>Hence, the criterion <!-- MATH $J(\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="36" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img184.gif" ALT="$ J(\ensuremath\boldsymbol{\Theta})$"></SPAN> represents a lower boundary for <!-- MATH ${\calL}(\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="36" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img185.gif" ALT="$ {\calL}(\ensuremath\boldsymbol{\Theta})$"></SPAN>. This criterion is locally maximized by the algorithm.<P><H3><A NAME="SECTION00033200000000000000">Experiment:</A></H3>
Use the EM explorer utility:
<PRE>
 EMALGO EM algorithm explorer

   Launch it with EMALGO(DATA,NCLUST) where DATA is the matrix
   of observations (one observation per row) and NCLUST is the
   desired number of clusters.

   The clusters are initialized with a heuristic that spreads
   them randomly around mean(DATA) with standard deviation
   sqrtm(cov(DATA)*10). Their initial covariance is set to cov(DATA).

   If you want to set your own initial clusters, use
   EMALGO(DATA,MEANS,VARS) where MEANS and VARS are cell arrays
   containing respectively NCLUST initial mean vectors and NCLUST
   initial covariance matrices. In this case, the initial a-priori
   probabilities are set equal to 1/NCLUST.

   To set your own initial priors, use VITERB(DATA,MEANS,VARS,PRIORS)
   where PRIORS is a vector containing NCLUST a priori probabilities.

   Example: for two clusters
     means{1} = [1 2]; means{2} = [3 4];
     vars{1} = [2 0;0 2]; vars{2} = [1 0;0 1];
     emalgo(data,means,vars);</PRE>
Launch <TT>emalgo</TT> with again the same dataset <TT>allvow</TT>. Do
several runs with different cases of initialization of the algorithm:<OL><LI>5 clusters determined according to the default heuristic;
</LI><LI>initial <TT>MEANS</TT> values equal to some data points, and some
  random <TT>VARS</TT> values (e.g., <TT>cov(allvow)</TT> for all the
  classes);
</LI><LI>the initial <TT>MEANS</TT> and <TT>VARS</TT> values found by the
  <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm.
</LI><LI>initial <TT>MEANS</TT> values equal to <!-- MATH $\{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\text{/e/}}, \ensuremath\boldsymbol{\mu}_{\text{/i/}}, \ensuremath\boldsymbol{\mu}_{\text{/o/}},
  \ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="176" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img149.gif" ALT="$ \{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\tex......emath\boldsymbol{\mu}_{\text{/o/}},\ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$"></SPAN>, <TT>VARS</TT> values equal to <BR>
  <!-- MATH $\{\ensuremath\boldsymbol{\Sigma}_{\text{/a/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/e/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/i/}},\ensuremath\boldsymbol{\Sigma}_{\text{/o/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/y/}}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="185" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img164.gif" ALT="$ \{\ensuremath\boldsymbol{\Sigma}_{\text{/a/}}, \ensuremath\boldsymbol{\Sigma}_......\boldsymbol{\Sigma}_{\text{/o/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/y/}}\}$"></SPAN>, and <TT>PRIORS</TT> values
  equal to
  <!-- MATH $[P_{\text{/a/}},P_{\text{/e/}},P_{\text{/i/}},P_{\text{/o/}},P_{\text{/y/}}]$ --><SPAN CLASS="MATH"><IMG WIDTH="165" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img165.gif" ALT="$ [P_{\text{/a/}},P_{\text{/e/}},P_{\text{/i/}},P_{\text{/o/}},P_{\text{/y/}}]$"></SPAN>;
</LI><LI>initial <TT>MEANS</TT> and <TT>VARS</TT> values chosen by yourself.
</LI></OL>
(If you have time, also increase the number of clusters and play again with
the algorithm.)<P>Iterate the algorithm until the total likelihood reaches an asymptotic
convergence. Observe the evolution of the clusters and of the total
likelihood curve. (In the EM case, the data points attribution chart is not
given because each data point participates to the update of each cluster.)
Observe the mean, variance and prior values found after the convergence of
the algorithm. Compare them with the values found in
section&nbsp;<A HREF="node2.html#gaussmod">2.2</A>.<P><H3><A NAME="SECTION00033300000000000000">Example:</A></H3>
<TT>&#187; emalgo(allvow,5);</TT> <BR>
- or - <BR>
<TT>&#187; means =  { mu_a, mu_e, mu_i, mu_o, mu_y };</TT> <BR>
<TT>&#187; vars = { sigma_a, sigma_e, sigma_i, sigma_o, sigma_y };</TT> <BR>
<TT>&#187; emalgo(allvow,means,vars);</TT> <BR>Enlarge the window, then push the buttons, zoom etc.
After convergence, use:<BR>
<TT>&#187; for k=1:5; disp(emalgo_result_means{k}); end</TT> <BR>
<TT>&#187; for k=1:5; disp(emalgo_result_vars{k}); end</TT> <BR>
<TT>&#187; for k=1:5; disp(emalgo_result_priors(k)); end</TT> <BR>to see the resulting means, variances and priors.<P><H3><A NAME="SECTION00033400000000000000">Question:</A></H3><OL><LI>Does the final solution depend on the initialization of the
  algorithm?
</LI><LI>Describe the evolution of the total likelihood. Is it monotonic?
</LI><LI>In terms of optimization of the likelihood, what does the final
  solution correspond to?
</LI><LI>Is the algorithm suitable for fitting Gaussian clusters?
</LI></OL><P><H2><A NAME="SECTION00034000000000000000">Questions to <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means, Viterbi-EM and EM algorithm:</A></H2>
<H3><A NAME="SECTION00034100000000000000">Find the right statements:</A></H3>
<DL COMPACT><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>With the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm the solution is independent
  upon the initialization.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>With the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm the discriminant surfaces are
  linear.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>The <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm is well suited for fitting
  Gaussian clusters
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>In the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm the global criterion used to
  minimize is the maximum likelihood.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>In all 3 algorithms the measure used as global criterion
  decreases in a monotonic way.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>In the Viterbi-EM and EM algorithm the solution is
  highly dependent upon initialization.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>The EM algorithm is best suited for fitting Gaussian
  clusters.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>It is an easy task to guess the parameters for
  initialization.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>With the Viterbi-EM algorithm the discriminant surfaces
  are linear.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>With the EM algorithm the discriminant surfaces have the
  form of (hyper)parabola.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>The EM algorithm needs less computational effort then
  the Viterbi-EM algorithm.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>In the EM algorithm and the Viterbi-EM algorithm, the
  same global criterion is used.
</DD><DT><SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img43.gif" ALT="$ \Box$"></SPAN></DT><DD>The EM algorithm finds a global optimum.
</DD></DL><P><DIV CLASS="navigation"><br><table border=0 cellspacing=0 callpadding=0 width=100% class="tut_nav"><tr valign=middle class="tut_nav"><td valign=middle align=left width=1% class="tut_nav"><A NAME="tex2html41"  HREF="node2.html"><IMG  ALIGN="absmiddle" BORDER="0" ALT="previous" SRC="prev.gif"></A></td><td valign=middle align=left class="tut_nav">&nbsp;<A NAME="tex2html42"  HREF="node2.html">Statistical pattern recognition</A></td></tr></table></DIV><!--End of Navigation Panel--></BODY></HTML>
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -