📄 node3.html

📁 高斯混合模型算法
💻 HTML
📖 第 1 页 / 共 3 页
字号:
 --><SPAN CLASS="MATH"><IMG WIDTH="70" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img152.gif" ALT="$ k=1\ldots K$"></SPAN>). Set
the initial prior probabilities <SPAN CLASS="MATH"><IMG WIDTH="38" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img90.gif" ALT="$ P(q_k)$"></SPAN> to <SPAN CLASS="MATH"><IMG WIDTH="30" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img153.gif" ALT="$ 1/K$"></SPAN>.
</LI><LI><B>Do</B>:<OL><LI>Classify each data-point using Bayes' rule.<P>This step is equivalent to having a set <SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img154.gif" ALT="$ Q$"></SPAN> of boolean hidden variables
that give a labeling of the data by taking the value 1 (belongs) or 0 (does
not belong) for each class <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN> and each point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN>. The value of <SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img154.gif" ALT="$ Q$"></SPAN>
that maximizes <!-- MATH $p(X,Q|\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="67" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img155.gif" ALT="$ p(X,Q\vert\ensuremath\boldsymbol{\Theta})$"></SPAN> precisely tells which is the most probable
model for each point of the whole set <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN> of training data.<P>Hence, each data point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN> is assigned to its most probable
cluster <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>.
</LI><LI>Update the parameters (<SPAN CLASS="MATH"><IMG WIDTH="8" HEIGHT="13" ALIGN="BOTTOM" BORDER="0" SRC="img156.gif" ALT="$ i$"></SPAN> is the iteration index):<UL><LI>update the means:
    <!-- MATH \begin{displaymath}\ensuremath\boldsymbol{\mu}_{k}^{(i+1)} = \mbox{mean of the points belonging to } q_k^{(i)}\end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="55" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img157.gif" ALT="$\displaystyle \ensuremath\boldsymbol{\mu}_{k}^{(i+1)} =$">&nbsp; &nbsp;mean of the points belonging to <IMG WIDTH="23" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img158.gif" ALT="$\displaystyle q_k^{(i)} $"></DIV><P></P>
</LI><LI>update the variances:
    <!-- MATH \begin{displaymath}\ensuremath\boldsymbol{\Sigma}_{k}^{(i+1)} = \mbox{variance of the points belonging to } q_k^{(i)}\end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="57" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img159.gif" ALT="$\displaystyle \ensuremath\boldsymbol{\Sigma}_{k}^{(i+1)} =$">&nbsp; &nbsp;variance of the points belonging to <IMG WIDTH="23" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img158.gif" ALT="$\displaystyle q_k^{(i)} $"></DIV><P></P>
</LI><LI>update the priors:
    <!-- MATH \begin{displaymath}P(q_k^{(i+1)}|\ensuremath\boldsymbol{\Theta}^{(i+1)}) = \frac{\mbox{number of training points        belonging to } q_k^{(i)} }{\mbox{total number of training points}}
    \end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="387" HEIGHT="55" ALIGN="MIDDLE" BORDER="0" SRC="img160.gif" ALT="$\displaystyle P(q_k^{(i+1)}\vert\ensuremath\boldsymbol{\Theta}^{(i+1)}) = \frac......ng pointsbelonging to } q_k^{(i)} }{\mbox{total number of training points}}$"></DIV><P></P></LI></UL>
</LI><LI>Go to 1.
</LI></OL>
</LI><LI><B>Until</B>: no further change occurs.
</LI></UL>
The global criterion in the present case is
<BR><DIV ALIGN="CENTER" CLASS="mathdisplay"><!-- MATH \begin{eqnarray*}{\cal L}(\ensuremath\boldsymbol{\Theta}) &=& \sum_{X} P(X|\ensuremath\boldsymbol{\Theta}) \;=\; \sum_{Q} \sum_{X} p(X,Q|\ensuremath\boldsymbol{\Theta}) \\  &=& \sum_{k=1}^{K} \sum_{\ensuremath\mathbf{x}_n \in q_k} \log p(\ensuremath\mathbf{x}_n|\ensuremath\boldsymbol{\Theta}_k),
\end{eqnarray*} --><TABLE CELLPADDING="0" ALIGN="CENTER" WIDTH="100%"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"><IMG WIDTH="36" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img161.gif" ALT="$\displaystyle {\cal L}(\ensuremath\boldsymbol{\Theta})$"></TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="212" HEIGHT="47" ALIGN="MIDDLE" BORDER="0" SRC="img162.gif" ALT="$\displaystyle \sum_{X} P(X\vert\ensuremath\boldsymbol{\Theta}) \;=\; \sum_{Q} \sum_{X} p(X,Q\vert\ensuremath\boldsymbol{\Theta})$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT">&nbsp;</TD></TR><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT">&nbsp;</TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="142" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img163.gif" ALT="$\displaystyle \sum_{k=1}^{K} \sum_{\ensuremath\mathbf{x}_n \in q_k} \log p(\ensuremath\mathbf{x}_n\vert\ensuremath\boldsymbol{\Theta}_k),$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT">&nbsp;</TD></TR></TABLE></DIV><BR CLEAR="ALL">
and represents the joint likelihood of the data with respect to the models
they belong to. This criterion is locally maximized by the algorithm.<P><H3><A NAME="SECTION00032200000000000000">Experiment:</A></H3>
Use the Viterbi-EM explorer utility:
<PRE>
 VITERB Viterbi version of the EM algorithm

   Launch it with VITERB(DATA,NCLUST) where DATA is the matrix
   of observations (one observation per row) and NCLUST is the
   desired number of clusters.

   The clusters are initialized with a heuristic that spreads
   them randomly around mean(DATA) with standard deviation
   sqrtm(cov(DATA)). Their initial covariance is set to cov(DATA).

   If you want to set your own initial clusters, use
   VITERB(DATA,MEANS,VARS) where MEANS and VARS are cell arrays
   containing respectively NCLUST initial mean vectors and NCLUST
   initial covariance matrices. In this case, the initial a-priori
   probabilities are set equal to 1/NCLUST.

   To set your own initial priors, use VITERB(DATA,MEANS,VARS,PRIORS)
   where PRIORS is a vector containing NCLUST a priori probabilities.

   Example: for two clusters
     means{1} = [1 2]; means{2} = [3 4];
     vars{1} = [2 0;0 2]; vars{2} = [1 0;0 1];
     viterb(data,means,vars);</PRE>
Launch <TT>viterb</TT> with the dataset <TT>allvow</TT>. Do several runs
with different initializations of the algorithm:<OL><LI>5 initial clusters determined according to the default heuristic;
</LI><LI>initial <TT>MEANS</TT> values equal to some data points, and some
  random <TT>VARS</TT> values (try for instance <TT>cov(allvow)</TT> for all
  the classes);
</LI><LI>the initial <TT>MEANS</TT>, <TT>VARS</TT> and <TT>PRIORS</TT> values
  found by the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm.
</LI><LI>initial <TT>MEANS</TT> values equal to <!-- MATH $\{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\text{/e/}}, \ensuremath\boldsymbol{\mu}_{\text{/i/}}, \ensuremath\boldsymbol{\mu}_{\text{/o/}},
  \ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="176" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img149.gif" ALT="$ \{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\tex......emath\boldsymbol{\mu}_{\text{/o/}},\ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$"></SPAN>, <TT>VARS</TT> values equal to <BR>
  <!-- MATH $\{\ensuremath\boldsymbol{\Sigma}_{\text{/a/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/e/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/i/}},\ensuremath\boldsymbol{\Sigma}_{\text{/o/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/y/}}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="185" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img164.gif" ALT="$ \{\ensuremath\boldsymbol{\Sigma}_{\text{/a/}}, \ensuremath\boldsymbol{\Sigma}_......\boldsymbol{\Sigma}_{\text{/o/}}, \ensuremath\boldsymbol{\Sigma}_{\text{/y/}}\}$"></SPAN>, and <TT>PRIORS</TT> values
  equal to
  <!-- MATH $[P_{\text{/a/}},P_{\text{/e/}},P_{\text{/i/}},P_{\text{/o/}},P_{\text{/y/}}]$ --><SPAN CLASS="MATH"><IMG WIDTH="165" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img165.gif" ALT="$ [P_{\text{/a/}},P_{\text{/e/}},P_{\text{/i/}},P_{\text{/o/}},P_{\text{/y/}}]$"></SPAN>;
</LI><LI>initial <TT>MEANS</TT> and <TT>VARS</TT> values chosen by yourself.
</LI></OL>
Iterate the algorithm until it converges. Observe the evolution of the
clusters, of the data points attribution chart and of the total
likelihood curve. Observe the mean, variance and priors values found
after the convergence of the algorithm. Compare them with the values
computed in section&nbsp;<A HREF="node2.html#gaussmod">2.2</A> (using supervised training).<P><H3><A NAME="SECTION00032300000000000000">Example:</A></H3>
<TT>&#187; viterb(allvow,5);</TT> <BR>
- or - <BR>
<TT>&#187; means =  { mu_a, mu_e, mu_i, mu_o, mu_y };</TT> <BR>
<TT>&#187; vars = { sigma_a, sigma_e, sigma_i, sigma_o, sigma_y };</TT> <BR>
<TT>&#187; viterb(allvow,means,vars);</TT> <BR>Enlarge the window, then push the buttons, zoom, etc.
After convergence, use:<BR>
<TT>&#187; for k=1:5, disp(viterb_result_means{k}); end</TT> <BR>
<TT>&#187; for k=1:5, disp(viterb_result_vars{k}); end</TT> <BR>
<TT>&#187; for k=1:5, disp(viterb_result_priors(k)); end</TT> <BR>to see the resulting means, variances and priors.<P><H3><A NAME="SECTION00032400000000000000">Question:</A></H3><OL><LI>Does the final solution depend on the initialization of the
algorithm?
</LI><LI>Describe the evolution of the total likelihood. Is it monotonic?
</LI><LI>In terms of optimization of the likelihood, what does the final
solution correspond to?
</LI><LI>What is the nature of the discriminant surfaces corresponding to the
Gaussian classification?
</LI><LI>Is the algorithm suitable for fitting Gaussian clusters?
</LI></OL><P><H2><A NAME="SECTION00033000000000000000">EM algorithm for Gaussian clustering</A></H2>
<H3><A NAME="SECTION00033100000000000000">Synopsis of the algorithm:</A></H3><UL><LI>Start from K initial Gaussian models <!-- MATH ${\cal N}(\ensuremath\boldsymbol{\mu}_{k},\ensuremath\boldsymbol{\Sigma}_{k}),\; k=1\ldots K$ --><SPAN CLASS="MATH"><IMG WIDTH="146" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img151.gif" ALT="$ {\cal N}(\ensuremath\boldsymbol{\mu}_{k},\ensuremath\boldsymbol{\Sigma}_{k}),\; k=1\ldots K$"></SPAN>, with equal priors set to <!-- MATH $P(q_k) = 1/K$ --><SPAN CLASS="MATH"><IMG WIDTH="84" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img166.gif" ALT="$ P(q_k) = 1/K$"></SPAN>.
</LI><LI><B>Do</B>:<OL><LI><B>Estimation step</B>: compute the probability
  <!-- MATH $P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})$ --><SPAN CLASS="MATH"><IMG WIDTH="97" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img167.gif" ALT="$ P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})$"></SPAN> for each data point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN> to belong
  to the class <SPAN CLASS="MATH"><IMG WIDTH="23" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img168.gif" ALT="$ q_k^{(i)}$"></SPAN>:
<BR><DIV ALIGN="CENTER" CLASS="mathdisplay"><!-- MATH \begin{eqnarray*}P(q_k^{(i)}|\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)}) & = & \frac{P(q_k^{(i)}|\ensuremath\boldsymbol{\Theta}^{(i)})    \cdot p(\ensuremath\mathbf{x}_n|q_k^{(i)},\ensuremath\boldsymbol{\Theta}^{(i)})}
  {p(\ensuremath\mathbf{x}_n|\ensuremath\boldsymbol{\Theta}^{(i)})} \\
  & = & \frac{P(q_k^{(i)}|\ensuremath\boldsymbol{\Theta}^{(i)}) \cdot p(\ensuremath\mathbf{x}_n|\ensuremath\boldsymbol{\mu}_k^{(i)},\ensuremath\boldsymbol{\Sigma}_k^{(i)}) }
  {\sum_j P(q_j^{(i)}|\ensuremath\boldsymbol{\Theta}^{(i)}) \cdot p(\ensuremath\mathbf{x}_n|\ensuremath\boldsymbol{\mu}_j^{(i)},\ensuremath\boldsymbol{\Sigma}_j^{(i)}) }
\end{eqnarray*} --><TABLE CELLPADDING="0" ALIGN="CENTER" WIDTH="100%"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"><IMG WIDTH="97" HEIGHT="36" ALIGN="MIDDLE" BORDER="0" SRC="img169.gif" ALT="$\displaystyle P(q_k^{(i)}\vert\ensuremath\mathbf{x}_n,\ensuremath\boldsymbol{\Theta}^{(i)})$"></TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="179" HEIGHT="55" ALIGN="MIDDLE" BORDER="0" SRC="img170.gif" ALT="$\displaystyle \frac{P(q_k^{(i)}\vert\ensuremath\boldsymbol{\Theta}^{(i)})\cdot......}^{(i)})}{p(\ensuremath\mathbf{x}_n\vert\ensuremath\boldsymbol{\Theta}^{(i)})}$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT">&nbsp;</TD></TR><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT">&nbsp;</TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="204" HEIGHT="55" ALIGN="MIDDLE" BORDER="0" SRC="img171.gif" ALT="$\displaystyle \frac{P(q_k^{(i)}\vert\ensuremath\boldsymbol{\Theta}^{(i)}) \cdot......rt\ensuremath\boldsymbol{\mu}_j^{(i)},\ensuremath\boldsymbol{\Sigma}_j^{(i)}) }$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT">&nbsp;</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P>This step is equivalent to having a set <SPAN CLASS="MATH"><IMG WIDTH="14" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img154.gif" ALT="$ Q$"></SPAN> of continuous hidden
variables, taking values in the interval <SPAN CLASS="MATH"><IMG WIDTH="31" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img172.gif" ALT="$ [0,1]$"></SPAN>, that give a labeling
of the data by telling to which extent a point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN> belongs to the
class <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>. This represents a soft classification, since a point can
belong, e.g., by 60% to class 1 and by 40% to class 2 (think of
Schr&#246;dinger's cat which is 60% alive and 40% dead as long as
nobody opens the box or performs Bayesian classification).
</LI><LI><B>Maximization step</B>:<UL><LI>update the means:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -