📄 node3.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><!--Converted with LaTeX2HTML 2002-2-1 (1.70)original version by: Nikos Drakos, CBLU, University of Leeds* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan* with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --><HTML><HEAD><TITLE>Unsupervised training</TITLE><META NAME="description" CONTENT="Unsupervised training"><META NAME="keywords" CONTENT="Gaussian"><META NAME="resource-type" CONTENT="document"><META NAME="distribution" CONTENT="global"><META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1"><META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"><LINK REL="STYLESHEET" HREF="../../ci.css"><LINK REL="previous" HREF="node2.html"><LINK REL="up" HREF="Gaussian.html"></HEAD><BODY bgcolor="#ffffff"><DIV CLASS="navigation"><table border=0 cellspacing=0 callpadding=0 width=100% class="tut_nav"><tr valign=middle class="tut_nav"><td valign=middle align=left class="tut_nav"><i><b> <A NAME="tex2html44" HREF="Gaussian.html">Tutorial: Gaussian Statistics and Unsupervised Learning</A></b></i></td><td valign=middle align=right class="tut_nav"> <A NAME="tex2html41" HREF="node2.html"><IMG ALIGN="absmiddle" BORDER="0" ALT="previous" SRC="prev.gif"></A> <a href="index.html"><img ALIGN="absmiddle" BORDER="0" ALT="Contents" src="contents.gif"></a></dt></tr></table></DIV><!--End of Navigation Panel--><!--Table of Child-Links--><br><A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></A><UL CLASS="ChildLinks"><LI><A NAME="tex2html45" HREF="node3.html#SECTION00031000000000000000"><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm</A><LI><A NAME="tex2html46" HREF="node3.html#SECTION00032000000000000000">Viterbi-EM algorithm for Gaussian clustering</A><LI><A NAME="tex2html47" HREF="node3.html#SECTION00033000000000000000">EM algorithm for Gaussian clustering</A><LI><A NAME="tex2html48" HREF="node3.html#SECTION00034000000000000000">Questions to <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means, Viterbi-EM and EM algorithm:</A></UL><!--End of Table of Child-Links--><HR><H1><A NAME="SECTION00030000000000000000"></A>
<A NAME="unsup"></A><BR>Unsupervised training</H1>
In the previous section, we have computed the models for classes /a/,
/e/, /i/, /o/, and /y/ by knowing a-priori which training samples
belongs to which class (we were disposing of a <EM>labeling</EM> of the
training data). Hence, we have performed a <EM>supervised training</EM>
of the Gaussian models. Now, suppose that we only have unlabeled
training data that we want to separate in several classes (e.g., 5
classes) without knowing a-priori which point belongs to which class.
This is called <EM>unsupervised training</EM>. Several algorithms are
available for that purpose, among them: the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means, the EM
(Expectation-Maximization), and the Viterbi-EM algorithm.<P>All these algorithms are characterized by the following components:<UL><LI>a set of models for the classes <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN> (not necessarily
Gaussian), defined by a parameter set <!-- MATH $\ensuremath\boldsymbol{\Theta}$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img61.gif" ALT="$ \ensuremath\boldsymbol{\Theta}$"></SPAN> (means, variances,
priors,...);
</LI><LI>a measure of membership, telling to which extent a data point
``belongs'' to a model;
</LI><LI>a ``recipe'' to update the model parameters as a function of the
membership information.
</LI></UL>
The measure of membership usually takes the form of a distance measure
or the form of a measure of probability. It replaces the missing
labeling information to permit the application of standard parameter
estimation techniques. It also defines implicitly a global criterion
of ``goodness of fit'' of the models to the data, e.g.:<UL><LI>in the case of a distance measure, the models that are closer to
the data characterize it better;
</LI><LI>in the case of a probability measure, the models with a larger
likelihood for the data explain it better.
</LI></UL><P><H2><A NAME="SECTION00031000000000000000"><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means algorithm</A></H2>
<H3><A NAME="SECTION00031100000000000000">Synopsis of the algorithm:</A></H3><UL><LI>Start with <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN> initial prototypes <!-- MATH $\ensuremath\boldsymbol{\mu}_k$ --><SPAN CLASS="MATH"><IMG WIDTH="20" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img105.gif" ALT="$ \ensuremath\boldsymbol{\mu}_k$"></SPAN>, <!-- MATH $k=1,\ldots,K$ --><SPAN CLASS="MATH"><IMG WIDTH="81" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img140.gif" ALT="$ k=1,\ldots,K$"></SPAN>.
</LI><LI><B>Do</B>:<OL><LI>For each data-point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN>, <!-- MATH $n=1,\ldots,N$ --><SPAN CLASS="MATH"><IMG WIDTH="81" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img142.gif" ALT="$ n=1,\ldots,N$"></SPAN>, compute the squared
Euclidean distance from the <!-- MATH $k^{\text{th}}$ --><SPAN CLASS="MATH"><IMG WIDTH="22" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img143.gif" ALT="$ k^{\text{th}}$"></SPAN> prototype:
<BR><DIV ALIGN="CENTER" CLASS="mathdisplay"><A NAME="eq:dist"></A><!-- MATH \begin{eqnarray}d_k(\ensuremath\mathbf{x}_n) & = & \, \left\| \ensuremath\mathbf{x}_n - \ensuremath\boldsymbol{\mu}_k \right\|^2 \nonumber \\ & = & (\ensuremath\mathbf{x}_n-\ensuremath\boldsymbol{\mu}_k)^{\mathsf T}(\ensuremath\mathbf{x}_n-\ensuremath\boldsymbol{\mu}_k) \nonumber
\end{eqnarray} --><TABLE CELLPADDING="0" ALIGN="CENTER" WIDTH="100%"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"><IMG WIDTH="44" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img144.gif" ALT="$\displaystyle d_k(\ensuremath\mathbf{x}_n)$"></TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="76" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img145.gif" ALT="$\displaystyle \left\Vert \ensuremath\mathbf{x}_n - \ensuremath\boldsymbol{\mu}_k \right\Vert^2$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT"> </TD></TR><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="RIGHT"> </TD><TD WIDTH="10" ALIGN="CENTER" NOWRAP><IMG WIDTH="14" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img67.gif" ALT="$\displaystyle =$"></TD><TD ALIGN="LEFT" NOWRAP><IMG WIDTH="133" HEIGHT="32" ALIGN="MIDDLE" BORDER="0" SRC="img146.gif" ALT="$\displaystyle (\ensuremath\mathbf{x}_n-\ensuremath\boldsymbol{\mu}_k)^{\mathsf T}(\ensuremath\mathbf{x}_n-\ensuremath\boldsymbol{\mu}_k)$"></TD><TD CLASS="eqno" WIDTH=10 ALIGN="RIGHT"> </TD></TR></TABLE></DIV><BR CLEAR="ALL">
</LI><LI>Assign each data-point <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN> to its <SPAN CLASS="textit">closest prototype</SPAN>
<!-- MATH $\ensuremath\boldsymbol{\mu}_k$ --><SPAN CLASS="MATH"><IMG WIDTH="20" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img105.gif" ALT="$ \ensuremath\boldsymbol{\mu}_k$"></SPAN>, i.e., assign <!-- MATH $\ensuremath\mathbf{x}_n$ --><SPAN CLASS="MATH"><IMG WIDTH="19" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img141.gif" ALT="$ \ensuremath\mathbf{x}_n$"></SPAN> to the class <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN> if
<!-- MATH \begin{displaymath}d_k(\ensuremath\mathbf{x}_n) \, < \, d_l(\ensuremath\mathbf{x}_n), \qquad \forall l \neq k\end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="179" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img147.gif" ALT="$\displaystyle d_k(\ensuremath\mathbf{x}_n) < d_l(\ensuremath\mathbf{x}_n), \qquad \forall l \neq k$"></DIV><P></P>
<EM>Note</EM>: using the squared Euclidean distance for the
classification gives the same result as using the true Euclidean
distance, since the square root is a monotonically growing
function. But the computational load is obviously smaller when the
square root is dropped.
</LI><LI>Replace each prototype with the mean of the data-points assigned to
the corresponding class;
</LI><LI>Go to 1.
</LI></OL><P></LI><LI><B>Until</B>: no further change occurs.
</LI></UL><P>The global criterion for the present case is
<!-- MATH \begin{displaymath}J = \sum_{k=1}^{K} \sum_{\ensuremath\mathbf{x}_n \in q_k} d_k(\ensuremath\mathbf{x}_n)\end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="129" HEIGHT="58" ALIGN="MIDDLE" BORDER="0" SRC="img148.gif" ALT="$\displaystyle J = \sum_{k=1}^{K} \sum_{\ensuremath\mathbf{x}_n \in q_k} d_k(\ensuremath\mathbf{x}_n)$"></DIV><P></P>and represents the total squared distance between the data and the models
they belong to. This criterion is locally minimized by the algorithm.<P><H3><A NAME="SECTION00031200000000000000">Experiment:</A></H3>
Use the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN>-means explorer utility:
<PRE>
KMEANS K-means algorithm exploration tool
Launch it with KMEANS(DATA,NCLUST) where DATA is the matrix
of observations (one observation per row) and NCLUST is the
desired number of clusters.
The clusters are initialized with a heuristic that spreads
them randomly around mean(DATA) with standard deviation
sqrtm(cov(DATA)).
If you want to set your own initial clusters, use
KMEANS(DATA,MEANS) where MEANS is a cell array containing
NCLUST initial mean vectors.
Example: for two clusters
means{1} = [1 2]; means{2} = [3 4];
kmeans(data,means);</PRE><P>Launch <TT>kmeans</TT> with the data sample <TT>allvow</TT>, which was part
of file <TT>vowels.mat</TT> and gathers all the simulated vowels data. Do
several runs with different cases of initialization of the algorithm:<OL><LI>5 initial clusters determined according to the default heuristic;
</LI><LI>initial <TT>MEANS</TT> values equal to some data points;
</LI><LI>initial <TT>MEANS</TT> values equal to <!-- MATH $\{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\text{/e/}}, \ensuremath\boldsymbol{\mu}_{\text{/i/}}, \ensuremath\boldsymbol{\mu}_{\text{/o/}},
\ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="176" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img149.gif" ALT="$ \{\ensuremath\boldsymbol{\mu}_{\text{/a/}},\ensuremath\boldsymbol{\mu}_{\tex......emath\boldsymbol{\mu}_{\text{/o/}},\ensuremath\boldsymbol{\mu}_{\text{/y/}}\}$"></SPAN>.
</LI></OL>
Iterate the algorithm until its convergence. Observe the evolution of the
cluster centers, of the data-points attribution chart and of the total
squared Euclidean distance. (It is possible to zoom these plots: left
click inside the axes to zoom <SPAN CLASS="MATH"><IMG WIDTH="21" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img150.gif" ALT="$ 2\times$"></SPAN> centered on the point under the
mouse; right click to zoom out; click and drag to zoom into an area; double
click to reset the figure to the original). Observe the mean values found after the convergence of the algorithm.<P><H3><A NAME="SECTION00031300000000000000">Example:</A></H3>
<TT>» kmeans(allvow,5);</TT> <BR>
- or - <BR>
<TT>» means = { mu_a, mu_e, mu_i, mu_o, mu_y };</TT> <BR>
<TT>» kmeans(allvow,means);</TT> <BR>Enlarge the window, then push the buttons, zoom etc.
After the convergence, use:<BR>
<TT>» for k=1:5, disp(kmeans_result_means{k}); end</TT> <BR>to see the resulting means.<P><H3><A NAME="SECTION00031400000000000000">Think about the following question:</A></H3><OL><LI>Does the final solution depend on the initialization of the
algorithm?
</LI><LI>Describe the evolution of the total squared Euclidean distance.
</LI><LI>What is the nature of the discriminant surfaces corresponding to
a minimum Euclidean distance classification scheme?
</LI><LI>Is the algorithm suitable for fitting Gaussian clusters?
</LI></OL><P><H2><A NAME="SECTION00032000000000000000">Viterbi-EM algorithm for Gaussian clustering</A></H2>
<H3><A NAME="SECTION00032100000000000000">Synopsis of the algorithm:</A></H3><UL><LI>Start from <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img4.gif" ALT="$ K$"></SPAN> initial Gaussian models <!-- MATH ${\cal N}(\ensuremath\boldsymbol{\mu}_{k},\ensuremath\boldsymbol{\Sigma}_{k}),\; k=1\ldots K$ --><SPAN CLASS="MATH"><IMG WIDTH="146" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img151.gif" ALT="$ {\cal N}(\ensuremath\boldsymbol{\mu}_{k},\ensuremath\boldsymbol{\Sigma}_{k}),\; k=1\ldots K$"></SPAN>, characterized by the set of parameters <!-- MATH $\ensuremath\boldsymbol{\Theta}$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img61.gif" ALT="$ \ensuremath\boldsymbol{\Theta}$"></SPAN> (i.e., the
set of all means and variances <!-- MATH $\ensuremath\boldsymbol{\mu}_k$ --><SPAN CLASS="MATH"><IMG WIDTH="20" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img105.gif" ALT="$ \ensuremath\boldsymbol{\mu}_k$"></SPAN> and <!-- MATH $\ensuremath\boldsymbol{\Sigma}_k$ --><SPAN CLASS="MATH"><IMG WIDTH="22" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img106.gif" ALT="$ \ensuremath\boldsymbol{\Sigma}_k$"></SPAN>, <!-- MATH $k=1\ldots K$
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -