📄 node2.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><!--Converted with LaTeX2HTML 2002-2-1 (1.70)original version by: Nikos Drakos, CBLU, University of Leeds* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan* with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --><HTML><HEAD><TITLE>Statistical pattern recognition</TITLE><META NAME="description" CONTENT="Statistical pattern recognition"><META NAME="keywords" CONTENT="Gaussian"><META NAME="resource-type" CONTENT="document"><META NAME="distribution" CONTENT="global"><META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1"><META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"><LINK REL="STYLESHEET" HREF="../../ci.css"><LINK REL="next" HREF="node3.html"><LINK REL="previous" HREF="node1.html"><LINK REL="up" HREF="Gaussian.html"><LINK REL="next" HREF="node3.html"></HEAD><BODY bgcolor="#ffffff"><DIV CLASS="navigation"><table border=0 cellspacing=0 callpadding=0 width=100% class="tut_nav"><tr valign=middle class="tut_nav"><td valign=middle align=left class="tut_nav"><i><b> <A NAME="tex2html32" HREF="Gaussian.html">Tutorial: Gaussian Statistics and Unsupervised Learning</A></b></i></td><td valign=middle align=right class="tut_nav"> <A NAME="tex2html25" HREF="node1.html"><IMG ALIGN="absmiddle" BORDER="0" ALT="previous" SRC="prev.gif"></A> <a href="index.html"><img ALIGN="absmiddle" BORDER="0" ALT="Contents" src="contents.gif"></a> <A NAME="tex2html33" HREF="node3.html"><IMG ALIGN="absmiddle" BORDER="0" ALT="next" SRC="next.gif"></A></dt></tr></table></DIV><!--End of Navigation Panel--><!--Table of Child-Links--><br><A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></A><UL CLASS="ChildLinks"><LI><A NAME="tex2html35" HREF="node2.html#SECTION00021000000000000000">A-priori class probabilities</A><LI><A NAME="tex2html36" HREF="node2.html#SECTION00022000000000000000">Gaussian modeling of classes</A><LI><A NAME="tex2html37" HREF="node2.html#SECTION00023000000000000000">Bayesian classification</A><LI><A NAME="tex2html38" HREF="node2.html#SECTION00024000000000000000">Discriminant surfaces</A></UL><!--End of Table of Child-Links--><HR><H1><A NAME="SECTION00020000000000000000">Statistical pattern recognition</A></H1><P><H2><A NAME="SECTION00021000000000000000"></A>
<A NAME="sec:apriori"></A><BR>A-priori class probabilities</H2>
<H3><A NAME="SECTION00021100000000000000">Experiment:</A></H3>
Load data from file ``vowels.mat''. This file contains a database of
2-dimensional samples of speech features in the form of formant
frequencies (the first and the second spectral formants, <SPAN CLASS="MATH"><IMG WIDTH="48" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img88.gif" ALT="$ [F_1,F_2]$"></SPAN>).
The formant frequency samples represent features that would be
extracted from the speech signal for several occurrences of the vowels
/a/, /e/, /i/, /o/, and /y/<A NAME="tex2html4" HREF="footnode.html#foot353"><SUP><SPAN CLASS="arabic">1</SPAN></SUP></A>. They are grouped in matrices of size <SPAN CLASS="MATH"><IMG WIDTH="40" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img89.gif" ALT="$ N\times2$"></SPAN>, where
each of the <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img30.gif" ALT="$ N$"></SPAN> lines contains the two formant frequencies for one
occurrence of a vowel.<P>Supposing that the whole database covers adequately an imaginary
language made only of /a/'s, /e/'s, /i/'s, /o/'s, and /y/'s, compute
the probability <SPAN CLASS="MATH"><IMG WIDTH="38" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img90.gif" ALT="$ P(q_k)$"></SPAN> of each class <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>, <!-- MATH $k \in\{\text{/a/},\text{/e/},\text{/i/},\text{/o/},\text{/y/}\}$ --><SPAN CLASS="MATH"><IMG WIDTH="35" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img92.gif" ALT="$ k \in\{$">/a/<IMG WIDTH="7" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img93.gif" ALT="$ ,$">/e/<IMG WIDTH="7" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img93.gif" ALT="$ ,$">/i/<IMG WIDTH="7" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img93.gif" ALT="$ ,$">/o/<IMG WIDTH="7" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img93.gif" ALT="$ ,$">/y/<IMG WIDTH="10" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img94.gif" ALT="$ \}$"></SPAN>. Which is
the most common and which the least common phoneme in our imaginary
language?<P><H3><A NAME="SECTION00021200000000000000">Example:</A></H3>
<TT>» clear all; load vowels.mat; whos</TT> <BR>
<TT>» Na = size(a,1); Ne = size(e,1); Ni = size(i,1); No = size(o,1); Ny = size(y,1);</TT> <BR>
<TT>» N = Na + Ne + Ni + No + Ny;</TT> <BR>
<TT>» Pa = Na/N</TT> <BR>
<TT>» Pi = Ni/N</TT> <BR>etc.<P><H2><A NAME="SECTION00022000000000000000"></A>
<A NAME="gaussmod"></A><BR>Gaussian modeling of classes</H2><P><H3><A NAME="SECTION00022100000000000000">Experiment:</A></H3>
Plot each vowel's data as clouds of points in the 2D plane. Train the
Gaussian models corresponding to each class (use directly the
<TT>mean</TT> and <TT>cov</TT> commands). Plot their contours (use directly
the function <TT>plotgaus(mu,sigma,color)</TT> where <TT>color =
[R,G,B]</TT>).<P><H3><A NAME="SECTION00022200000000000000">Example:</A></H3>
<TT>» plotvow; % Plot the clouds of simulated vowel features</TT> <BR>
(Do not close the figure obtained, it will be used later on.) <BR>Then compute and plot the Gaussian models: <BR>
<TT>» mu_a = mean(a);</TT> <BR>
<TT>» sigma_a = cov(a);</TT> <BR>
<TT>» plotgaus(mu_a,sigma_a,[0 1 1]);</TT> <BR>
<TT>» mu_e = mean(e);</TT> <BR>
<TT>» sigma_e = cov(e);</TT> <BR>
<TT>» plotgaus(mu_e,sigma_e,[0 1 1]);</TT> <BR>etc.<P><H2><A NAME="SECTION00023000000000000000"></A>
<A NAME="sec:classification"></A><BR>Bayesian classification</H2><P>We will now find how to classify a feature vector <!-- MATH $\ensuremath\mathbf{x}_i$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img44.gif" ALT="$ \ensuremath\mathbf{x}_i$"></SPAN> from a data
sample (or several feature vectors <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN>) as belonging to a certain
class <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>.<P><H3><A NAME="SECTION00023100000000000000">Useful formulas and definitions:</A></H3><UL><LI><EM>Bayes' decision rule</EM>:
<!-- MATH \begin{displaymath}X \in q_k \quad \mbox{if} \quad P(q_k|X,\ensuremath\boldsymbol{\Theta}) \geq P(q_j|X,\ensuremath\boldsymbol{\Theta}), \quad\forall j \neq k
\end{displaymath} --><P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><IMG WIDTH="46" HEIGHT="26" ALIGN="MIDDLE" BORDER="0" SRC="img95.gif" ALT="$\displaystyle X \in q_k$"> if<IMG WIDTH="235" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img96.gif" ALT="$\displaystyle \quad P(q_k\vert X,\ensuremath\boldsymbol{\Theta}) \geq P(q_j\vert X,\ensuremath\boldsymbol{\Theta}),\quad\forall j \neq k$"></DIV><P></P>This formula means: given a set of classes <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>, characterized by a
set of known parameters in model <!-- MATH $\ensuremath\boldsymbol{\Theta}$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img61.gif" ALT="$ \ensuremath\boldsymbol{\Theta}$"></SPAN>, a set of one or more speech
feature vectors <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN> (also called <EM>observations</EM>) belongs to the
class which has the highest probability once we actually know (or
``see'', or ``measure'') the sample <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN>. <!-- MATH $P(q_k|X,\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="73" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img97.gif" ALT="$ P(q_k\vert X,\ensuremath\boldsymbol{\Theta})$"></SPAN> is therefore
called the <EM>a posteriori probability</EM>, because it depends on
having seen the observations, as opposed to the <EM>a priori</EM>
probability <!-- MATH $P(q_k|\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="55" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img98.gif" ALT="$ P(q_k\vert\ensuremath\boldsymbol{\Theta})$"></SPAN> which does not depend on any observation
(but depends of course on knowing how to characterize all the
classes <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN>, which means knowing the parameter set <!-- MATH $\ensuremath\boldsymbol{\Theta}$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img61.gif" ALT="$ \ensuremath\boldsymbol{\Theta}$"></SPAN>).<P></LI><LI>For some classification tasks (e.g. speech recognition), it is
practical to resort to <EM>Bayes' law</EM>, which makes use of <EM>
likelihoods</EM> (see sec. <A HREF="node1.html#sec:likelihood">1.3</A>), rather than trying
to directly estimate the posterior probability <!-- MATH $P(q_k|X,\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="73" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img97.gif" ALT="$ P(q_k\vert X,\ensuremath\boldsymbol{\Theta})$"></SPAN>.
Bayes' law says:
<P></P><DIV ALIGN="CENTER" CLASS="mathdisplay"><A NAME="eq:decision-rule"></A><!-- MATH \begin{equation}P(q_k|X,\ensuremath\boldsymbol{\Theta}) = \frac{p(X|q_k,\ensuremath\boldsymbol{\Theta})\; P(q_k|\ensuremath\boldsymbol{\Theta})}{p(X|\ensuremath\boldsymbol{\Theta})}\end{equation} --><TABLE CLASS="equation" CELLPADDING="0" WIDTH="100%" ALIGN="CENTER"><TR VALIGN="MIDDLE"><TD NOWRAP ALIGN="CENTER"><SPAN CLASS="MATH"><IMG WIDTH="216" HEIGHT="47" ALIGN="MIDDLE" BORDER="0" SRC="img99.gif" ALT="$\displaystyle P(q_k\vert X,\ensuremath\boldsymbol{\Theta}) = \frac{p(X\vert q_k......k\vert\ensuremath\boldsymbol{\Theta})}{p(X\vert\ensuremath\boldsymbol{\Theta})}$"></SPAN></TD><TD NOWRAP CLASS="eqno" WIDTH="10" ALIGN="RIGHT">(<SPAN CLASS="arabic">4</SPAN>)</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P></P>where <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="25" ALIGN="MIDDLE" BORDER="0" SRC="img91.gif" ALT="$ q_k$"></SPAN> is a class, <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN> is a sample containing one or more
feature vectors and <!-- MATH $\ensuremath\boldsymbol{\Theta}$ --><SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img61.gif" ALT="$ \ensuremath\boldsymbol{\Theta}$"></SPAN> is the parameter set of all the class
models.<P></LI><LI>The speech features are usually considered equi-probable:
<!-- MATH $p(X|\ensuremath\boldsymbol{\Theta})=\text{const.}$ --><SPAN CLASS="MATH"><IMG WIDTH="65" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img100.gif" ALT="$ p(X\vert\ensuremath\boldsymbol{\Theta})=$">const.</SPAN> (uniform prior distribution for <SPAN CLASS="MATH"><IMG WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$ X$"></SPAN>).
Hence, <!-- MATH $P(q_k|X,\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="73" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img97.gif" ALT="$ P(q_k\vert X,\ensuremath\boldsymbol{\Theta})$"></SPAN> is proportional to <!-- MATH $p(X|q_k,\ensuremath\boldsymbol{\Theta}) P(q_k|\ensuremath\boldsymbol{\Theta})$ --><SPAN CLASS="MATH"><IMG WIDTH="121" HEIGHT="28" ALIGN="MIDDLE" BORDER="0" SRC="img101.gif" ALT="$ p(X\vert q_k,\ensuremath\boldsymbol{\Theta}) P(q_k\vert\ensuremath\boldsymbol{\Theta})$"></SPAN>
for all classes:
<!-- MATH \begin{displaymath}P(q_k|X,\ensuremath\boldsymbol{\Theta}) \propto p(X|q_k,\ensuremath\boldsymbol{\Theta})\; P(q_k|\ensuremath\boldsymbol{\Theta}), \quad \forall k\end{displaymath} -->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -