📄 gaussian.tex
字号:
\documentclass[a4paper]{article}
\usepackage{array,amsmath,amssymb,rotating,graphicx}
\input{tutorial_style.texinc}
\newcommand{\mat}[1]{{\tt >> #1} \\}
\newcommand{\com}[1]{{\tt #1}}
%\newcommand{\tit}[1]{{\noindent \bf #1 \\}}
\newcommand{\tab}{\hspace{1em}}
% for MATH MODE
\newcommand{\trn}{^{\mathsf T}} % transposition
\newcommand{\xv}{\ensuremath\mathbf{x}} % vector x
\newcommand{\muv}{\ensuremath\boldsymbol{\mu}} % vector mu
\newcommand{\Sm}{\ensuremath\boldsymbol{\Sigma}} % matrix Sigma
\newcommand{\Tm}{\ensuremath\boldsymbol{\Theta}} % matrix Sigma
\newcommand{\Rf}{\ensuremath\mathbb{R}}
\setlength{\hoffset}{-1in}
\setlength{\voffset}{-1in}
\setlength{\topskip}{0cm}
\setlength{\headheight}{0cm}
\setlength{\headsep}{0cm}
\setlength{\textwidth}{16cm}
\setlength{\evensidemargin}{2.5cm}
\setlength{\oddsidemargin}{2.5cm}
\setlength{\textheight}{24cm}
\setlength{\topmargin}{2.5cm}
\setlength{\headheight}{0.5cm}
\setlength{\headsep}{0.5cm}
%\pagestyle{fancyplain}
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%% Make Title %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\author{\small Barbara Resch (minor changes by Erhard Rank)\\
Signal Processing and Speech Communication Laboratory\\
Inffeldgasse 16c/II\\
phone 873--4436}
\date{}
\MakeTutorialTitle{Gaussian Statistics and Unsupervised Learning}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection*{Abstract}
This tutorial presents the properties of the Gaussian probability
density function. Subsequently, supervised and unsupervised pattern
recognition methods are treated. Supervised classification algorithms
are based on a labeled data set. The knowledge about the class
membership of this training data set is used for the classification of
new samples. Unsupervised learning methods establish clusters from an
unlabeled training data set. Clustering algorithms such as the
$K$-means, the EM (expectation-maximization) algorithm, and the
Viterbi-EM algorithm are presented.
\subsubsection*{Usage} To make full use of this tutorial you have to
\begin{enumerate}
\item download the file
\HREF{http://www.igi.tugraz.at/lehre/CI/tutorials/Gaussian.zip}
{\texttt{Gaussian.zip}} which contains this tutorial \html{in
printable format (\href{Gaussian.pdf}{PDF} and
\href{Gaussian.ps.gz}{ps.gz})} and the accompanying Matlab
programs.
\item Unzip \texttt{Gaussian.zip} which will generate a
subdirectory named \texttt{Gaussian/matlab} where you can find
all the Matlab programs.
\item Add the path \texttt{Gaussian/matlab} to the matlab search path,
for example with a command like
\verb#addpath('C:\Work\Gaussian\matlab')# if you are using a Windows
machine, or by using a command like
\verb#addpath('/home/jack/Gaussian/matlab')# if you are on a
Unix/Linux machine.
\end{enumerate}
\subsubsection*{Sources}
This tutorial is based on
% ftp://ftp.idiap.ch/pub/sacha/labs/labman1.pdf
% or http://www.ai.mit.edu/~murphyk/Software/HMM/labman2.pdf
\begin{itemize}
\item EPFL lab notes
``Introduction to Gaussian Statistics and Statistical Pattern
Recognition'' by Herv\'e Bourlard, Sacha Krstulovi\'c, and Mathew
Magimai-Doss.
\end{itemize}
\html{
\subsubsection*{Contents}
}
\section{Gaussian statistics}
%%%%%%%%%
%%%%%%%%%
%%%%%%%%%
\subsection{Samples from a Gaussian density}
\label{samples}
%%%%%%%%%
\subsubsection*{Useful formulas and definitions:}\label{sec:gausspdf}
\begin{itemize}
\item The {\em Gaussian probability density function (pdf)} for the
$d$-dimensional random variable $\xv \circlearrowleft {\cal
N}(\muv,\Sm)$ (i.e., variable $\xv \in \Rf^d$ following the
Gaussian, or Normal, probability law) is given by:
\begin{equation}
\label{eq:gauss}
g_{(\muv,\Sm)}(\xv) = \frac{1}{\sqrt{2\pi}^d
\sqrt{\det\left(\Sm\right)}} \, e^{-\frac{1}{2} (\xv-\muv)\trn
\Sm^{-1} (\xv-\muv)}
\end{equation}
where $\muv$ is the mean vector and $\Sm$ is the covariance matrix.
$\muv$ and $\Sm$ are the {\em parameters} of the Gaussian
distribution.
\item The mean vector $\muv$ contains the mean values of each
dimension, $\mu_i = E(x_i)$, with $E(x)$ being the \emph{expected
value} of $x$.
\item All of the variances $c_{ii}$ and covariances $c_{ij}$ are
collected together into the covariance matrix $\Sm$ of dimension
$d\times d$:
\begin{equation*}
\Sm =
\left[
\begin{array}{*{4}{c}}
c_{11} & c_{12} & \cdots & c_{1n} \\
c_{21} & c_{22} & \cdots & c_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
c_{n1} & c_{n2} & \cdots & c_{nn} \\
\end{array}
\right]
\end{equation*}
The covariance $c_{ij}$ of two components $x_i$ and $x_j$ of $\xv$
measures their tendency to vary together, i.e., to co-vary,
\[ c_{ij} = E\left((x_i-\mu_i)\trn\,(x_j-\mu_j)\right).\]
If two components $x_i$ and $x_j$, $i\ne j$, have zero covariance
$c_{ij} = 0$ they are {\em orthogonal} in the statistical sense, which
transposes to a geometric sense (the expectation is a scalar product
of random variables; a null scalar product means orthogonality). If
all components of $\xv$ are mutually orthogonal the covariance matrix
has a diagonal form.
\item $\sqrt{\Sm}$ defines the {\em standard deviation} of the random
variable $\xv$. Beware: this square root is meant in the {\em matrix
sense}.
\item If $\xv \circlearrowleft {\cal N}(\mathbf{0},\mathbf{I})$ ($\xv$
follows a normal law with zero mean and unit variance; $\mathbf{I}$
denotes the identity matrix), and if $\mathbf{y} = \muv +
\sqrt{\Sm}\,\xv$, then $\mathbf{y} \circlearrowleft {\cal
N}(\muv,\Sm)$.
\end{itemize}
\subsubsection{Experiment:}
Generate samples $X$ of $N$ points, $X=\{\xv_1,
\xv_2,\ldots,\xv_N\}$, with $N=10000$, coming from a 2-dimensional
Gaussian process that has mean
\[
\muv = \left[ \begin{array}{c} 730 \\ 1090 \end{array} \right]
\]
and variance
\begin{itemize}
%%%%%
\item
8000 for both dimensions ({\em spherical process}) (sample $X_1$):
\[
\Sm_1 = \left[ \begin{array}{cc}
8000 & 0 \\
0 & 8000
\end{array} \right]
\]
%%%%%
\item
expressed as a {\em diagonal} covariance matrix (sample $X_2$):
\[
\Sm_2 = \left[ \begin{array}{cc}
8000 & 0 \\
0 & 18500
\end{array} \right]
\]
%%%%%
\item
expressed as a {\em full} covariance matrix (sample $X_3$):
\[
\Sm_3 = \left[ \begin{array}{cc}
8000 & 8400 \\
8400 & 18500
\end{array} \right]
\]
%%%%%
\end{itemize}
%
Use the function \com{gausview} (\com{>> help gausview}) to plot the
results as clouds of points in the 2-dimensional plane, and to view the
corresponding 2-dimensional probability density functions (pdfs) in 2D and
3D.
\subsubsection*{Example:}
\mat{N = 10000;}
\mat{mu = [730 1090]; sigma\_1 = [8000 0; 0 8000];}
\mat{X1 = randn(N,2) * sqrtm(sigma\_1) + repmat(mu,N,1);}
\mat{gausview(X1,mu,sigma\_1,'Sample X1');}
%
Repeat for the two other variance matrices $\Sm_2$ and $\Sm_3$.
Use the radio buttons to switch the plots on/off. Use the ``view''
buttons to switch between 2D and 3D. Use the mouse to rotate the plot
(must be enabled in \com{Tools} menu: \com{Rotate 3D}, or by the
$\circlearrowleft$ button).
\subsubsection*{Questions:}
By simple inspection of 2D views of the data and of the corresponding
pdf contours, how can you tell which sample corresponds to a spherical
process (as the sample $X_1$), which sample corresponds to a process
with a diagonal covariance matrix (as $X_2$), and which to a process
with a full covariance matrix (as $X_3$)?
\subsubsection*{Find the right statements:}
\begin{itemize}
\item[$\Box$] In process 1 the first and the second component of the
vectors $\xv_i$ are independent.
\item[$\Box$] In process 2 the first and the second component of the
vectors $\xv_i$ are independent.
\item[$\Box$] In process 3 the first and the second component of the
vectors $\xv_i$ are independent.
\item[$\Box$] If the first and second component of the vectors $\xv_i$
are independent, the cloud of points and the pdf contour has the
shape of a circle.
\item[$\Box$] If the first and second component of the vectors $\xv_i$
are independent, the cloud of points and pdf contour has to be
elliptic with the principle axes of the ellipse aligned with the
abscissa and ordinate axes.
\item[$\Box$] For the covariance matrix $\Sm$ the elements have to
satisfy $c_{ij} = c_{ji}$.
\item[$\Box$] The covariance matrix has to be positive definite
($\xv\trn \Sm\, \xv \ge 0$). (If yes, what happens if not? Try it
out in \textsc{Matlab}).
\end{itemize}
%\pagebreak
%%%%%%%%%
\subsection{Gaussian modeling: Mean and variance of a sample}
%%%%%%%%%
We will now estimate the parameters $\muv$ and $\Sm$ of the Gaussian
models from the data samples.
\subsubsection*{Useful formulas and definitions:}
\begin{itemize}
\item Mean estimator: $\displaystyle \hat{\muv} = \frac{1}{N}
\sum_{i=1}^{N} \xv_i$
\item Unbiased covariance estimator: $\displaystyle \hat{\Sm} =
\frac{1}{N-1} \; \sum_{i=1}^{N} (\xv_i-\muv)\trn (\xv_i-\muv) $
\end{itemize}
\subsubsection{Experiment:}
Take the sample $X_3$ of 10000 points generated from ${\cal
N}(\muv,\Sm_3)$. Compute an estimate $\hat{\muv}$ of its mean and an
estimate $\hat{\Sm}$ of its variance:
\begin{enumerate}
\item with all the available points \hspace{1cm}$\hat{\muv}_{(10000)}
=$\hspace{3.5cm}$\hat{\Sm}_{(10000)} =$ \vspace{0.8cm}
\item with only 1000 points \hspace{1.95cm}$\hat{\muv}_{(1000)}
=$\hspace{3.7cm}$\hat{\Sm}_{(1000)} =$ \vspace{0.8cm}
\item with only 100 points \hspace{2.2cm}$\hat{\muv}_{(100)}
=$\hspace{3.9cm}$\hat{\Sm}_{(100)} =$ \vspace{0.8cm}
\end{enumerate}
Compare the estimated mean vector $\hat{\muv}$ to the original mean
vector $\muv$ by measuring the Euclidean distance that separates them.
Compare the estimated covariance matrix $\hat{\Sm}$ to the original
covariance matrix $\Sm_3$ by measuring the matrix 2-norm of their
difference (the norm $\|\mathbf{A}-\mathbf{B}\|_2$ constitutes a
measure of similarity of two matrices $\mathbf{A}$ and $\mathbf{B}$;
use {\sc Matlab}'s \com{norm} command).
\subsubsection{Example:}
In the case of 1000 points (case 2.): \\
\mat{X = X3(1:1000,:);}
\mat{N = size(X,1)}
\mat{mu\_1000 = sum(X)/N}
\textit{--or--}\\
\mat{mu\_1000 = mean(X)}
\mat{sigma\_1000 = (X - repmat(mu\_1000,N,1))' * (X - repmat(mu\_1000,N,1)) / (N-1)}
\textit{--or--}\\
\mat{sigma\_1000 = cov(X)}
\noindent
\mat{\% Comparison of means and covariances:}
\mat{e\_mu = sqrt((mu\_1000 - mu) * (mu\_1000 - mu)')}
\mat{\% (This is the Euclidean distance between mu\_1000 and mu)}
\mat{e\_sigma = norm(sigma\_1000 - sigma\_3)}
\com{>> \% (This is the 2-norm of the difference between sigma\_1000 and sigma\_3)}
\subsubsection{Question:}
When comparing the estimated values $\hat{\muv}$ and $\hat{\Sm}$ to
the original values of $\muv$ and $\Sm_3$ (using the Euclidean
distance and the matrix 2-norm), what can you observe?
\subsubsection*{Find the right statements:}
\begin{itemize}
\item[$\Box$] An accurate mean estimate requires more points than an
accurate variance estimate.
\item[$\Box$] It is very important to have enough training examples to
estimate the parameters of the data generation process accurately.
\end{itemize}
%\pagebreak
\subsection{Likelihood of a sample with respect to a Gaussian model}
\label{sec:likelihood}
In the following we compute the likelihood of a sample point $\xv$,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -