📄 labman1.tex

📁 一本关于dsp的原著教材
💻 TEX
📖 第 1 页 / 共 4 页
字号:
N}\left(\mu_{/i/},\Sigma_{/i/}\right)$ and ${\cal N}\left(\mu_{/e/},\Sigma_{/e/}\right)$,then ${\cal N}\left(\mu_{/i/},\Sigma_{/e/}\right)$ and ${\calN}\left(\mu_{/e/},\Sigma_{/e/}\right)$.}\end{figure}%%%%%%%%%%%%The iso-likelihood lines for the Gaussian pdfs ${\calN}\left(\mu_{/i/},\Sigma_{/i/}\right)$ and ${\cal N}\left(\mu_{/e/},\Sigma_{/e/}\right)$,which we used before to model the class /i/ and the class /e/, are plottedon the next page (figure~\ref{iso}). On a second graph, the iso-likelihoodlines for ${\cal N}\left(\mu_{/i/},\Sigma_{/e/}\right)$ and ${\calN}\left(\mu_{/e/},\Sigma_{/e/}\right)$ (two pdfs with the same covariance matrix$\Sigma_{/e/}$) are represented. On these figures, use a colored pen to jointhe intersections of the level lines that correspond to equal likelihoods.\subsubsection*{Question\,:}What is the nature of the surface that separates class /i/ from class /e/when the two models have {\em different} variances ? Can you explain theorigin of this form ?What is the nature of the surface that separates class /i/ from class /e/when the two models have the {\em same} variances ? Why is it differentfrom the previous discriminant surface ?\subsubsection*{Answer\,:}\expl{In the case of different variances, the discriminant surface is aparabola. As a matter of fact, in the Gaussian case\,:\[\log p(x|\Theta)\simeq\; - \frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu) \nonumber\; - \frac{1}{2} \log \left( \det\left(\Sigma\right) \right)\]which is a quadratic form. Therefore, the equation\,:\[\log p(x|\Theta_1) = \log p(x|\Theta_2)\]which describes the discriminant surface is necessarily a second orderequation. Hence, its solution describes a parabolic discriminant surface.\tab In the case where the covariance matrices are equal, the separationbetween the class 1 and the class 2 does not depend upon the covariance$\Sigma$ any more\,:\hfill {\em (Continued on next page...)}}\expl{{\bf Answer, continued\,:}\begin{align*}& & - \frac{1}{2} (x-\mu_1)^T \Sigma^{-1} (x-\mu_1)\; - \frac{1}{2} \log \left( \det\left(\Sigma\right) \right)\; & =\; - \frac{1}{2} (x-\mu_2)^T \Sigma^{-1} (x-\mu_2)\; - \frac{1}{2} \log \left( \det\left(\Sigma\right) \right)\\\\\Leftrightarrow & &- \frac{1}{2} (x-\mu_1)^T \Sigma^{-1} (x-\mu_1) & =- \frac{1}{2} (x-\mu_2)^T \Sigma^{-1} (x-\mu_2)\\\\\Leftrightarrow & &- \frac{1}{2} x^T \Sigma^{-1} x+ \mu_1^T \Sigma^{-1} x- \frac{1}{2} \mu_1^T \Sigma^{-1} \mu_1& =- \frac{1}{2} x^T \Sigma^{-1} x+ \mu_2^T \Sigma^{-1} x- \frac{1}{2} \mu_2^T \Sigma^{-1} \mu_2\end{align*}\medskipHence, the equation of the surface that separates class $q_1$ from class$q_2$ becomes\,:\[( \mu_1 - \mu_2 )^T \Sigma^{-1} x- \frac{1}{2} ( \mu_1 - \mu_2 )^T \Sigma^{-1} ( \mu_1 - \mu_2 )= 0\]which has a linear form. In the equal covariance case, using a Bayesianclassifier with Gaussian densities is therefore equivalent to usingdiscriminant functions of the form\,:\[f_k(x) = w_k^T x + w_{k0}\]where\begin{eqnarray}w_k^T  & = & \mu_k^T \Sigma^{-1}\nonumber \\w_{k0} & = & - \frac{1}{2} \mu_k^T \Sigma^{-1} \mu_k \nonumber\end{eqnarray}\tab As a summary, we have shown that Bayesian classifiers with Gaussianmodels separate the classes with combinations of parabolic surfaces. If thecovariance matrices of the models are equal, the parabolic separationsurfaces become simple hyper-planes.}\bigskip%%%%%%%%%%%%%%%%%%\section{Unsupervised training}\label{unsup}%%%%%%%%%%%%%%%%%%In the previous section, we have computed the models for classes /a/, /e/,/i/, /o/ and /y/ by knowing a-priori which training samples belongs towhich class (we were disposing of a {\em labeling} of the trainingdata). Hence, we have performed a {\em supervised training} of Gaussianmodels.  Now, suppose that we only have unlabeled training data that wewant to separate in several classes (e.g., 5 classes) without knowinga-priori which point belongs to which class. This is called {\emunsupervised training}. Several algorithms are available for that purpose,among which\,: the K-means, the Viterbi-EM and the EM(Expectation-Maximization) algorithm.%%\newcommand{\PBS}[1]{\let\temp=\\#1\let\\=\temp}\newcommand{\RR}{\PBS\raggedright\hspace{0pt}}\newcommand{\RL}{\PBS\raggedleft\hspace{0pt}}\newcommand{\CC}{\PBS\centering\hspace{0pt}}\begin{sidewaystable}[p]\setlength{\arrayrulewidth}{1.2pt}\renewcommand{\arraystretch}{2}%\begin{center}\hspace{-1.5em}\begin{tabular}{|>{\RR}m{6em}|>{\RR}m{7.5em}|>{\CC}m{20em}|>{\RR}m{24.5em}|>{\CC}m{8em}|}\hline\centering\bf Algorithm &\centering\bf Parameters &\bf Membership measure &\centering\bf Update method &\bf Global criterion \\ \hline%Kmeans    &$\bullet$ mean $\mu_k$ &Euclidean distance \linebreak\[d_k(x_n)= \sqrt{(x_n-\mu_k)^T(x_n-\mu_k)}\] \linebreak(or the square of it)&Find the points closest to $q_k^{(old)}$, then\,:\medskip$\bullet$ $\mu_{k}^{(new)}$ = mean of the points closest to $q_k^{(old)}$ &Least squares \\ \hline% %% Kmeans w/ \linebreak Mahalanobis distance &% $\bullet$ mean $\mu_k$ \linebreak% $\bullet$ variance $\Sigma_k$ &% Mahalanobis distance, \linebreak (Weighted Euclidean distance)% \[d_k(x_n)= \sqrt{(x_n-\mu_k)^T\Sigma_k^{-1}(x_n-\mu_k)}\]% &% Find the points closest to $q_k^{(old)}$, then\,:% \medskip% $\bullet$ $\mu_{k}^{(new)}$ = mean of the points closest to $q_k^{(old)}$% \medskip% $\bullet$ $\Sigma_{k}^{(new)}$ = variance of the points closest to $q_k^{(old)}$ &% Minimal \linebreak spread of the data \linebreak (in a Mahalanobis distance sense) \\ \hline%Viterbi-EM &$\bullet$ mean $\mu_k$ \linebreak$\bullet$ variance $\Sigma_k$ \linebreak$\bullet$ priors $P(q_k|\Theta)$&Posterior probability\[d_k(x_n) = P(q_k|x_n,\Theta)\]\footnotesize\[ \propto \frac{1}{\sqrt{2\pi}^d \sqrt{\det\left(\Sigma_k\right)}}\, e^{-\frac{1}{2} (x_n-\mu_k)^T \Sigma_k^{-1} (x_n-\mu_k)} \cdot P(q_k|\Theta)\]\normalsize&Do Bayesian classification of each data point, then\,:\medskip$\bullet$ $\mu_{k}^{(new)}$ = mean of the points belonging to $q_k^{(old)}$\medskip$\bullet$ $\Sigma_{k}^{(new)}$ = variance of the points belonging to $q_k^{(old)}$\medskip$\bullet$ $P(q_k^{(new)}|\Theta^{(new)})$ = number of training pointsbelonging to $q_k^{(old)}$ / total number of training points&Maximum likelihood \\ \hline%EM &$\bullet$ mean $\mu_k$ \linebreak$\bullet$ variance $\Sigma_k$ \linebreak$\bullet$ priors $P(q_k|\Theta)$ &Posterior probability\[d_k(x_n) = P(q_k|x_n,\Theta)\]\footnotesize\[ \propto \frac{1}{\sqrt{2\pi}^d \sqrt{\det\left(\Sigma_k\right)}}\, e^{-\frac{1}{2} (x_n-\mu_k)^T \Sigma_k^{-1} (x_n-\mu_k)} \cdot P(q_k|\Theta)\]\normalsize&Compute $P(q_k^{(old)}|x_n,\Theta^{(old)})$ (soft classification), then\,:\medskip$\bullet$ $\mu_{k}^{(new)} = \frac{\sum_{n=1}^{N} x_n P(q_k^{(old)}|x_n,\Theta^{(old)})}								  {\sum_{n=1}^{N} P(q_k^{(old)}|x_n,\Theta^{(old)})} $\medskip$\bullet$ $\Sigma_{k}^{(new)} = \frac{\sum_{n=1}^{N} P(q_k^{(old)}|x_n,\Theta^{(old)})								(x_n - \mu_k^{(new)})(x_n - \mu_k^{(new)})^T }							    {\sum_{n=1}^{N} P(q_k^{(old)}|x_n,\Theta^{(old)})} $\medskip$\bullet$ $P(q_k^{(new)}|\Theta^{(new)}) = \frac{1}{N} \sum_{n=1}^{N} P(q_k^{(old)}|x_n,\Theta^{(old)}) $ &Maximum likelihood \\ \hline\end{tabular}%\end{center}\caption{\label{algos}Characteristics of some usual unsupervised clustering algorithms.}\end{sidewaystable}%%All these algorithms are characterized by the following components\,:\begin{itemize}\item a set of models $q_k$ (not necessarily Gaussian), defined by someparameters $\Theta$ (means, variances, priors,...);\item a measure of membership, telling to which extent a data point``belongs'' to a model;\item a ``recipe'' to update the model parameters in function of themembership information.\end{itemize}The measure of membership usually takes the form of a measure of distanceor the form of a measure of probability. It replaces the missing labelinginformation to permit the application of standard parameter estimationtechniques.  It also defines implicitly a global criterion of ``goodnessof fit'' of the models to the data, e.g.\,:\begin{itemize}\item in the case of a distance, the models that are globally closer fromthe data characterize it better;\item in the case of a probability measure, the models bringing a betterlikelihood for the data explain it better.\end{itemize}Table~\ref{algos} summarizes the components of each of the algorithm thatwill be studied in the following experiments. More detail will be given inthe corresponding subsections.\pagebreak%%%%%%%%%\subsection{K-means algorithm}%%%%%%%%%\subsubsection*{Synopsis of the algorithm\,:}\begin{itemize}\item Start with $K$ initial prototypes $\mu_k$, $k=1,\cdots,K$.\item {\bf Do}\,:\begin{enumerate}\item For each data-point $x_n$, $n=1,\cdots,N$, compute the squared Euclideandistance from the $k^{th}$ prototype\,:\begin{eqnarray}\label{eq:dist}d_k(x_n) & = & \, \left\| x_n - \mu_k \right\|^2 \nonumber \\         & = & (x_n-\mu_k)^T(x_n-\mu_k)   \nonumber\end{eqnarray}\item Assign each data-point $x_n$ to its {\bf closest} prototype $\mu_k$,i.e. assign $x_n$ to the class $q_k$ if\,:\[d_k(x_n) \, \leq \, d_l(x_n),\;\; \forall l \neq k\]{\em Note}\,: using the square of the Euclidean distance for theclassification gives the same result as using the true Euclidean distance,since the square root is a monotonically growing function. But thecomputational load is obviously lighter when the square root is dropped.\item Replace each prototype with the mean of the data-points assigned tothe corresponding class;\item Go to 1.\end{enumerate}\item {\bf Until}\,: no further change occurs.\end{itemize}The global criterion defined in the present case is\,:\[	J = \sum_{k=1}^{K} \sum_{x_n \in q_k} d_k(x_n)\]and represents the total squared distance between the data and the modelsthey belong to. This criterion is locally minimized by the algorithm.% Alternately, the Euclidean distance used in step 1. can be replaced by the% Mahalanobis distance, belonging to the class of weighted Euclidean% distances\,:% \[% 	d_k(x_n)= \sqrt{(x_n-\mu_k)^T\Sigma_k^{-1}(x_n-\mu_k)}% \]% where $\Sigma_k$ is the covariance of the points associated with the class% $q_k$.% After the convergence of the algorithm, the final clusters of points may be% used to make Gaussian models since we dispose of means and variances.\subsubsection*{Experiment\,:}Use the K-means explorer utility\,:\begin{verbatim} KMEANS K-means algorithm exploration tool   Launch it with KMEANS(DATA,NCLUST) where DATA is the matrix   of observations (one observation per row) and NCLUST is the   desired number of clusters.   The clusters are initialized with a heuristic that spreads   them randomly around mean(DATA) with standard deviation   sqrtm(cov(DATA)).   If you want to set your own initial clusters, use   KMEANS(DATA,MEANS) where MEANS is a cell array containing   NCLUST initial mean vectors.   Example: for two clusters     means{1} = [1 2]; means{2} = [3 4];     kmeans(data,means);\end{verbatim}\pagebreakLaunch it with the data sample \com{allvow}, which was part of file\com{vowels.mat} and gathers all the simulated vowels data. Do several runswith different cases of initialization of the algorithm\,:\begin{enumerate}\item 5 initial clusters determined according to the default heuristic;\item some initial \com{MEANS} values equal to some data points;\item some initial \com{MEANS} values equal to $\{\mu_{/a/}, \mu_{/e/},\mu_{/i/}, \mu_{/o/}, \mu_{/y/}\}$.\end{enumerate}Iterate the algorithm until its convergence. Observe the evolution of thecluster centers, of the data-points attribution chart and of the totalsquared Euclidean distance. (It is possible to zoom these plots\,: leftclick inside the axes to zoom $2\times$ centered on the point under themouse; right click to zoom out; click and drag to zoom into an area; doubleclick to reset the figure to the original). %Observe the mean and variance% values found after the convergence of the algorithm.Observe the mean values found after the convergence of the algorithm.\subsubsection*{Example\,:}\mat{kmeans(allvow,5);}- or - \\\mat{means =  \{ mu\_a, mu\_e, mu\_i, mu\_o, mu\_y \};}\mat{kmeans(allvow,means);}Enlarge the window, then push the buttons, zoom etc.After the convergence, use\,:\\\mat{for k=1:5, disp(kmeans\_result\_means\{k\}); end}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -