📄 labman2.tex

📁 一本关于dsp的原著教材
💻 TEX
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34
	\begin{eqnarray}		\delta_{t+1}(j) & = & \max_{2 \leq i \leq N-1}			\left[ \delta_{t}(i) \cdot a_{ij} \right]			\cdot b_j(x_{t+1}),		\;\;\;\; \begin{array}{l} 1 \leq t \leq T-1 \\ 2 \leq j \leq N-1 \end{array}		\nonumber \\		\psi_{t+1} & = & \mbox{arg}\hspace{-0.5ex}\max_{\hspace{-3ex}2 \leq i \leq N-1}		\left[ \delta_{t}(i) \cdot a_{ij} \right],		\;\;\;\; \begin{array}{l} 1 \leq t \leq T-1 \\ 2 \leq j \leq N-1 \end{array}		\nonumber	\end{eqnarray}%{\it ``Optimal policy is composed of optimal sub-policies''}\,: find thepath that leads to a maximum likelihood considering the best likelihood atthe previous step and the transitions from it; then multiply by the currentlikelihood given the current state. Hence, the best path is found byinduction.\item {\bf Termination}	\begin{eqnarray}		p^*(X|\Theta) & = & \max_{2 \leq i \leq N-1}			\left[ \delta_{T}(i) \cdot a_{iN} \right] \nonumber \\		q_T^* & = & \mbox{arg}\hspace{-0.5ex}\max_{\hspace{-3ex}2 \leq i \leq N-1}			\left[ \delta_{T}(i) \cdot a_{iN} \right]		\nonumber	\end{eqnarray}%Find the best likelihood when the end of the observation sequence isreached, given that the final state is the non-emitting state $N$.%\item {\bf Backtracking}	\[		Q^* = \{q_1^*,\cdots,q_T^*\} \;\;\;\;\mbox{so that}\;\;\;\;		q_t^* = \psi_{t+1}(q_{t+1}^*), \;\;\;\; t = T-1, T-2, \cdots, 1	\]%Read (decode) the best sequence of states from the $\psi_t$ vectors.\end{enumerate}\pagebreakHence, the Viterbi algorithm delivers {\em two} useful results, given anobservation sequence $X=\{x_1,\cdots,x_T\}$ and a model $\Theta$\,:\begin{itemize}\item the selection, among all the possible paths in the considered model,of the {\em best path} $Q^* = \{q^*_1,\cdots,q^*_T\}$, which corresponds tothe state sequence giving a maximum of likelihood to the observationsequence~$X$;%\item the {\em likelihood along the best path}, $p(X,Q^*|\Theta) =p^*(X|\Theta)$. As opposed to the the forward procedure, where all thepossible paths are considered, the Viterbi computes a likelihood along thebest path only.\end{itemize}(For more detail about the Viterbi algorithm, refer to~\cite{RAB93},chap.6.4.1).\subsubsection*{Questions\,:}\begin{enumerate}\item From an algorithmic point of view, what is the main differencebetween the computation of the $\delta$ variable in the Viterbi algorithmand that of the $\alpha$ variable in the forward procedure ?\item Give the log version of the Viterbi algorithm.\end{enumerate}\subsubsection*{Answers\,:}\expl{\begin{enumerate}\item The sums that were appearing in the computation of $\alpha$ become$\max$ operations in the computation of $\delta$. Hence, the Viterbiprocedure takes less computational power than the forward recursion.\item\begin{enumerate}\item {\bf Initialization}%	\begin{eqnarray}		\delta_1^{(log)}(i) & = & \log a_{1i} + \log b_i(x_1),		\;\;\;\; 2 \leq i \leq N-1 \nonumber \\		\psi_1(i) & = & 0 \nonumber	\end{eqnarray}%\item {\bf Recursion}	\begin{eqnarray}		\delta_{t+1}^{(log)}(j) & = & \max_{2 \leq i \leq N-1}			\left[ \delta_{t}^{(log)}(i) + \log a_{ij} \right]			 + \log b_j(x_{t+1}),		\;\;\;\; \begin{array}{l} 1 \leq t \leq T-1 \\ 2 \leq j \leq N-1 \end{array}		\nonumber \\		\psi_{t+1} & = & \mbox{\rm arg}\hspace{-0.5ex}\max_{\hspace{-3ex}2 \leq i \leq N-1}		\left[ \delta_{t}^{(log)}(i) + \log a_{ij} \right],		\;\;\;\; \begin{array}{l} 1 \leq t \leq T-1 \\ 2 \leq j \leq N-1 \end{array}		\nonumber	\end{eqnarray}\item {\bf Termination}	\begin{eqnarray}		\log p^*(X|\Theta) & = & \max_{2 \leq i \leq N-1}			\left[ \delta_{T}^{(log)}(i) + \log a_{iN} \right] \nonumber \\		q_T^* & = & \mbox{\rm arg}\hspace{-0.5ex}\max_{\hspace{-3ex}2 \leq i \leq N-1}			\left[ \delta_{T}^{(log)}(i) + \log a_{iN} \right]		\nonumber	\end{eqnarray}%\item {\bf Backtracking}	\[		Q^* = \{q_1^*,\cdots,q_T^*\} \;\;\;\;\mbox{so that}\;\;\;\;		q_t^* = \psi_{t+1}(q_{t+1}^*) \;\;\;\; t = T-1, T-2, \cdots, 1	\]\end{enumerate}In this version, the {\rm logsum} operation (involving the computation ofan exponential) is avoided, alleviating even further the computationalload.\end{enumerate}}\subsubsection*{Experiments\,:}\begin{enumerate}\item Use the function \com{logvit} to find the best path of the sequences$X_1, \cdots X_6$ with respect to the most likely model found insection~\ref{sub:class} (i.e. $X_1$:\,HMM1, $X_2$:\,HMM3, $X_3$:\,HMM5,$X_4$:\,HMM4, $X_5$:\,HMM6 and $X_6$:\,HMM2). Compare with the statesequences $ST_1, \cdots ST_6$ originally used to generate $X_1, \cdots X_6$(use the function \com{compseq}, which provides a view of the firstdimension of the observations as a time series, and allows to compare theoriginal alignment to the Viterbi solution).%\item Use the function \com{logvit} to compute the probabilities of thesequences $X_1, \cdots X_6$ along the best paths with respect to each model$\Theta_1, \cdots \Theta_6$. Note your results below. Compare with thelog-likelihoods obtained in the section~\ref{sub:class} with the forwardprocedure.\end{enumerate}\subsubsection*{Examples\,:}\begin{enumerate}\item Best paths and comparison with the original paths\,: \\\mat{figure;}\mat{[STbest,bestProb] = logvit(X1,hmm1); compseq(X1,ST1,STbest);}\mat{[STbest,bestProb] = logvit(X2,hmm3); compseq(X2,ST2,STbest);}Repeat for the remaining sequences.%\item Probabilities along the best paths for all the models\,: \\\mat{[STbest,bestProb(1,1)] = logvit(X1,hmm1);}\mat{[STbest,bestProb(1,2)] = logvit(X1,hmm2);}etc. \\\mat{[STbest,bestProb(3,2)] = logvit(X3,hmm2);}%etc. (You can also use loops here.)To compare with the complete log-likelihood, issued by the forwardrecurrence\,: \\%\mat{diffProb = logProb - bestProb}\end{enumerate}\noindentLikelihoods along the best path\,: \\[0.5em]\renewcommand{\arraystretch}{1.5}\begin{tabular}{|c|c|c|c|c|c|c|c|} \hline Sequence &\small $\log p^*(X|\Theta_1)$ & \small $\log p^*(X|\Theta_2)$ &\small $\log p^*(X|\Theta_3)$ & \small $\log p^*(X|\Theta_4)$ &\small $\log p^*(X|\Theta_5)$ & \small $\log p^*(X|\Theta_6)$ &\parbox[c][3em][c]{11ex}{Most likely\\ model} \\ \hlineX1 & & & & & & & \\ \hlineX2 & & & & & & & \\ \hlineX3 & & & & & & & \\ \hlineX4 & & & & & & & \\ \hlineX5 & & & & & & & \\ \hlineX6 & & & & & & & \\ \hline\end{tabular}\bigskip\noindentDifference between log-likelihoods and likelihoods along the best path\,: \\[0.5em]\renewcommand{\arraystretch}{1.5}\begin{tabular}{|c|c|c|c|c|c|c|} \hline Sequence &\small HMM1 & \small HMM2 &\small HMM3 & \small HMM4 &\small HMM5 & \small HMM6 \\ \hlineX1 & & & & & & \\ \hlineX2 & & & & & & \\ \hlineX3 & & & & & & \\ \hlineX4 & & & & & & \\ \hlineX5 & & & & & & \\ \hlineX6 & & & & & & \\ \hline\end{tabular}\subsubsection*{Question\,:}Is the likelihood along the best path a good approximation of the reallikelihood of a sequence given a model ?\subsubsection*{Answer\,:}\expl{The values found for both likelihoods differ within an acceptableerror margin. Furthermore, using the best path likelihood does not, in mostpractical cases, modify the classification results. Finally, it alleviatesfurther the computational load since it replaces the sum or the logsum by amax in the recursive part of the procedure. Hence, the likelihood along thebest path can be considered as a good approximation of the truelikelihood.}\pagebreak%%%%%%%%%%%%%%%%%%\section{Training of HMMs}%%%%%%%%%%%%%%%%%%Decoding or aligning acoustic feature sequences requires the priorspecification of the parameters of some HMMs. As explained insection~\ref{sub:class}, these models have the role of stochastic templatesto which we compare the observations.  But how to determine templates thatrepresent efficiently the phonemes or the words that we want to model~? Thesolution is to estimate the parameters of the HMMs from a databasecontaining observation sequences, in a supervised or an unsupervised way.\subsubsection*{Questions\,:}In the previous lab session, we have learned how to estimate the parametersof Gaussian pdfs given a set of training data. Suppose that you have adatabase containing several utterances of the imaginary word /aiy/, andthat you want to train a HMM for this word. Suppose also that this databasecomes with a {\em labeling} of the data, i.e. some data structures thattell you where are the phoneme boundaries for each instance of the word.\begin{enumerate}\item Which model architecture (ergodic or left-right) would you choose~?With how many states~?  Justify your choice.\item How would you compute the parameters of the proposed HMM~?\item Suppose you didn't have the phonetic labeling ({\em unsupervisedtraining}). Propose a recursive procedure to train the model, making use ofone of the algorithms studied during the present session.\end{enumerate}\subsubsection*{Answers\,:}\expl{\begin{enumerate}\item It can be assumed that the observation sequences associated with eachdistinct phoneme obey specific densities of probability. As in the previouslab, this means that the phonetic classes are assumed to be separable byGaussian classifiers. Hence, the word /aiy/ can be assimilated to theresult of drawing samples from the pdf ${\cal N}_{/a/}$, then transiting to${\cal N}_{/i/}$ and drawing samples again, and finally transiting to${\cal N}_{/y/}$ and drawing samples. It sounds therefore reasonable tomodel the word /aiy/ by a {\em left-right} HMM with {\em three} emittingstates.%\item If we know the phonetic boundaries for each instance, we know towhich state belongs each training observation, and we can give a label(/a/, /i/ or /y/) to each feature vector. Hence, we can use the mean andvariance estimators studied in the previous lab to compute the parametersof the Gaussian density associated with each state (or each label).\tab By knowing the labels, we can also count the transitions from onestate to the following (itself or another state). By dividing thetransitions that start from a state by the total number of transitions fromthis state, we can determine the transition matrix.%\item The Viterbi procedure allows to distribute some labels on a sequenceof features. Hence, it is possible to perform unsupervised training in thefollowing way\,:%\begin{enumerate}\item Start with some arbitrary state sequences, which constitute aninitial labeling. (The initial sequences are usually made of evendistributions of phonetic labels along the length of each utterance.)\item Update the model, relying on the current labeling.\item Use the Viterbi algorithm to re-distribute some labels on thetraining examples.\item If the new distribution of labels differs from the previous one,re-iterate (go to (b)\,). One can also stop when the evolution of thelikelihood of the training data becomes asymptotic to a higher bound.\end{enumerate}%The principle of this algorithm is similar to the Viterbi-EM, used to trainthe Gaussians during the previous lab. Similarly, there exists a ``soft''version, called the {\em Baum-Welch} algorithm, where each stateparticipates to the labeling of the feature frames (this version uses theforward recursion instead of the Viterbi). The Baum-Welch algorithm is anEM algorithm specifically adapted to the training of HMMs (see \cite{RAB93}for details), and is one of the most widely used training algorithms in``real world'' speech recognition.\end{enumerate}}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% BIBLIO\bibliography{general}\bibliographystyle{alpha}\vfill%%%%%%%%%%%%%%%%%%\section*{After the lab...}%%%%%%%%%%%%%%%%%%This lab manual can be kept as additional course material. If you want tobrowse the experiments again, you can use the script\,:\noindent \com{>> lab2demo}\noindentwhich will automatically redo all the computation and plots for you.\end{document}
上一页 1 2 34
💿 文件大小 10625 K
👤 上传用户 zhhw254774338
📂 所属分类其他
🏷️ 相关标签

#dsp #教材
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -