📄 refine.tex

📁 隐马尔科夫模型工具箱
💻 TEX
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
    TB thresh macroname itemlist\end{verbatim}Apart from the clustering mechanism, there are some other differences between\texttt{TC} and \texttt{TB}.  Firstly, \texttt{TC} uses a distance metric betweenstates whereas \texttt{TB} uses a log likelihood criterion.  Thus, the thresholdvalues are not directly comparable.  Furthermore, \texttt{TC} supports any typeof output distribution whereas \texttt{TB} only supports single-Gaussiancontinuous density output distributions.Secondly, although the following describes only state clustering, the \texttt{TB} command\index{tb@\texttt{TB} command} can also be used to cluster wholemodels.A phonetic decision tree is a binary tree in which a yes/no phonetic question\index{phonetic questions}is attached to each node.  Initially all states in a given item list (typically a specific phone state position)are placed at theroot node of a tree. Depending on each answer, the pool of states issuccessively split and this continues until the states have trickleddown to leaf-nodes.  All states in the same leaf node are then tied. For example, Fig~\href{f:qstree} illustrates the case of tying the centrestates of all triphones of the phone /aw/ (as in ``out'').  All of the states trickledown the tree and depending on the answer to the questions, they end upat one of the shaded terminal nodes.   For example, in the illustratedcase, the centre state of \texttt{s-aw+n} would join the second leafnode from the right since its right context is a central consonant,and its rightcontext is a nasal  but its left context is not a central stop.The question at each node is chosen to (locally) maximise the likelihoodof the training data given the final set of state tyings. Before any tree building can take place, all of the possible phoneticquestions must be loaded into \htool{HHEd} using \texttt{QS} commands\index{qs@\texttt{QS} command}.  Eachquestion takes the form ``Is the left or right context in the set P?'' where thecontext is the model context as defined by its logical name.  The set P isrepresented by an item list andfor convenience every question is given a name.  As an example, thefollowing command \begin{verbatim}    QS "L_Nasal" { ng-*,n-*,m-* }\end{verbatim}defines the question ``Is the left context a nasal?''.It is possible to calculate the log likelihood of the trainingdata given any pool of states (or models).  Furthermore, this can be done without reference to thetraining data itself since for single Gaussian distributions the means, variancesand state occupation counts (input via a stats file) form sufficient statistics.Splitting any pool into two will increase the log likelihood since it provides twiceas many parameters to model the same amount of data.  The increase obtained wheneach possible question is used can thus be calculated and the question selectedwhich gives the biggest improvement.  Trees are therefore built using a top-down sequential optimisation process.Initially all states (or models) are placed in a single cluster at the rootof the tree.  The question is then found which gives the best split of the rootnode.  This process is repeated until the increase in log likelihood fallsbelow the threshold specified in the \texttt{TB} command.As a final stage, the decrease in log likelihood is calculated for mergingterminal nodes with differing parents.  Any pair of nodes for which thisdecrease is less than the threshold used to stop splitting are then merged.\index{tree optimisation}As with the \texttt{TC} command, it is useful to prevent the creation ofclusters with very little associated training data.  The \texttt{RO} commandcan therefore be used in tree clustering as well as in data-driven clustering.When used with trees, any split which would result in a total occupation countfalling below the value specified is prohibited.  Note that the \texttt{RO}command can also be used to load the required stats file.  Alternatively,the stats file can be loaded using the \texttt{LS} command\index{ls@\texttt{LS} command}.As with data-driven clustering, using the trace facilities provided by\htool{HHEd} is recommended for monitoring and setting the appropriate thresholds.Basic tracing provides the following summary data for each tree\begin{verbatim}    TB 350.00 aw_s3 {}     Tree based clustering      Start  aw[3] : 28  have  LogL=-86.899 occ=864.2      Via    aw[3] : 5   gives LogL=-84.421 occ=864.2      End    aw[3] : 5   gives LogL=-84.421 occ=864.2    TB: Stats 28->5 [17.9%]  { 4537->285 [6.3%] total }\end{verbatim}This example corresponds to the case illustrated in Fig~\href{f:qstree}.The \texttt{TB}command has been invoked with a  threshold of 350.0 to clusterthe centre states of the triphones of the phone \textit{aw}.At the start of clustering with all 28 states in a single pool, the averagelog likelihood per unit of occupation is -86.9 and on completion with5 clusters this has increased to -84.4.  The middle line labelled ``via'' givesthe position after the tree has been built but before terminal nodes have beenmerged (none were merged in this case).  The last line summarises the overallposition.  After building this tree, a total of 4537 states were reducedto 285 clusters.\index{clustering!tracing in}As noted at the start of this section, an important advantage of tree-based clusteringis that it allows triphone models which have no training data to be synthesised.This is done in \htool{HHEd} using the \texttt{AU} command\index{au@\texttt{AU} command} which has the form\begin{verbatim}    AU hmmlist\end{verbatim}Its effect is to scan the given \texttt{hmmlist} and any physical models listedwhich are not in the currently loaded set are synthesised.  This is done by descending the previously constructed trees for that phone and answering thequestions at each node based on the new unseen context.  When each leaf node isreached, the state representing that cluster is used for the corresponding statein the unseen triphone\index{unseen triphones!synthesising}.The \texttt{AU} command can be used within the same edit script as the tree buildingcommands.  However, it will often be the case that a new set of triphones is neededat a later date, perhaps as a result of vocabulary changes.  To make this possible,a complete set of trees can be saved using the \texttt{ST} command\index{st@\texttt{ST} command} and then laterreloaded using the \texttt{LT} command\index{lt@\texttt{LT} command}.\index{decision trees!loading and storing}\mysect{Mixture Incrementing}{upmix}When building sub-word based continuous density systems, the final system will typically consist of multiple mixture componentcontext-dependent HMMs.  However, as indicated previously, the earlystages of triphone construction, particularly state tying, are best donewith single Gaussian models.  Indeed, if tree-based clustering is to beused there is no option.\index{mixture incrementing}\index{up-mixing}In \HTK\ therefore, the conversion from single Gaussian HMMs to multiplemixture component HMMs is usually one of the final steps in buildinga system.  The mechanism provided to do this is the \htool{HHEd} \texttt{MU} commandwhich will increase the number of components in a mixture by a process called \textit{mixture splitting}.This approach to building a multiplemixture component system is extremelyflexible since it allows the number of mixture components to be repeatedly increaseduntil the desired level of performance is achieved.\index{mixture splitting}The \texttt{MU} command\index{mu@\texttt{MU} command} has the form\begin{verbatim}     MU n itemList\end{verbatim}where \texttt{n} gives the new number of mixture components required and\texttt{itemList} defines the actual mixture distributions to modify.  Thiscommand works by repeatedly splitting the mixture with the largest mixtureweight until the required number of components is obtained.  The actualsplit is performed by copying the mixture, dividing the weights of bothcopies by 2, and finally perturbing the means by plus or minus 0.2 standarddeviations.For example, the command\begin{verbatim}     MU 3 {aa.state[2].mix}\end{verbatim}would increase the number of mixture components in the output distributionfor state 2 of model \texttt{aa} to 3.   Normally, however,  the number ofcomponents in all mixture distributions will be increased at the same time.Hence, a command of the form is more usual\begin{verbatim}     MU 3 {*.state[2-4].mix}\end{verbatim}It is usually a good idea to increment mixture components instages, for example, by incrementing by 1 or 2 then re-estimating, thenincrementing by 1 or 2 again and re-estimating, and so on until therequired number of components are obtained.  This also allows recognition performanceto be monitored to find the optimum.One final point with regard to multiple mixture component distributions is thatall \HTK\ tools ignore mixture components whose weights fall below a threshold valuecalled \texttt{MINMIX} (defined in \texttt{HModel.h}).  Such mixture componentsare called {\it defunct}.  Defunct mixture components can beprevented by setting the \texttt{-w} option in \htool{HERest} so that all mixtureweights are floored to some level above \texttt{MINMIX}\index{minmix@\texttt{MINMIX}}.  If mixture weights\index{mixture weight floor}are allowed to fall below \texttt{MINMIX} then the corresponding Gaussianparameters will not be written out when the model containing that componentis saved.  It is possible to recover from this, however, since the \texttt{MU} commandwill replace defunct mixtures\index{defunct mixtures} before performing any requested mixturecomponent increment.\mysect{Regression Class Tree Construction}{hhedregtree}In order to perform most model adaptation tasks (seechapter~\ref{c:Adapt}), it will be neccesary to produce a binary regressionclass tree\index{adaptation!regression tree}. This tree is stored in the MMF, along with a regressionbase class identifier for each mixture component. An exampleregression tree and how it may be used is shown insubsection~\ref{s:reg_classes}. \htool{HHEd} provides the means toconstruct a regression class tree for a given MMF, and is invokedusing the \texttt{RC} command\index{rc@\texttt{RC} command}. It is also necessary to supply astatistics file, which is output using the \texttt{-s} option of\texttt{HERest}. The statistics file can be loaded by invoking the\texttt{LS}\index{ls@\texttt{LS} command} command.A centroid-splitting algorithm using a Euclidean distance measure isused to grow the binary regression class tree to cluster the modelset's  mixture components.  Each leaf node therefore specifies a particular mixture component cluster. This algorithm proceedsas follows until the requested number of terminals has been achieved.\begin{itemize}\item Select a terminal node that is to be split.\item Calculated the mean and variance from the mixture components clusteredat this node.\item Create two children. Initialise their means to the parent mean perturbed in opposite directions (for each child) by a fraction of the variance.\item For each component at the parent node assign the component to one of the children by using a Euclidean distance measure to ascertain which child mean the component is closest to.\item Once all the components have been assigned, calculate the newmeans for the children, based on the component assignments.\item Keep re-assigning components to the children and re-estimatingthe child means until there is no change in assignments from oneiteration to the next. Now finalise the split.\end{itemize}As an example, the following \htool{HHEd} script would produce a regression class tree with 32 terminal nodes, or regression baseclasses:-\begin{verbatim}     LS "statsfile"     RC 32 "rtree"\end{verbatim}A further optional argument is possible with the \texttt{RC} command. This argument allows the user to specify the non-speech classmixture components using an \texttt{itemlist}, such as the silence mixture components. \begin{verbatim}     LS "statsfile"     RC 32 "rtree" {sil.state[2-4].mix}\end{verbatim}In this case the first split that will be made in the regression class treewill be to split the speech and non-speech sounds, after which thetree building continues as usual. \mysect{Miscellaneous Operations}{misedit}The preceding sections have described the main \htool{HHEd} commands used forbuilding continuous density systems with tied parameters.  A further groupof commands (\texttt{JO}, \texttt{TI} and \texttt{HK}) are used to buildtied-mixture systems and these are described in Chapter~\ref{c:discmods}.Thoseremaining cover a miscellany of functions.  They are documented in thereference entry for \htool{HHEd} and include commands to add and removestate transitions\index{state transitions!adding/removing} (\texttt{AT}\index{at@\texttt{AT} command},\texttt{RT}\index{rt@\texttt{RT} command}); synthesise triphones frombiphones (\texttt{MT}\index{mt@\texttt{MT} command}); change the parameter kind of a HMM (\texttt{SK}\index{sk@\texttt{SK} command});modify stream dimensions (\texttt{SS}\index{ss@\texttt{SS} command},\texttt{SU}\index{su@\texttt{SU} command},\texttt{SW}\index{sw@\texttt{SW} command}); change/add an identifier nameto an MMF (\texttt{RN}\index{rn@\texttt{RN}} command); and expandHMM sets by duplication, for example, as needed in making genderdependent models (\texttt{DP}\index{dp@\texttt{DP} command}).%%% Local Variables: %%% mode: latex%%% TeX-master: "htkbook"%%% End:
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -