📄 refine.tex

📁 隐马尔科夫模型工具箱
💻 TEX
📖 第 1 页 / 共 3 页
字号:
2 states from each of the \texttt{aa} models and 3 states fromeach of the others.  Moving further down the tree, the item list\begin{verbatim}     { *.state[2-4].stream[1].mix[1,3].cov }\end{verbatim}denotes the set of all covariance vectors (or matrices) of the first andthird mixturecomponents of stream 1, of states 2 to 4 of all HMMs.  Since many HMM systemsare single stream, the \texttt{stream} part of the path can be omitted if its valueis 1.  Thus, the above could have been written\begin{verbatim}     { *.state[2-4].mix[1,3].cov }\end{verbatim}These last two examples also show that indices\index{item lists!indexing} can be written as commaseparated lists as well as ranges, for example, \texttt{[1,3,4-6,9]}is a valid index list representing states 1, 3, 4, 5, 6, and 9.When item lists are used as the argument to a \texttt{TI} command\index{ti@\texttt{TI} command}, thekind of items represented by the list determines the macro type in a fairlyobvious way.  The only non-obvious cases are firstly that lists endingin \texttt{cov} generate \hmmt{v}, \hmmt{i}, \hmmt{c}, or \hmmt{x} macros asappropriate.   If an explicit set of mixture components is definedas in\begin{verbatim}     { *.state[2].mix[1-5] }\end{verbatim}then  \hmmt{m} macros are generated but omittingthe indices  altogether denotes a special case of mixture tying\index{tied-mixtures}which is explained later in Chapter~\ref{c:discmods}.To illustrate the use of item lists, some example \texttt{TI} commandscan now be given.  Firstly, when a set of context-dependent models is created, it canbe beneficial to share one transition matrix across all variantsof a phone rather than having a distinct transition matrix for each.This could be achieved by adding \texttt{TI}commands immediately after the \texttt{CL} command described inthe previous section, that is\index{tying!examples of}\begin{verbatim}    CL cdlist    TI T_ah {*-ah+*.transP}    TI T_eh {*-eh+*.transP}    TI T_ae {*-ae+*.transP}    TI T_ih {*-ih+*.transP}     ... etc\end{verbatim}As a second example, a so-called Grand Variance\index{grand variance} HMM system canbe generated very easily with the following HHEd command\begin{verbatim}     TI "gvar" { *.state[2-4].mix[1].cov }\end{verbatim}where it is assumed that the HMMs are 3-state single mixture component models.   The effectof this command is to tie all state distributions to a single global variancevector.  For applications, where there is limited training data, this techniquecan improve performance, particularly in noise.Speech recognition systems will often have distinctmodels for silence  and short pauses.  A silence model\index{silence model} \texttt{sil} may havethe normal 3 state topology whereas a short pause model may have just a single state.  To avoid the two models \textit{competing} with each other, the\texttt{sp} model state can be tied to the centre state of the \texttt{sil} modelthus\begin{verbatim}     TI "silst" { sp.state[2], sil.state[3] }\end{verbatim}So far nothing has been said about how the parameters are actuallydetermined when a set of items is replaced by a single shared representative.When states are tied, the state with the broadest  variances  and as few aspossible zero mixture component weights is selected from the pool and usedas the representative.  When mean vectors are tied, the average of all themean vectors in the pool is used and when variances are tied, the largestvariance in the the pool is used.  In all other cases, the last item in thetie-list is  arbitrarily chosen as representative.All of these selection criteria are \textit{ad hoc}, but sincethe tie operations are always followed by explicit re-estimationusing \htool{HERest}, the precise choice of representative for a tiedset is not critical.\index{tying!exemplar selection}Finally, tied parameters can beuntied.  For example,  subsequent refinements of the context-dependent model setgenerated above with tied transition matrices might result ina much more compact set of models for which individual transitionparameters could be robustly estimated.    This can be done using the \texttt{UT} command\index{ut@\texttt{UT} command} whose effect is to untie all of theitems in its argument list.  For example, the command\begin{verbatim}     UT {*-iy+*.transP}\end{verbatim}would untie the transition parameters in all variants of the \texttt{iy}phoneme.This untying works by simply making unique copies of the tied parameters.These untied parameters can then subsequently be re-estimated.\mysect{Data-Driven Clustering}{ddclust} Insection~\ref{s:mkCDHMMs}, a method of triphone construction was describedwhich involved cloning all monophones and then re-estimating them using datafor which monophone labels have been replaced by triphone labels.  This will lead to a very large set of models, and relatively littletraining data for each model.  Applying the argument that context will not greatly affectthe centre states of triphone models, one way to reduce the total number of parameters without significantly altering the models' ability to represent the differentcontextual effects might be to tie all of the centre states across allmodels derived from the same monophone.  This tying could be\index{clustering!data-driven}done by writing an edit script of the form\begin{verbatim}     TI "iyS3" {*-iy+*.state[3]}     TI "ihS3" {*-ih+*.state[3]}     TI "ehS3" {*-eh+*.state[3]}      .... etc\end{verbatim}Each \texttt{TI} command would tie all the centre states of all triphonesin each phone group. Hence, if there were an average of 100 triphonesper phone group then the total number of states per groupwould be reduced from300 to 201.Explicit tyings such as these can have some positive effect but overall they are not very satisfactory.  Tying all centre states is too severe and worsestill, the problem of undertraining for the left and right states remains.A much better approach is to use clustering to decide which states totie.  \htool{HHEd} provides two mechanisms for this.  In this sectiona data-driven clustering approach will be described and inthe next section, an alternative decision tree-based approach is presented.Data-driven clustering is performed by the \index{model training!clustering}\texttt{TC}\index{tc@\texttt{TC} command} and \texttt{NC}\index{nc@\texttt{NC} command}commands.  These both invoke the same top-down hierarchicalprocedure.  Initially all states are placed in individualclusters.  The pair of clusters which when combined would form the smallestresultant cluster are merged.  This process repeats until either thesize of the largestcluster reaches the threshold set by the \texttt{TC} command orthe total number of clusters has fallen to thatspecified by by the \texttt{NC} command.  The size of clusteris defined as the greatest distance between any two states.The distance metric depends on the type of state distribution.For single Gaussians, a weighted Euclidean distance between the meansis used and for tied-mixture systems a  Euclidean distance between themixture weights is used.  For all other cases, the average probabilityof each component mean with respect to the other state is used.The details of the algorithm and these metrics are given in the referencesection for \htool{HHEd}.\centrefig{tiedstate}{100}{Data-driven state tying}As an example, the following \htool{HHEd} script would cluster and tie thecorresponding states of the triphone group for the phone \texttt{ih}\begin{verbatim}     TC 100.0 "ihS2" {*-ih+*.state[2]}     TC 100.0 "ihS3" {*-ih+*.state[3]}     TC 100.0 "ihS4" {*-ih+*.state[4]}\end{verbatim}In this example, each \texttt{TC} command performs clustering on the specifiedset of states, each cluster is  tied and output as a macro.  The macro nameis generated by appending the cluster index tothe macro  name given in the command.   The effect of this command is illustrated in Fig.~\href{f:tiedstate}.  Note that if a word-internaltriphone system is being built, it is sensible to include biphones as wellas triphones in the item list, for example, the first command above wouldbe written as\begin{verbatim}     TC 100.0 "ihS2" {(*-ih,ih+*,*-ih+*).state[2]}\end{verbatim}If the above \texttt{TC} commands are repeated for all phones, the resulting  set oftied-state models will have farfewer parameters in total than the original untied set.  The numeric argumentimmediately following the \texttt{TC} command name is the cluster threshold.  Increasingthis value will allow larger and hence, fewer clusters. The aim, ofcourse, is to strike the right balance between compactness and the acousticaccuracy of the individual models.  In practice, the use of this commandrequires some experimentation to find a good threshold value. \htool{HHEd} providesextensive trace  output for monitoring clustering operations.  Note in thisrespect that as well as setting tracing from the command line and theconfiguration file, tracing in \htool{HHEd} can be set by the \texttt{TR} command. Thus,  tracing can be controlled at the command level. Further traceinformation can be obtained by including the \texttt{SH} command\index{sh@\texttt{SH} command} at strategicpoints in the edit script.  The effect of executing this command is to listout all of the parameter tyings currently in force.A potential problem with the use of the \texttt{TC} and \texttt{NC} commands isthat {\it outlier} states will tend to form their own singleton clusters\index{singleton clusters} forwhich there is then insufficient data to properly train.  One solution tothis is to use the \texttt{RO} command\index{ro@\texttt{RO} command} to remove outliers\index{removing outliers}.  This commmand hasthe form\begin{verbatim}     RO thresh "statsfile"\end{verbatim}where \texttt{statsfile} is the name of a statistics file\index{statisticsfile} output using the\texttt{-s} option of \htool{HERest}.  This statistics file holds the {\em occupation counts} for all states of the HMM set being trained.  The term {\em occupation count} refers to the number of frames allocated to aparticular state and can be used as a measure of how much training data isavailable for estimating the parameters of that state.  The \texttt{RO} command must be executed {\it before} the \texttt{TC} or\texttt{NC} commands used to do the actual clustering. Its effect is to simplyread in the statistics information from the given file and then to set a flaginstructing the\texttt{TC} or \texttt{NC} commands to remove any outliers remaining at the conclusionof the normal clustering process.  This is done by repeatedly finding thecluster with the smallest total occupation count and merging it with itsnearest neighbour. This process is repeated until all clusters have a totaloccupation count which exceeds \texttt{thresh}, thereby ensuring that everycluster of states will be properly trained in the subsequent re-estimationperformed by \htool{HERest}.\index{state tying}On completion of the above clustering and tying procedures, many of the modelsmay be effectively identical, since acoustically similar triphones may sharecommon clusters for all their emitting states.  They are then, in effect,so-called {\it generalised triphones}.\index{generalised triphones} State tyingcan be further exploited if the HMMs which are effectively equivalent areidentified and then tied via the physical-logical mapping\footnote{The physicalHMM which corresponding to several logical HMMs will be arbitrarily named afterone of them.} facility provided by HMM lists (see section~\ref{s:hmmsets}). Theeffect of this would be to reduce the total number of HMM definitions required.\htool{HHEd} provides a compaction command to do all of this automatically.For example, the command\begin{verbatim}     CO newList \end{verbatim}\index{co@\texttt{CO} command}will compact\index{model training!compacting} the currently loaded HMM set by identifying equivalent modelsand then tying them via the new HMM list output  to the file \texttt{newList}.  Note, however, that for two HMMs to be tied, theymust be identical in all respects.This is one of the reasons why transition parameters are often tiedacross triphone groups otherwise HMMs with identical states would stillbe left distinct due to minor differences in their transition matrices.\mysect{Tree-Based Clustering}{tbclust}\index{clustering!tree-based}One limitation of the data-driven clustering procedure described above isthat it does not deal with triphones for which there are no examples in thetraining data.  When building word-internal triphone systems,  this problem can oftenbe avoided by careful design of the training database but when building largevocabulary cross-word triphone systems \textit{unseen} triphones are unavoidable.\index{unseen triphones}\centrefig{qstree}{100}{Decision tree-based state tying}\htool{HHEd} provides an alternative decision tree based clustering\index{decision tree-based clustering} mechanismwhich provides a similar quality of clustering but offers a solution to the unseen triphone problem.  Decision tree-based clustering is invokedby the command \texttt{TB} which is analogous to the \texttt{TC} commanddescribed above and has an identical form, that is\begin{verbatim}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -