⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 .#htkoview.tex.1.3

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 3
📖 第 1 页 / 共 2 页
字号:
The philosophy of system construction in \HTK\  is that HMMs should be\index{HMM!build philosophy}refined incrementally.  Thus, a typical progression is to start with asimple set of single Gaussian context-independent phone models and theniteratively refine them by expanding them to include context-dependency and use multiple mixture component Gaussian distributions.The tool \htool{HHEd}\index{hhed@\htool{HHEd}} is a HMM definition editor which will clone models\index{HMM!editor}into context-dependent sets, apply a variety of parameter tyingsand increment the number of mixture components in specified distributions.The usual process is to modify a set of HMMs in stages using \htool{HHEd}and then re-estimate the parameters of the modified set using \htool{HERest}after each stage.  To improve performance for specific speakers the tools \htool{HERest}\index{herest@\htool{HERest}} and \htool{HVite}\index{hvite@\htool{HVite}} can be used to adapt HMMs to better model the characteristics of particular speakers using a small amount of training or adaptation data.The end result of which is a speaker adapted system.The single biggest problem in building context-dependent HMM systems is always datainsufficiency.  The more complex the model set, the more data is needed to make robust estimates of its parameters, and since data isusually limited, a balance must be struck between complexity and the available data.For continuous density systems, this balance is achieved by tying parameters together as mentioned above.  Parameter tyingallows data to be pooled so that the shared parameters can be robustlyestimated. In addition to continuous density systems, \HTK\ also supportsfully tied mixture systems and discrete probability systems.  In thesecases, the data insufficiency problem is usually addressed by smoothingthe distributions and the tool \htool{HSmooth}\index{hsmooth@\htool{HSmooth}} is used for this.\subsection{Recognition Tools}\HTK\ provides a single recognition tool\index{recognition!tools} called \htool{HVite}\index{hvite@\htool{HVite}}which uses the token passing algorithm described in the previouschapter to perform Viterbi-based speech recognition.  \htool{HVite}takes as input a network describing the allowable word sequences,a dictionary defining how each word is pronounced and a set of HMMs.It operates by converting the word network to a phone network andthen attaching the appropriate HMM definition to each phone instance.Recognition can then be performed on either a list of stored speechfiles or on direct audio input.  As noted at the end of the lastchapter, \htool{HVite} can support cross-word triphones and it canrun with multiple tokens to generatelattices containing multiple hypotheses.  It can also be configuredto rescore lattices and perform forced alignments.The word networks needed to drive \htool{HVite} are usually eithersimple word loops in which any word can follow any other word or theyare directed graphs representing a finite-state task grammar.  In theformer case, bigram probabilities are normally attached to the wordtransitions.  Word networks are stored usingthe \HTK\ standard lattice format\index{lattice!format}.  This is a text-based format and\index{standard lattice format}\index{SLF}hence word networks can be created directly using a text-editor.However, this is rather tedious and hence \HTK\ provides twotools to assist in creating word networks.  Firstly, \htool{HBuild} allows sub-networks to be created and used within higher level networks.Hence, although the same low level notation is used, much duplicationis avoided.  Also, \htool{HBuild}\index{hbuild@\htool{HBuild}} can be used to generate word loopsand it can also read in a backed-off bigram language model and modify the word loop transitions to incorporate the bigram probabilities.Note that the label statistics tool \htool{HLStats} mentioned earliercan be used to generate a backed-off bigram language model.As an alternative to specifying a word network directly, a higherlevel grammar notation can be used.  This notation is based onthe Extended Backus Naur Form (EBNF\index{EBNF}) used in compiler specification andit is compatible with the grammar specification language used in earlier versions of \HTK.  The tool \htool{HParse}\index{hparse@\htool{HParse}} is suppliedto convert this notation into the equivalent word network.Whichever method is chosen to generate a word network, it is usefulto be able to see examples of the \textit{language} that it defines.The tool \htool{HSGen}\index{hsgen@\htool{HSGen}} is provided to do this.  It takes as inputa network and then randomly traverses the network outputting wordstrings.  These strings can then be inspected to ensure that theycorrespond to what is required.  \htool{HSGen} can also computethe empirical perplexity of the task.Finally, the construction of large dictionaries can involve mergingseveral sources and performing a variety of transformations on eachsources.  The dictionary management tool \htool{HDMan}\index{hdman@\htool{HDMan}} is suppliedto assist with this process.\subsection{Analysis Tool}\index{results analysis}Once the HMM-based recogniser has been built, it is necessaryto evaluate its performance.  This is usually done by using itto transcribe some pre-recorded test sentences and match therecogniser output with the correct reference transcriptions.This comparison is performed by a tool called \htool{HResults} which uses dynamic programming to align the two transcriptionsand then count substitution, deletion and insertion errors.Options are provided to ensure that thealgorithms and output formats used by \htool{HResults}\index{hresults@\htool{HResults}} are compatiblewith those used by the US National Institute of Standards and Technology(NIST).As well as global performance measures,\htool{HResults} can also provide speaker-by-speaker breakdowns,confusion matrices and time-aligned transcriptions.  For word spottingapplications, it can also compute \textit{Figure of Merit} (FOM) scoresand \textit{Receiver Operating Curve} (ROC) information.\index{NIST}\index{FOM}\index{Figure of Merit}\mysect{Whats New In Version 3.3}{whatsnew}This \index{new features!in Version 3.3} section lists the newfeatures in \HTK\ Version 3.3 compared to the preceding Version~3.3.\begin{enumerate}\item \htool{HERest} now incorporates the adaptation transform generation  that was previously performed in \htool{HEAdapt}. The range of linear  transformations and the ability to combine transforms hierarchically  has now been included. The system also now supports adaptive training  with contrained MLLR transforms.\item Many other smaller changes and bug fixes have been integrated.\end{enumerate}\mysubsect{New In Version 3.2}{}This \index{new features!in Version 3.2} section lists the newfeatures in \HTK\ Version 3.2 compared to the preceding Version~3.1.\begin{enumerate}\item The \htool{HLM} toolkit has been incorporated into HTK. It  supports the training and testing of word or class-based n-gram  language models. \item \htool{HPARM} supports global feature space transforms.\item \htool{HPARM} now supports third differentials  ($\Delta\Delta\Delta$ parameters).\item A new tool named \htool{HLRescore} offers support for a number  of lattice post-processing operations such as lattice pruning,  finding the 1-best path in a lattice and language model expansion of  lattices.\item \htool{HERest} supports 2-model re-estimation which allows the  use of a separate alignment model set in the Baum-Welch  re-estimation.\item The initialisation of the decision-tree state clustering in  \htool{HHEd} has been improved.\item \htool{HHEd} supports a number of new commands related to  variance flooring and decreasing the number of mixtures.\item A major bug in the estimation of block-diagonal MLLR transforms  has been fixed.\item Many other smaller changes and bug fixes have been integrated.\end{enumerate}\subsection{New In Version 3.1}{}This \index{new features!in Version 3.1} section lists the newfeatures in \HTK\ Version 3.1 compared to the preceding Version~3.0which was functionally equivalent to Version~2.2.\begin{enumerate}\item \htool{HPARM} supports Perceptual Linear Prediction (PLP) feature  extraction.\item \htool{HPARM} supports Vocal Tract Length Normalisation (VTLN)  by warping the frequency axis in the filterbank analysis.\item \htool{HPARM} supports variance scaling.\item \htool{HPARM} supports cluster-based cepstral mean and variance  normalisation. \item All tools support an extended filename syntax that can be used  to deal with unsegmented data more easily.\end{enumerate}\subsection{New In Version 2.2}{}This section lists the new features and refinements in \HTK\ Version2.2 compared to the preceding Version 2.1. \begin{enumerate}\item Speaker adaptation is now supported via the \htool{HEAdapt} and \htool{HVite} tools, which adapt a current set of models to a new speaker and/orenvironment.\begin{itemize}\item \htool{HEAdapt} performs offline supervised adaptation using maximum likelihood linear regression (MLLR) and/or maximum a-posteriori (MAP) adaptation.\item \htool{HVite} performs unsupervised adaptation using just MLLR.\end{itemize}Both tools can be used in a static mode, where all thedata is presented prior to any adaptation, or in an incremental fashion.\item Improved support for PC WAV files\\In addition to 16-bit PCM linear, HTK can now read  \begin{itemize}\item 8-bit CCITT mu-law\item 8-bit CCITT a-law\item 8-bit PCM linear\end{itemize}\end{enumerate}\subsection{Features Added To Version 2.1}{}For the benefit of users of earlier versions of \HTK\, this \index{new features!in Version 2.1} section lists the main changes in \HTK\ Version 2.1 compared to the preceding Version 2.0.\begin{enumerate}\item The speech input handling has been partially re-designed and a new energy-based speech/silence detector has been incorporated into \htool{HParm}.The detector is robust yet flexible and can be configured through a number ofconfiguration variables. Speech/silence detection can now be performed onwaveform files. The calibration of speech/silence detector parameters is nowaccomplished by asking the user to speak an arbitrary sentence.\item \htool{HParm} now allows random noise signal to be added to waveformdata via the configuration parameter \texttt{ADDDITHER}. This preventsnumerical overflows which can occur with artificially created waveform dataunder some coding schemes.\item \htool{HNet} has been optimised for more efficient operation whenperforming forced alignments of utterances using \htool{HVite}. Furthernetwork optimisations tailored to biphone/triphone-based phone recognition have also been incorporated.\item \htool{HVite} can now produce partial recognition hypothesis even when no tokens survive to the end of the network. This is accomplished by settingthe \htool{HRec} configuration parameter \texttt{FORCEOUT} to true.\item Dictionary support has been extended to allow pronunciation probabilitiesto be associated with different pronunciations of the same word. At the sametime, \htool{HVite} now allows the use of a pronunciation scale factor duringrecognition.\item \HTK\ now provides consistent support for reading and writing of \HTK\ binary files (waveforms, binary MMFs, binary SLFs, \htool{HERest} accumulators)across different machine architectures incorporating automatic byte swapping.By default, all binary data files handled by the tools are now written/read inbig-endian (\texttt{NONVAX}) byte order. The default behavior can be changedvia the configuration parameters \texttt{NATURALREADORDER} and\texttt{NATURALWRITEORDER}.\item \htool{HWave} supports the reading of waveforms in Microsoft WAVE fileformat.\item \htool{HAudio} allows key-press control of live audio input.\end{enumerate}%%% Local Variables: %%% mode: latex%%% TeX-master: "htkbook"%%% End: 

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -