📄 .#htkoview.tex.1.3
字号:
%/* ----------------------------------------------------------- */%/* */%/* ___ */%/* |_| | |_/ SPEECH */%/* | | | | \ RECOGNITION */%/* ========= SOFTWARE */ %/* */%/* */%/* ----------------------------------------------------------- */%/* developed at: */%/* */%/* Speech Vision and Robotics group */%/* Cambridge University Engineering Department */%/* http://svr-www.eng.cam.ac.uk/ */%/* */%/* Entropic Cambridge Research Laboratory */%/* (now part of Microsoft) */%/* */%/* ----------------------------------------------------------- */%/* Copyright: Microsoft Corporation */%/* 1995-2000 Redmond, Washington USA */%/* http://www.microsoft.com */%/* */%/* 2001-2002 Cambridge University */%/* Engineering Department */%/* */%/* Use of this software is governed by a License Agreement */%/* ** See the file License for the Conditions of Use ** */%/* ** This banner notice must not be removed ** */%/* */%/* ----------------------------------------------------------- */%% HTKBook - Steve Young 1/12/97%\mychap{An Overview of the \HTK\ Toolkit}{htkoview}\sidepic{toolkit}{50}{}The basic principles of HMM-based recognition were outlined inthe previous chapter and a number of the key \HTK\ tools have alreadybeen mentioned. This chapter describes the software architectureof a \HTK\ tool. It then gives a brief outline of all the\HTK\ tools and the way that they are used together to constructand test HMM-based recognisers. For the benefit of existing \HTK\ users,the major changes in recent versions of \HTK\ are listed.The following chapter will then illustratethe use of the \HTK\ toolkit by working through a practical example of building a simple continuous speech recognition system.\mysect{\HTK\ Software Architecture}{softarch}Much of the functionality of \HTK\ is built into the library modules. These modules ensure that every tool interfaces to the outside world in exactly the same way. They also provide a central resource of commonly used functions. Fig.~\href{f:softarch} illustrates the software\index{software architecture}structure of a typical \HTK\ tool and shows its input/output interfaces.User input/output and interaction with the operating system is controlled by the library module \htool{HShell}\index{hshell@\htool{HShell}} and all memory managementis controlled by \htool{HMem}\index{hmem@\htool{HMem}}. Math support is provided by \htool{HMath}\index{hmath@\htool{HMath}}and the signal processing operations needed for speech analysis arein \htool{HSigP}\index{hsigp@\htool{HSigP}}.\index{library modules}Each of the file types required by \HTK\ has a dedicatedinterface module. \htool{HLabel}\index{hlabel@\htool{HLabel}} provides the interface for label files,\htool{HLM}\index{hlm@\htool{HLM}} for language model files,\htool{HNet}\index{hnet@\htool{HNet}} for networks and lattices,\htool{HDict}\index{hdict@\htool{HDict}} for dictionaries,\htool{HVQ}\index{hvq@\htool{HVQ}} for VQ codebooks and\htool{HModel}\index{hmodel@\htool{HModel}} for HMM definitions.\sidefig{softarch}{75}{Software Architecture}{-4}{All speech input and output at the waveform levelis via \htool{HWave} and at the parameterised level via \htool{HParm}.As well as providing a consistent interface, \htool{HWave} and \htool{HLabel}support multiple file formats allowing data to be importedfrom other systems. Direct audio input is supported by \htool{HAudio}and simple interactive graphics is provided by \htool{HGraf}. \htool{HUtil} provides a number of utility routines for manipulatingHMMs while \htool{HTrain} and \htool{HFB} contain support for the various \HTK\ training tools. \htool{HAdapt} provides supportfor the various \HTK\ adaptation tools.Finally, \htool{HRec} contains the main recognition processing functions.}\index{haudio@\htool{HAudio}}\index{hrec@\htool{HRec}}\index{hutil@\htool{HUtil}}\index{hwave@\htool{HWave}}\index{hparm@\htool{HParm}}\index{hgraf@\htool{HGraf}}\index{htrain@\htool{HTrain}}As noted in the next section, fine control over the behaviour of these library modulesis provided by setting configuration variables\index{configuration variables}. Detailed descriptionsof the functions provided by the library modules are given in the secondpart of this book and the relevant configuration variables are describedas they arise. For reference purposes, a complete list is given inchapter~\ref{c:confvars}.\mysect{Generic Properties of a HTK Tool}{genprops}\HTK\ tools are designed to run with a traditional command-line style interface.Each tool\index{command line!options}has a number of required arguments plus optional arguments.The latter are always prefixed by a minus sign. As an example,the following command would invoke the mythical \HTK\ tool called\htool{HFoo}\begin{verbatim} HFoo -T 1 -f 34.3 -a -s myfile file1 file2\end{verbatim}This tool has two main arguments called \texttt{file1} and\texttt{file2} plus four optional arguments. Optionsare always introduced by a single letter option name followedwhere appropriate by the option value. The option valueis always separated from the option name by a space. Thus, the value of the\texttt{-f} option is a real number, the value of the\texttt{-T} option is an integer number and the value of the\texttt{-s} option is a string. The \texttt{-a} option has no followingvalue and it is used as a simple flag to enable or disable somefeature of the tool. Options whose names are a capital letterhave the same meaning across all tools. For example, the \texttt{-T}option is always used to control the trace output of a \HTK\ tool.In addition to command line arguments, the operation of a toolcan be controlled by parameters stored in a configuration file\index{configuration files}.For example, if the command \begin{verbatim} HFoo -C config -f 34.3 -a -s myfile file1 file2\end{verbatim}is executed, the tool \htool{HFoo} will load the parameters stored in the configurationfile \texttt{config} during its initialisation procedures. Multipleconfiguration files can be specified by repeating the \verb|-C|option, e.g.\ \begin{verbatim} HFoo -C config1 -C config2 -f 34.3 -a -s myfile file1 file2\end{verbatim}Configuration parameters can sometimes be used as analternative to using command line arguments. For example, traceoptions can always be set within a configuration file. However, themain use of configuration files is to control the detailed behaviourof the library modules on which all \HTK\ tools depend.Although this style of command-line working may seem old-fashioned when compared to moderngraphical user interfaces, it has many advantages. In particular,it makes it simple to write shell scripts to control \HTK\ tool execution. Thisis vital for performing large-scale system building and experimentation.Furthermore, defining all operations using text-based commands allowsthe details of system construction or experimental procedure tobe recorded and documented.Finally, note that a summary of the command line andoptions for any \HTK\ tool can be obtained simply by executingthe tool with no arguments.\mysect{The Toolkit}{toolkit}The \HTK\ tools are best introduced by going through theprocessing steps involved in building a sub-word based continuous speech recogniser. As shown in Fig.~\href{f:sysoview}, there are 4main phases: data preparation, training, testing and analysis.\subsection{Data Preparation Tools}\index{data preparation}In order to build a set of HMMs, a set of speech data filesand their associated transcriptions are required. Very oftenspeech data will be obtained from database archives, typicallyon CD-ROMs. Before it can be used in training, it must be converted into the appropriate parametric form and any associatedtranscriptions must be converted to have the correct formatand use the required phone or word labels. If the speech needs to berecorded, then the tool \htool{HSLab}\index{hslab@\htool{HSLab}} can be used both to record thespeech and to manually annotate it with any required transcriptions.Although all \HTK\ tools can parameterise waveforms \textit{on-the-fly}, in practiceit is usually better toparameterise the data just once. The tool \htool{HCopy}\index{hcopy@\htool{HCopy}}is used for this. As the name suggests, \htool{HCopy} is used to copy oneor more source files to an output file. Normally, \htool{HCopy} copies the whole file, but a varietyof mechanisms are provided for extracting segments of files and concatenatingfiles. By setting the appropriate configuration variables, all input filescan be converted to parametric form as they are read-in.Thus, simply copying each file in this manner performs the required encoding.The tool \htool{HList}\index{hlist@\htool{HList}} can be used to check the contents of any speech fileand since it can also convert input on-the-fly, it can be used to checkthe results of any conversions before processing large quantities of data.Transcriptions will also need preparing. Typically the labels used in theoriginal source transcriptions will not be exactly as required, for example,because of differences in the phone sets used. Also, HMM training mightrequire the labels to be context-dependent. The tool \htool{HLEd}\index{hled@\htool{HLEd}} isa script-driven label editor which is designed to make the required transformationsto label files. \htool{HLEd} can also output files to a single \textit{Master LabelFile} MLF which is usually more convenient for subsequent processing.Finally on data preparation, \htool{HLStats}\index{hlstats@\htool{HLStats}} can gather and display statisticson label files and where required, \htool{HQuant}\index{hquant@\htool{HQuant}} can be used to build aVQ codebook in preparation for building discrete probability HMM system.\subsection{Training Tools}The second step of system building is to\index{training tools}define the topology required for each HMM by writing a prototype definition.\HTK\ allows HMMs to be built with any desired topology.HMM definitions can be stored externally as simple text files andhence it is possible to edit them with any convenient texteditor. Alternatively, the standard \HTK\ distribution includesa number of example HMM prototypes and a script to generatethe most common topologies automatically.With the exception of the transitionprobabilities, all of the HMM parameters given in the prototype definition\index{prototype definition}are ignored. The purpose of the prototype definition is onlyto specify the overall characteristics and topology of the HMM. Theactual parameters will be computed later by the training tools. Sensible values forthe transition probabilities must be given but the trainingprocess is very insensitive to these. An acceptable and simple strategyfor choosing these probabilities is to make all of the transitionsout of any state equally likely.\centrefig{sysoview}{100}{\HTK\ Processing Stages}The actual training process takes place in stages and it isillustrated in more detail in Fig.~\href{f:tsubword}. Firstly, an initial set of models must be created. If there issome speech data available for which the location of the sub-word (i.e.\ phone)boundaries have been marked, then this can be used as \textit{bootstrap data}.In this case, the tools \htool{HInit}\index{hinit@\htool{HInit}} and \htool{HRest}\index{hrest@\htool{HRest}}provide {\it isolated word} styletraining using the fully labelled bootstrap\index{bootstrapping} data. Each of the requiredHMMs is generated individually. \htool{HInit} reads in all of the bootstraptraining data and {\it cuts out} all of the examples of the requiredphone. It then iteratively computes an initial set of parameter valuesusing a {\it segmental k-means} procedure\index{segmental k-means}. On the first cycle, the training datais uniformly segmented, each model state is matched with thecorresponding data segments and then means and variances are estimated.If mixture Gaussian models are being trained, then a modified formof k-means clustering is used. On the second and successive cycles,the uniform segmentation is replaced by Viterbi alignment. The initial parameter values computed by \htool{HInit} are then further re-estimatedby \htool{HRest}. Again, the fully labelled bootstrap data is used but thistime the segmental k-means procedure is replaced by the Baum-Welch re-estimationprocedure described in the previous chapter. When no bootstrap data isavailable, a so-called \textit{flat start} can be used. In this case allof the phone models are initialised to be identical and have state meansand variances equal to the global speech mean and variance. The tool\htool{HCompV}\index{hcompv@\htool{HCompV}} can be used for this.\index{flat start}\centrefig{tsubword}{90}{Training Sub-word HMMs}Once an initial set of models has been created, the tool \htool{HERest}is used to perform {\em embedded training} using the entire\index{embedded training}training set. \htool{HERest}\index{herest@\htool{HERest}} performs a single Baum-Welchre-estimation of the whole set of HMM phone models simultaneously. For eachtraining utterance, the corresponding phone models are concatenated and thenthe forward-backward algorithm is used to accumulate the statistics of stateoccupation, means, variances, etc., for each HMM in the sequence. Whenall of the training data has been processed, the accumulated statisticsare used to compute re-estimates of the HMM parameters. \htool{HERest} isthe core \HTK\ training tool. It is designed to process large databases, it hasfacilities for pruning\index{pruning} to reduce computation and it can be run in parallel across a network of machines.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -