📄 hdecode.tex
字号:
%/* */
%/* ___ */
%/* |_| | |_/ SPEECH */
%/* | | | | \ RECOGNITION */
%/* ========= SOFTWARE */
%/* */
%/* */
%/* ----------------------------------------------------------- */
%/* Use of this software is governed by a License Agreement */
%/* ** See the file License for the Conditions of Use ** */
%/* ** This banner notice must not be removed ** */
%/* */
%/* ----------------------------------------------------------- */
%
% HTKBook - Steve Young and Julian Odell - 24/11/97
%
\newpage
\mysect{HDecode}{HDecode}
{\bf\large WARNING: In contrast to the rest of HTK, \htool{HDecode} has
been specifically written for speech recognition. Known restrictions
are:
\begin{itemize}
\item only works for cross-word triphones;
\item \texttt{sil} and \texttt{sp} models are reserved as silence
models and are, by default, automatically added to the end of all
``words'' in the pronunciation dictionary.
\item lattices generated with \htool{HDecode} must be {\em merged}
using \htool{HLRescore}
to remove duplicate word paths prior to being used for lattice rescoring
with \htool{HDecode} and \htool{HVite}.
\end{itemize}
For an example of the use of \htool{HDecode} see the Resource
Management recipe, section 11, in the \texttt{samples} tar-ball
that is available for download
}
\mysubsect{Function}{HDecode-Function}
\index{hdecode@\htool{HDecode}|(}
\htool{HDecode} is a large vocabulary word recogniser.
Similar to \htool{HVite}, it transcribes speech files using a HMM model set
and a dictionary (vocabulary). The best transcription hypothesis will be
generated in the Master Label File (MLF) format. Optionally,
multiple hypotheses can also be generated as a word lattice in the form
of the HTK Standard Lattice Format (SLF).
The search space of the recognition process is defined by a model based
network, produced from expanding a supplied language model or a word
level lattice using the dictionary. In the absence of a word lattice,
a language model must be supplied to perform a \emph{full decoding}.
The current version of \htool{HDecode} only supports bigram full decoding.
When a word lattice is supplied, the use of a language model is optional.
This mode of operation is known as \emph{lattice rescoring}.
The acoustic and language model scores can be adjusted using the
\texttt{-a} and \texttt{-s} options respectively.
In the case where the supplied dictionary
contains pronunciation probability information, the corresponding scale
factor can be adjusted using the \texttt{-r} option. Use the \texttt{-q} option
to control the type of information to be included in the generated lattices.
\htool{HDecode}, when compiled with the \texttt{MODALIGN} compile directive,
can also be used to align the HMM models to a given word level lattice
(also known as model marking the lattice). When using the default
\texttt{Makefile} supplied with \htool{HDecode}, this binary will be
made and stored in \htool{HDecode.mod}.
\htool{HDecode} supports shared parameters and appropriately pre-computes
output probabilities. The runtime of the decoding process can be adjusted by
changing the pruning beam width (see the \texttt{-t} option),
word end beam width (see the \texttt{-v} option) and the maximum model
pruning (see the \texttt{-u} option).
\htool{HDecode} also allows probability calculation to be carried out in
blocks at the same time. The block size (in frames) can be specified using
the \texttt{-k} option. However, when CMLLR adaptation is used, probabilities
have to be calculated one frame at a time (i.e. using \texttt{-k 1})\footnote{
This is due to the different caching mechanism used in \htool{HDecode} and
the \htool{HAdapt} module}.
Speaker adaptation is supported by \htool{HDecode} only in terms of
using a speaker specific linear adaptation transformation. The use of
an adaptation transformation is enabled using the \texttt{-m} option.
The path, name and extension of the transformation matrices are specified
using the \texttt{-J} and the file names are derived from the name of the
speech file using a \emph{mask} (see the \texttt{-h} option).
Online (batch or incremental) adaptations are not supported by \htool{HDecode}.
Note that for lattices rescoring word lattices must be deterministic.
Duplicated paths and pronunciation variants are not permitted. See
\htool{HLRescore} reference page for information on how to produce
deterministic lattices.
\mysubsect{Use}{HDecode-Use}
\htool{HDecode} is invoked via the command line
\begin{verbatim}
HDecode [options] dictFile hmmList testFiles ...
\end{verbatim}
HDecode will then either load a N-gram language model file (\texttt{-w s})
and create a decoding network for the test files, which is
the {\em full decoding} mode, or create a
new network for each test file from the corresponding
word lattice, which is the {\em lattice rescoring} mode.
When a new network is created for each test file the path name
of the label (or lattice) file to load is determined from the
test file name and the \texttt{-L} and \texttt{-X} options
described below.
The \texttt{hmmList} should contain a list of the models required to
construct the network from the word level representation.
The recogniser output is written in the form of a label file whose
path name is determined from the test file name and the \texttt{-l} and
\texttt{-y} options described below. The list of test files can be stored
in a script file if required.
When performing lattice recognition (see \texttt{-z s} option described
below) the output lattice file contains multiple alternatives and the
format is determined by the \texttt{-q} option.
The detailed operation of \htool{HDecode} is controlled by the following
command line options
\begin{optlist}
\ttitem{-a f} Set acoustic scale factor to \texttt{f}.
This factor post-multiplies the acoustic likelihoods
from the word lattices. (default value 1.0).
\ttitem{-d dir} This specifies the directory to search for the
HMM definition files corresponding to the labels used in
the recognition network.
\ttitem{-h mask} Set the mask for determining which transform names are
to be used for the input transforms.
\ttitem{-i s} Output transcriptions to MLF \texttt{s}.
\ttitem{-k i} Set frame block size in output probability calculation for
diagonal covariance systems.
\ttitem{-l dir} This specifies the directory to store the output label
files. If this option is not used then \htool{HDecode} will store
the label files in the same directory as the data.
When output is directed to an MLF, this option can be used to
add a path to each output file name. In particular, setting the option
\verb+-l '*'+ will cause a label file named \texttt{xxx} to be prefixed
by the pattern \verb+"*/xxx"+ in the output MLF file. This is useful
for generating MLFs which are independent of the location of the
corresponding data files.
\ttitem{-m } Use an input transform. (default is off)
\ttitem{-n i} Use \texttt{i} tokens in each state to perform
lattice recognition. (default is 32 tokens per state)
\ttitem{-o s} Choose how the output labels should be formatted.
\texttt{s} is a string with certain letters (from \texttt{NSCTWMX})
indicating binary flags that control formatting options.
\texttt{N} normalise acoustic scores by dividing by the duration
(in frames) of the segment.
\texttt{S} remove scores from output label. By default
scores will be set to the total likelihood of the segment.
\texttt{C} Set the transcription labels to start and end on
frame centres. By default start times are set to the start
time of the frame and end times are set to the end time of
the frame.
\texttt{T} Do not include times in output label files.
\texttt{W} Do not include words in output label files
when performing state or model alignment.
\texttt{M} Do not include model names in output label
files when performing state and model alignment.
\texttt{X} Strip the triphone context.
\ttitem{-p f} Set the word insertion log probability to \texttt{f}
(default 0.0).
\ttitem{-q s} Choose how the output lattice should be formatted.
\texttt{s} is a string with certain letters (from \texttt{ABtvaldmnr})
indicating binary flags that control formatting options.
\texttt{A} attach word labels to arcs rather than nodes.
\texttt{B} output lattices in binary for speed.
\texttt{t} output node times.
\texttt{v} output pronunciation information.
\texttt{a} output acoustic likelihoods.
\texttt{l} output language model likelihoods.
\texttt{d} output word alignments (if available).
\texttt{m} output within word alignment durations.
\texttt{n} output within word alignment likelihoods.
\texttt{r} output pronunciation probabilities.
\ttitem{-r f} Set the dictionary pronunciation probability scale
factor to \texttt{f}. (default value 1.0).
\ttitem{-s f} Set the grammar scale factor to \texttt{f}.
This factor post-multiplies the language model likelihoods
from the word lattices. (default value 1.0).
\ttitem{-t f [g]} Enable beam searching such that any model whose
maximum log probability token falls more than the main beam
\texttt{f} below the maximum for all models is deactivated.
An extra parameter \texttt{g} can be specified as the relative
beam width. It may override the main beam width.
\ttitem{-u i} Set the maximum number of active models to \texttt{i}.
Setting \texttt{i} to \texttt{0} disables this limit (default 0).
\ttitem{-v f [g]} Enable word end pruning. Do not propagate tokens from
word end nodes that fall more than \texttt{f} below the maximum
word end likelihood. (default \texttt{0.0}).
An extra parameter \texttt{g} can be specified.
\ttitem{-w s} Load language model from \texttt{s}.
\ttitem{-x ext} This sets the extension to use for HMM definition
files to \texttt{ext}.
\ttitem{-y ext} This sets the extension for output label files to
\texttt{ext} (default \texttt{rec}).
\ttitem{-z ext} Enable output of lattices with extension \texttt{ext}
(default off).
\ttitem{-L dir} This specifies the directory to find input lattices.
% \ttitem{-R s} Load 1-best alignment label file from \texttt{-s}.
\ttitem{-X ext} Set the extension for the input lattice files
to be \texttt{ext} (default value \texttt{lat}).
\stdoptE
\stdoptF
\stdoptG
\stdoptH
\stdoptJ
\stdoptK
\stdoptP
\end{optlist}
\stdopts{HDecode}
\mysubsect{Tracing}{HDecode-Tracing}
\htool{HDecode} supports the following trace options where each
trace flag is given using an octal base
\begin{optlist}
\ttitem{0001} enable basic progress reporting.
\ttitem{0002} list observations.
\ttitem{0004} show adaptation process.
\ttitem{0010} show memory usage at start and finish.
\end{optlist}
Trace flags are set using the \texttt{-T} option or the \texttt{TRACE}
configuration variable.
\index{hvite@@\htool{HDecode}|)}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -