📄 hparse.tex
字号:
word in a 7 word vocabulary
\begin{tabbing}
+++\=++++++++\=++\= \kill
\> ENTRY \> - \> show, tell, give \\
\> show \> - \> me, all \\
\> tell \> - \> me, all \\
\> me \> - \> all \\
\> all \> - \> names, addresses \\
\> names \> - \> and, names, addresses, show, tell, EXIT \\
\> addresses \> - \> and, names, addresses, show, tell, EXIT \\
\> and \> - \> names, addresses, show, tell
\end{tabbing}
\htool{HParse} can generate a suitable lattice to represent this word-pair
grammar by using the following specification:
\begin{verbatim}
$TLOOP_BEGIN_FLLWRS = show|tell|give;
$TLOOP_END_PREDS = names|addresses;
$show_FLLWRS = me|all;
$tell_FLLWRS = me|all;
$me_FLLWRS = all;
$all_FLLWRS = names|addresses;
$names_FLLWRS = and|names|addresses|show|tell|TLOOP_END;
$addresses_FLLWRS = and|names|addresses|show|tell|TLOOP_END;
$and_FLLWRS = names|addresses|show|tell;
( sil <<
TLOOP_BEGIN+TLOOP_BEGIN_FLLWRS |
TLOOP_END_PREDS-TLOOP_END |
show+show_FLLWRS |
tell+tell_FLLWRS |
me+me_FLLWRS |
all+all_FLLWRS |
names+names_FLLWRS |
addresses+addresses_FLLWRS |
and+and_FLLWRS
>> sil )
\end{verbatim}
where it is assumed that each utterance begins and ends with \texttt{sil}
model.
In this example, each set of contexts is defined by creating a variable
whose alternatives are the individual contexts. The actual context-dependent
loop is indicated by the \texttt{<< >>} brackets.
Each element in this loop is a single
variable name of the form \texttt{A-B+C} where \texttt{A} represents the left
context, \texttt{C} represents the right context and \texttt{B} is the actual word.
Each of \texttt{A}, \texttt{B} and \texttt{C} can be nodenames or
variable names but note that this is the only case where variable
names are expanded automatically and the usual
\texttt{\$} symbol is not used\footnote{If the base-names or left/right context of the context-dependent names in a context-dependent loop are variables,
no \texttt{\$} symbols are used when writing the context-dependent
nodename.}. Both \texttt{A} and \texttt{C} are optional, and left and
right contexts can be mixed in the same triphone loop.
\mysubsect{Compatibility Mode}{HParse-Compatibility Mode}
In \htool{HParse} compatibility mode, the interpretation of the
ENBF network is that used by the \HTK\ V1.5 \htool{HVite} program.
in which \htool{HParse} ENBF notation was used to define both the word
level syntax and the dictionary.
Compatibility mode is aimed at converting files written for
\HTK\ V1.5 into their equivalent \HTK\ V2 representation.
Therefore \htool{HParse} will output the word level
portion of such a ENBF syntax as an \HTK\ V2 lattice file and the
pronunciation information is optionally stored in
an \HTK\ V2 dictionary file. When operating in compatibility mode
and not generating dictionary output, the pronunciation information
is discarded.
In compatibility mode, the reserved
node names \texttt{WD\_BEGIN} and \texttt{WD\_END} are used to delimit word
boundaries---nodes between a \texttt{WD\_BEGIN/WD\_END} pair are called
``word-internal'' while all other nodes are ``word-external''.
All \texttt{WD\_BEGIN/WD\_END} nodes
must have an ``external name'' attached that denotes the word.
It is a requirement that the number of \texttt{WD\_BEGIN} and the number
of \texttt{WD\_END} nodes are equal and furthermore that there isn't
a direct connection from a \texttt{WD\_BEGIN} node to a \texttt{WD\_END}.
For example a portion of such an \HTK\ V1.5 network could be
\begin{verbatim}
$A = WD_BEGIN%A ax WD_END%A;
$ABDOMEN = WD_BEGIN%ABDOMEN ae b d ax m ax n WD_END%ABDOMEN;
$ABIDES = WD_BEGIN%ABIDES ax b ay d z WD_END%ABIDES;
$ABOLISH = WD_BEGIN%ABOLISH ax b aa l ih sh WD_END%ABOLISH;
... etc
( <
$A | $ABDOMEN | $ABIDES | $ABOLISH | ... etc
> )
\end{verbatim}
\htool{HParse} will output the connectivity of the words
in an HTK V2 word lattice format file
and the pronunciation information in an HTK V2 dictionary.
Word-external nodes are treated as words and stored in the lattice
with corresponding entries in the dictionary.
It should be noted that in \HTK\ V1.5 any ENBF network could appear
between a \texttt{WD\_BEGIN/WD\_END} pair, which includes loops.
Care should therefore be taken with syntaxes that define very complex
sets of alternative pronunciations. It should also be noted
that each dictionary entry is limited in length to 100 phones.
If multiple instances of the same word are found in the expanded
HParse network, a dictionary entry will be created for only the
first instance and subsequent instances are ignored (a warning is
printed). If words with a NULL external name are present then
the dictionary will contain a NULL output symbol.
Finally, since the implementation of the generation of
the \htool{HParse} network has been revised\footnote{With the added benefit
of rectifying some residual bugs in the HTK V1.5 implementation}
the semantics of variable definition and use has been slightly changed.
Previously variables could be redefined during network definition
and each use would follow the most recent definition. In HTK V2 only
the final definition of any variable is used in network expansion.
\mysubsect{Use}{HParse-Use}
\htool{HParse} is invoked via the command line
\begin{verbatim}
HParse [options] syntaxFile latFile
\end{verbatim}
\htool{HParse} will then read the set of ENBF rules
in \texttt{syntaxFile} and produce the output lattice in \texttt{latFile}.
The detailed operation of \htool{HParse} is controlled by the following
command line options
\begin{optlist}
\ttitem{-b} Output the lattice in binary format. This increases
speed of subsequent loading (default ASCII text lattices).
\ttitem{-c} Set V1.5 compatibility mode. Compatibility mode can also
be set by using the configuration variable V1COMPAT
(default compatibility mode disabled).
\ttitem{-d s} Output dictionary to file {\tt s}. This is only
a valid option when operating in compatibility mode.
If not set no dictionary will be produced.
\ttitem{-l} Include language model log probabilities in the output
These log probabilities are calculated as
$-\log (\mbox{number of followers})$ for each network node.
\end{optlist}
\stdopts{HParse}
\mysubsect{Tracing}{HParse-Tracing}
\htool{HParse} supports the following trace options where each
trace flag is given using an octal base
\begin{optlist}
\ttitem{0001} basic progress reporting.
\ttitem{0002} show final HParse network (before conversion to a lattice)
\ttitem{0004} print memory statistics after HParse lattice generation
\ttitem{0010} show progress through glue node removal.
\end{optlist}
Trace flags are set using the \texttt{-T} option or the \texttt{TRACE}
configuration variable.
\index{hparse@\htool{HParse}|)}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../htkbook"
%%% End:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -