📄 netdict.tex
字号:
J=2 S=4 E=3 l=-0.4\end{verbatim}Here the probabilities have been normalised to sum to 1, however,this is not necessary. The recogniser simply adds the scaled log probabilityto the path score and hence it can be regarded as an additiveword transition penalty.\index{SLF!arc probabilities}\mysect{Building a Word Network with \htool{HParse}}{usehparse}Whilst the construction of a word level SLF network file by handis not difficult, it can be somewhat tedious. In earlier versionsof \HTK, a high level grammar notation based on extended Backus-Naur\index{extended Backus-Naur Form}Form (EBNF\index{EBNF}) was used to specify recognition grammars. This \textit{HParse}format was read-in directly by the recogniser and compiled intoa finite state recognition network at run-time.\inthisversion \textit{HParse} format is still supported but in the form ofan \textit{off-line} compilation into an SLF word network which cansubsequently be used to drive a recogniser. A HParse format\index{HParse format} grammar\index{grammar} consists of an extended form of regular expressionenclosed within parentheses. Expressions are constructedfrom sequences of words and the metacharacters\begin{description}\item[\texttt{|}] denotes alternatives\item[\texttt{[ ]}] encloses options\item[\texttt{\{ \}}] denotes zero or more repetitions\item[\texttt{< >}] denotes one or more repetitions\item[\texttt{<< >>}] denotes context-sensitive loop\end{description}The following examples will illustrate the use of all of theseexcept the last which is a special-purpose facility providedfor constructing context-sensitive loops as found in for example,context-dependent phone loops and word-pair grammars. It is describedin the reference entry for \htool{HParse}\index{hparse@\htool{HParse}}.As a first example, supposethat a simple isolated word single digit recogniser\index{digit recogniser} was required.A suitable syntax would be\begin{verbatim} ( one | two | three | four | five | six | seven | eight | nine | zero )\end{verbatim}This would translate into the network shown in part (a) ofFig.~\href{f:digitnets}.If this HParse format syntax definitionwas stored in a file called {\tt digitsyn},the equivalent SLF word network would be generated in thefile \texttt{digitnet} by typing\begin{verbatim} HParse digitsyn digitnet\end{verbatim}The above digit syntax assumes that each input digit isproperly end-pointed. Thisrequirement can be removed by adding a silence modelbefore and after the digit\begin{verbatim} ( sil (one | two | three | four | five | six | seven | eight | nine | zero) sil )\end{verbatim}As shown by graph (b) in Fig.~\href{f:digitnets}, the allowable sequence ofmodels now consists of silence followed by a digit followed by silence. If a sequence of digits needed to be recognised then angle brackets canbe used to indicate one or more repetitions, the HParse grammar\begin{verbatim} ( sil < one | two | three | four | five | six | seven | eight | nine | zero > sil )\end{verbatim}would accomplish this.Part (c) of Fig.~\href{f:digitnets}shows the network that would result in this case.\centrefig{digitnets}{120}{Example Digit Recognition Networks}HParse\index{HParse format!variables} grammars can define variables to represent sub-expressions.Variable names start with a dollar symbol and they are given valuesby definitions of the form\begin{verbatim} $var = expression ;\end{verbatim}For example, the above connected digit grammar could be rewritten as\begin{verbatim} $digit = one | two | three | four | five | six | seven | eight | nine | zero; ( sil < $digit > sil )\end{verbatim}Here \texttt{\$digit} is a variable whose value is the expression appearingon the right hand side of the assignment. Whenever the name of a variableappears within an expression, the corresponding expression is substituted.Note however that variables must be defined before use, hence, recursionis prohibited.As a final refinement of the digit grammar, the start and end silencecan be made optional by enclosing them within square brackets thus\begin{verbatim} $digit = one | two | three | four | five | six | seven | eight | nine | zero; ( [sil] < $digit > [sil] )\end{verbatim}Part (d) of Fig.~\href{f:digitnets}shows the network that would result in this last case.HParse format grammars are a convenient way of specifying task grammars\index{task grammar} for interactive voice interfaces. As a finalexample, the following defines a simple grammar for the controlof a telephone by voice.\begin{verbatim} $digit = one | two | three | four | five | six | seven | eight | nine | zero; $number = $digit { [pause] $digit}; $scode = shortcode $digit $digit; $telnum = $scode | $number; $cmd = dial $telnum | enter $scode for $number | redial | cancel; $noise = lipsmack | breath | background; ( < $cmd | $noise > )\end{verbatim}The dictionary entries for \texttt{pause}, \texttt{lipsmack}, \texttt{breath} and \texttt{background} would reference HMMs trainedto model these types of noise and the corresponding output symbolsin the dictionary would be null.Finally, it should be noted that when the HParse format\index{HParse format!in V1.5} was used inearlier versions of \HTK, word grammars contained word pronunciationsembedded within them. This was done by using the reserved node names\texttt{WD\_BEGIN} and \texttt{WD\_END} to delimit word boundaries. Toprovide backwards compatiblity, \htool{HParse} can process these oldformat networks but when doing so it outputs a dictionary as well as aword network. This compatibility mode\index{HParse format!compatibility mode} is defined fully in thereference section, to use it the configuration variable\texttt{V1COMPAT}\index{v1compat@\texttt{V1COMPAT}} must be set true or the \texttt{-c} option set.Finally on the topic of word networks\index{word networks!tee-models in}, it is important to note thatany network containing an unbroken loop of one or more tee-modelswill generate an error. For example, if \texttt{sp} is a single state tee-model used to represent short pauses, then the following network would generate anerror\index{tee-models!in networks}\begin{verbatim} ( sil < sp | $digit > sil )\end{verbatim}the intention here is to recognise a sequence of digits which mayoptionally be separated by short pauses. However, the syntax allowsan endless sequence of \texttt{sp} models and hence, the recogniser couldtraverse this sequence without ever consuming any input. The solution toproblems such as these is to rearrange the network. For example, theabove could be written as\begin{verbatim} ( sil < $digit sp > sil )\end{verbatim}%$\mysect{Bigram Language Models}{biglms}\index{language models!bigram}Before continuing with the description of network generationand, in particular, the use of \htool{HBuild}\index{hbuild@\htool{HBuild}}, the use of bigram language models needs to be described.Support for statistical language models in \HTK\ is providedby the library module \htool{HLM}. Although the interface to\htool{HLM}\index{hlm@\htool{HLM}} can support general N-grams\index{N-grams}, the facilities forconstructing and using N-grams are limited to bigrams.A bigram language model can be built using \htool{HLStats}\index{hlstats@\htool{HLStats}}invoked as follows where it is a assumed that all of thelabel files used fortraining are stored in an MLF called \texttt{labs}\begin{verbatim} HLStats -b bigfn -o wordlist labs\end{verbatim}All words used in the label files must be listed in the \texttt{wordlist}.This command will read all of the transcriptions in \texttt{labs},build a table ofbigram counts in memory, and then output a back-off bigram\index{back-off bigrams}to the file \texttt{bigfn}. The formulae used for this aregiven in the reference entry for \htool{HLStats}. However, the basic idea is encapsulated in the following formula\[ p(i,j) = \left\{ \begin{array}{ll} (N(i,j) - D )/N(i) & \mbox{if $N(i,j) > t$} \\ b(i) p(j) & \mbox{otherwise} \end{array} \right. \]where $N(i,j)$ is the number of times word $j$ follows word $i$ and$N(i)$ is the number of times that word $i$ appears.Essentially, a small part of the available probability massis deducted from the higher bigram counts and distributed amongstthe infrequent bigrams. This process is called \textit{discounting}.The default value for the discount constant $D$ is 0.5 but this can be altered using the configuration variable \texttt{DISCOUNT}\index{discount@\texttt{DISCOUNT}}.When a bigram count falls below the threshold$t$, the bigram is backed-off to the unigram probability suitably scaledby a back-off weight in order to ensure that all bigram probabilities for a givenhistory sum to one.Backed-off bigrams\index{back-off bigrams!ARPA MIT-LL format} are stored in a text file using the standardARPA MIT-LL format which as used in \HTK\ is as follows\begin{verbatim} \data\ ngram 1=<num 1-grams> ngram 2=<num 2-ngrams> \1-grams: P(!ENTER) !ENTER B(!ENTER) P(W1) W1 B(W1) P(W2) W2 B(W2) ... P(!EXIT) !EXIT B(!EXIT) \2-grams: P(W1 | !ENTER) !ENTER W1 P(W2 | !ENTER) !ENTER W2 P(W1 | W1) W1 W1 P(W2 | W1) W1 W2 P(W1 | W2) W2 W1 .... P(!EXIT | W1) W1 !EXIT P(!EXIT | W2) W2 !EXIT \end\\end{verbatim}where all probabilities are stored as base-10 logs. The defaultstart and end words, \texttt{!ENTER} and \texttt{!EXIT} can be changedusing the \htool{HLStats} \texttt{-s} option.For some applications, a simple matrix style of bigram representationmay be more appropriate. If the \texttt{-o} option is omitted inthe above invocation of \htool{HLStats}, then a simple full bigrammatrix will be output using the format\begin{verbatim} !ENTER 0 P(W1 | !ENTER) P(W2 | !ENTER) ..... W1 0 P(W1 | W1) P(W2 | W1) ..... W2 0 P(W1 | W2) P(W2 | W2) ..... ... !EXIT 0 PN PN .....\end{verbatim} where the probability $P(w_j|w_i)$ is given by row $i,j$ of the matrix.If there are a total of N words in the vocabulary then \texttt{PN}in the above is set to $1/(N+1)$, this ensures that the last rowsums to one. As a very crude form of smoothing, a floor can be setusing the \texttt{-f minp} option to prevent any entry fallingbelow \texttt{minp}. Note, however, that this does not affect the bigram entries in the firstcolumn which are zero by definition. Finally, as with the storageof tied-mixture and discrete probabilities, a run-length encodingscheme is used whereby any value can be followed by an asterisk and a repeat count (see section~\ref{s:tmix}).\mysect{Building a Word Network with \htool{HBuild}}{usehbuild}\sidefig{decinet}{62}{Decimal Syntax}{-2}{As mentioned in the introduction, the main function of \htool{HBuild}is allow a word-level network to be constructed froma main lattice and a set of sub-lattices\index{sub-lattices}. Any latticecan contain node definitions which refer to other lattices.This allows a word-level recognition network to be decomposedinto a number of sub-networks which can be reused at differentpoints in the network. }\index{hbuild@\htool{HBuild}}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -