📄 netdict.tex

📁 Hidden Markov Toolkit (HTK) 3.2.1 HTK is a toolkit for use in research into automatic speech recogn
💻 TEX
📖 第 1 页 / 共 4 页
字号:
For example, suppose that decimal numberinput was required.  A suitable network structure would beas shown in Fig.~\href{f:decinet}.  However, to write this directlyin an SLF file would require the digit loop to be written twice.This can be avoided by defining the digit loop as a sub-networkand referencing it within the main \textit{decimal} network asfollows\begin{verbatim}    # Digit network    SUBLAT=digits    N=14 L=21    # define digits    I=0  W=zero    I=1  W=one    I=2  W=two    ...    I=9  W=nine    #  enter/exit & loop-back null nodes    I=10 W=!NULL    I=11 W=!NULL    I=12 W=!NULL    I=13 W=!NULL    # null->null->digits    J=0 S=10 E=11    J=1 S=11 E=0    J=2 S=11 E=1    ...    J=10 S=11 E=9    # digits->null->null    J=11 S=0 E=12    ...    J=19 S=9 E=12    J=20 S=12 E=13    # finally add loop back    J=21 S=12 E=11    .    # Decimal netork    N=5 L=4    # digits -> point -> digits    I=0 W=start    I=1 L=digits    I=2 W=pause    I=3 L=digits    I=4 W=end    # digits -> point -> digits    J=0 S=0 E=1    J=1 S=1 E=2    J=2 S=2 E=3    J=3 S=3 E=4\end{verbatim}The sub-network is identified by the field \texttt{SUBLAT}\index{sublat@\texttt{SUBLAT}} in the headerand it is terminated by a single period on a line by itself.  Themain body of the sub-network is written as normal.Once defined, a sub-network can be substituted into a higher levelnetwork using an \texttt{L} field in a node definition, as in nodes1 and 3 of the decimal network above.Of course, this process can be continued and a higher level networkcould reference the decimal network wherever it needed decimalnumber entry.\centrefig{bobig}{100}{Back-off Bigram Word-Loop Network}One of the commonest form of recognition network is the word-loop\index{word-loop network}where all vocabulary items are placed in parallel with a loop-backto allow any word sequence to be recognised.  This is the basicarrangement used in most dictation or transcription applications.\htool{HBuild} can build such a loop automatically from a listof words.  It can also read in a bigram in either ARPA MIT-LL format or HTK matrix format and attach a bigram probability toeach word transition.  Note, however, that using a full bigramlanguage model means that every distinct pair of words musthave its own unique loop-back transition.  This increases the size ofthe network considerably and slows down the recogniser.When a back-off bigram is used, however, backed-off transitionscan share a common loop-back transition.  Fig.~\href{f:bobig}illustrates this.  When backed-off bigrams are input via an ARPA MIT-LL format file, \htool{HBuild} will exploit this where possible.Finally, \htool{HBuild} can automatically construct a word-pair grammar\index{word-pair grammar} as used in the ARPA Naval Resource Management task.\mysect{Testing a Word Network using \htool{HSGen}}{usehsgen}When designing task grammars, it is useful to be able to checkthat the  language defined by the final word network is as envisaged.One simple way to check this is to use the network as a generator byrandomly traversing it and outputting the name of each word nodeencountered.  \HTK\ provides a very simple tool called \htool{HSGen}\index{hsgen@\htool{HSGen}}for doing this.As an example if the file \texttt{bnet} contained the simple Bit-Butnetword described above and the file \texttt{bdic} contained a correspondingdictionary then the command\begin{verbatim}    HSGen bnet bdic\end{verbatim}would generate a random list of examples of the languagedefined by  \texttt{bnet}, for example,\begin{verbatim}    start bit but bit bit bit end     start but bit but but end     start bit bit but but end     .... etc\end{verbatim}This is perhaps not too informative in this case but for morecomplex grammars, this type of output can be quite illuminating.\htool{HSGen} will also estimate the empirical entropyby recordingthe probability of each sentence generated\index{sentence generation}.  To use this facility, itis best to suppress the sentence output and generate a large numberof examples.  For example, executing\begin{verbatim}    HSGen -s -n 1000 -q bnet bdic\end{verbatim}where the \texttt{-s} option requests statistics, the \texttt{-q} optionsuppresses the output and \texttt{-n 1000} asks for 1000 sentenceswould generate the following output\begin{verbatim}    Number of Nodes = 4 [0 null], Vocab Size = 4    Entropy = 1.156462,  Perplexity = 2.229102    1000 Sentences: average len = 5.1, min=3, max=19\end{verbatim}\mysect{Constructing a Dictionary}{usehdman}As explained in section~\ref{s:netuse}, the word level network is expanded by\htool{HNet} to create the network of HMM instances needed by the recogniser.The way in which each word is expanded is determined from adictionary\index{dictionary!construction}.A dictionary for use in \HTK\ has a very simple format.\index{dictionary!formats}Each line consists of a single word pronunciation with format\begin{verbatim}    WORD [ '['OUTSYM']' ] [PRONPROB] P1 P2 P3 P4 ....\end{verbatim}where \texttt{WORD} represents the word, followed by the optionalparameters \texttt{OUTSYM} and \texttt{PRONPROB}, where\texttt{OUTSYM} is the symbol to output when that word isrecognised (which must be enclosed in square brackets, \verb|[| and\verb|]|) and \texttt{PRONPROB} is the pronunciation probability($0.0$ - $1.0$).  \texttt{P1}, \texttt{P2}, \ldots is the sequence ofphones or HMMs to be used in recognising that word. The output symboland the pronunciation probability are optional. If an output symbol isnot specified, the name of the word itself is output. If apronunciation probability is not specified then a default of 1.0 isassumed.  Empty square brackets,\texttt{[]}, can be used to suppress any output when that word is recognised.For example, a dictionary might contain\begin{verbatim}    bit           b  ih t     but           b  ah t    dog    [woof] d  ao g    cat    [meow] k  ae t    start  []     sil    end    []     sil\end{verbatim}\noindentIf any word has more than one pronunciation, then the wordhas a repeated entry, for example,\begin{verbatim}    the           th iy    the           th ax \end{verbatim}corresponding to the stressed and unstressed forms of the word``the''.\index{dictionary!output symbols}The pronunciations in a dictionary are normally at the phonelevel as in the above examples.  However, if context-dependentmodels are wanted, these can be included directly in the dictionary.For example, the Bit-But entries might be written as\begin{verbatim}    bit           b+ih  b-ih+t  ih-t     but           b+ah  b-ah+t  ah-t\end{verbatim}In principle, this is never necessary since \htool{HNet} can perform contextexpansion automatically, however, it saves computation to do thisoff-line as part of the dictionary construction process.  Of course,this is only possible for word-internal context dependencies.Cross-word dependencies can only be generated by \htool{HNet}.\centrefig{dmaker}{110}{Dictionary Construction using \htool{HDMan}}Pronouncing dictionaries are a valuable resource and if producedmanually, they can require considerable investment.  There area number of commercial and public domain dictionaries available,however, these will typically have differing formats and willuse different phone sets.  To assist in the process ofdictionary construction, \HTK\ provides a tool called \htool{HDMan}which can be used to edit and merge differing source dictionariesto form a single uniform dictionary.  The way that\htool{HDMan}\index{hdman@\htool{HDMan}} works is illustrated in Fig.~\href{f:dmaker}.Each source dictionary file must have one pronunciation per line and thewords must be sorted into alphabetical order.  The word entries must bevalid \HTK\ strings as defined in section~\ref{s:htkstrings}.  If anarbitrary character sequence is to be allowed, then the input editscript should have the command \texttt{IM RAW} as its first command.The basic operation of \htool{HDMan} is to scan the input streams and for each new wordencountered, copy the entry to the output.  In the figure,  a word listis also shown.  This is optional but if included \htool{HDMan} only copies words in the list.  Normally, \htool{HDMan}copies just the first pronunciation that it finds for any word. Thus,the source dictionaries are usually arranged in order of\textit{reliability}, possibly preceded by a small dictionary of specialword pronunciations. For example, in Fig.~\href{f:dmaker}, the maindictionary might be \texttt{Src2}.  \texttt{Src1} might be a small dictionary containing correct pronunciations for words in \texttt{Src2}known to have  errors in them. Finally, \texttt{Src3} might be a largepoor quality dictionary (for example, it could be generatedby a rule-based text-to-phone system) which is included as a last resortsource of pronunciations for words not in the main dictionary.As shown in the figure, \htool{HDMan} can apply a set of editingcommands to each source dictionary and it can also edit theoutput stream.  The commands available are described in full inthe reference section.  They operate in a similar way tothose in \htool{HLEd}.  Each set of commands is written inan edit script with one command per line.  Each input edit scripthas the same name as the corresponding source dictionary but withthe extension \texttt{.ded} added.  The output edit script is storedin a file called \texttt{global.ded}\index{global@\texttt{global.ded}}.  The commands providedinclude replace and delete at the word and phone level, context-sensitivereplace and automatic conversions to left biphones, right biphonesand word internal triphones.\index{dictionary!edit commands}When \htool{HDMan} loads a dictionary it adds word boundary symbols tothe start and end of each pronunciation and then deletes them whenwriting out the new dictionary.  The default for these word boundarysymbols is \texttt{\#} but it can be redefined using the \texttt{-b}option.  The reason for this is to allow context-dependent edit commands to take account of word-initial and word-final phone positions.  The examples below will illustrate this.Rather than go through each \htool{HDMan} edit command in detail, some exampleswill illustrate the typical manipulations that can be performedby \htool{HDMan}.  Firstly, suppose that a dictionary transcribedunstressed ``-ed'' endings as \texttt{ih0 d}but the required dictionarydoes not mark stress but uses a schwa in such cases, that is,the transformations\index{mp@\texttt{MP} command}\index{sp@\texttt{SP} command}\begin{verbatim}    ih0 d  #   ->   ax d    ih0        ->   ih  (otherwise)\end{verbatim}are required.These could be achieved by the following 3 commands\begin{verbatim}    MP axd0 ih0 d #    SP axd0 ax d #    RP ih ih0\end{verbatim}The context sensitive replace is achieved by merging all sequencesof \texttt{ih0 d \#} and then splitting the result into the sequence\texttt{ax d \#}.  The final \texttt{RP} command\index{rp@\texttt{RP} command} then unconditionallyreplaces all occurrences of \texttt{ih0} by \texttt{ih}.As a second similar example, suppose that all examples of \texttt{ax l}(as in ``bottle'') are to be replaced by the single phone \texttt{el}provided that the immediately following phone is a non-vowel.This requires the use of the \texttt{DC} command\index{dc@\texttt{DC} command} to define acontext consisting of all non-vowels, then a merge using  \texttt{MP}as above followed by a context-sensitive replace\begin{verbatim}    DC nonv l r w y .... m n ng #    MP axl ax l    CR el * axl nonv    SP axl ax l\end{verbatim}the final step converts all non-transformed cases of \texttt{ax l}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -