⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 hparse.tex

📁 隐马尔可夫模型源代码
💻 TEX
📖 第 1 页 / 共 2 页
字号:
%/* ----------------------------------------------------------- */


%/*                                                             */


%/*                          ___                                */


%/*                       |_| | |_/   SPEECH                    */


%/*                       | | | | \   RECOGNITION               */


%/*                       =========   SOFTWARE                  */ 


%/*                                                             */


%/*                                                             */


%/* ----------------------------------------------------------- */


%/*         Copyright: Microsoft Corporation                    */


%/*          1995-2000 Redmond, Washington USA                  */


%/*                    http://www.microsoft.com                */


%/*                                                             */


%/*   Use of this software is governed by a License Agreement   */


%/*    ** See the file License for the Conditions of Use  **    */


%/*    **     This banner notice must not be removed      **    */


%/*                                                             */


%/* ----------------------------------------------------------- */


%


% HTKBook - Phil Woodland and Steve Young    24/11/97


%





\newpage


\mysect{HParse}{HParse}





\mysubsect{Function}{HParse-Function}





\index{hparse@\htool{HParse}|(}


The \htool{HParse} program generates word level lattice files (for use


with e.g. \htool{HVite}) from a text file syntax description containing a 


set of rewrite rules based on extended Backus-Naur Form (EBNF). 


The EBNF rules are used to generate an internal


representation of the corresponding finite-state network where \htool{HParse}


network nodes represent the words in the network, and are connected via


sets of links. This \htool{HParse} network is then converted to \HTK\ V2 word


level lattice. The program provides one convenient way of defining such


word level lattices. 





\htool{HParse} also provides a {\em compatibility mode} for use


with \htool{HParse} syntax descriptions used in \HTK\ V1.5 where


the same format was used to define both the word level syntax 


and the dictionary.


In compatibility mode \htool{HParse} will output the word level


portion of such a syntax as an \HTK\ V2 lattice file (via \htool{HNet})


and the pronuciation information as an \HTK\ V2 dictionary file (via 


\htool{HDict}).





The lattice produced by \htool{HParse} will often contain a number


of \texttt{!NULL} nodes in order to reduce the number of arcs in the


lattice. The use of such \texttt{!NULL} nodes can both


reduce size and  increase efficiency when used by recognition programs 


such as \htool{HVite}.





\mysubsect{Network Definition}{HParse-Network Definition}





The syntax rules for the textual definition of the network are 


as follows.  Each node in the network has a \texttt{nodename}.


This node name will normally correspond to a word in the final syntax


network. Additionally, for use in compatibility mode,


each node can also have an external name.


{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= \kill


\>        name \> = \> char\{char\} \\


\>        nodename \> = \> name [ "\%" ( "\%" $|$ name ) ]


\end{tabbing}}





\noindent


Here \texttt{char} represents any character except one of the meta chars 


\texttt{\{ \} [ ] $<$ $>$$|$ = \$ ( ) ; $\backslash$ / *}.   The latter may, 


however, be escaped using a backslash.  The first name in a \texttt{nodename}


represents the name of the node (``internal name''), and the second optional name is


the ``external'' name.  This is used only in compatibility mode, and is, by default


the same as the internal name.





Network definitions may also contain variables


{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= \kill


\>      variable \> = \> \$name


\end{tabbing}}


\noindent


Variables are identified by a leading \$ character.  They stand for


sub-networks and must be defined before they appear in the RHS of a rule


using the form


{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= ++++++++++++ \=  \kill


\>      subnet \> = \> variable ``='' expr ``;''


\end{tabbing}}


\noindent


An \texttt{expr} consists of a set of alternative sequences representing


parallel branches of the network. 


{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= ++++++++++++ \=  \kill


\>      expr \>  = \> sequence \{``$|$'' sequence\} \\


\>      sequence \> = \> factor\{factor\}


\end{tabbing}}


\noindent


Each sequence is composed of a sequence of factors where a factor


is either a node name, a variable representing some sub-network or


an expression contained within various sorts of brackets.





{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= ++++++++++++ \=  \kill


\>   factor \> = \> ``('' expr ``)'' \> $|$ \\


\>\>\>            ``\{'' expr ``\}'' \> $|$ \\


\>\>\>            ``$<$'' expr ``$>$'' \> $|$ \\


\>\>\>         ``['' expr ``]'' \>  $|$ \\


\>\>\>             ``$<<$'' expr ``$>>$'' \> $|$ \\


\>\>\>               nodename \> $|$ \\


\>\>\>               variable 


\end{tabbing}}





Ordinary parentheses are used to bracket sub-expressions, curly braces \{ \} denote zero or more repetitions and angle brackets $<>$ denote one or more repetitions.  Square brackets [$\:$] are used to enclose optional items.  The double angle brackets are a special feature included for building context dependent loops and are explained further below.


Finally, the complete network is defined by a list of sub-network


definitions followed by a single expression within parentheses.


{\sf


\begin{tabbing}


++++++ \= ++++++++ \= ++ \= ++++++++++++ \=  \kill


\>    network \> = \> \{subnet\} ``('' expr ``)''


\end{tabbing}}


\noindent


Note that C style comments may be placed anywhere in the text of


the network definition.





As an example, the following network defines a syntax for some


simple edit commands


\begin{verbatim}


   $dir   = up | down | left | right;


   $mvcmd = move $dir | top | bottom;      


   $item  = char | word | line | page;


   $dlcmd = delete [$item];   /* default is char */


   $incmd = insert;


   $encmd = end [insert];


   $cmd = $mvcmd|$dlcmd|$incmd|$encmd;


   ({sil} < $cmd {sil} > quit)


\end{verbatim}





Double angle brackets are used to


construct contextually consistent context-dependent loops such


as a word-pair grammar.\footnote{The expression between 


double angle brackets must be a simple list of alternative node names or


a variable which has such a list as its value}


This function can also be used to generate consistent triphone loops 


for phone recognition\footnote{In \HTK\ V2 it is preferable for


these context-loop expansions to be done automatically via \htool{HNet},


to avoid requiring a dictionary entry for every context-dependent


model}.


The entry and exit conditions to a


context-dependent loop can be controlled by the invisible


pseudo-words TLOOP\_BEGIN and TLOOP\_END.  The right context of TLOOP\_BEGIN


defines the legal loop start nodes, and the left context of TLOOP\_END


defines the legal loop finishers. If TLOOP\_BEGIN/TLOOP\_END are not


present then all models are connected to the entry/exit of the loop.





A word-pair grammar simply defines the legal


set of words that can follow each word in the vocabulary.


To generate a network to represent such a grammar a


right context-dependent loop could be used.


The legal sentence set of sentence start and end words are defined as


above using TLOOP\_BEGIN/TLOOP\_END.





For example, the following lists the legal followers for each


⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -