⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 lbuild.tex

📁 该压缩包为最新版htk的源代码,htk是现在比较流行的语音处理软件,请有兴趣的朋友下载使用
💻 TEX
字号:
%% HLMBook - Steve Young    13/01/97%% Updated - Gareth Moore   15/01/02%\newpage\mysect{LBuild}{LBuild}\mysubsect{Function}{LBuild-Function}\index{lbuild@\htool{LBuild}|(}\index{n-gram language model}This program will read one or more input gram files andgenerate/update a back-off $n$-gram language model as described insection~\ref{s:mkngoview}. The \texttt{-n} option specifies the order ofthe final model. Thus, to generate a trigram language model, the usermay simply invoke the tool with \texttt{-n 3} which will cause it tocompute the FoF table and then generate the unigram, bigram andtrigram stages of the model. Note that intermediate model/FoF fileswill not be generated.As for all tools which process gram files, the input gram files musteach be sorted but they need not be sequenced. The counts in eachinput file can be modified by applying a multiplier factor. Any $n$-gramcontaining an id which is not in the word map is ignored, thus, thesupplied word map will typically contain just those word and class idsrequired for the language model under construction (see\htool{LSubset}).\htool{LBuild} supports Turing-Good and absolute discounting as described in section~\ref{s:HLMdiscounts}.\mysubsect{Use}{LBuild-Use}\htool{LBuild} is invoked by typing the command line\begin{verbatim}   LBuild [options] wordmap outfile [mult] gramfile .. [mult] gramfile ..\end{verbatim}The given word map file is loaded and then the set of named gram filesare merged to form a single sorted stream of $n$-grams. Any $n$-gramscontaining ids not in the word map are ignored.  The list of inputgram files can be interspersed with multipliers. These arefloating-point format numbers which must begin with a plus or minuscharacter (e.g. \texttt{+1.0}, \texttt{-0.5}, etc.). The effect of amultiplier \texttt{x} is to scale the $n$-gram counts in the followinggram files by the factor \texttt{x}. A multiplier stays in effectuntil it is redefined. The output to \texttt{outfile} is a back-off$n$-gram language model file in the specified file format.See the \htool{LPCalc} options in section~\ref{s:coninlib} fordetails on changing the discounting type from the default ofTuring-Good, as well as other configuration file options.The allowable options to \htool{LBuild} are as follows\begin{optlist}  \ttitem{-c n c} Set cutoff for \texttt{n}-gram to \texttt{c}.  \ttitem{-d n c} Set weighted discount pruning for \texttt{n}-gram                   to \texttt{c} for Seymore-Rosenfeld pruning.  \ttitem{-f t} Set output model format to \texttt{t} (TEXT, BIN, ULTRA).  \ttitem{-k n} Set discounting range for Good-Turing discounting to                $[1..n]$.  \ttitem{-l f} Build model by updating existing LM in \texttt{f}.  \ttitem{-n n} Set final model order to \texttt{n}.  \ttitem{-t ff} Load the FoF file \texttt{f}. This is only used for	         Turing-Good discounting, and is not essential.  \ttitem{-u c} Set the minimum occurrence count for unigrams to	        \texttt{c}.  (Default is 1)  \ttitem{-x} Produce a counts model.\end{optlist}\stdopts{LBuild}\mysubsect{Tracing}{LBuild-Tracing}\htool{LBuild} supports the following trace options where eachtrace flag is given using an octal base\begin{optlist}\ttitem{00001}  basic progress reporting. \end{optlist}Trace flags are set using the \texttt{-T} option or the  \texttt{TRACE} configuration variable.\index{lbuild@\htool{LBuild}|)}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -