📄 openviron.tex
字号:
Although these string conventions are unnecessary in \HLM, to maintaincompatibility with \HTK the same conventions are used. However, anumber of options are provided to allow a mix of escaped and unescapedtext files to be handled. Word maps allow the type of escaping (HTKor none) to be defined in their headers. When a degenerate form ofword map is used (i.e. a map with no header), the \htool{LWMap}configuration variable \texttt{INWMAPRAW} may be set to true todisable \HTK\ escaping. By default, \HLM\ tools output word lists andmaps in HTK escaped form. However, this can be overridden by settingthe configuration variable \texttt{OUTWMAPRAW} to true. Similarconventions apply to class maps. A degenerate class map can be readin raw mode by setting the \htool{LClass} configuration variable\texttt{INCMAPRAW} to true, and a class map can be written in raw formby setting \texttt{OUTCMAPRAW} to true.Input/output of N-gram language model files are handled by the \HLM\module \texttt{LModel}. Hence, by default input/output of LMs storedin the ARPA-MIT text format will assume \HTK\ escaping conventions.This can be disabled for both input and output by setting \texttt{RAWMITFORMAT} to true.% $\mysect{Memory Management}{memman}Memory management\index{memory management} is a very low level function and ismostly invisible to \HTK\ users. However, some applications require very largeamounts of memory. For example, building the models for a large vocabularycontinuous speech dictation system might require 150MB or more. Clearly, whenmemory demands become this large, a proper understanding of the impact ofsystem design decisions on memory usage is important. The first step in thisis to have a basic understanding of memory allocation in \HTK.Many \HTK\ tools dynamically construct large and complex data structures inmemory. To keep strict control over this and to reduce memory allocationoverheads to an absolute minimum, \HTK\ performs its own memorymanagement. Thus, every time that a module or tool wishes to allocate somememory, it does so by calling routines in\htool{HMem}\index{hmem@\htool{HMem}}. At a slightly higher level, math objectssuch as vectors and matrices are allocated by \htool{HMath} but using theprimitives provided by \htool{HMem}.To make memory allocation\index{memory!allocators} and de-allocation very fast,tools create specific memory allocators for specific objects or groups ofobjects. These memory allocators are divided into a sequence of blocks, andthey are organised as either Stacks\index{stacks}, M-heaps\index{M-heaps} orC-heaps\index{C-heaps}. A Stack constrains the pattern of allocation andde-allocation requests to be made in a last-allocated first-deallocated orderbut allows objects of any size to be allocated. An M-heap allows an arbitrarypattern of allocation and de-allocation requests to be made but all allocatedobjects must be the same size. Both of these memory allocation disciplines aremore restricted than the general mechanism supplied by the operating system,and as a result, such memory operations are faster and incur no storageoverhead due to the need to maintain hidden housekeeping information in eachallocated object. Finally, a C-heap uses the underlying operating system andallows arbitrary allocation patterns, and as a result incurs the associatedtime and space overheads. The use of C-heaps is avoided wherever possible.Most tools provide one or more trace options which show howmuch memory has been allocated. The following shows the form ofthe output\index{memory!statistics}\begin{verbatim} ---------------------- Heap Statistics ------------------------ nblk=1, siz= 100000*1, used= 32056, alloc= 100000 : Global Stack[S] nblk=1, siz= 200*28, used= 100, alloc= 5600 : cellHeap[M] nblk=1, siz= 10000*1, used= 3450, alloc= 10000 : mlfHeap[S] nblk=2, siz= 7504*1, used= 9216, alloc= 10346 : nameHeap[S] ---------------------------------------------------------------\end{verbatim}Each line describes the status of each memory allocator and gives the number ofblocks allocated, the current block size (number of elements in block $\times$the number of bytes in each element)\footnote{ Block sizes typically grow asmore blocks are allocated}, the total number of bytes in use by the tool andthe total number of bytes currently allocated to that allocator. The end ofeach line gives the name of the allocator and its type: Stack[S], M-heap[M] orC-heap[M]. The element size for Stacks will always be 1 but will be variablein M-heaps.\index{memory!element sizes} The documentation for the memoryintensive \HTK\ tools indicates what each of the main memory allocators areused for and this information allows the effects of various system designchoices to be monitored.\mysect{Input/Output via Pipes and Networks}{iopipes}Most types of file in \HTK\ can be input or output via a pipe\index{pipes}instead of directly from or to disk. The mechanism for doing this is to assignthe required input or output filter\index{output filter} command to aconfiguration parameter or to an environment variable, either can be used.Within this command, any occurrence of the dollar symbol\verb+$+ will be replaced by the name of the required file. Theoutput of the command will then be input to or output from the \HTK\ tool via apipe.\index{filters}For example, the following command willnormally list the contents of the speech waveform file \texttt{spfile}\begin{verbatim} HList spfile\end{verbatim}However, if the value of the environment variable \texttt{HWAVEFILTER}is set as follows\begin{verbatim} setenv HWAVEFILTER 'gunzip -c $'\end{verbatim}then the effect is to invoke the decompression filter\index{decompressionfilter} \texttt{gunzip} with its input connected to the file \texttt{spfile}and its output connected to \htool{HList} via a pipe. Each different type offile has a unique associated variable so that multiple input and/or filters canbe used. The full list of these is given in the summary section at the end ofthis chapter.\HTK\ is often used to process large amounts of data and typically thisdata is distributed across a network. In many systems, an attempt to open afile can fail because of temporary network \textit{glitches}. In the majorityof cases, a second or third attempt to open the file a few seconds later willsucceed and all will be well. To allow this to be done automatically, \HTK\ tools can be configured to retry opening a file several times before giving up.This is done simply by setting the configuration parameter\texttt{MAXTRYOPEN}\index{maxtryopen@\texttt{MAXTRYOPEN}} to the requirednumber of retries\footnote{ This does not work if input filters are used. }.\index{files!network problems}\index{files!opening}\mysect{Byte-swapping of HTK data files}{byteswap}\index{natreadorder@\texttt{NATURALREADORDER}}\index{natwriteorder@\texttt{NATURALWRITEORDER}}\index{byte swapping}Virtually all \HTK\ tools can read and write data to and from binary files. Theuse of binary format as opposed to text can speed up the performance of thetools and at the same time reduce the file size when manipulating largequantities of data. Typical binary files used by the \HTK\ tools are speechwaveform/parameter files, binary master model files (MMF), binary accumulatorfiles used in HMM parameter estimation and binary lattice files. However, theuse of binary data format often introduces incompatibilities between differentmachine architectures due to the different byte ordering conventions used torepresent numerical quantities. In such cases, byte swapping of the data isrequired. To avoid incompatibilities across different machine architectures,all \HTK\ binary data files are written out using big-endian (\texttt{NONVAX})representation of numerical values. Similarly, during loading \HTK\ binaryformat files are assumed to be in \texttt{NONVAX} byte order. The defaultbehavior can be altered using the configuration parameters\texttt{NATURALREADORDER} and\texttt{NATURALWRITEORDER}. Setting \texttt{NATURALREADORDER} to true willinstruct the \HTK\ tools to interpret the binary input data in the machine'snatural byte order (byte swapping will never take place). Similarly, setting\texttt{NATURALWRITEORDER} to true will instruct the tools to write out datausing the machine's natural byte order. The default value of these twoconfiguration variables is false which is the appropriate setting when using\HTK\ in a multiple machine architecture environment. In an environmentcomprising entirely of machines with \texttt{VAX} byte order both configurationparameters can be set true which will disable the byte swapping procedureduring reading and writing of data.\mysect{Summary}{openvsum}This section summarises the globally-used environmentvariables\index{environment variables} and configurationparameters\index{configuration parameters!operating environment}. Italso provides a list of all the standard command line options usedwith \HTK.Table~\href{t:openvcparms} lists all of the configuration parametersalong with a brief description. A missing module name means that itis recognised by more than one module. Table~\href{t:openvars} listsall of the environment parameters used by these modules. Finally,table~\href{t:stdopts} lists all of the standard options.\begin{center}\begin{tabular}{|p{1.4cm}|p{3.0cm}|p{6.4cm}|} \hlineModule & Name & Description \\ \hline\htool{HShell} & \texttt{ABORTONERR} & Core dump on error (for debugging) \\\htool{HShell} & \texttt{HWAVEFILTER} & Filter for waveform file input\\\htool{HShell} & \texttt{HPARMFILTER} & Filter for parameter file input\\\htool{HShell} & \texttt{HLANGMODFILTER} & Filter for language model file input\\\htool{HShell} & \texttt{HMMLISTFILTER} & Filter for HMM list file input\\\htool{HShell} & \texttt{HMMDEFFILTER} & Filter for HMM definition file input\\\htool{HShell} & \texttt{HLABELFILTER} & Filter for Label file input\\\htool{HShell} & \texttt{HNETFILTER} & Filter for Network file input\\\htool{HShell} & \texttt{HDICTFILTER} & Filter for Dictionary file input \\ \htool{HShell} & \texttt{LGRAMFILTER} & Filter for gram file input\\\htool{HShell} & \texttt{LWMAPFILTER} & Filter for word map file input\\\htool{HShell} & \texttt{LCMAPFILTER} & Filter for class map file input\\\htool{HShell} & \texttt{LMTEXTFILTER} & Filter for text file input\\\htool{HShell} & \texttt{HWAVEOFILTER} & Filter for waveform file output\\\htool{HShell} & \texttt{HPARMOFILTER} & Filter for parameter file output\\\htool{HShell} & \texttt{HLANGMODOFILTER} & Filter for language model file output\\\htool{HShell} & \texttt{HMMLISTOFILTER} & Filter for HMM list file output\\\htool{HShell} & \texttt{HMMDEFOFILTER} & Filter for HMM definition file output\\\htool{HShell} & \texttt{HLABELOFILTER} & Filter for Label file output\\\htool{HShell} & \texttt{HNETOFILTER} & Filter for Network file output\\\htool{HShell} & \texttt{HDICTOFILTER} & Filter for Dictionary file output \\ \htool{HShell} & \texttt{LGRAMOFILTER} & Filter for gram file output\\\htool{HShell} & \texttt{LWMAPOFILTER} & Filter for word map file output\\\htool{HShell} & \texttt{LCMAPOFILTER} & Filter for class map file output\\\htool{HShell} & \texttt{MAXTRYOPEN} & Number of file open retries \\\htool{HShell} & \texttt{NONUMESCAPES} & Prevent string output using \verb+\012+ format \\\htool{HShell} & \texttt{NATURALREADORDER} & Enable natural read order for HTK binary files \\\htool{HShell} & \texttt{NATURALWRITEORDER} & Enable natural write order for HTK binary files \\\htool{HMem} & \texttt{PROTECTSTAKS} & Warn if stack is cut-back (debugging) \\ & \texttt{TRACE} & Trace control (default=0) \\ & \texttt{STARTWORD} & Set sentence start symbol ({\tt <s>}) \\ & \texttt{ENDWORD} & Set sentence end symbol ({\tt </s>}) \\ & \texttt{UNKNOWNNAME} & Set OOV class symbol ({\tt !!UNK}) \\ & \texttt{RAWMITFORMAT} & Disable \HTK\ escaping for LM tools\\\htool{LWMap} & \texttt{INWMAPRAW} & Disable \HTK\ escaping for input word lists and maps \\\htool{LWMap} & \texttt{OUTWMAPRAW} & Disable \HTK\ escaping for output word lists and maps \\\htool{LCMap} & \texttt{INCMAPRAW} & Disable \HTK\ escaping for input class lists and maps \\\htool{LCMap} & \texttt{OUTCMAPRAW} & Disable \HTK\ escaping for output class lists and maps \\\hline\end{tabular}\tabcap{openvcparms}{Configuration Parameters used in Operating Environment}\end{center}\vspace*{1cm}\begin{center}\begin{tabular}{|p{2.6cm}|p{8.2cm}|} \hlineEnv Variable & Meaning \\ \hline\texttt{HCONFIG} & Name of default configuration file\\\texttt{HxxxFILTER} & Input/Output filters as above \\ \hline\end{tabular}\tabcap{openvars}{Environment Variables used in Operating Environment}\end{center}\vspace*{1cm}\begin{center}\begin{tabular}{|p{2.6cm}|p{8.2cm}|} \hlineStandard Option & Meaning \\ \hline\texttt{-A} & Print command line arguments\\\texttt{-B} & Store output HMM macro files in binary \\\texttt{-C cf} & Configuration file is cf \\\texttt{-D} & Display configuration variables\\\texttt{-F fmt} & Set source data file format to fmt \\\texttt{-G fmt} & Set source label file format to fmt \\\texttt{-H mmf} & Load HMM macro file mmf \\\texttt{-I mlf} & Load master label file mlf \\\texttt{-J tmf} & Load transform model file tmf \\\texttt{-K tmf} & Save transform model file tmf \\\texttt{-L dir} & Look for label files in directory dir \\\texttt{-M dir} & Store output HMM macro files in directory dir \\\texttt{-O fmt} & Set output data file format to fmt \\\texttt{-P fmt} & Set output label file format to fmt \\\texttt{-Q} & Print command summary info\\\texttt{-S scp} & Use command line script file scp \\\texttt{-T N} & Set trace level to N \\\texttt{-V} & Print version information\\\texttt{-X ext} & Set label file extension to ext \\ \hline\end{tabular}\tabcap{stdopts}{Summary of Standard Options}\end{center}\index{standard options!summary}%%% Local Variables: %%% mode: latex%%% TeX-master: "htkbook"%%% End:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -