📄 lattice-tool.html
字号:
<! $Id: lattice-tool.1,v 1.57 2006/09/20 21:05:57 stolcke Exp $><HTML><HEADER><TITLE>lattice-tool</TITLE><BODY><H1>lattice-tool</H1><H2> NAME </H2>lattice-tool - manipulate word lattices<H2> SYNOPSIS </H2><B> lattice-tool </B>[<B>-help</B>]<B></B>option...<H2> DESCRIPTION </H2><B> lattice-tool </B>performs operations on word lattices in <A HREF="pfsg-format.html">pfsg-format(5)</A>or in HTK Standard Lattice format (SLF).Operations include size reduction, pruning, null-node removal,weight assignment fromlanguage models, lattice word error computation, and decoding of the best hypotheses.<P>Each input lattice is processed in turn, and a series of optionaloperations is performed in a fixed sequence (regardless of the orderin which corresponding options are specified).The sequence of operations is as follows:<DL><DT>1.<DD>Read input lattice.<DT>2.<DD>Score pronunciations (if dictionary was supplied).<DT>3.<DD>Split multiword word nodes.<DT>4.<DD>Posterior- and density-based pruning (before reduction).<DT>5.<DD>Write word posterior lattice.<DT>6.<DD>Perform word-posterior based decoding.<DT>7.<DD>Write word mesh (confusion network).<DT>8.<DD>Compute word and transition posteriors (forward-backward algorithm),and N-gram counts if specified.<DT>9.<DD>Compute lattice density.<DT>10.<DD>Check lattice connectivity.<DT>11.<DD>Compute node entropy.<DT>12.<DD>Compute lattice word error.<DT>13.<DD>Output reference word posteriors.<DT>14.<DD>Remove null nodes.<DT>15.<DD>Lattice reduction.<DT>16.<DD>Posterior- and density-based pruning (after reduction).<DT>17.<DD>Remove pause nodes.<DT>18.<DD>Lattice reduction (post-pause removal).<DT>19.<DD>Language model replacement or expansion.<DT>20.<DD>Pause recovery or insertion.<DT>21.<DD>Lattice reduction (post-LM expansion).<DT>22.<DD>Multiword splitting (post-LM expansion).<DT>23.<DD>Merging of same-word nodes.<DT>24.<DD>Lattice algebra operations (or, concatenation).<DT>25.<DD>Viterbi-decode best hypothesisand/or generate N-best lists.<DT>26.<DD>Lattice-LM perplexity computation.<DT>27.<DD>Writing output lattice.</DD></DL><P>The following options control which of these steps actually apply.<H2> OPTIONS </H2>Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicatestdin/stdout.<DL><DT><B> -help </B><DD>Print option summary.<DT><B> -version </B><DD>Print version information.<DT><B>-debug</B><I> level</I><B></B><DD>Set the debugging output level (0 means no debugging output).Debugging messages are sent to stderr.<DT><B>-in-lattice</B><I> file</I><B></B><DD>Read input lattice from<I>file</I>.<I></I><DT><B>-in-lattice2</B><I> file</I><B></B><DD>Read additional input lattice (for binary lattice operations) from<I>file</I>.<I></I><DT><B>-in-lattice-list</B><I> file</I><B></B><DD>Read list of input lattices from<I>file</I>.<I></I>Lattice operations are applied to each filename listed in <I>file</I>.<I></I><DT><B>-out-lattice</B><I> file</I><B></B><DD>Write result lattice to <I>file</I>.<I></I><DT><B>-out-lattice-dir</B><I> dir</I><B></B><DD>Write result lattices from processing of <B> -in-lattice-list </B>to directory<I>dir</I>.<I></I><DT><B> -read-mesh </B><DD>Assume input lattices are in word mesh (confusion network) format, as describedin<A HREF="wlat-format.html">wlat-format(5)</A>.<DT><B> -write-internal </B><DD>Write output lattices with internal node numbering instead of compact,consecutive numbering.<DT><B> -overwrite </B><DD>Overwrite existing output lattice files.<DT><B>-vocab</B><I> file</I><B></B><DD>Initialize the vocabulary to words listed in<I>file</I>.<I></I>This is useful in conjunction with <DT><B> -limit-vocab </B><DD>Discard LM parameters on reading that do not pertain to the words specified in the vocabulary.The default is that words used in the LM are automatically added to the vocabulary.This option can be used to reduce the memory requirements for large LMs;to this end,<B> -vocab </B>typically specifies the set of words used in the lattices to be processed (which has to be generated beforehand, see <A HREF="pfsg-scripts.html">pfsg-scripts(1)</A>).<DT><B>-vocab-aliases</B><I> file</I><B></B><DD>Reads vocabulary alias definitions from<I>file</I>,<I></I>consisting of lines of the form<BR> <I>alias</I> <I>word</I><BR>This causes all tokens<I> alias </I>to be mapped to<I>word</I>.<I></I><DT><B> -unk </B><DD>Map lattice words not contained in the known vocabulary with the unknown word tag.This is useful if the rescoring LM contains a probability for the unknownword (i.e., is an open-vocabulary LM).The known vocabulary is given by what is specified by the<B> -vocab </B>option, as well as all words in the LM used for rescoring.<DT><B>-map-unk</B><I> word</I><B></B><DD>Map out-of-vocabulary words to <I>word</I>,<I></I>rather than the default<B> <unk> </B>tag.<DT><B> -tolower </B><DD>Map all vocabulary to lowercase.<DT><B>-nonevents</B><I> file</I><B></B><DD>Read a list of words from<I> file </I>that are used only as context elements, and are not predicted by the LM,similar to ``<s>''.If<B> -keep-pause </B>is also specified then pauses are not treated as nonevents by default.<DT><B>-max-time</B><I> T</I><B></B><DD>Limit processing time per lattice to<I> T </I>seconds.</DD></DL><P>Options controlling lattice operations:<DL><DT><B>-write-posteriors</B><I> file</I><B></B><DD>Compute the posteriors of lattice nodes and transitions (using theforward-backward algorithm) and write out a word posterior latticein<A HREF="wlat-format.html">wlat-format(5)</A>.This and other options based on posterior probabilities make most senseif the input lattice contains combined acoustic-language model weights.<DT><B>-write-posteriors-dir</B><I> dir</I><B></B><DD>Similar to the above, but posterior lattices are written toseparate files in directory <I>dir</I>,<I></I>named after the utterance IDs.<DT><B>-write-mesh</B><I> file</I><B></B><DD>Construct a word confusion network ("sausage") from the lattice and write it to <I>file</I>.<I></I>If reference words are available for the utterance (specified by<B> -ref-file </B>or<B>-ref-list</B>)<B></B>their alignment will be recorded in the sausage.<DT><B>-write-mesh-dir</B><I> dir</I><B></B><DD>Similar, but write sausages to files in<I> dir </I>named after the utterance IDs.<DT><B>-init-mesh</B><I> file</I><B></B><DD>Initialize the word confusion network by reading an existing sausage from<I>file</I>.<I></I>This effectively aligns the lattice being processed to the existingsausage.<DT><B>-acoustic-mesh</B><I></I><B></B><DD>Preserve word-level acoustic information (times, scores, and pronunciations) in sausages, encoded as described in<A HREF="wlat-format.html">wlat-format(5)</A>.<DT><B>-posterior-prune</B><I> P</I><B></B><DD>Prune lattice nodes with posteriors less than<I> P </I>times the highest posterior path.<DT><B>-density-prune</B><I> D</I><B></B><DD>Prune lattices such that the lattice density (non-null words per second)does not exceed <I>D</I>.<I></I><DT><B>-nodes-prune</B><I> N</I><B></B><DD>Prune lattices such that the total number of non-null, non-pause nodesdoes not exceed<I>N</I>.<I></I><DT><B> -fast-prune </B><DD>Choose a faster pruning algorithm that does not recompute posteriorsafter each iteration.<DT><B>-write-ngrams</B><I> file</I><B></B><DD>Compute posterior expected N-gram counts in lattices and output themto<I>file</I>.<I></I>The maximal N-gram length is given by the<B> -order </B>option (see below).The counts from all lattices processed are accumulated and output at the end.<DT><B>-write-ngram-index</B><I> file</I><B></B><DD>Output an index file of all N-gram occurences in the lattices processed,including their start times, durations, and posterior probabilities.The maximal N-gram length is given by the<B> -order </B>option (see below).<DT><B>-min-count</B><I> C</I><B></B><DD>Prune N-grams with count less than <I> C </I>from output with <B> -write-ngrams </B>and<B>-write-ngram-index</B>.<B></B>In the former case, the threshold applies to the aggregate occurrence counts;in the latter case, the threshold applies to the posterior probability ofan individual occurence.<DT><B>-max-ngram-pause</B><I> T</I><B></B><DD>Index only N-grams that contain internal pauses (between words) not exceeding<I> T </I>seconds (assuming time stamps are recorded in the input lattice).<DT><B>-posterior-scale</B><I> S</I><B></B><DD>Scale the transition weights by dividing by<I> S </I>for the purpose of posterior probability computation.If the input weights represent combined acoustic-language model scoresthen this should be approximately the language model weight of the recognizer in order to avoid overly peaked posteriors (the default value is 8).<DT><B>-write-vocab</B><I> file</I><B></B><DD>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -