⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pfsg-scripts.1

📁 这是一款很好用的工具包
💻 1
字号:
.\" $Id: pfsg-scripts.1,v 1.22 2006/10/05 19:43:07 stolcke Exp $.TH pfsg-scripts 1 "$Date: 2006/10/05 19:43:07 $" "SRILM Tools".SH NAMEpfsg-scripts, add-classes-to-pfsg, add-pauses-to-pfsg, classes-to-fsm, fsm-to-pfsg, htklat-vocab, make-nbest-pfsg, make-ngram-pfsg, pfsg-from-ngram, pfsg-to-dot, pfsg-to-fsm, pfsg-vocab, wlat-stats, wlat-to-dot, wlat-to-pfsg \- create and manipulate finite-state networks.SH SYNOPSIS.B make-ngram-pfsg[\c.BI maxorder= N\c][\c.BR check_bows= 0|1\c].RB [ no_empty_bo=1 ].RB [ version=1 ].RB [ top_level_name=\c.IR name ].RB [ null=\c.IR string ].RI [ lm-file ].BI > pfsg-file.br.B add-pauses-to-pfsg.RB [ vocab=\c.IR file ].RB [ pauselast=1 ].RB [ wordwrap=0 ].RB [ pause=\c.IR pauseword ].RB [ version=1 ].RB [ top_level_name=\c.IR name ].RB [ null=\c.IR string ].RI [ pfsg-file ].br.B add-classes-to-pfsg.BI classes= classes.RB [ null=\c.IR string ].RI [ pfsg-file ].br.B pfsg-from-ngram.RI [ lm-file ].BI > pfsg-file.br.B make-nbest-pfsg [\c.BR notree= 0|1.BI scale= S.BI amw= A.BI lmw= L.BI wtw= W].RI [ nbest-file ].br.B pfsg-vocab.RI [ pfsg-file ...].br.B htklat-vocab.RB [ quotes=1 ].RI [ htk-lattice-file ...].br.B pfsg-to-dot[\c.BR show_probs= 0|1.BR show_logs= 0|1.BR show_nums= 0|1\c].RI [ pfsg-file ].br.B pfsg-to-fsm[\c.BI symbolfile= symbols.BR symbolic= 0|1.BI scale= S.BI final_output= E\c].RI [ pfsg-file ].br.B fsm-to-pfsg[\c.BI pfsg_name= name.BR transducer= 0|1.BI scale= S\c].RI [ fsm-file ].br.B classes-to-fsm.BI vocab= vocab[\c.BI isymbolfile= isymbols.BI osymbolfile= osymbols.BR symbolic= 0|1\c].RI [ classes ].br.B wlat-to-pfsg.RI [ wlat-file ].br.B wlat-to-dot[\c.BR show_probs= 0|1.BR show_nums= 0|1\c].RI [ wlat-file ].br.B wlat-stats.RI [ wlat-file ].SH DESCRIPTIONThese scripts create and manipulate various forms of finite-state networks.Note that they take options with the .BR gawk (1)syntax.IB option = valueinstead of the more common.BI - option.IR value ..PPAlso, since these tools are implemented as scripts they don't automaticallyinput or output compressed model files correctly, unlike the mainSRILM tools.However, since most scripts work with data from standard input orto standard output (by leaving out the file argument, or specifying it as ``-'') it is easy to combine them with .BR gunzip (1)or.BR gzip (1)on the command line..PP.B make-ngram-pfsgencodes a backoff N-gram model in.BR ngram-format (5)as a finite-state network in.BR pfsg-format (5)..BI maxorder= Nlimits the N-gram length used in PFSG construction to .IR N ;the default is to use all N-grams occurring in the input model..B check_bows=1enables a check for conditional probabilities that are smaller than thecorresponding backoff probabilities.Such transitions should first be removed from the model with .BR "ngram \-prune-lowprobs" ..B no_empty_bo=1Prevents empty paths through the PFSG resulting from transitions through the unigram backoff node..PP.B add-pauses-to-pfsgreplaces the word nodes in an input PFSG with sub-PFSGs that allow an optional pause before each word.It also inserts an optional pause following the last word in the sentence.A typical usage is .br	make-ngram-pfsg \fIngram\fP | \\.br	add-pauses-to-pfsg >\fIfinal-pfsg\fP.brThe result is a PFSG suitable for use in a speech recognizer.The option.B pauselast=1switches the order of words and pause nodes in the sub-PFSGs;.B wordwrap=0disables the insertion of sub-PFSGs altogether..PPThe options.BI pause= pauseword and .BI top_level_name= nameallow changing the default names of the pause word and the top-levelgrammar, respectively..B version=1inserts a version line at the top of the output as required by the Nuance recognition system (see NUANCE COMPATIBILTY below)..B add-pauses-to-pfsguses a heuristic to distinguish word nodes in the input PFSG fromother nodes (NULL or sub-PFSGs).The option.BI vocab= filelets one specify a vocabulary of word names to override these heuristics..PP.B add-classes-to-pfsgextends an input PFSG with expansions for word classes, defined in.IR classes ..IR pfsg-fileshould contain a PFSG generated from the N-gram portion of a class N-grammodel.A typical usage is thus.br	make-ngram-pfsg \fIclass-ngram\fP | \\.br	add-classes-to-pfsg classes=\fIclasses\fP | \\.br	add-pauses-to-pfsg >\fIfinal-pfsg\fP.br.PP.B pfsg-from-ngramis a wrapper script that combines removal of low-probability N-grams,conversion to PFSG, and adding of optional pauses to create a PFSGfor recognition..PP.B make-nbest-pfsgconverts an N-best list in .BR nbest-format (5)into a PFSG which, when used in recognition,allows exactly the hypotheses contained in the N-best list..B notree=1creates separate PFSG nodes for all word instances; the default is toconstruct a prefix-tree structured PFSG..BI scale= Smultiplies the total hypothesis scores by .IR S ;the default is 0, meaning that all hypotheses have identical probabilityin the PFSG.Three options,.BR amw=\fIA\fP ,.BR lmw=\fIL\fP ,and.BR wtw=\fIW\fP ,control the score weighting in N-best lists that containseparate acoustic and language model scores, setting the acoustic model weight to.IR A,the language model weight to.IR L ,and the word transition weight to.IR W ..PP.B pfsg-vocabextracts the vocabulary used in one or more PFSGs..B htklat-vocabdoes the same for lattices in HTK standard lattice format.The.B quotes=1option enables processing of HTK quotes..PP.B pfsg-to-dotrenders a PFSG in.BR dot (1)format for subsequent layout, printing, etc..B show_probs=1includes transition probabilities in the output..B show_logs=1includes log (base 10) transition probabilities in the output..B show_nums=1includes node numbers in the output..PP.B pfsg-to-fsmconverts a finite-state network in .BR pfsg-format (5)into an equivalent network in AT&T.BR fsm (5)format.This involves moving output actions from nodes to transitions.If .BI symbolfile= symbolsis specified, the mapping from FSM output symbols is written to.IR symbols for later use with the.B \-ior .B \-ooptions of .BR fsm (1)tools..B symbolic=1preserves the word strings in the resulting FSA..BI scale= Sscales the transition weights by a factor.IR S ;the default is -1 (to conform to the default FSM semiring)..BI final_output= Eforces the final FSA node to have output label.IR S ;this also forces creation of a unique final FSA node, which isotherwise unnecessary if the final node has a null output..PP.B fsm-to-pfsgconversely transforms .BR fsm (5)format into.BR pfsg-format (5).This involves moving output actions from transitions to nodes, andgenerally requires an increase in the number of nodes.(The conversion is done such that.B pfsg-to-fsmand.B fsm-to-pfsgare exact inverses of each other.)The.I nameparameter sets the name field of the output PFSG..B transducer=1indicates that the input is a transducer and that input:output pairs shouldbe preserved in the PFSG..BI scale= Sscales the transition weights by a factor.IR S ;the default is -1 (to conform to the default FSM semiring)..PP.B classes-to-fsmconverts a.BR classes-format (5)file into a transducer in.BR fsm (5)format, such that composing the transducer withan FSA encoding a class language model results in an FSA for theword language model.The word vocabulary needs to be given in file.IR vocab ..BI isymbolfile= isymbolsand.BI osymbolfile= osymbolsallow saving the input and output symbol tables of the transducer forlater use..B symbolic=1preserves the word strings in the resulting FSA..PPThe following commands show the creation of an FSA encoding the class N-gramgrammar ``test.bo'' with vocabulary ``test.vocab'' and class expansions``test.classes'':.br	classes-to-fsm vocab=test.vocab symbolic=1 \\.br        	isymbolfile=CLASSES.inputs \\.br		osymbolfile=CLASSES.outputs \\.br		test.classes >CLASSES.fsm.br	make-ngram-pfsg test.bo | \\.br	pfsg-to-fsm symbolic=1 >test.fsm.br	fsmcompile -i CLASSES.inputs test.fsm  >test.fsmc.br	fsmcompile -t -i CLASSES.inputs -o CLASSES.outputs \\.br		CLASSES.fsm >CLASSES.fsmc.br	fsmcompose test.fsmc CLASSES.fsmc >result.fsmc.br.PP.B wlat-to-pfsgconverts a word posterior lattice or mesh ("sausage") in .BR wlat-format (5)into .BR pfsg-format (5)..PP.B wlat-to-dotrenders a.BR wlat-format (5)word lattice in .BR dot (1)format for subsequent layout, printing, etc..B show_probs=1includes node posterior probabilities in the output..B show_nums=1includes node indices in the output..PP.B wlat-statscomputes statistics of word posterior lattices, including the number of word hypotheses, the entropy (log base 10) of the sentence hypothesisset represented, and the posterior expected number of words.For word meshes that have been aligned with references, the 1-best and oracle lattice error rates are also computed..SH "NUANCE COMPATIBILITY".PPThe Nuance recognizer (as of version 6.2) understands a variant of the PFSG format; hence the scripts above should be useful in buildingrecognition systems for that recognizer..PPA suitable PFSG can be generated from an N-gram backoff modelin ARPA.BR ngram-format (5)using the following command:.br	ngram -debug 1 -order \fIN\fP -lm \fILM.bo\fP -prune-lowprobs -write-lm - | \\.br	make-ngram-pfsg | \\.br	add-pauses-to-pfsg version=1 pauselast=1 pause=_pau_ top_level_name=.TOP_LEVEL >\fILM.pfsg\fP.brassuming the pause word in the dictionary is ``_pau_''.Certain restrictions on the naming of words (e.g., no hyphens are allowed)have to be respected..PPThe resulting PFSG can then be referenced in a Nuance grammar file, e.g.,.br	.TOP [NGRAM_PFSG].br	NGRAM_PFSG:lm \fILM.pfsg\fP.br.PPIn newer Nuance versions the name for a non-emitting node was changed to.BR NULNOD ,and inter-word optional pauses are automatically added to the grammar.This means that the PFSG should be create using.br	ngram -debug 1 -order \fIN\fP -lm \fILM.bo\fP -prune-lowprobs -write-lm - | \\.br	make-ngram-pfsg version=1 top_level_name=.TOP_LEVEL null=NULNOD >\fILM.pfsg\fP.brThe .B "null=NULNOD" option should also be passed to.BR add-classes-to-pfsg ..PPStarting with version 8, Nuance supports N-gram LMs.However, you can still use SRILM to create LMs, as described above.The syntax for inclusion of a PFSG has changed to.br	NGRAM_PFSG:slm \fILM.pfsg\fP.br.PPCaveat: Compatibility with Nuance is purely due to historical circumstance andnot supported..SH "SEE ALSO"lattice-tool(1), ngram(1), ngram-format(5), pfsg-format(5), wlat-format(5),nbest-format(5), classes-format(5), fsm(5), dot(1)..SH BUGS.B make-ngram-pfsgshould be reimplemented in C++ for speed and some size optimizations thatrequire more global operations on the PFSG..SH AUTHORAndreas Stolcke <stolcke@speech.sri.com>..brCopyright 1995-2005 SRI International

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -