📄 ngram.1

📁 这是一款很好用的工具包
💻 1
📖 第 1 页 / 共 2 页
字号:
上一页 12
.BI \-mix-lambda2 " weight".TP.BI \-mix-lambda3 " weight".TP.BI \-mix-lambda4 " weight".TP.BI \-mix-lambda5 " weight".TP.BI \-mix-lambda6 " weight".TP.BI \-mix-lambda7 " weight".TP.BI \-mix-lambda8 " weight".TP.BI \-mix-lambda9 " weight"These are the weights for the additional mixture components, correspondingto.B \-mix-lm2through.BR \-mix-lm9 .The weight for the.B \-mix-lm model is 1 minus the sum of .B \-lambdaand .B \-mix-lambda2through.BR \-mix-lambda9 ..TP.B \-loglinear-mixImplement a log-linear (rather than linear) mixture LM, using the parameters above..TP.BI \-bayes " length"Interpolate the second and the main model using posterior probabilitiesfor local N-gram-contexts of length.IR length .The .B \-lambda value is used as a prior mixture weight in this case..TP.BI \-bayes-scale " scale"Set the exponential scale factor on the context likelihood in conjunctionwith the.B \-bayesfunction.Default value is 1.0..TP.BI \-cache " length"Interpolate the main LM (or the one resulting from operations above) witha unigram cache language model based on a history of.I lengthwords..TP.BI \-cache-lambda " weight"Set interpolation weight for the cache LM.Default value is 0.05..TP.BI \-dynamicInterpolate the main LM (or the one resulting from operations above) witha dynamically changing LM.LM changes are indicated by the tag ``<LMstate>'' starting a line in theinput to.BR -ppl ,.BR -counts ,or.BR -rescore ,followed by a filename containing the new LM..TP.BI \-dynamic-lambda " weight"Set interpolation weight for the dynamic LM.Default value is 0.05..TP.BI \-adapt-marginals " LM"Use an LM obtained by adapting the unigram marginals to the values specifiedin the.I LMin.BR ngram-format (5),using the method described in Kneser et al. (1997).The LM to be adapted is that constructed according to the other options..TP.BI \-base-marginals " LM"Specify the baseline unigram marginals in a separate file .IR LM ,which must be in.BR ngram-format (5)as well.If not specified, the baseline marginals are taken from the model to beadapted, but this might not be desirable, e.g., when Kneser-Ney smoothingwas used..TP.BI \-adapt-marginals-beta " B"The exponential weight given to the ratio between adapted and baselinemarginals.The default is 0.5..TP.BI \-adapt-marginals-ratiosCompute and output only the log ratio between the adapted and the baselineLM probabilities.These can be useful as a separate knowledge source in N-best rescoring..PPThe following options specify the operations performed on/with the LMconstructed as per the options above..TP.B \-renormRenormalize the main model by recomputing backoff weights for the givenprobabilities..TP.BI \-prune " threshold"Prune N-gram probabilities if their removal causes (training set)perplexity of the model to increase by less than.I thresholdrelative..TP.B \-prune-lowprobsPrune N-gram probabilities that are lower than the correspondingbacked-off estimates.This generates N-gram models that can be correctlyconverted into probabilistic finite-state networks..TP.BI \-minprune " n"Only prune N-grams of length at least.IR n .The default (and minimum allowed value) is 2, i.e., only unigrams are excludedfrom pruning.This option applies to both.B \-pruneand.BR \-prune-lowprobs ..TP.BI \-rescore-ngram " file"Read an N-gram LM from .I fileand recompute its N-gram probabilities using the LM specified by theother options; then renormalize and evaluate the resulting new N-gram LM..TP.BI \-write-lm " file"Write a model back to.IR file .The output will be in the same format as read by.BR \-lm ,except if operations such as .B \-mix-lmor .B \-expand-classes were applied, in which case the output will contain the generatedsingle N-gram backoff model in ARPA.BR ngram-format (5)..TP.BI \-write-bin-lm " file"Write a model to.I fileusing a binary data format.This is only supported by certain model types, specifically, N-gram backoff models.Binary model files should be recognized automatically by the.B \-readfunction..TP.BI \-write-vocab " file"Write the LM's vocabulary to.IR file ..TP.BI \-gen " number"Generate.I numberrandom sentences from the LM..TP.BI \-seed " value"Initialize the random number generator used for sentence generationusing seed.IR value .The default is to use a seed that should be close to unique for eachinvocation of the program..TP.BI \-ppl " textfile"Compute sentence scores (log probabilities) and perplexities fromthe sentences in.IR textfile ,which should contain one sentence per line.The.B \-debugoption controls the level of detail printed, even though output isto stdout (not stderr)..RS.TP 10.B "\-debug 0"Only summary statistics for the entire corpus are printed,as well a partial statistics for each input portion delimited by escaped lines (see.BR \-escape ).These statistics include the number of sentences, words, out-of-vocabularywords and zero-probability tokens in the input,as well as its total log probability and perplexity.Perplexity is given with two different normalizations: counting allinput tokens (``ppl'') and excluding end-of-sentence tags (``ppl1'')..TP.B "\-debug 1"Statistics for individual sentences are printed..TP.B "\-debug 2"Probabilities for each word, plus LM-dependent details about backoffused etc., are printed..TP.B "\-debug 3"Probabilities for all words are summed in each context, andthe sum is printed.If this differs significantly from 1, a warning messageto stderr will be issued..RE.TP.BI \-nbest " file"Read an N-best list in.BR nbest-format (5)and rerank the hypotheses using the specified LM.The reordered N-best list is written to stdout.If the N-best list is given in``NBestList1.0'' format and contains composite acoustic/language model scores, then.B \-decipher-lmand the recognizer language model and word transition weights (see below)need to be specified so the original acoustic scores can be recovered..TP.BI \-nbest-files " filelist"Process multiple N-best lists whose filenames are listed in.IR filelist ..TP.BI \-write-nbest-dir " dir"Deposit rescored N-best lists into directory .IR dir ,using filenames derived from the input ones..TP.B \-decipher-nbestOutput rescored N-best lists in Decipher 1.0 format, rather than SRILM format..TP.B \-no-reorderOutput rescored N-best lists without sorting the hypotheses by theirnew combined scores..TP.B \-split-multiwordsSplit multiwords into their components when reading N-best lists;the rescored N-best lists thus no longer contain multiwords.(Note this is different from the.B \-multiwordsoption, which leaves the input word stream unchanged and splitsmultiwords only for the purpose of LM probability computation.).TP.BI \-max-nbest " n"Limits the number of hypotheses read from an N-best list.Only the first.I nhypotheses are processed..TP.BI \-rescore " file"Similar to.BR \-nbest ,but the input is processed as a stream of N-best hypotheses (without header).The output consists of the rescored hypotheses inSRILM format (the third of the formats described in.BR nbest-format (5))..TP.BI \-decipher-lm " model-file"Designates the N-gram backoff model (typically a bigram) that was used by theDecipher(TM) recognizer in computing composite scores for the hypotheses fed to.B \-rescoreor.BR \-nbest .Used to compute acoustic scores from the composite scores..TP.BI \-decipher-order " N"Specifies the order of the Decipher N-gram model used (default is 2)..TP.B \-decipher-nobackoff Indicates that the Decipher N-gram model does not contain backoff nodes,i.e., all recognizer LM scores are correct up to rounding. .TP.BI \-decipher-lmw " weight"Specifies the language model weight used by the recognizer.Used to compute acoustic scores from the composite scores..TP.BI \-decipher-wtw " weight"Specifies the word transition weight used by the recognizer.Used to compute acoustic scores from the composite scores..TP.BI \-escape " string"Set an ``escape string'' for the.BR \-ppl ,.BR \-counts ,and.B \-rescorecomputations.Input lines starting with.I stringare not processed as sentences and passed unchanged to stdout instead.This allows associated information to be passed to scoring scripts etc..TP.BI \-counts " countsfile"Perform a computation similar to .BR \-ppl ,but based only on the N-gram counts found in .IR countsfile .Probabilities are computed for the last word of each N-gram, using theother words as contexts, and scaling by the associated N-gram count.Summary statistics are output at the end, as well as before eachescaped input line..TP.BI \-count-order " n"Use only counts of order.I nin the.BR \-countscomputation.The default value is 0, meaning use all counts..TP.B \-counts-entropyWeight the log probabilities for .B \-countsprocessing by the join probabilities of the N-grams.This effectively computes the sum over p(w,h) log p(w|h),i.e., the entropy of the model.In debugging mode, both the conditional log probabilities and the corresponding joint probabilities are output..TP.B \-skipoovsInstruct the LM to skip over contexts that contain out-of-vocabularywords, instead of using a backoff strategy in these cases..TP.BI \-noise " noise-tag"Designate.I noise-tagas a vocabulary item that is to be ignored by the LM.(This is typically used to identify a noise marker.)Note that the LM specified by.B \-decipher-lmdoes NOT ignore this.I noise-tagsince the DECIPHER recognizer treats noise as a regular word..TP.BI \-noise-vocab " file"Read several noise tags from.IR file ,instead of, or in addition to, the single noise tag specified by.BR \-noise ..TP.B \-reverseReverse the words in a sentence for LM scoring purposes.(This assumes the LM used is a ``right-to-left'' model.)Note that the LM specified by.B \-decipher-lmis always applied to the original, left-to-right word sequence..SH "SEE ALSO"ngram-count(1), ngram-class(1), lm-scripts(1), ppl-scripts(1),pfsg-scripts(1), nbest-scripts(1),ngram-format(5), nbest-format(5), classes-format(5)..brJ. A. Bilmes and K. Kirchhoff, ``Factored Language Models and GeneralizedParallel Backoff,'' \fIProc. HLT-NAACL\fP, pp. 4\-6, Edmonton, Alberta, 2003..brS. F. Chen and J. Goodman, ``An Empirical Study of Smoothing Techniques forLanguage Modeling,'' TR-10-98, Computer Science Group, Harvard Univ., 1998..br K. Kirchhoff et al., ``Novel Speech Recognition Models for Arabic,''Johns Hopkins University Summer Research Workshop 2002, Final Report..brR. Kneser, J. Peters and D. Klakow,``Language Model Adaptation Using Dynamic Marginals'',\fIProc. Eurospeech\fP, pp. 1971\-1974, Rhodes, 1997..brA. Stolcke and E. Shriberg, ``Statistical language modeling for speechdisfluencies,'' Proc. IEEE ICASSP, pp. 405\-409, Atlanta, GA, 1996..brA. Stolcke,`` Entropy-based Pruning of Backoff Language Models,''\fIProc. DARPA Broadcast News Transcription and Understanding Workshop\fP,pp. 270\-274, Lansdowne, VA, 1998..brA. Stolcke et al., ``Automatic Detection of Sentence Boundaries andDisfluencies based on Recognized Words,'' \fIProc. ICSLP\fP, pp. 2247\-2250,Sydney, 1998..brM. Weintraub et al., ``Fast Training and Portability,''in Research Note No. 1, Center for Language and Speech Processing,Johns Hopkins University, Baltimore, Feb. 1996..SH BUGSSome LM types (such as Bayes-interpolated and factored LMs) currently donot support the .B \-write-lm function..PPFor the .B \-limit-vocaboption to work correctly with hidden event and class N-gram LMs, theevent/class vocabularies have to be specified by options (\c.B \-hidden-vocab and.BR \-classes ,respectively).Embedding event/class definitions in the LM file only will not work correctly..PPSentence generation is slow and takes time proportional to the vocabularysize..PPThe file given by .B \-classes is read multiple times if.B \-limit-vocabis in effect or if a mixture of LMs is specified.This will lead to incorrect behavior if the argument of.B \-classes is stdin (``-'')..PPAlso, .B \-limit-vocab will not work correctly with LM operations that require the entirevocabulary to be enumerated, such as .B \-adapt-marginals or perplexity computation with.BR "\-debug 3" ..PPSupport for factored LMs is experimental and many LM operations supportedby standard N-grams (such as.BR \-limit-vocab )are not implemented yet..SH AUTHORSAndreas Stolcke <stolcke@speech.sri.com>.brJing Zheng <zj@speech.sri.com>.brCopyright 1995\-2006 SRI International
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -