⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 multi-ngram.1

📁 这是一款很好用的工具包
💻 1
字号:
.\" $Id: multi-ngram.1,v 1.3 2004/12/03 17:59:01 stolcke Exp $.TH multi-ngram 1 "$Date: 2004/12/03 17:59:01 $" "SRILM Tools".SH NAMEmulti-ngram \- build multiword N-gram models.SH SYNOPSIS.B multi-ngram[\c.BR \-help ]option\&....SH DESCRIPTION.B multi-ngrambuilds N-gram language models that contain multiwords, i.e., compound wordsthat are a concatenation of words from some prior given model.It will optionally generate multiword N-grams and insert them intoan existing, reference N-gram model, so as to cover multiwords occuring in a specified vocabulary.It will then assign probabilities to the multiword N-grams so that wordstrings containing multiwords have the same probabilities as the stringsof component words in the reference model..PPNote that the inverse operation (expanding a multiword N-gram to containonly regular words) is subsumed by the .B "ngram -expand-classes"function..SH OPTIONSEach filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicatestdin/stdout..TP.B \-helpPrint option summary..TP.B \-versionPrint version information..TP.BI \-order " n"Set the maximal N-gram order to be used from the reference model.NOTE: The order of the model is not set automatically when a modelfile is read, so the same file can be used at various orders.To use models of order higher than 3 it is always necessary to specify thisoption..TP.BI \-multi-order " n"The maximal N-gram order in the multiword-based model..TP.BI \-debug " level"Set the debugging output level (0 means no debugging output)..TP.BI \-vocab " file"Words to be added to the model.In particular, this should include all the multiwords to be added..TP.BI \-multi-char " C"Character used to delimit component words in multiwords(an underscore character by default)..TP.BI \-lm " file"Reference N-gram model..TP.BI \-multi-lm " file"Model containing multiwords; the N-grams in this model will be assignednew probabilities based on the reference model.If this option is .I notgiven then the multiword model will be generated by adding multiwordN-grams to the reference model..TP.B \-prune-unseen-ngramsThis option prevents the insertion of multiword N-grams whose componentN-grams are not contained in the reference model.For example, for a multiword bigram "a_b c_d" to be inserted, a trigramreference model must contain the trigrams "a b c" and "b c d".If the reference model were a bigram LM, it would have to contain"a b", "b c", and "c d".This option is important to control the size of the multiword LM forlarge vocabularies..TP.BI \-write-lm " file"Output location of the generated multiword model..SH "SEE ALSO"ngram(1), ngram-format(5)..SH BUGSThis program is a hack for cases were the original training data is not available and a multiword model has to be generated from an existingmodel..brThe resulting model is no longer properly normalized, since the same word string can potentially be represented with or without multiwords..brThe generation of multiword N-grams uses a heuristic algorithm that works well for bigrams and trigrams, but is not exhaustive..SH AUTHORAndreas Stolcke <stolcke@speech.sri.com>..brCopyright 2000\-2004 SRI International

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -