⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ngram-format.5

📁 这是一款很好用的工具包
💻 5
字号:
.\" $Id: ngram-format.5,v 1.4 2004/02/27 03:33:40 stolcke Exp $.TH ngram-format 5 "$Date: 2004/02/27 03:33:40 $" "SRILM File Formats".SH NAMEngram-format \- File format for ARPA backoff N-gram models.SH SYNOPSIS.br\fB\\data\\\fP.br\fBngram 1=\fP\fIn1\fP.br\fBngram 2=\fP\fIn2\fP.br\&....br\fBngram\fP \fIN\fP\fB=\fP\fInN\fP.br\fB\\1-grams:\fP.br\fIp\fP	\fIw\fP		[\fIbow\fP].br\&....br\fB\\2-grams:\fP.br\fIp\fP	\fIw1 w2\fP		[\fIbow\fP].br\&....br\fB\\\fP\fIN\fP\fB-grams:\fP.br\fIp\fP	\fIw1\fP ... \fIwN\fP.br\&....br\fB\\end\\\fP.SH DESCRIPTIONThe so-called ARPA (or Doug Paul) format for N-gram backoff modelsstarts with a header, introduced by the keyword \fB\\data\\\fP,listing the number of N-grams of each length.Following that, N-grams are listed one per line, grouped into sectionsby length, each section starting with the keyword \fB\\\fP\fIN\fP\fB-gram:\fP,where.I Nis the length of the N-grams to follow.Each N-gram line starts with the logarithm (base 10) of conditional probability .I pof that N-gram, followed by the words.IR w1 ... wNmaking up the N-gram.These are optionally followed by the logarithm (base 10) of thebackoff weight for the N-gram.The keyword \fB\\end\\\fPconcludes the model representation..PPBackoff weights are required only for those N-gramsthat form a prefix of longer N-grams in the model.The highest-order N-grams in particular will not need backoff weights(they would be useless)..PPSince log(0) (minus infinity) has no portable representation, such valuesare mapped to a large negative number.However, the designated dummy value (-99 in SRILM) is interpreted as log(0)when read back from file into memory..PPThe correctness of the N-gram counts .IR n1 ,.IR n2 ,\&... in the header is not enforced by SRILM software when reading models (although a warning is printed when an inconsistency is encountered).This allows easy textual insertion or deletion of parameters in a model file.The proper format can be recovered by passsing the model throughthe command.br	ngram -order \fIN\fP -lm \fIinput\fP -write-lm \fIoutput\fP.PPNote that the format is self-delimiting, allowing multiple models tobe stored in one file, or to be surrounded by ancillary information.Some extensions of N-gram models in SRILM store additional parameters after a basic N-gram section in the standard format..SH "SEE ALSO"ngram(1), ngram-count(1), lm-scripts(1), pfsg-scripts(1)..SH BUGSThe ARPA format does not allow N-grams that have only a backoff weightassociated with them, but no conditional probability.This makes the format less general than would otherwise be useful(e.g., to support pruned models, or ones containing a mix of words andclasses).  The.BR ngram-count (1)tool satisfies this constraint by inserting dummy probabilities wherenecessary..PPFor simplicity, an N-gram model containing N-grams up to length.I Nis referred to in the SRILM programs as an .IR N -thorder model, although techncally it represents a Markov model of order .IR N -1..SH BUGSThere is no way to specify words with embedded whitespace..SH AUTHORThe ARPA backoff format was developed by Doug Paul at MIT Lincoln Labsfor research sponsored by the U.S. Department of DefenseAdvanced Research Project Agency (ARPA)..brMan page by Andreas Stolcke <stolcke@speech.sri.com>..brCopyright 1999, 2004 SRI International

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -