⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ngram-format.5

📁 这是一款很好用的工具包
💻 5
字号:
ngram-format(5)                                   ngram-format(5)NNAAMMEE       ngram-format - File format for ARPA backoff N-gram modelsSSYYNNOOPPSSIISS       \\ddaattaa\\       nnggrraamm 11==_n_1       nnggrraamm 22==_n_2       ...       nnggrraamm _N==_n_N       \\11--ggrraammss::       _p    _w         [_b_o_w]       ...       \\22--ggrraammss::       _p    _w_1 _w_2          [_b_o_w]       ...       \\_N--ggrraammss::       _p    _w_1 ... _w_N       ...       \\eenndd\\DDEESSCCRRIIPPTTIIOONN       The  so-called ARPA (or Doug Paul) format for N-gram back-       off models starts with a header, introduced by the keyword       \\ddaattaa\\,  listing  the  number  of  N-grams of each length.       Following that, N-grams are listed one per  line,  grouped       into  sections  by  length, each section starting with the       keyword \\_N--ggrraamm::, where _N is the length of the N-grams  to       follow.   Each N-gram line starts with the logarithm (base       10) of conditional probability _p of that N-gram,  followed       by  the  words  _w_1..._w_N  making  up the N-gram.  These are       optionally followed by the  logarithm  (base  10)  of  the       backoff  weight  for  the  N-gram.  The keyword \\eenndd\\ con-       cludes the model representation.       Backoff weights are required only for those  N-grams  that       form  a  prefix of longer N-grams in the model.  The high-       est-order N-grams in  particular  will  not  need  backoff       weights (they would be useless).       Since  log(0) (minus infinity) has no portable representa-       tion, such values are mapped to a large  negative  number.       However,  the  designated  dummy  value  (-99 in SRILM) is       interpreted as log(0) when read back from file  into  mem-       ory.       The  correctness  of  the N-gram counts _n_1, _n_2, ... in the       header is not enforced by SRILM software when reading mod-       els  (although  a warning is printed when an inconsistency       is encountered).  This allows easy  textual  insertion  or       deletion of parameters in a model file.  The proper format       can be recovered by passsing the model through the command            ngram -order _N -lm _i_n_p_u_t -write-lm _o_u_t_p_u_t       Note that the format is self-delimiting, allowing multiple       models to be stored in one file, or to  be  surrounded  by       ancillary  information.   Some extensions of N-gram models       in SRILM store additional parameters after a basic  N-gram       section in the standard format.SSEEEE AALLSSOO       ngram(1),  ngram-count(1), lm-scripts(1), pfsg-scripts(1).BBUUGGSS       The ARPA format does not allow N-grams that  have  only  a       backoff  weight  associated  with them, but no conditional       probability.  This makes  the  format  less  general  than       would otherwise be useful (e.g., to support pruned models,       or ones containing a  mix  of  words  and  classes).   The       nnggrraamm--ccoouunntt(1) tool satisfies this constraint by inserting       dummy probabilities where necessary.       For simplicity, an N-gram model containing N-grams  up  to       length  _N  is referred to in the SRILM programs as an _N-th       order model, although techncally it  represents  a  Markov       model of order _N-1.BBUUGGSS       There is no way to specify words with embedded whitespace.AAUUTTHHOORR       The ARPA backoff format was developed by Doug Paul at  MIT       Lincoln Labs for research sponsored by the U.S. Department       of Defense Advanced Research Project Agency (ARPA).       Man page by Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1999, 2004 SRI InternationalSRILM File Formats $Date: 2004/02/27 03:33:40 $   ngram-format(5)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -