⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ngram-merge.1

📁 这是一款很好用的工具包
💻 1
字号:
ngram-merge(1)                                     ngram-merge(1)NNAAMMEE       ngram-merge - merge N-gram countsSSYYNNOOPPSSIISS       nnggrraamm--mmeerrggee  [--hheellpp] [--wwrriittee _o_u_t_f_i_l_e] [--ffllooaatt--ccoouunnttss] [----]       _i_n_f_i_l_e_1 _i_n_f_i_l_e_2 ...DDEESSCCRRIIPPTTIIOONN       nnggrraamm--mmeerrggee reads two or more lexicographically sorted  N-       gram  count  files  (as produced by nnggrraamm--ccoouunntt --ssoorrtt) and       outputs the merged, sorted counts.   The  output  is  thus       suitable for subsequent merging steps.       The input format consists of one N-gram count per line,            _w_o_r_d_1 _w_o_r_d_2 _._._. _w_o_r_d_n _c_o_u_n_t       The  lines  must be sorted lexicographically on the words,       leftmost first.  The input may contain N-grams of  differ-       ent lengths.       Each filename argument can be a plain ASCII count file, or       a compressed file (name ending in .Z or .gz), or ``-''  to       indicate stdin/stdout.       nnggrraamm--mmeerrggee  is recommended in cases where the full counts       would far exceed available real memory.  Although an arbi-       trary  number of input count files is accepted, it is best       to use the program as follows.  First, partition the input       text  into  the largest chunks so that nnggrraamm--ccoouunntt can run       in real memory.  Then merge the  resulting  sorted  counts       using  nnggrraamm--mmeerrggee  pairwise,  and  continue doing so in a       binary tree pattern until a single count  file  containing       all  N-grams  remains.  This procedure is automated by the       mmaakkee--bbaattcchh--ccoouunnttss and mmeerrggee--bbaattcchh--ccoouunnttss scripts.OOPPTTIIOONNSS       Each filename argument can be an ASCII  file,  or  a  com-       pressed file (name ending in .Z or .gz), or ``-'' to indi-       cate stdin/stdout.       --hheellpp  Print option and usage summary.       --vveerrssiioonn              Print version information.       --wwrriittee _o_u_t_f_i_l_e              Write merged counts to _o_u_t_f_i_l_e, instead of standard              output.       --ffllooaatt--ccoouunnttss              Process  counts  as  floating  point  numbers.   By              default counts are assumed to be unsigned integers.       ----     Indicates  the  end  of  options, in case the first              input filename begins with ``-''.SSEEEE AALLSSOO       ngram-count(1), ngram(1), training-scripts(1).AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>       Copyright 1995-2004 SRI International

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -