📄 ppl-scripts.1
字号:
ppl-scripts(1) ppl-scripts(1)NNAAMMEE ppl-scripts, add-ppls, compare-ppls, compute-best-mix, compute-best-sentence-mix, hits-from-log, ppl-from-log, subtract-ppls - manipulate perplexitiesSSYYNNOOPPSSIISS aadddd--ppppllss [_p_p_l_-_f_i_l_e..] ssuubbttrraacctt--ppppllss _p_p_l_-_f_i_l_e_1 [_p_p_l_-_f_i_l_e_2...] ppppll--ffrroomm--lloogg [_p_p_l_-_f_i_l_e...] hhiittss--ffrroomm--lloogg [_p_p_l_-_f_i_l_e...] ccoommppaarree--ppppllss [mmiinnddeellttaa==_D] _p_p_l_-_f_i_l_e_1 _p_p_l_-_f_i_l_e_2 ccoommppuuttee--bbeesstt--mmiixx [llaammbbddaa==''_l_1 _l_2 ...'' pprreecciissiioonn==_P] _p_p_l_- _f_i_l_e_1 [_p_p_l_-_f_i_l_e_2...] ccoommppuuttee--bbeesstt--sseenntteennccee--mmiixx [llaammbbddaa==''_l_1 _l_2 ...'' pprreeccii-- ssiioonn==_P] _p_p_l_-_f_i_l_e_1 [_p_p_l_-_f_i_l_e_2...]DDEESSCCRRIIPPTTIIOONN These scripts process the output of the nnggrraamm(1) option --ppppll to extract various useful information. They are par- ticularly convenient in analyzing the performance (per- plexity) of language models on specific subsets of the test data, or to compare and combine multiple models. aadddd--ppppllss takes several ppl output files and computes an aggregate perplexity and corpus statistics. Its output is suitable for subsequent manipulation by aadddd--ppppllss or ssuubb-- ttrraacctt--ppppllss. ssuubbttrraacctt--ppppllss similarly computes an aggregate perplexity by removing the statistics of zero or more _p_p_l_-_f_i_l_e_2 from those in _p_p_l_-_f_i_l_e_1. Its output is suitable for subsequent manipulation by aadddd--ppppllss or ssuubbttrraacctt--ppppllss. ppppll--ffrroomm--lloogg recomputes the total perplexities and statis- tics from individual lines in nnggrraamm --ddeebbuugg 22 --ppppll output. Combined with some filtering of that output this allows computing perplexities on interesting subsets of words. hhiittss--ffrroomm--lloogg computes N-gram hit rates from nnggrraamm --ddeebbuugg 22 --ppppll output. ccoommppaarree--ppppllss tallies the number of words for which two language models produce the same, higher, or lower proba- bilities. The input files should be nnggrraamm --ddeebbuugg 22 --ppppll output for the two models on the same test set. The parameter _D is the minimum absolute difference for two log probabilities to be considered different (the default is 0). ccoommppuuttee--bbeesstt--mmiixx takes the output of several nnggrraamm --ddeebbuugg 22 --ppppll runs on the same test set and computes the optimal interpolation weights for the corresponding models, i.e., the weights that minimize the perplexity of an interpo- lated model. Initial weights may be specified as _l_1 _l_2 _._._.. The computation is iterative and stops when the interpolation weights change by less than _P (default 0.001). ccoommppuuttee--bbeesstt--sseenntteennccee--mmiixx similarly optimizes the weights for sentence-level interpolation of LMs. It requires input files generated by nnggrraamm --ddeebbuugg 11 --ppppll. (Sentence- level mixtures can be implemented using the nnggrraamm --hhmmmm option, by constructing a suitable HMM structure.)SSEEEE AALLSSOO ngram(1).BBUUGGSS All scripts depend on the idiosyncrasies of nnggrraamm --ppppll output.AAUUTTHHOORR Andreas Stolcke <stolcke@speech.sri.com>. Copyright 1995-2002 SRI InternationalSRILM Tools $Date: 2002/04/19 14:11:30 $ ppl-scripts(1)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -