📄 nbest-scripts.1

📁 这是一款很好用的工具包
💻 1
📖 第 1 页 / 共 2 页
字号:
上一页 12
       filename, followed by the words.       rreessccoorree--mmiinniimmiizzee--wweerr  is  similar  to rreessccoorree--rreewweeiigghhtt but       picks hypotheses using the word error  minimization  algo-       rithm of nnbbeesstt--llaattttiiccee(1).       nnbbeesstt22--ttoo--nnbbeesstt11    converts    an    N-best    list    in       ``NBestList2.0'' format to ``NBestlist1.0'', for the bene-       fit  of  programs  that  have not yet been updated to deal       with the new format.       nnbbeesstt--rroovveerr combines hypotheses from multiple N-best lists       at  the  word  level,  by performing the same kind of word       error minimization as nnbbeesstt--llaattttiiccee(1), in  a  generaliza-       tion  of the ROVER algorithm.  _s_e_n_t_i_d_-_l_i_s_t is a file list-       ing sentence IDs.  These must match the filenames in a set       of  N-best  directories, which are specified in a _c_o_n_t_r_o_l_-       _f_i_l_e.  The format for the latter is            _d_i_r_1 _l_m_w_1 _w_t_w_1 _w_1 [_n_1 [_s_1]]            _d_i_r_2 _l_m_w_2 _w_t_w_2 _w_2 [_n_2 [_s_2]]            ...       Each line specifies  an  N-best  directory,  the  language       model  and  word  transition  weights  to be used in score       combination, and a weight to be applied to  the  posterior       probabilities.   An  optional  next-to-last  parameter for       each N-best list allows the lists to be truncated  to  the       top  _n_1, _n_2, etc., hypotheses.  The final optional parame-       ter sets the posterior distribution scaling factor,  which       defaults  to  the language model weight.  Optionally, _c_o_n_-       _t_r_o_l_-_f_i_l_e can also contain lines of the form            _d_i_r _w ++       These indicate that additional score files can be found in       directory  _d_i_r and that the scores found therein should be       added to the following N-best  list  set  with  weight  _w.       Several  lines  of this form may occur preceding a regular       N-best directory specification; the corresponding additive       combination of multiple scores is performed.       If  ``-''  is  specified for _s_e_n_t_i_d_-_l_i_s_t, the sentence IDs       are inferred from the contents of the first directory _d_i_r_1       specified in _c_o_n_t_r_o_l_-_f_i_l_e.  If _p_o_s_t_e_r_i_o_r_-_f_i_l_e is specified       on the command line, posterior word probability  estimates       are  written  to  that file.  Any additional arguments are       passed as options to the underlying nnbbeesstt--llaattttiiccee(1) invo-       cation.       nnbbeesstt--rroovveerr can process N-best lists in any of the formats       described in nnbbeesstt--ffoorrmmaatt(5), _a_s _l_o_n_g _a_s _a_l_l _N_-_b_e_s_t  _l_i_s_t_s       _f_o_r  _a _g_i_v_e_n _u_t_t_e_r_a_n_c_e _a_r_e _i_n _t_h_e _s_a_m_e _f_o_r_m_a_t.  When Deci-       pher formats are used only their acoustic scores are used.       ccoommbbiinnee--rroovveerr--ccoonnttrroollss  takes one or more nnbbeesstt--rroovveerr con-       trol files as arguments and outputs  a  new  control  file       that  specifies  the combination of the input files.  Each       input system is given equal weight.   Directory  names  in       the input files are adjusted to reflect the relative loca-       tion of the input files.  The  optional  llaammbbddaa==  argument       may  be  used  to specify a space-separated list of system       weights; the default weights are uniform.       nnbbeesstt--ppoosstteerriioorrss rescales the scores in an N-best list  to       reflect (weighted) posterior probabilities.  The output is       the same N-best list with acoustic scores set to  the  log       (base 10) of the posterior hyp probabilities and LM scores       set to zero.  ppoossttssccaallee==_S attenuates the posterior distri-       bution  by  dividing combined log scores by _S (the default       is _S=_l_m_w).  If wweeiigghhtt==_W is specified  the  posteriors  are       multiplied   by  _W.   mmaaxx__nnbbeesstt==_M  limits  the  number  of       hypotheses used to the top _M.  This script is used  mostly       as a helper in nnbbeesstt--rroovveerr.       mmeerrggee--nnbbeesstt  merges  hypotheses  from  one  or more N-best       lists into a single list, collapsing hypotheses that occur       in  more  than one input list.  If all input lists use the       same nnbbeesstt--ffoorrmmaatt(5) then the output will also be in  that       format  and contain the information from the first list in       which a hypothesis was encountered.  Otherwise, the output       will  be  in SRI Decipher(TM) NBestList1.0 format and con-       tain  acoustic  scores  and  word   strings   only.    The       mmaaxx__nnbbeesstt==_M  option limits input to the first _M hypotheses       from each input list.  mmuullttiiwwoorrddss==11 merges hypotheses that       are  identical  after  resolving  multiwords.   nnooppaauusseess==11       merges hypotheses that  are  identical  after  removal  of       pause words.       nnbbeesstt--vvooccaabb outputs the vocabulary used in a set of N-best       lists.  (The N-best files cannot be compressed, but may be       concatenated and supplied via stdin.)       nnbbeesstt--eerrrroorr computes the overall oracle word error rate of       a set of N-best lists in directory _s_c_o_r_e_-_d_i_r or listed  in       _f_i_l_e_-_l_i_s_t.  The reference answers are given in _r_e_f_s in the       format output by rreessccoorree--rreewweeiigghhtt (see above).  Additional       arguments  are  passed  to  the  underlying  invocation of       nnbbeesstt--llaattttiiccee(1), and can be used to limit  the  depth  of       the  N-best list, compute lattice error rather than N-best       error, etc.       sseennttiidd--ttoo--sscclliittee converts 1-best hypotheses and references       in  the format used here to the ``trn'' format expected by       the NIST sscclliittee(1) scoring software.       sseennttiidd--ttoo--ccttmm converts 1-best hypotheses and references in       the  format  used  here to NIST ccttmm(5) format.  The script       relies on an encoding of conversation  IDs,  channel,  and       utterance  time  marks  in  the  sentence IDs and may need       adjustment to local conventions.       ffiixx--ccttmm converts output produced by the --oouuttppuutt--ccttmm option       of  nnbbeesstt--llaattttiiccee(1) and llaattttiiccee--ttooooll(1) to a format suit-       able for scoring with NIST sscclliittee(1).  It, too, relies  on       information  encoded  in  the  sentids  IDs  and  may need       adjustments.       ccoommppuuttee--sscclliittee is a  wrapper  around  the  NIST  sscclliittee(1)       scoring tool.  _r_e_f_s and _h_y_p_s are the reference and hypoth-       esized transcripts, respectively.  The _r_e_f_s  file  can  be       either  in  "sentid"  format  or in ssttmm(5) format.  In the       latter case, _h_y_p_s will be converted to ccttmm(5) format using       the  sseennttiidd--ttoo--ccttmm  helper  script.   The _h_y_p_s file can be       either in "sentid" format or in ccttmm(5) format.  More  than       one --hh option can be given to combine the contents of mul-       tiple hypotheses files.  Optionally, --SS specifies a sorted       list of sentence IDs _s_u_b_s_e_t to score.  Multiple --SS options       may be given, to form the intersection of several subsets.       --mmuullttiiwwoorrddss  or  --MM splits ``multiwords'' joined by under-       scores  into  their  component  words  prior  to  scoring.       --nnooppeerriiooddss  deletes  periods  from the hypotheses prior to       scoring (typically used to  bridge  different  conventions       for  spelled  letters).   --RR preserves reject words in the       hypotheses for scoring (as appropriate if references  also       contain  rejects).  --gg _g_l_m_f_i_l_e enables filtering of refer-       ences and hypotheses by the NIST ccssrrffiilltt..sshh  script,  con-       trolled  by the filter file _g_l_m_f_i_l_e (this is only possible       with an stm reference file).  In that case, the --HH  option       causes  hesitations  (as  defined  by  the  filter)  to be       deleted from the output for scoring purposes.  --vv displays       the  complete  command  used  to invoke sscclliittee.  Any addi-       tional options are passed to sscclliittee, e.g., to control  its       output actions or alignment mode.       ccoommppaarree--sscclliittee  scores  two  sets  of hypotheses _h_y_p_s_1 and       _h_y_p_s_2 for the same test set and computes in how many cases       the first or second set had lower word error.  The remain-       ing options are as for ccoommppuuttee--sscclliittee.  The script ignores       hypotheses for sentence that do not appear in both hypoth-       esis files, to ensure comparable scoring results.SSEEEE AALLSSOO       nbest-format(5), ngram(1),  nbest-lattice(1),  nbest-opti-       mize(1), sclite(1), stm(5), ctm(5).       J.G.  Fiscus,  A  Post-Processing  System to Yield Reduced       Word Error Rates: Recognizer Output Voting Error Reduction       (ROVER),  _P_r_o_c_.  _I_E_E_E  _A_u_t_o_m_a_t_i_c  _S_p_e_e_c_h  _R_e_c_o_g_n_i_t_i_o_n  _a_n_d       _U_n_d_e_r_s_t_a_n_d_i_n_g _W_o_r_k_s_h_o_p, Santa Barbara, CA, 347-352,  1997.       A.  Stolcke  et  al.,  "The SRI March 2000 Hub-5 Conversa-       tional Speech Transcription  System",  _P_r_o_c_.  _N_I_S_T  _S_p_e_e_c_h       _T_r_a_n_s_c_r_i_p_t_i_o_n _W_o_r_k_s_h_o_p, College Park, MD, 2000.BBUUGGSS       sseennttiidd--ttoo--sscclliittee  has some assumptions about the structure       of sentence IDs built-in and may need to be  modified  for       ccoommppuuttee--sscclliittee and ccoommppaarree--sscclliittee to work.       rreessccoorree--ddeecciipphheerr  --pprreettttyy  may not work correctly with the       --lliimmiitt--vvooccaabb option if the word mapping adds to the vocab-       ulary subset used in the N-best lists.AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1995-2006 SRI InternationalSRILM Tools        $Date: 2006/07/29 18:42:28 $  nbest-scripts(1)
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -