📄 pfsg-scripts.1

📁 这是一款很好用的工具包
💻 1
字号:
pfsg-scripts(1)                                   pfsg-scripts(1)NNAAMMEE       pfsg-scripts,   add-classes-to-pfsg,   add-pauses-to-pfsg,       classes-to-fsm,  fsm-to-pfsg,  htklat-vocab,   make-nbest-       pfsg, make-ngram-pfsg, pfsg-from-ngram, pfsg-to-dot, pfsg-       to-fsm, pfsg-vocab, wlat-stats, wlat-to-dot,  wlat-to-pfsg       - create and manipulate finite-state networksSSYYNNOOPPSSIISS       mmaakkee--nnggrraamm--ppffssgg        [mmaaxxoorrddeerr==_N]       [cchheecckk__bboowwss==0|1]       [nnoo__eemmppttyy__bboo==11]     [vveerrssiioonn==11]      [ttoopp__lleevveell__nnaammee==_n_a_m_e]       [nnuullll==_s_t_r_i_n_g] [_l_m_-_f_i_l_e] >>_p_f_s_g_-_f_i_l_e       aadddd--ppaauusseess--ttoo--ppffssgg [vvooccaabb==_f_i_l_e] [ppaauusseellaasstt==11] [wwoorrddwwrraapp==00]       [ppaauussee==_p_a_u_s_e_w_o_r_d]    [vveerrssiioonn==11]     [ttoopp__lleevveell__nnaammee==_n_a_m_e]       [nnuullll==_s_t_r_i_n_g] [_p_f_s_g_-_f_i_l_e]       aadddd--ccllaasssseess--ttoo--ppffssgg  ccllaasssseess==_c_l_a_s_s_e_s  [nnuullll==_s_t_r_i_n_g] [_p_f_s_g_-       _f_i_l_e]       ppffssgg--ffrroomm--nnggrraamm [_l_m_-_f_i_l_e] >>_p_f_s_g_-_f_i_l_e       mmaakkee--nnbbeesstt--ppffssgg [nnoottrreeee==0|1 ssccaallee==_S aammww==_A  llmmww==_L  wwttww==_W  ]       [_n_b_e_s_t_-_f_i_l_e]       ppffssgg--vvooccaabb [_p_f_s_g_-_f_i_l_e...]       hhttkkllaatt--vvooccaabb [qquuootteess==11] [_h_t_k_-_l_a_t_t_i_c_e_-_f_i_l_e...]       ppffssgg--ttoo--ddoott  [sshhooww__pprroobbss==0|1  sshhooww__llooggss==0|1 sshhooww__nnuummss==0|1]       [_p_f_s_g_-_f_i_l_e]       ppffssgg--ttoo--ffssmm   [ssyymmbboollffiillee==_s_y_m_b_o_l_s   ssyymmbboolliicc==0|1   ssccaallee==_S       ffiinnaall__oouuttppuutt==_E] [_p_f_s_g_-_f_i_l_e]       ffssmm--ttoo--ppffssgg  [ppffssgg__nnaammee==_n_a_m_e ttrraannssdduucceerr==0|1 ssccaallee==_S] [_f_s_m_-       _f_i_l_e]       ccllaasssseess--ttoo--ffssmm vvooccaabb==_v_o_c_a_b [iissyymmbboollffiillee==_i_s_y_m_b_o_l_s  oossyymmbbooll--       ffiillee==_o_s_y_m_b_o_l_s ssyymmbboolliicc==0|1] [_c_l_a_s_s_e_s]       wwllaatt--ttoo--ppffssgg [_w_l_a_t_-_f_i_l_e]       wwllaatt--ttoo--ddoott [sshhooww__pprroobbss==0|1 sshhooww__nnuummss==0|1] [_w_l_a_t_-_f_i_l_e]       wwllaatt--ssttaattss [_w_l_a_t_-_f_i_l_e]DDEESSCCRRIIPPTTIIOONN       These  scripts  create  and  manipulate  various  forms of       finite-state networks.  Note that they take  options  with       the ggaawwkk(1) syntax _o_p_t_i_o_n==_v_a_l_u_e instead of the more common       --_o_p_t_i_o_n _v_a_l_u_e.       Also, since these tools are implemented  as  scripts  they       don't automatically input or output compressed model files       correctly, unlike the main SRILM  tools.   However,  since       most  scripts  work  with  data  from standard input or to       standard output (by leaving  out  the  file  argument,  or       specifying  it  as  ``-'') it is easy to combine them with       gguunnzziipp(1) or ggzziipp(1) on the command line.       mmaakkee--nnggrraamm--ppffssgg encodes a backoff N-gram model  in  nnggrraamm--       ffoorrmmaatt(5)  as  a  finite-state  network in ppffssgg--ffoorrmmaatt(5).       mmaaxxoorrddeerr==_N limits the N-gram length used in PFSG construc-       tion  to _N; the default is to use all N-grams occurring in       the input model.  cchheecckk__bboowwss==11 enables a check for  condi-       tional probabilities that are smaller than the correspond-       ing backoff probabilities.  Such transitions should  first       be  removed  from  the  model  with nnggrraamm --pprruunnee--lloowwpprroobbss.       nnoo__eemmppttyy__bboo==11  Prevents  empty  paths  through  the   PFSG       resulting  from  transitions  through  the unigram backoff       node.       aadddd--ppaauusseess--ttoo--ppffssgg replaces the word  nodes  in  an  input       PFSG  with  sub-PFSGs  that allow an optional pause before       each word.  It also inserts an  optional  pause  following       the last word in the sentence.  A typical usage is            make-ngram-pfsg _n_g_r_a_m | \            add-pauses-to-pfsg >_f_i_n_a_l_-_p_f_s_g       The  result  is a PFSG suitable for use in a speech recog-       nizer.  The option ppaauusseellaasstt==11 switches the order of words       and  pause nodes in the sub-PFSGs; wwoorrddwwrraapp==00 disables the       insertion of sub-PFSGs altogether.       The options ppaauussee==_p_a_u_s_e_w_o_r_d and ttoopp__lleevveell__nnaammee==_n_a_m_e  allow       changing  the default names of the pause word and the top-       level grammar, respectively.  vveerrssiioonn==11 inserts a  version       line  at  the  top of the output as required by the Nuance       recognition system (see NUANCE COMPATIBILTY below).   aadddd--       ppaauusseess--ttoo--ppffssgg  uses a heuristic to distinguish word nodes       in the input PFSG from other nodes  (NULL  or  sub-PFSGs).       The  option  vvooccaabb==_f_i_l_e  lets  one specify a vocabulary of       word names to override these heuristics.       aadddd--ccllaasssseess--ttoo--ppffssgg extends an input PFSG with  expansions       for  word  classes,  defined in _c_l_a_s_s_e_s.  _p_f_s_g_-_f_i_l_e should       contain a PFSG generated from  the  N-gram  portion  of  a       class N-gram model.  A typical usage is thus            make-ngram-pfsg _c_l_a_s_s_-_n_g_r_a_m | \            add-classes-to-pfsg classes=_c_l_a_s_s_e_s | \            add-pauses-to-pfsg >_f_i_n_a_l_-_p_f_s_g       ppffssgg--ffrroomm--nnggrraamm  is a wrapper script that combines removal       of low-probability N-grams, conversion to PFSG, and adding       of optional pauses to create a PFSG for recognition.       mmaakkee--nnbbeesstt--ppffssgg converts an N-best list in nnbbeesstt--ffoorrmmaatt(5)       into a  PFSG  which,  when  used  in  recognition,  allows       exactly  the  hypotheses  contained  in  the  N-best list.       nnoottrreeee==11  creates  separate  PFSG  nodes  for   all   word       instances;  the  default  is  to  construct  a prefix-tree       structured PFSG.  ssccaallee==_S multiplies the total  hypothesis       scores by _S; the default is 0, meaning that all hypotheses       have identical probability in the  PFSG.   Three  options,       aammww==_A, llmmww==_L, and wwttww==_W, control the score weighting in N-       best lists that contain  separate  acoustic  and  language       model  scores, setting the acoustic model weight to _A_, the       language model weight to _L, and the word transition weight       to _W.       ppffssgg--vvooccaabb  extracts  the  vocabulary  used in one or more       PFSGs.  hhttkkllaatt--vvooccaabb does the same  for  lattices  in  HTK       standard lattice format.  The qquuootteess==11 option enables pro-       cessing of HTK quotes.       ppffssgg--ttoo--ddoott renders a PFSG in ddoott(1) format for subsequent       layout,  printing,  etc.  sshhooww__pprroobbss==11 includes transition       probabilities in the  output.   sshhooww__llooggss==11  includes  log       (base   10)   transition   probabilities  in  the  output.       sshhooww__nnuummss==11 includes node numbers in the output.       ppffssgg--ttoo--ffssmm converts a finite-state network  in  ppffssgg--ffoorr--       mmaatt(5)  into  an equivalent network in AT&T ffssmm(5) format.       This involves moving output actions from nodes to  transi-       tions.   If  ssyymmbboollffiillee==_s_y_m_b_o_l_s  is specified, the mapping       from FSM output symbols is written to  _s_y_m_b_o_l_s  for  later       use with the --ii or --oo options of ffssmm(1) tools.  ssyymmbboolliicc==11       preserves the word strings in the resulting FSA.   ssccaallee==_S       scales  the  transition weights by a factor _S; the default       is  -1  (to  conform  to  the   default   FSM   semiring).       ffiinnaall__oouuttppuutt==_E  forces  the  final FSA node to have output       label _S; this also forces creation of a unique  final  FSA       node, which is otherwise unnecessary if the final node has       a null output.       ffssmm--ttoo--ppffssgg conversely transforms ffssmm(5) format into ppffssgg--       ffoorrmmaatt(5).  This involves moving output actions from tran-       sitions to nodes, and generally requires  an  increase  in       the  number  of  nodes.  (The conversion is done such that       ppffssgg--ttoo--ffssmm and ffssmm--ttoo--ppffssgg are  exact  inverses  of  each       other.)   The  _n_a_m_e  parameter  sets the name field of the       output PFSG.  ttrraannssdduucceerr==11 indicates that the input  is  a       transducer and that input:output pairs should be preserved       in the PFSG.  ssccaallee==_S scales the transition weights  by  a       factor _S; the default is -1 (to conform to the default FSM       semiring).       ccllaasssseess--ttoo--ffssmm converts a ccllaasssseess--ffoorrmmaatt(5)  file  into  a       transducer  in  ffssmm(5)  format,  such  that  composing the       transducer with an FSA encoding  a  class  language  model       results  in  an FSA for the word language model.  The word       vocabulary needs to be  given  in  file  _v_o_c_a_b.   iissyymmbbooll--       ffiillee==_i_s_y_m_b_o_l_s  and  oossyymmbboollffiillee==_o_s_y_m_b_o_l_s  allow saving the       input and output symbol tables of the transducer for later       use.  ssyymmbboolliicc==11 preserves the word strings in the result-       ing FSA.       The following commands show the creation of an FSA  encod-       ing  the  class N-gram grammar ``test.bo'' with vocabulary       ``test.vocab'' and class expansions ``test.classes'':            classes-to-fsm vocab=test.vocab symbolic=1 \                 isymbolfile=CLASSES.inputs \                 osymbolfile=CLASSES.outputs \                 test.classes >CLASSES.fsm            make-ngram-pfsg test.bo | \            pfsg-to-fsm symbolic=1 >test.fsm            fsmcompile -i CLASSES.inputs test.fsm  >test.fsmc            fsmcompile -t -i CLASSES.inputs -o CLASSES.outputs \                 CLASSES.fsm >CLASSES.fsmc            fsmcompose test.fsmc CLASSES.fsmc >result.fsmc       wwllaatt--ttoo--ppffssgg converts a word  posterior  lattice  or  mesh       ("sausage") in wwllaatt--ffoorrmmaatt(5) into ppffssgg--ffoorrmmaatt(5).       wwllaatt--ttoo--ddoott  renders  a  wwllaatt--ffoorrmmaatt(5)  word  lattice  in       ddoott(1)  format  for  subsequent  layout,  printing,   etc.       sshhooww__pprroobbss==11  includes node posterior probabilities in the       output.  sshhooww__nnuummss==11 includes node indices in the  output.       wwllaatt--ssttaattss computes statistics of word posterior lattices,       including the number of word hypotheses, the entropy  (log       base  10)  of the sentence hypothesis set represented, and       the posterior expected number of words.  For  word  meshes       that  have  been  aligned  with references, the 1-best and       oracle lattice error rates are also computed.NNUUAANNCCEE CCOOMMPPAATTIIBBIILLIITTYY       The Nuance recognizer (as of version  6.2)  understands  a       variant of the PFSG format; hence the scripts above should       be useful in building recognition systems for that  recog-       nizer.       A  suitable  PFSG  can be generated from an N-gram backoff       model in ARPA nnggrraamm--ffoorrmmaatt(5) using the following command:            ngram  -debug  1  -order  _N -lm _L_M_._b_o -prune-lowprobs       -write-lm - | \            make-ngram-pfsg | \            add-pauses-to-pfsg version=1 pauselast=1  pause=_pau_       top_level_name=.TOP_LEVEL >_L_M_._p_f_s_g       assuming  the  pause  word in the dictionary is ``_pau_''.       Certain restrictions on the  naming  of  words  (e.g.,  no       hyphens are allowed) have to be respected.       The  resulting  PFSG  can  then  be referenced in a Nuance       grammar file, e.g.,            .TOP [NGRAM_PFSG]            NGRAM_PFSG:lm _L_M_._p_f_s_g       In newer Nuance versions the name for a non-emitting  node       was  changed to NNUULLNNOODD, and inter-word optional pauses are       automatically added to the grammar.  This means  that  the       PFSG should be create using            ngram  -debug  1  -order  _N -lm _L_M_._b_o -prune-lowprobs       -write-lm - | \            make-ngram-pfsg  version=1  top_level_name=.TOP_LEVEL       null=NULNOD >_L_M_._p_f_s_g       The  nnuullll==NNUULLNNOODD  option  should  also  be  passed to aadddd--       ccllaasssseess--ttoo--ppffssgg.       Starting with version 8, Nuance supports N-gram LMs.  How-       ever,  you can still use SRILM to create LMs, as described       above.  The syntax for inclusion of a PFSG has changed to            NGRAM_PFSG:slm _L_M_._p_f_s_g       Caveat: Compatibility with Nuance is purely due to histor-       ical circumstance and not supported.SSEEEE AALLSSOO       lattice-tool(1),   ngram(1),   ngram-format(5),  pfsg-for-       mat(5),  wlat-format(5),   nbest-format(5),   classes-for-       mat(5), fsm(5), dot(1).BBUUGGSS       mmaakkee--nnggrraamm--ppffssgg  should  be reimplemented in C++ for speed       and some size optimizations that require more global oper-       ations on the PFSG.AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1995-2005 SRI InternationalSRILM Tools        $Date: 2006/10/05 19:43:07 $   pfsg-scripts(1)
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -