⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 vocab.3

📁 这是一款很好用的工具包
💻 3
字号:
Vocab(3)                                                 Vocab(3)NNAAMMEE       Vocab - Vocabulary indexing for SRILMSSYYNNOOPPSSIISS       ##iinncclluuddee <<VVooccaabb..hh>>DDEESSCCRRIIPPTTIIOONN       The  VVooccaabb class represents sets of string tokens as typi-       cally used for vocabularies, word class names, etc.  Addi-       tionally, Vocab provides a mapping from such string tokens       (type  VVooccaabbSSttrriinngg)   to   integers   (type   VVooccaabbIInnddeexx).       VocabIndex  values  are  typically  used to index words in       language models to conserve space and speed up comparisons       etc.   Thus,  VVooccaabb  essentially implements a symbol table       into which strings can be ``interned.''TTYYPPEESS       VVooccaabbIInnddeexx              A non-negative integer for  representing  a  string              internally.       VVooccaabbSSttrriinngg              A  character  array  representing a vocabulary item              (e.g., a word).CCOONNSSTTAANNTTSS       mmaaxxWWoorrddLLeennggtthh              Maximum number of characters in a VocabString.       VVooccaabb__NNoonnee              A special VocabIndex used to denote  no  vocabulary              item and to terminate VocabIndex arrays.       VVooccaabb__UUnnkknnoowwnn       VVooccaabb__SSeennttSSttaarrtt       VVooccaabb__SSeennttEEnndd       VVooccaabb__PPaauussee              Default  VocabString values for some common, prede-              fined  vocabulary  items:  unknown  word,  sentence              begin, sentence end, and pause, respectively.CCLLAASSSS MMEEMMBBEERRSS       VVooccaabb((VVooccaabbIInnddeexx _s_t_a_r_t == 00,, VVooccaabbIInnddeexx _e_n_d == 00xx77ffffffffffffff))              When  initializing  a  Vocab  object, _s_t_a_r_t and _e_n_d              optionally set the minimum and  maximum  VocabIndex              values  assigned  by  the  vocabulary.  Indices are              allocated in increasing order starting at _s_t_a_r_t.       VVooccaabbIInnddeexx aaddddWWoorrdd((VVooccaabbSSttrriinngg _n_a_m_e))              Looks up the index of a word  string  _n_a_m_e,  adding              the word if not already part of the vocabulary.       VVooccaabbSSttrriinngg ggeettWWoorrdd((VVooccaabbIInnddeexx _i_n_d_e_x))              Returns  the  VocabString  for  _i_n_d_e_x,  or 0 if the              index isn't defined.       ggeettIInnddeexx((VVooccaabbSSttrriinngg _n_a_m_e))              Returns the VocabIndex for word _n_a_m_e, or VVooccaabb__NNoonnee              if the word isn't defined.  (Unlike aaddddWWoorrdd(()), this              will not extend the vocabulary if the word is unde-              fined.)       vvooiidd rreemmoovvee((VVooccaabbSSttrriinngg _n_a_m_e))       vvooiidd rreemmoovvee((VVooccaabbIInnddeexx _i_n_d_e_x))              Deletes  a  vocabulary  item,  either by name or by              index.       uunnssiiggnneedd iinntt nnuummWWoorrddss(())              Returns the number of current vocabulary entries.       VVooccaabbIInnddeexx hhiigghhIInnddeexx(())              Returns the highest VocabIndex  value  assigned  so              far.   The  next  word  added will receive an index              that is one greater.  When allocating various mean-              ingful  vocabulary  subsets into contiguous ranges,              this function can be used to determine  the  corre-              sponding  boundaries  in VocabIndex space, and then              use these values to test subset membership etc.       VVooccaabbIInnddeexx uunnkkIInnddeexx              The index of the unknown word (by default  assigned              to VVooccaabb__UUnnkknnoowwnn).       VVooccaabbIInnddeexx ssssIInnddeexx              The  index  of  the  sentence-start tag (by default              assignedrto VVooccaabb__SSeennttSSttaarrtt).       VVooccaabbIInnddeexx sseeIInnddeexx              The index  of  the  sentence-end  tag  (by  default              assigned to VVooccaabb__SSeennttEEnndd).       VVooccaabbIInnddeexx ppaauusseeIInnddeexx              The  index of the pause tag (by default assigned to              VVooccaabb__PPaauussee).       BBoooolleeaann uunnkkIIssWWoorrdd              When ttrruuee, the unknown word is considered a regular              word (default ffaallssee).       BBoooolleeaann ttooLLoowweerr              When  ttrruuee,  all  word strings are mapped to lower-              case.  This is convenient to combine  vocabularies,              language  models,  etc.,  whose vocabularies differ              only in the case convention (default ffaallssee).       BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbSSttrriinngg _w_o_r_d))       BBoooolleeaann iissNNoonnEEvveenntt((VVooccaabbIInnddeexx _w_o_r_d))              Tests a word string or index for  being  an  ``non-              event'',  i.e., a token that is not assigned proba-              bility in a language model.  By default,  sentence-              start, pauses, and unknown words are non-events.       uunnssiiggnneedd rreeaadd((FFiillee &&_f_i_l_e))              Reads word strings from a file and adds them to the              vocabulary.  For convenience, only the  first  word              on  each  line is significant (so extra information              could be contained in such a  file).   Returns  the              number of words read.       vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, BBoooolleeaann _s_o_r_t_e_d == ttrruuee))              Write  the vocabulary strings to a file in a format              compatible with rreeaadd(()).  The _s_o_r_t_e_d  argument  con-              trols   whether  the  output  is  lexicographically              sorted.       Often times one wants to manipulate not single  vocabulary       items,  but strings of them, e.g., to represent sentences.       Word strings are represented as self-delimiting arrays  of       type VVooccaabbSSttrriinngg ** or VVooccaabbIInnddeexx **.  The last element in a       string is 0 or VVooccaabb__NNoonnee, respectively.       uunnssiiggnneedd  ggeettWWoorrddss((ccoonnsstt  VVooccaabbIInnddeexx  **_w_i_d_s,,   VVooccaabbSSttrriinngg       **_w_o_r_d_s,, uunnssiiggnneedd _m_a_x))              Extends ggeettWWoorrdd(()) to strings of word.   The  result              is  placed  in  _w_o_r_d_s,  which must have room for at              least _m_a_x words.   Returns  the  actual  number  of              indices in _w_i_d_s.       uunnssiiggnneedd  aaddddWWoorrddss((ccoonnsstt  VVooccaabbSSttrriinngg  **_w_o_r_d_s,,  VVooccaabbIInnddeexx       **_w_i_d_s,, uunnssiiggnneedd _m_a_x))              Extends  aaddddWWoorrdd(())  to  strings  of  indices.   The              result is placed in _w_i_d_s, which must have room  for              at least _m_a_x indices.  Returns the actual number of              words in _w_o_r_d_s.       uunnssiiggnneedd ggeettIInnddiicceess((ccoonnsstt VVooccaabbSSttrriinngg  **_w_o_r_d_s,,  VVooccaabbIInnddeexx       **_w_i_d_s,, uunnssiiggnneedd _m_a_x))              Extends ggeettIInnddeexx(())  to  strings  of  indices.   The              result  is placed in _w_i_d_s, which must have room for              at least _m_a_x indices.  Returns the actual number of              words in _w_o_r_d_s.FFUUNNCCTTIIOONNSS       The  following  static  member  functions are utilities to       manipulate strings of vocabulary items, independent  of  a       particular vocabulary.       uunnssiiggnneedd   ppaarrsseeWWoorrddss((cchhaarr   **_l_i_n_e,,   VVooccaabbSSttrriinngg  **_w_o_r_d_s,,       uunnssiiggnneedd _m_a_x))              Parses  a  character  string  _l_i_n_e into whitespace-              delimited words.  On return, _w_o_r_d_s contains  point-              ers  to  null-terminated  substrings of _l_i_n_e (whose              contents is modified in the process).   _w_o_r_d_s  must              have  room  for at least _m_a_x pointers.  Returns the              actual number of words parsed.       uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))       uunnssiiggnneedd lleennggtthh((ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))              Returns the number items in a word string.       BBoooolleeaann ccoonnttaaiinnss((ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s,, VVooccaabbIInnddeexx _w_o_r_d))              Returns _t_r_u_e if the _w_o_r_d occurs among _w_o_r_d_s.       VVooccaabbIInnddeexx **rreevveerrssee((VVooccaabbIInnddeexx **_w_o_r_d_s))       VVooccaabbSSttrriinngg **rreevveerrssee((VVooccaabbSSttrriinngg **_w_o_r_d_s))              Reverses a string of words in place (and returns it              as a result).       vvooiidd wwrriittee((FFiillee &&_f_i_l_e,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))              Writes a string of space-delimited words to a file.       iinntt ccoommppaarree((VVooccaabbIInnddeexx _w_o_r_d_1,, VVooccaabbIInnddeexx _w_o_r_d_2))       iinntt ccoommppaarree((VVooccaabbSSttrriinngg _w_o_r_d_1,, VVooccaabbSSttrriinngg _w_o_r_d_2))              Compares two  vocabulary  items  lexicographically.              Returns  -1, 0, +1 for less than, equal, or greater              than, respectively.       iinntt ccoommppaarree((ccoonnsstt  VVooccaabbIInnddeexx  **_w_o_r_d_s_1,,  ccoonnsstt  VVooccaabbIInnddeexx       **_w_o_r_d_s_2))       iinntt ccoommppaarree((ccoonnsstt  VVooccaabbIInnddeexx  **_w_o_r_d_s_1,,  ccoonnsstt  VVooccaabbIInnddeexx       **_w_o_r_d_s_2))              Extends the order of _c_o_m_p_a_r_e_(_) to strings of words.       For  compatibilty  with the C library calling conventions,       ccoommppaarree(()) cannot be a member function of a  Vocab  object.       For  index-based  comparisons  the  associated  vocabulary       needs to be set globally.  This is achieved by calling the       ccoommppaarreeIInnddeexx(()) member function of a Vocab object.       oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbSSttrriinngg **_w_o_r_d_s))       oossttrreeaamm &&ooppeerraattoorr<<<< ((oossttrreeaamm &&,, ccoonnsstt VVooccaabbIInnddeexx **_w_o_r_d_s))              These  operators  output  strings  of  words  to  a              stream.  For the second variant, the  Vocab  object              used  for  interpreting indices needs to be identi-              fied globally by calling the _u_s_e_(_) member  function              on the object.IITTEERRAATTOORRSS       The  VVooccaabbIItteerr class provides iteration over vocabularies.       An iteration returns the  elements  of  a  Vocab  in  some       unspecified, but deterministic order.       When  copied  or  used in initialization of other objects,       VocabIter objects retain the current  ``position''  in  an       iteration.   This  allows nested iterations that enumerate       all pairs of distinct elements, etc.       NOTE: While an iteration over a Vocab object  is  ongoing,       no modifications are allowed to the object, _e_x_c_e_p_t removal       of the ``current'' vocabulary item.       VVooccaabbIItteerr((VVooccaabb &&_v_o_c_a_b,, BBoooolleeaann _s_o_r_t_e_d == ffaallssee))              Creates an iteration over _v_o_c_a_b.  If _s_o_r_t_e_d is  set              to  ttrruuee the vocabulary items will be enumerated in              lexicographic order.       vvooiidd iinniitt(())              Reinitializes the iteration to its beginning.       VVooccaabbSSttrriinngg nneexxtt(())       VVooccaabbSSttrriinngg nneexxtt((VVooccaabbIInnddeexx &&_i_n_d_e_x))              Steps the  iteration  and  returns  the  next  word              string.   Optionally,  the associated word index is              returned in _i_n_d_e_x.  Returns 0 if the vocabulary  is              exhausted.SSEEEE AALLSSOO       LM(3), File(3)BBUUGGSS       There  is  no  good  way  to synchronize VocabIndex values       across multiple Vocab objects.AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1995, 1996 SRI InternationalSRILM              $Date: 1996/07/13 01:35:40 $          Vocab(3)

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -