📄 segment.1

📁 这是一款很好用的工具包
💻 1
字号:
segment(1)                                             segment(1)NNAAMMEE       segment - segment text using N-gram language modelSSYYNNOOPPSSIISS       sseeggmmeenntt [--hheellpp] option ...DDEESSCCRRIIPPTTIIOONN       sseeggmmeenntt  infers  a  most  likely segmentation (location of       segment boundaries) from a text, based on a  segment  lan-       guage  model.  The language model is a standard backoff N-       gram model in ARPA nnggrraamm--ffoorrmmaatt(5), modeling  segmentation       using  the  boundary tags <s> and </s>.  The program reads       in a word sequence, finds the  most  likely  locations  of       segment  boundaries  according  to the language model, and       outputs the word sequence with segment  boundaries  marked       by <s> tags.OOPPTTIIOONNSS       Each  filename  argument  can  be an ASCII file, or a com-       pressed file (name ending in .Z or .gz), or ``-'' to indi-       cate stdin/stdout.       --hheellpp  Print option summary.       --vveerrssiioonn              Print version information.       --oorrddeerr _n              Set the maximal N-gram order to be used, by default              3.  NOTE: The order of the model is not  set  auto-              matically  when  a  model file is read, so the same              file can be used at various orders.       --ddeebbuugg _l_e_v_e_l              Set the debugging output level (0 means  no  debug-              ging  output).   Debugging  messages  are  sent  to              stderr.       --llmm _f_i_l_e              Read the N-gram model from _f_i_l_e.       --tteexxtt _f_i_l_e              Find the text to be  segmented  in  _f_i_l_e.   Default              input is stdin.       --ccoonnttiinnuuoouuss              Process  all  words in the input as one sequence of              words, irrespective of line breaks.  Normally  each              line is processed separately as a word sequence.       --ppoosstteerriioorrss              Use  a  forward-backward  algorithm  to compute the              posterior probabilities of a  segment  boundary  at              each  word  transition,  and hypothesize a boundary              whenever the probability exceeds 0.5.  By default a              Viterbi  algorithm  is used that computes the glob-              ally most likely segmentation.              If --ccoonnttiinnuuoouuss is  specified  as  well,  then  this              option  will  produce  one line of output per word,              containing, respectively, the <s> tag (if appropri-              ate),  the word itself, and the posterior probabil-              ity for a boundary preceding the word.       --uunnkk   Output the unknown word token <unk> for each  input              word  not  in  the  language model vocabulary.  The              default is to output the input word unchanged.       --ssttaagg _s_t_r_i_n_g              Use _s_t_r_i_n_g to mark segment boundaries in  the  out-              put.    Default  is  the  start-of-sentence  symbol              defined in the language model (<s>).       --bbiiaass _b              Make a segment boundary a priori more likely  by  a              factor of _b.  This allows balancing of false detec-              tion/rejection errors.  The default is 1.SSEEEE AALLSSOO       ngram-count(1), ngram-format(5).       A. Stolcke and E. Shriberg, ``Automatic Linguistic Segmen-       tation  of  Spontaneous  Speech,'' _P_r_o_c_. _I_C_S_L_P, 1005-1008,       1996.BBUUGGSS       Only N-grams models up to trigram  order  are  used  accu-       rately.  For higher-order models use the more general hhiidd--       ddeenn--nnggrraamm(1).AAUUTTHHOORR       Andreas Stolcke <stolcke@speech.sri.com>.       Copyright 1997-2004 SRI InternationalSRILM Tools        $Date: 2004/12/03 17:59:01 $        segment(1)
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -