📄 segment.1
字号:
.\" $Id: segment.1,v 1.6 2004/12/03 17:59:01 stolcke Exp $.TH segment 1 "$Date: 2004/12/03 17:59:01 $" "SRILM Tools".SH NAMEsegment \- segment text using N-gram language model.SH SYNOPSIS.B segment[\c.BR \-help ]option\&....SH DESCRIPTION.B segmentinfers a most likely segmentation (location of segment boundaries)from a text, based on a segment language model.The language model is a standard backoff N-gram model in ARPA.BR ngram-format (5),modeling segmentation using the boundary tags <s> and </s>.The program reads in a word sequence, finds the most likely locations of segment boundaries according to the language model, and outputs the word sequence with segment boundaries marked by <s> tags..SH OPTIONS.PPEach filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicatestdin/stdout..TP.B \-helpPrint option summary..TP.B \-versionPrint version information..TP.BI \-order " n"Set the maximal N-gram order to be used, by default 3.NOTE: The order of the model is not set automatically when a modelfile is read, so the same file can be used at various orders..TP.BI \-debug " level"Set the debugging output level (0 means no debugging output).Debugging messages are sent to stderr..TP.BI \-lm " file"Read the N-gram model from.IR file ..TP.BI \-text " file"Find the text to be segmented in .IR file .Default input is stdin..TP.B \-continuousProcess all words in the input as one sequence of words, irrespective ofline breaks.Normally each line is processed separately as a word sequence..TP.B \-posteriorsUse a forward-backward algorithm to compute the posterior probabilitiesof a segment boundary at each word transition, and hypothesize a boundarywhenever the probability exceeds 0.5.By default a Viterbi algorithm is used that computesthe globally most likely segmentation..brIf.B \-continuous is specified as well,then this option will produce one line of output per word, containing,respectively, the <s> tag (if appropriate), the word itself, and the posterior probability for a boundary preceding the word..TP.B \-unkOutput the unknown word token <unk> for each input word not in the language model vocabulary.The default is to output the input word unchanged..TP.BI \-stag " string"Use.I stringto mark segment boundaries in the output.Default is the start-of-sentence symbol defined in the language model (<s>)..TP.BI \-bias " b"Make a segment boundary a priori more likely by a factor of.IR b .This allows balancing of false detection/rejection errors.The default is 1..SH "SEE ALSO"ngram-count(1), ngram-format(5)..brA. Stolcke and E. Shriberg, ``Automatic Linguistic Segmentation ofSpontaneous Speech,'' \fIProc. ICSLP\fP, 1005\-1008, 1996..SH BUGSOnly N-grams models up to trigram order are used accurately.For higher-order models use the more general .BR hidden-ngram (1)..SH AUTHORAndreas Stolcke <stolcke@speech.sri.com>..brCopyright 1997\-2004 SRI International
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -