libbow-desc.texi
来自「良好的代码实现」· TEXI 代码 · 共 41 行
TEXI
41 行
@samp{Libbow} is a library of C code intended for writing statisticaltext-processing programs. This distribution includes the library, aswell as a text classification front-end, and a document retrievalfront-end.@formatThe library provides facilities for: Recursively descending directories, finding text files. Finding `document' boundaries when there are multiple docs per file. Tokenizing a text file, according to several different methods. Including N-grams among the tokens. Mapping strings to integers and back again, very efficiently. Building a sparse matrix of document/token counts. Pruning vocabulary by occurrence counts or by information gain. Building and manipulating word vectors. Setting word vector weights according to NaiveBayes, TFIDF, and a simple form of Probabilistic Indexing. Scoring queries for retrieval or classification. Writing all data structures to disk in a machine-architecture- independent format. Reading the document/token matrix from disk in an efficient, sparse fashion. Performing test/train splits, and automatic classification tests.@end format It should compile on most UNIX systems, and WindowsNT (with a GNU buildenvironment).The code conforms to the GNU coding standards. It is released under theLibrary GNU Public License.@formatThe library does not: Have parsing facilities. Do smoothing across N-gram models. Claim to be finished. Have good documentation. Claim to be bug-free. ...many other things.@end format
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?