📄 readme
字号:
NOTE: this file was last updated with r35check updates at: http://code.google.com/p/icsiboostBoosting is a meta-learning approach that aims at combining an ensemble of weak classifiers to form a strong classifier. Adaptive Boosting (Adaboost) implements this idea as a greedy search for a linear combination of classifiers by overweighting the examples that are misclassified by each classifier. icsiboost implements Adaboost over stumps (one-level decision trees) on discrete and continuous attributes (words and real values). See http://en.wikipedia.org/wiki/AdaBoost and the papers by Y. Freund and R. Schapire for more details [1]. This approach is one of most efficient and simple to combine continuous and nominal values. Our implementation is aimed at allowing training from millions of examples by hundreds of features in a reasonable time/memory.svn checkout http://icsiboost.googlecode.com/svn/trunk/ .cd icsiboost./configuremakecd src./test_uci.shProgram usage:USAGE: ./icsiboost [options] -S <stem> --version print version info -S <stem> defines model/data/names stem -n <iterations> number of boosting iterations -E <smoothing> set smoothing value (default=0.5) -V verbose mode -C classification mode -- reads examples from <stdin> -o long output in classification mode --cutoff <freq> consider as unknown (?) nominal features occuring unfrequently (not implemented) --jobs <threads> number of threaded weak learners --do-not-pack-model do not pack model (to get individual training steps) --output-weights output training examples weights at each iterations --model <model> save/load the model to/from this file instead of <stem>.shyp --train <file> bypass the <stem>.data filename to specify training examples --test <file> output additional error rate from an other file during training (can be used multiple times, not implemented) --names <file> use this column description file instead of <stem>.names --ignore <columns> ignore a comma separated list of columns (synonym with "ignore" in name file)The input data is defined in a format similar to the UCI repository (http://www.ics.uci.edu/~mlearn/MLRepository.html). You will have to remove blank lines, comments and add a period at the end of each lines. The data must contain a <stem>.data file with training examples, a <stem>.names file describing the classes/features and may contain a .dev and .test file that will be used for error rate computation (this last feature has been deactivated).icsiboost is still limited -- see the MISSING features. Next releases will focus on code cleanup, stabilization and usability. After that, the project will diverge from Boostexter in providing a different user interface (command line options, file formats...) and in implementing other approaches. In the long term, we may add script bindings (perl, python, ...) and a nice library interface.NOTES: * revision 21: compatible models/classification mode. * revision 13: first usable release. * 64-bit compilation is now working. We are able to use that 32G of memory ;) * threads are working: use --jobs 8 to run 8 weak learners at the same time. MISSING: * scored text (unlikely to be implemented) * n/f/s-grams (will not be implemented, can be emulated by a simple script) * dev/test score at each iterations (deactivated) * multi-labels * discrete AdaBoost.MR/MH (unlikely to be implemented) PAPERS: [1] Robert E. Schapire and Yoram Singer. "BoosTexter: A boosting-based system for text categorization". Machine Learning, 39(2/3):135-168, 2000.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -