📄 readme
字号:
Libsvm is a simple, easy-to-use, and efficient software for SVMclassification and regression. It solves C-SVM classification, nu-SVMclassification, one-class-SVM, epsilon-SVM regression, and nu-SVMregression. It also provides an automatic model selection tool forC-SVM classification. This document explains the use of libsvm.Libsvm is available at http://www.csie.ntu.edu.tw/~cjlin/libsvmPlease read the COPYRIGHT file before using libsvm.Table of Contents=================- Quick Start- Installation and Data Format- `svm-train' Usage- `svm-predict' Usage- Tips on Practical Use- Examples- Precomputed Kernels - Library Usage- Java Version- Building Windows Binaries- Additional Tools: Sub-sampling, Parameter Selection, Format checking, etc.- Python Interface- Additional InformationQuick Start===========If you are new to SVM and if the data is not large, please go to `tools' directory and use easy.py after installation. It does everything automatic -- from data scaling to parameter selection.Usage: easy.py training_file [testing_file]More information about parameter selection can be found in`tools/README.'Installation and Data Format============================On Unix systems, type `make' to build the `svm-train' and `svm-predict'programs. Run them without arguments to show the usages of them.On other systems, consult `Makefile' to build them (e.g., see'Building Windows binaries' in this file) or use the pre-builtbinaries (Windows binaries are in the directory `windows').The format of training and testing data file is:<label> <index1>:<value1> <index2>:<value2> ......Each line contains an instance and is ended by a '\n' character. Forclassification, <label> is an integer indicating the class label(multi-class is supported). For regression, <label> is the targetvalue which can be any real number. For one-class SVM, it's not usedso can be any number. Except using precomputed kernels (explained inanother section), <index>:<value> gives a feature (attribute) value.<index> is an integer starting from 1 and <value> is a realnumber. Indices must be in an ASCENDING order. Labels in the testingfile are only used to calculate accuracy or errors. If they areunknown, just fill the first column with any numbers.A sample classification data included in this package is`heart_scale'. To check if your data is in a correct form, use`tools/checkdata.py' (details in `tools/README').Type `svm-train heart_scale', and the program will read the trainingdata and output the model file `heart_scale.model'. If you have a testset called heart_scale.t, then type `svm-predict heart_scale.theart_scale.model output' to see the prediction accuracy. The `output'file contains the predicted class labels.There are some other useful programs in this package.svm-scale: This is a tool for scaling input data file.svm-toy: This is a simple graphical interface which shows how SVM separate data in a plane. You can click in the window to draw data points. Use "change" button to choose class 1, 2 or 3 (i.e., up to three classes are supported), "load" button to load data from a file, "save" button to save data to a file, "run" button to obtain an SVM model, and "clear" button to clear the window. You can enter options in the bottom of the window, the syntax of options is the same as `svm-train'. Note that "load" and "save" consider data in the classification but not the regression case. Each data point has one label (the color) which must be 1, 2, or 3 and two attributes (x-axis and y-axis values) in [0,1]. Type `make' in respective directories to build them. You need Qt library to build the Qt version. (available from http://www.trolltech.com) You need GTK+ library to build the GTK version. (available from http://www.gtk.org) The pre-built Windows binaries are in the `windows' directory. We use Visual C++ on a 32-bit machine, so the maximal cache size is 2GB.`svm-train' Usage=================Usage: svm-train [options] training_set_file [model_file]options:-s svm_type : set type of SVM (default 0) 0 -- C-SVC 1 -- nu-SVC 2 -- one-class SVM 3 -- epsilon-SVR 4 -- nu-SVR-t kernel_type : set type of kernel function (default 2) 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) 3 -- sigmoid: tanh(gamma*u'*v + coef0) 4 -- precomputed kernel (kernel values in training_set_file)-d degree : set degree in kernel function (default 3)-g gamma : set gamma in kernel function (default 1/k)-r coef0 : set coef0 in kernel function (default 0)-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)-m cachesize : set cache memory size in MB (default 100)-e epsilon : set tolerance of termination criterion (default 0.001)-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)-b probability_estimates: whether to train an SVC or SVR model for probability estimates, 0 or 1 (default 0)-wi weight: set the parameter C of class i to weight*C in C-SVC (default 1)-v n: n-fold cross validation modeThe k in the -g option means the number of attributes in the input data.option -v randomly splits the data into n parts and calculates crossvalidation accuracy/mean squared error on them.`svm-predict' Usage===================Usage: svm-predict [options] test_file model_file output_fileoptions:-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); for one-class SVM only 0 is supportedmodel_file is the model file generated by svm-train.test_file is the test data you want to predict.svm-predict will produce output in the output_file.Tips on Practical Use=====================* Scale your data. For example, scale each attribute to [0,1] or [-1,+1].* For C-SVC, consider using the model selection tool in the tools directory.* nu in nu-SVC/one-class-SVM/nu-SVR approximates the fraction of training errors and support vectors.* If data for classification are unbalanced (e.g. many positive and few negative), try different penalty parameters C by -wi (see examples below).* Specify larger cache size (i.e., larger -m) for huge problems.Examples========> svm-scale -l -1 -u 1 -s range train > train.scale> svm-scale -r range test > test.scaleScale each feature of the training data to be in [-1,1]. Scalingfactors are stored in the file range and then used for scaling thetest data.> svm-train -s 0 -c 5 -t 2 -g 0.5 -e 0.1 data_file Train a classifier with RBF kernel exp(-0.5|u-v|^2), C=10, andstopping tolerance 0.1.> svm-train -s 3 -p 0.1 -t 0 data_fileSolve SVM regression with linear kernel u'v and epsilon=0.1in the loss function.> svm-train -c 10 -w1 1 -w-1 5 data_fileTrain a classifier with penalty 10 for class 1 and penalty 50for class -1.> svm-train -s 0 -c 100 -g 0.1 -v 5 data_fileDo five-fold cross validation for the classifier usingthe parameters C = 100 and gamma = 0.1> svm-train -s 0 -b 1 data_file> svm-predict -b 1 test_file data_file.model output_fileObtain a model with probability information and predict test data withprobability estimatesPrecomputed Kernels ===================Users may precompute kernel values and input them as training andtesting files. Then libsvm does not need the originaltraining/testing sets.Assume there are L training instances x1, ..., xL and. Let K(x, y) be the kernelvalue of two instances x and y. The input formatsare:New training instance for xi:<label> 0:i 1:K(xi,x1) ... L:K(xi,xL) New testing instance for any x:<label> 0:? 1:K(x,x1) ... L:K(x,xL) That is, in the training file the first column must be the "ID" ofxi. In testing, ? can be any value.All kernel values including ZEROs must be explicitly provided. Anypermutation or random subsets of the training/testing files are alsovalid (see examples below).Note: the format is slightly different from the precomputed kernelpackage released in libsvmtools earlier.Examples: Assume the original training data has three four-feature instances and testing data has one instance: 15 1:1 2:1 3:1 4:1 45 2:3 4:3 25 3:1 15 1:1 3:1 If the linear kernel is used, we have the following new training/testing sets: 15 0:1 1:4 2:6 3:1 45 0:2 1:6 2:18 3:0 25 0:3 1:1 2:0 3:1 15 0:? 1:2 2:0 3:1 ? can be any value. Any subset of the above training file is also valid. For example, 25 0:3 1:1 2:0 3:1 45 0:2 1:6 2:18 3:0 implies that the kernel matrix is [K(2,2) K(2,3)] = [18 0] [K(3,2) K(3,3)] = [0 1]Library Usage=============These functions and structures are declared in the header file `svm.h'.You need to #include "svm.h" in your C/C++ source files and link yourprogram with `svm.cpp'. You can see `svm-train.c' and `svm-predict.c'for examples showing how to use them.Before you classify test data, you need to construct an SVM model(`svm_model') using training data. A model can also be saved ina file for later use. Once an SVM model is available, you can use itto classify new data.- Function: struct svm_model *svm_train(const struct svm_problem *prob, const struct svm_parameter *param); This function constructs and returns an SVM model according to the given training data and parameters. struct svm_problem describes the problem: struct svm_problem { int l; double *y; struct svm_node **x; }; where `l' is the number of training data, and `y' is an array containing their target values. (integers in classification, real numbers in regression) `x' is an array of pointers, each of which points to a sparse representation (array of svm_node) of one training vector. For example, if we have the following training data: LABEL ATTR1 ATTR2 ATTR3 ATTR4 ATTR5 ----- ----- ----- ----- ----- ----- 1 0 0.1 0.2 0 0
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -