📄 faq.html
字号:
<p>
obj is the optimal objective value of the dual SVM problem.
rho is the bias term in the decision function
sgn(w^Tx - rho).
nSV and nBSV are number of support vectors and bounded support
vectors (i.e., alpha_i = C). nu-svm is a somewhat equivalent
form of C-SVM where C is replaced by nu. nu simply shows the
corresponding parameter. More details are in
<a href="http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf">
libsvm document</a>.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f402"><b>Q: Can you explain more about the model file?</b></a>
<br/>
<p>
After the parameters, each line represents a support vector.
Support vectors are listed in the order of "labels" listed earlier.
(i.e., those from the first class in the "labels" list are
grouped first, and so on.)
If k is the total number of classes,
in front of each support vector, there are
k-1 coefficients
y*alpha where alpha are dual solution of the
following two class problems:
<br>
1 vs j, 2 vs j, ..., j-1 vs j, j vs j+1, j vs j+2, ..., j vs k
<br>
and y=1 in first j-1 coefficients, y=-1 in the remaining
k-j coefficients.
For example, if there are 4 classes, the file looks like:
<pre>
+-+-+-+--------------------+
|1|1|1| |
|v|v|v| SVs from class 1 |
|2|3|4| |
+-+-+-+--------------------+
|1|2|2| |
|v|v|v| SVs from class 2 |
|2|3|4| |
+-+-+-+--------------------+
|1|2|3| |
|v|v|v| SVs from class 3 |
|3|3|4| |
+-+-+-+--------------------+
|1|2|3| |
|v|v|v| SVs from class 4 |
|4|4|4| |
+-+-+-+--------------------+
</pre>
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f403"><b>Q: Should I use float or double to store numbers in the cache ?</b></a>
<br/>
<p>
We have float as the default as you can store more numbers
in the cache.
In general this is good enough but for few difficult
cases (e.g. C very very large) where solutions are huge
numbers, it might be possible that the numerical precision is not
enough using only float.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f404"><b>Q: How do I choose the kernel?</b></a>
<br/>
<p>
In general we suggest you to try the RBF kernel first.
A recent result by Keerthi and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/limit.ps.gz>
download paper here</a>)
shows that if RBF is used with model selection,
then there is no need to consider the linear kernel.
The kernel matrix using sigmoid may not be positive definite
and in general it's accuracy is not better than RBF.
(see the paper by Lin and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/tanh.pdf>
download paper here</a>).
Polynomial kernels are ok but if a high degree is used,
numerical difficulties tend to happen
(thinking about dth power of (<1) goes to 0
and (>1) goes to infinity).
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f405"><b>Q: Does libsvm have special treatments for linear SVM?</b></a>
<br/>
<p>
No, at this point libsvm solves linear/nonlinear SVMs by the
same way.
Note that there are some possible
tricks to save training/testing time if the
linear kernel is used.
Hence libsvm is <b>NOT</b> particularly efficient for linear SVM,
especially for
using large C on
problems whose number of data is much larger
than number of attributes.
You can
<ul>
<li>
Use small C only. We have shown in the following paper
that after C is larger than a certain threshold,
the decision function is the same.
<p>
<a href="http://guppy.mpe.nus.edu.sg/~mpessk/">S. S. Keerthi</a>
and
<B>C.-J. Lin</B>.
<A HREF="papers/limit.ps.gz">
Asymptotic behaviors of support vector machines with
Gaussian kernel
</A>
.
<I><A HREF="http://mitpress.mit.edu/journal-home.tcl?issn=08997667">Neural Computation</A></I>, 15(2003), 1667-1689.
<li>
Check <a href=http://www.csie.ntu.edu.tw/~cjlin/bsvm>bsvm</a>,
which includes an efficient implementation for
linear SVMs.
More details can be found in the following study:
<p>
K.-M. Chung, W.-C. Kao,
T. Sun,
and
C.-J. Lin.
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/papers/linear.pdf">
Decomposition Methods for Linear Support Vector Machines.
</A>
<I><A HREF="http://mitpress.mit.edu/journal-home.tcl?issn=08997667">Neural Computation</A></I>,
16(2004), 1689-1704.
</ul>
<p> On the other hand, you do not really need to solve
linear SVMs. See the previous question about choosing
kernels for details.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f406"><b>Q: The number of free support vectors is large. What should I do?</b></a>
<br/>
<p>
This usually happens when the data are overfitted.
If attributes of your data are in large ranges,
try to scale them. Then the region
of appropriate parameters may be larger.
Note that there is a scale program
in libsvm.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f407"><b>Q: Should I scale training and testing data in a similar way?</b></a>
<br/>
<p>
Yes, you can do the following:
<br> svm-scale -s scaling_parameters train_data > scaled_train_data
<br> svm-scale -r scaling_parameters test_data > scaled_test_data
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f408"><b>Q: Does it make a big difference if I scale each attribute to [0,1] instead of [-1,1]?</b></a>
<br/>
<p>
For the linear scaling method, if the RBF kernel is
used and parameter selection is conducted, there
is no difference. Assume Mi and mi are
respectively the maximal and minimal values of the
ith attribute. Scaling to [0,1] means
<pre>
x'=(x-mi)/(Mi-mi)
</pre>
For [-1,1],
<pre>
x''=2(x-mi)/(Mi-mi)-1.
</pre>
In the RBF kernel,
<pre>
x'-y'=(x-y)/(Mi-mi), x''-y''=2(x-y)/(Mi-mi).
</pre>
Hence, using (C,g) on the [0,1]-scaled data is the
same as (C,g/2) on the [-1,1]-scaled data.
<p> Though the performance is the same, the computational
time may be different. For data with many zero entries,
[0,1]-scaling keeps the sparsity of input data and hence
may save the time.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f409"><b>Q: The prediction rate is low. How could I improve it?</b></a>
<br/>
<p>
Try to use the model selection tool grid.py in the python
directory find
out good parameters. To see the importance of model selection,
please
see my talk:
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf">
A practical guide to support vector
classification
</A>
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f410"><b>Q: My data are unbalanced. Could libsvm handle such problems?</b></a>
<br/>
<p>
Yes, there is a -wi options. For example, if you use
<p>
svm-train -s 0 -c 10 -w1 1 -w-1 5 data_file
<p>
the penalty for class "-1" is larger.
Note that this -w option is for C-SVC only.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f411"><b>Q: What is the difference between nu-SVC and C-SVC?</b></a>
<br/>
<p>
Basically they are the same thing but with different
parameters. The range of C is from zero to infinity
but nu is always between [0,1]. A nice property
of nu is that it is related to the ratio of
support vectors and the ratio of the training
error.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f412"><b>Q: The program keeps running without showing any output. What should I do?</b></a>
<br/>
<p>
You may want to check your data. Each training/testing
data must be in one line. It cannot be separated.
In addition, you have to remove empty lines.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f413"><b>Q: The program keeps running (with output, i.e. many dots). What should I do?</b></a>
<br/>
<p>
In theory libsvm guarantees to converge if the kernel
matrix is positive semidefinite.
After version 2.4 it can also handle non-PSD
kernels such as the sigmoid (tanh).
Therefore, this means you are
handling ill-conditioned situations
(e.g. too large/small parameters) so numerical
difficulties occur.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f414"><b>Q: The training time is too long. What should I do?</b></a>
<br/>
<p>
For large problems, please specify enough cache size (i.e.,
-m).
Slow convergence may happen for some difficult cases (e.g. -c is large).
You can try to use a looser stopping tolerance with -e.
If that still doesn't work, you may want to contact us. We can show you some
tricks on improving the training time.
<p>
If you are using polynomial kernels, please check the question on the pow() function.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f415"><b>Q: How do I get the decision value(s)?</b></a>
<br/>
<p>
We print out decision values for regression. For classification,
we solve several binary SVMs for multi-class cases. You
can obtain values by easily calling the subroutine
svm_predict_values. Their corresponding labels
can be obtained from svm_get_labels.
Details are in
README of libsvm package.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -