📄 faq.html

📁 一个计算线性支持向量机的matlab源代码
💻 HTML
📖 第 1 页 / 共 3 页
字号:
In general this is good enough but for few difficult
cases (e.g. C very very large) where solutions are huge
numbers, it might be possible that the numerical precision is not
enough using only float.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f404"><b>Q: How do I choose the kernel ?</b></a>
<br/>                                                                                

<p>
In general we suggest you to try the RBF kernel first.
A recent result by Keerthi and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/limit.ps.gz>
download paper here</a>)
shows that if RBF is used with model selection,
then there is no need to consider the linear kernel.
The kernel matrix using sigmoid may not be positive definite
and in general it's accuracy is not better than RBF.
(see the paper by Lin and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/tanh.pdf>
download paper here</a>).
Polynomial kernels are ok but if a high degree is used,
numerical difficulties tend to happen
(thinking about dth power of (<1) goes to 0
and (>1) goes to infinity).
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f405"><b>Q: Does libsvm have special treatments for linear SVM ?</b></a>
<br/>                                                                                

<p>

No, at this point libsvm solves linear/nonlinear SVMs by the
same way.
Note that there are some possible
tricks to save training/testing time if the
linear kernel is used.
Hence libsvm is <b>NOT</b> particularly efficient for linear SVM,
especially for problems whose number of data is much larger
than number of attributes.
If you plan to solve this type of problems, you may want 
to check <a href=http://www.csie.ntu.edu.tw/~cjlin/bsvm>bsvm</a>,
which includes an efficient implementation for
linear SVMs.
More details can be found in the following study:
K.-M. Chung, W.-C. Kao, 
T. Sun, 
and
C.-J. Lin.
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/papers/linear.pdf">
Decomposition Methods for Linear Support Vector Machines
</a>

<p> On the other hand, you do not really need to solve
linear SVMs. See the previous question about choosing
kernels for details.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f406"><b>Q: The number of free support vectors is large. What should I do ?</b></a>
<br/>                                                                                
 <p>
This usually happens when the data are overfitted.
If attributes of your data are in large ranges,
try to scale them. Then the region
of appropriate parameters may be larger.
Note that there is a scale program
in libsvm. 
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f407"><b>Q: Should I scale training and testing data in a similar way ?</b></a>
<br/>                                                                                
<p>
Yes, you can do the following:
<br> svm-scale -s scaling_parameters train_data > scaled_train_data
<br> svm-scale -r scaling_parameters test_data > scaled_test_data
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f408"><b>Q: Does it make a big difference  if I scale each attribute to [0,1] instead of [-1,1] ?</b></a>
<br/>                                                                                

<p>
For the linear scaling method, if the RBF kernel is
used and parameter selection is conducted, there
is no difference. Assume Mi and mi are 
respectively the maximal and minimal values of the
ith attribute. Scaling to [0,1] means
<pre>
                x'=(x-mi)/(Mi-mi)
</pre>
For [-1,1],
<pre>
                x''=2(x-mi)/(Mi-mi)-1.
</pre>
In the RBF kernel,
<pre>
                x'-y'=(x-y)/(Mi-mi), x''-y''=2(x-y)/(Mi-mi).
</pre>
Hence, using (C,g) on the [0,1]-scaled data is the
same as (C,g/2) on the [-1,1]-scaled data.

<p> Though the performance is the same, the computational
time may be different. For data with many zero entries,
[0,1]-scaling keeps the sparsity of input data and hence
may save the time.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f409"><b>Q: The prediction rate is low. How could I improve it ?</b></a>
<br/>                                                                                
<p>
Try to use the model selection tool grid.py in the python
directory find
out good parameters. To see the importance of model selection,
please 
see my  talk:
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf">
A practical guide to support vector 
classification 
</A>
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f410"><b>Q: My data are unbalanced. Could libsvm handle such problems ?</b></a>
<br/>                                                                                
<p>
Yes, there is a -wi options. For example, if you use
<p>
 svm-train -s 0 -c 10 -w1 1 -w-1 5 data_file
<p>
the penalty for class "-1" is larger.
Note that this -w option is for C-SVC only.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f411"><b>Q: What is the difference between nu-SVC and C-SVC ?</b></a>
<br/>                                                                                
<p>
Basically they are the same thing but with different
parameters. The range of C is from zero to infinity
but nu is always between [0,1]. A nice property
of nu is that it is related to the ratio of 
support vectors and the ratio of the training
error.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f412"><b>Q: The program keeps running without showing any output. What should I do ?</b></a>
<br/>                                                                                
<p>
You may want to check your data. Each training/testing
data must be in one line. It cannot be separated.
In addition, you have to remove empty lines.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f413"><b>Q: The program keeps running (with output, i.e. many dots). What should I do ?</b></a>
<br/>                                                                                
<p>
In theory libsvm guarantees to converge if the kernel
matrix is positive semidefinite. 
After version 2.4 it can also handle non-PSD
kernels such as the sigmoid (tanh).
Therefore, this means you are
handling ill-conditioned situations
(e.g. too large/small parameters) so numerical
difficulties occur.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f414"><b>Q: The training time is too long. What should I do ?</b></a>
<br/>                                                                                
<p>
This may happen for some difficult cases (e.g. -c is large).
You can try to use a looser stopping tolerance with -e.
If that still doesn't work, you may want to contact us. We can show you some
tricks on improving the training time.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f415"><b>Q: How do I get the decision value(s) ?</b></a>
<br/>                                                                                
<p>
We print out decision values for regression. For classification,
we solve several binary SVMs for multi-class cases, so 
you obtain values by easily calling the subroutine
svm_predict_values. Their corresponding labels
can be obtained from svm_get_labels. 
Details are in 
README of libsvm package. 

<p>
We do not recommend the following. But if you would
like to get values for 
TWO-class classification with labels +1 and -1
(note: +1 and -1 but not things like 5 and 10)
in the easiest way, simply add 
<pre>
		printf("%f\n", dec_values[0]*model->label[0]);
</pre>
after the line
<pre>
		svm_predict_values(model, x, dec_values);
</pre>
of the file svm.cpp.
Positive (negative)
decision values correspond to data predicted as +1 (-1).


<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f416"><b>Q: For some problem sets if I use a large cache (i.e. large -m) on a linux machine, why sometimes I get "segmentation fault ?"</b></a>
<br/>                                                                                
<p>

On 32-bit machines, the maximum addressable
memory is 4GB. The Linux kernel uses 3:1
split which means user space is 3G and
kernel space is 1G. Although there are
3G user space, the maximum dymanic allocation
memory is 2G. So, if you specify -m near 2G,
the memory will be exhausted. And svm-train
will fail when it asks more memory.
For more details, please read 
<a href=http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3BA164F6.BAFA4FB%40daimi.au.dk>
this article</a>.
<p>
There are two ways to solve this. If your
machine supports Intel's PAE (Physical Address
Extension), you can turn on the option HIGHMEM64G
in Linux kernel which uses 4G:4G split for
kernel and user space. If you don't, you can
try a software `tub' which can elimate the 2G
boundary for dymanic allocated memory. The `tub'
is available at 
<a href=http://www.bitwagon.com/tub.html>http://www.bitwagon.com/tub.html</a>.


<!--

This may happen only  when the cache is large, but each cached row is
not large enough. <b>Note:</b> This problem is specific to 
gnu C library which is used in linux.
The solution is as follows:

<p>
In our program we have malloc() which uses two methods 
to allocate memory from kernel. One is
sbrk() and another is mmap(). sbrk is faster, but mmap 
has a larger address
space. So malloc uses mmap only if the wanted memory size is larger
than some threshold (default 128k).
In the case where each row is not large enough (#elements < 128k/sizeof(float)) but we need a large cache ,
the address space for sbrk can be exhausted. The solution is to
lower the threshold to force malloc to use mmap
and increase the maximum number of chunks to allocate
with mmap.

<p>
Therefore, in the main program (i.e. svm-train.c) you want
to have
<pre>
      #include &lt;malloc.h&gt;
</pre>
and then in main():
<pre>
      mallopt(M_MMAP_THRESHOLD, 32768);
      mallopt(M_MMAP_MAX,1000000);
</pre>
You can also set the environment variables instead
of writing them in the program:
<pre>
$ M_MMAP_MAX=1000000 M_MMAP_THRESHOLD=32768 ./svm-train .....
</pre>
More information can be found by 
<pre>
$ info libc "Malloc Tunable Parameters"
</pre>
-->
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f417"><b>Q: How do I disable screen output of svm-train and svm-predict ?</b></a>
<br/>                                                                                
<p>
Simply update svm.cpp:
<pre>
#if 1
void info(char *fmt,...)
</pre>
to
<pre>
#if 0
void info(char *fmt,...)
</pre>
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f418"><b>Q: I would like to use my own kernel but find out that there are two subroutines for kernel evaluations: k_function() and kernel_function(). Which one should I modify ?</b></a>
<br/>                                                                                
<p>
The reason why we have two functions is as follows:
For the RBF kernel exp(-g |xi - xj|^2), if we calculate
xi - xj first and then the norm square, there are 3n operations.
Thus we consider exp(-g (|xi|^2 - 2dot(xi,xj) +|xj|^2))
and by calculating all |xi|^2 in the beginning, 
the number of operations is reduced to 2n.
This is for the training.  For prediction we cannot
do this so a regular subroutine using that 3n operations is
needed.

The easiest way to have your own kernel is
to  put the same code in these two
subroutines by replacing any kernel.
<p align="right">
<a href="#_TOP">[Go Top]</a>  
<hr/>
  <a name="/Q4:_Training_and_prediction"></a>
<a name="f419"><b>Q: What method does libsvm use for multi-class SVM ? Why don't you use the "1-against-the rest" method ?</b></a>
<br/>                                                                                
<p>
It is one-against-one. We chose it after doing the following
comparison:
C.-W. Hsu and C.-J. Lin.
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/papers/multisvm.ps.gz">
A comparison of methods 
for multi-class support vector machines
</A>, 
<I>IEEE Transactions on Neural Networks</A></I>, 13(2002), 415-425.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -