📄 opencvref_ml.htm

📁 Simple ellipse fitting example on C++Builder6 + OpenCV1.0.
💻 HTM
📖 第 1 页 / 共 5 页
字号:
an optimal linear discriminating function in this space (or an optimal hyperplane that fits into
the training data, ...). In case of SVM the kernel is not defined explicitly. Instead, a distance
between any 2 points in the hyperspace needs to be defined.</p>
<p>The solution is optimal in a sense that the margin between the separating hyperplane and the
nearest feature vectors from the both classes (in case of 2-class classifier) is maximal. The
feature vectors that are the closest to the hyperplane are called "support vectors", meaning that
the position of other vectors does not affect the hyperplane (the decision function).</p>

<!-- TODO: insert formulae -->

<p>There are a lot of good references on SVM. Here are only a few ones to start with.</p>
<b>[Burges98] C. Burges. "A tutorial on support vector machines for pattern recognition", Knowledge Discovery and Data
Mining 2(2), 1998.</b><br>
(available online at <a href="http://citeseer.ist.psu.edu/burges98tutorial.html">http://citeseer.ist.psu.edu/burges98tutorial.html</a>).<br>
<b>LIBSVM - A Library for Support Vector Machines. By Chih-Chung Chang and Chih-Jen Lin</b><br>
(<a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">http://www.csie.ntu.edu.tw/~cjlin/libsvm/</a>)


<hr><h3><a name="decl_CvSVM">CvSVM</a></h3>
<p class="Blurb">Support Vector Machines</p>
<pre>
class CvSVM : public CvStatModel
{
public:
    // SVM type
    enum { C_SVC=100, NU_SVC=101, ONE_CLASS=102, EPS_SVR=103, NU_SVR=104 };

    // SVM kernel type
    enum { LINEAR=0, POLY=1, RBF=2, SIGMOID=3 };

    CvSVM();
    virtual ~CvSVM();

    CvSVM( const CvMat* _train_data, const CvMat* _responses,
           const CvMat* _var_idx=0, const CvMat* _sample_idx=0,
           CvSVMParams _params=CvSVMParams() );

    virtual bool train( const CvMat* _train_data, const CvMat* _responses,
                        const CvMat* _var_idx=0, const CvMat* _sample_idx=0,
                        CvSVMParams _params=CvSVMParams() );

    virtual float predict( const CvMat* _sample ) const;
    virtual int get_support_vector_count() const;
    virtual const float* get_support_vector(int i) const;
    virtual void clear();

    virtual void save( const char* filename, const char* name=0 );
    virtual void load( const char* filename, const char* name=0 );

    virtual void write( CvFileStorage* storage, const char* name );
    virtual void read( CvFileStorage* storage, CvFileNode* node );
    int get_var_count() const { return var_idx ? var_idx->cols : var_all; }

protected:
    ...
};
</pre>

<hr><h3><a name="decl_CvSVMParams">CvSVMParams</a></h3>
<p class="Blurb">SVM training parameters</p>
<pre>
struct CvSVMParams
{
    CvSVMParams();
    CvSVMParams( int _svm_type, int _kernel_type,
                 double _degree, double _gamma, double _coef0,
                 double _C, double _nu, double _p,
                 CvMat* _class_weights, CvTermCriteria _term_crit );

    int         svm_type;
    int         kernel_type;
    double      degree; // for poly
    double      gamma;  // for poly/rbf/sigmoid
    double      coef0;  // for poly/sigmoid

    double      C;  // for CV_SVM_C_SVC, CV_SVM_EPS_SVR and CV_SVM_NU_SVR
    double      nu; // for CV_SVM_NU_SVC, CV_SVM_ONE_CLASS, and CV_SVM_NU_SVR
    double      p; // for CV_SVM_EPS_SVR
    CvMat*      class_weights; // for CV_SVM_C_SVC
    CvTermCriteria term_crit; // termination criteria
};
</pre>
<p><dl>
<dt>svm_type<dd>Type of SVM, one of the following types:<br>
                CvSVM::C_SVC - n-class classification (n>=2), allows imperfect separation of classes with
                               penalty multiplier <code>C</code> for outliers.<br>
                CvSVM::NU_SVC - n-class classification with possible imperfect separation. Parameter <code>nu</code>
                                (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of <code>C</code>.<br>
                CvSVM::ONE_CLASS - one-class SVM. All the training data are from the same class, SVM builds
                                   a boundary that separates the class from the rest of the feature space.<br>
                CvSVM::EPS_SVR - regression. The distance between feature vectors from the training set and
                                 the fitting hyperplane must be less than <code>p</code>. For outliers
                                 the penalty multiplier <code>C</code> is used.<br>
                CvSVM::NU_SVR - regression; <code>nu</code> is used instead of <code>p</code>.
<dt>kernel_type<dd>The kernel type, one of the following types:<br>
                CvSVM::LINEAR - no mapping is done, linear discrimination (or regression) is done in the original feature space.
                                It is the fastest option. <em>d(x,y) = x&bull;y == (x,y)</em><br>
                CvSVM::POLY - polynomial kernel: <em>d(x,y) = (gamma*(x&bull;y)+coef0)<sup>degree</sup></em><br>
                CvSVM::RBF - radial-basis-function kernel; a good choice in most cases:
                             <em>d(x,y) = exp(-gamma*|x-y|<sup>2</sup>)</em><br>
                CvSVM::SIGMOID - sigmoid function is used as a kernel:
                             <em>d(x,y) = tanh(gamma*(x&bull;y)+coef0)</em><br>
<dt>degree, gamma, coef0<dd>Parameters of the kernel, see the formulas above.
<dt>C, nu, p<dd>Parameters in the generalized SVM optimization problem.
<dt>class_weights<dd>Optional weights, assigned to particular classes.
                They are multiplied by <code>C</code> and thus affect the misclassification penalty for different classes.
                The larger weight, the larger penalty on misclassification of data from the corresponding class.
<dt>term_crit<dd>Termination procedure for iterative SVM training procedure
                 (which solves a partial case of constrained quadratic optimization problem)
</dl><p>
The structure must be initialized and passed to the training method of <a href="#decl_CvSVM">CvSVM</a>


<hr><h3><a name="decl_CvSVM_train">CvSVM::train</a></h3>
<p class="Blurb">Trains SVM</p>
<pre>
bool CvSVM::train( const CvMat* _train_data, const CvMat* _responses,
                   const CvMat* _var_idx=0, const CvMat* _sample_idx=0,
                   CvSVMParams _params=CvSVMParams() );
</pre>
The method trains the SVM model. It follows the conventions of
generic <a href="#decl_CvStatModel_train">train</a> "method" with the following limitations:
only CV_ROW_SAMPLE data layout is supported, the input variables are all ordered,
the output variables can be either categorical (<code>_params.svm_type=CvSVM::C_SVC</code> or
<code>_params.svm_type=CvSVM::NU_SVC</code>), or ordered
(<code>_params.svm_type=CvSVM::EPS_SVR</code> or
<code>_params.svm_type=CvSVM::NU_SVR</code>), or not required at all
(<code>_params.svm_type=CvSVM::ONE_CLASS</code>),
missing measurements are not supported.</p>
<p>All the other parameters are gathered in <a href="#decl_CvSVMParams">CvSVMParams</a>
structure.</p>


<hr><h3><a name="decl_CvSVM_get_support_vector">CvSVM::get_support_vector*</a></h3>
<p class="Blurb">Retrieves the number of support vectors and the particular vector</p>
<pre>
int CvSVM::get_support_vector_count() const;
const float* CvSVM::get_support_vector(int i) const;
</pre>
<p>The methods can be used to retrieve the set of support vectors.</p>


<!-- *****************************************************************************************
     *****************************************************************************************
     ***************************************************************************************** -->

<hr><h2><a name="ch_dtree">Decision Trees</a></h2>

<p>The ML classes discussed in this section implement
Classification And Regression Tree algorithms, which is described in
<a href="#paper_Brieman84">[Brieman84]</a>.</p>
<p>The class <a href="#decl_CvDTree">CvDTree</a> represents a single decision tree that
may be used alone, or as a base class in tree ensembles
(see <a href=#ch_boosting>Boosting</a> and <a href=#ch_randomforest>Random Trees</a>).</p>
<p>Decision tree is a binary tree (i.e. tree where each non-leaf node has exactly 2 child nodes).
It can be used either for classification, when
each tree leaf is marked with some class label (multiple leafs may have the same label),
or for regression, when each tree leaf is also assigned a constant
(so the approximation function is piecewise constant).
<h3>Predicting with Decision Trees</h3>
<p>To reach a leaf node, and thus
to obtain a response for the input feature vector, the prediction procedure starts
with the root node. From each non-leaf node the procedure goes to the left (i.e. selects the
left child node as the next observed node), or to the right based on the value of
a certain variable, which index is stored in the observed node. The variable can be either
ordered or categorical. In the first case, the variable value is compared with the certain threshold
(which is also stored in the node); if the value is less than the threshold, the
procedure goes to the left,
otherwise, to the right (for example, if the weight is less than 1 kilo, the
procedure goes to the left,
else to the right). And in the second case the discrete variable value is tested, whether it
belongs to a certain subset of values (also stored in the node)
from a limited set of values the variable could take;
if yes, the procedure goes to the left, else - to the right (for example,
if the color is green or red, go to the left, else to the right).
That is, in each node, a pair of entities (&lt;variable_index&gt;, &lt;decision_rule (threshold/subset)&gt;)
is used.
This pair is called split (split on the variable #&lt;variable_index&gt;).
Once a leaf node is reached, the value assigned to this node is used as the output of prediction procedure.</p>
<p>Sometimes, certain features of the input vector are missed (for example, in the darkness
it is difficult to determine the object color), and the prediction procedure
may get stuck in the certain node (in the mentioned example if the node is split by color).
To avoid such situations, decision trees use so-called
surrogate splits. That is, in addition to the best "primary" split, every tree node may also
be split on one or more other variables with nearly the same results.</p>

<h3>Training Decision Trees</h3>
<p>The tree is built recursively, starting from the root node. The whole training data (feature
vectors and the responses) are used to split the root node. In each node the optimum
decision rule (i.e. the best &quot;primary&quot; split) is found based on some criteria (in ML <em>gini</em> "purity" criteria is used
for classification, and sum of squared errors is used for regression). Then, if necessary,
the surrogate
splits are found that resemble at the most the results of the primary split on
the training data; all data are divided using the primary and the surrogate splits
(just like it is done in the prediction procedure)
between the left and the right child node. Then the procedure recursively splits both left and right
nodes etc. At each node the recursive procedure may stop (i.e. stop splitting the node further)
in one of the following cases:<br>
<ul>
<li>depth of the tree branch being constructed has reached the specified maximum value.
<li>number of training samples in the node is less than the specified threshold, i.e.
    it is not statistically representative set to split the node further.
<li>all the samples in the node belong to the same class (or, in case of regression,
    the variation is too small).
<li>the best split found does not give any noticeable improvement comparing to just a random
    choice.
</ul>
</p>
<p>When the tree is built, it may be pruned using cross-validation procedure, if need.
That is, some branches of the tree that may lead to the model overfitting are cut off.
Normally, this procedure is only applied to standalone decision trees, while tree ensembles
usually build small enough trees and use their own protection schemes against overfitting.
</p>

<h3>Variable importance</h3>
<p>
Besides the obvious use of decision trees - prediction, the tree can be also 
used
for various data analysis.
One of the key properties of the constructed decision tree algorithms is that it is possible
to compute importance (relative decisive power) of each variable. For example, in a spam
filter that uses a set of words occurred in the message as a feature vector, the variable importance
rating can be used to determine the most "spam-indicating" words and thus help to keep the dictionary
size reasonable.</p>
<p>Importance of each variable is computed over all the splits on this variable in the tree, primary
and surrogate ones. Thus, to compute variable importance correctly, the surrogate splits must be enabled
in the training parameters, even if there is no missing data.</p>

<p><a name="paper_Brieman84"><b>[Brieman84]
Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), "Classification and Regression Trees", Wadsworth.
</b></a></p>


<hr><h3><a name="decl_CvDTreeSplit">CvDTreeSplit</a></h3>
<p class="Blurb">Decision tree node split</p>
<pre>
struct CvDTreeSplit
{
    int var_idx;
    int inversed;
    float quality;
    CvDTreeSplit* next;
    union
    {
        int subset[2];
        struct
        {
            float c;
            int split_point;
        }
        ord;
    };
};
</pre>
<p><dl>
<dt>var_idx<dd>Index of the variable used in the split
<dt>inversed<dd>When it equals to 1, the inverse split rule is used
(i.e. left and right branches are exchanged in the expressions below)
<dt>quality<dd>The split quality, a positive number. It is used to choose the 
best primary split, then to choose and sort the surrogate splits.
After the tree is constructed, it is also used to compute variable importance.
<dt>next<dd>Pointer to the next split in the node split list.
<dt>subset<dd>Bit array indicating the value subset in case of split on a categorical variable.<br>
              The rule is: <code>if var_value in subset then next_node&lt;-left else next_node&lt;-right</code>
<dt>c<dd>The threshold value in case of split on an ordered variable.<br>
         The rule is: <code>if var_value &lt; c then next_node&lt;-left else next_node&lt;-right</code>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -