📄 opencvref_ml.htm

📁 Simple ellipse fitting example on C++Builder6 + OpenCV1.0.
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<dt>split_point<dd>Used internally by the training algorithm.
</dl>


<hr><h3><a name="decl_CvDTreeNode">CvDTreeNode</a></h3>
<p class="Blurb">Decision tree node</p>
<pre>
struct CvDTreeNode
{
    int class_idx;
    int Tn;
    double value;

    CvDTreeNode* parent;
    CvDTreeNode* left;
    CvDTreeNode* right;

    CvDTreeSplit* split;

    int sample_count;
    int depth;
    ...
};
</pre>
<p><dl>
<dt>value<dd>The value assigned to the tree node. It is either a class label,
             or the estimated function value.
<dt>class_idx<dd>The assigned to the node
    normalized class index (to 0..class_count-1 range), it is used internally in classification trees
    and tree ensembles.
<dt>Tn<dd>The tree index in a ordered sequence of trees. The indices are used during and
          after the pruning procedure. The root node has the maximum value <code>Tn</code>
          of the whole tree, child nodes have <code>Tn</code> less than or equal to
          the parent's <code>Tn</code>,
          and the nodes with <code>Tn&le;<a href="#decl_CvDTree">CvDTree</a>::pruned_tree_idx</code> are not taken
          into consideration at the prediction stage (the corresponding branches are
considered as cut-off), even
          if they have not been physically deleted from the tree at the pruning stage.
<dt>parent, left, right<dd>Pointers to the parent node, left and right child nodes.
<dt>split<dd>Pointer to the first (primary) split.
<dt>sample_count<dd>The number of samples that fall into the node at the training stage.
             It is used to resolve the difficult cases - when the variable for the primary split
             is missing, and all the variables for other surrogate splits are
missing too,<br>the sample is
             directed to the left if <code>left-&gt;sample_count&gt;right-&gt;sample_count</code> and
             to the right otherwise.
<dt>depth<dd>The node depth, the root node depth is 0, the child nodes depth is the parent's depth + 1.
</dl>
<p>Other numerous fields of <code>CvDTreeNode</code> are used internally at the training stage.</p>


<hr><h3><a name="decl_CvDTreeParams">CvDTreeParams</a></h3>
<p class="Blurb">Decision tree training parameters</p>
<pre>
struct CvDTreeParams
{
    int max_categories;
    int max_depth;
    int min_sample_count;
    int cv_folds;
    bool use_surrogates;
    bool use_1se_rule;
    bool truncate_pruned_tree;
    float regression_accuracy;
    const float* priors;

    CvDTreeParams() : max_categories(10), max_depth(INT_MAX), min_sample_count(10),
        cv_folds(10), use_surrogates(true), use_1se_rule(true),
        truncate_pruned_tree(true), regression_accuracy(0.01f), priors(0)
    {}

    CvDTreeParams( int _max_depth, int _min_sample_count,
                   float _regression_accuracy, bool _use_surrogates,
                   int _max_categories, int _cv_folds,
                   bool _use_1se_rule, bool _truncate_pruned_tree,
                   const float* _priors );
};
</pre>
<p><dl>
<dt>max_depth<dd>This parameter specifies the maximum possible depth of the 
tree. That is the training algorithms attempts to split a node while its depth
                 is less than <code>max_depth</code>. The actual depth
may
                 be smaller if the other termination criteria are met
                 (see the outline of the training procedure in the beginning of the section),
                 and/or if the tree is pruned.
<dt>min_sample_count<dd>A node is not split if the number of samples directed to the node
                 is less than the parameter value.
<dt>regression_accuracy<dd>Another stop criteria - only for regression trees. As soon as
                 the estimated node value differs from the node training samples responses
                 by less than the parameter value, the node is not split further.
<dt>use_surrogates<dd>If <code>true</code>, surrogate splits are built. Surrogate splits are
                 needed to handle missing measurements and for variable importance estimation.
<dt>max_categories<dd>If a discrete variable, on which the training procedure tries to make a split,
                 takes more than <code>max_categories</code> values, the precise best subset
                 estimation may take a very long time (as the algorithm is exponential).
                 Instead, many decision trees engines (including ML) try to find sub-optimal split
                 in this case by clustering all the samples into <code>max_categories</code> clusters
                 (i.e. some categories are merged together).<br>
                 Note that this technique is used only in <code>N(&gt;2)</code>-class classification problems.
                 In case of regression and 2-class classification the optimal split can be found efficiently
                 without employing clustering, thus the parameter is not used in these cases.
<dt>cv_folds<dd>If this parameter is &gt;1, the tree is pruned using <code>cv_folds</code>-fold
                cross validation.
<dt>use_1se_rule<dd>If <code>true</code>, the tree is truncated a bit more by the pruning procedure.
                That leads to compact, and more resistant to the training data noise, but a bit less
                accurate decision tree.
<dt>truncate_pruned_tree<dd>If <code>true</code>, the cut off nodes
                (with <code>Tn</code>&le;<code>CvDTree::pruned_tree_idx</code>) are physically
                removed from the tree. Otherwise they are kept, and by decreasing
                <code>CvDTree::pruned_tree_idx</code> (e.g. setting it to -1)
                it is still possible to get the results from the original unpruned
                (or pruned less aggressively) tree.
<dt>priors<dd>The array of a priori class probabilities, sorted by the class label value.
              The parameter can be used to tune the decision tree preferences toward a certain class.
              For example, if users want to detect some rare anomaly occurrence, the training
              base will likely contain much more normal cases than anomalies, so
a very good classification
              performance will be achieved just by considering every case as normal. To avoid this, the priors
can be specified, where the anomaly probability is artificially increased
              (up to 0.5 or even greater), so the weight of the misclassified anomalies
becomes much bigger,
              and the tree is adjusted properly.
              <p>A note about memory management: the field <code>priors</code>
              is a pointer to the array of floats. The array should be allocated by user, and
              released just after the <code>CvDTreeParams</code> structure is passed to
              <a href="#decl_CvDTreeTrainData">CvDTreeTrainData</a> or
              <a href="#decl_CvDTree">CvDTree</a> constructors/methods (as the methods
              make a copy of the array).
</dl>
<p>
The structure contains all the decision tree training parameters.
There is a default constructor that initializes all the parameters with the default values
tuned for standalone classification tree. Any of the parameters can be
overridden then,
or the structure may be fully initialized using the advanced variant of the constructor.</p>


<hr><h3><a name="decl_CvDTreeTrainData">CvDTreeTrainData</a></h3>
<p class="Blurb">Decision tree training data and shared data for tree ensembles</p>
<pre>
struct CvDTreeTrainData
{
    CvDTreeTrainData();
    CvDTreeTrainData( const CvMat* _train_data, int _tflag,
                      const CvMat* _responses, const CvMat* _var_idx=0,
                      const CvMat* _sample_idx=0, const CvMat* _var_type=0,
                      const CvMat* _missing_mask=0,
                      const CvDTreeParams& _params=CvDTreeParams(),
                      bool _shared=false, bool _add_labels=false );
    virtual ~CvDTreeTrainData();

    virtual void set_data( const CvMat* _train_data, int _tflag,
                          const CvMat* _responses, const CvMat* _var_idx=0,
                          const CvMat* _sample_idx=0, const CvMat* _var_type=0,
                          const CvMat* _missing_mask=0,
                          const CvDTreeParams& _params=CvDTreeParams(),
                          bool _shared=false, bool _add_labels=false,
                          bool _update_data=false );

    virtual void get_vectors( const CvMat* _subsample_idx,
         float* values, uchar* missing, float* responses, bool get_class_idx=false );

    virtual CvDTreeNode* subsample_data( const CvMat* _subsample_idx );

    virtual void write_params( CvFileStorage* fs );
    virtual void read_params( CvFileStorage* fs, CvFileNode* node );

    // release all the data
    virtual void clear();

    int get_num_classes() const;
    int get_var_type(int vi) const;
    int get_work_var_count() const;

    virtual int* get_class_labels( CvDTreeNode* n );
    virtual float* get_ord_responses( CvDTreeNode* n );
    virtual int* get_labels( CvDTreeNode* n );
    virtual int* get_cat_var_data( CvDTreeNode* n, int vi );
    virtual CvPair32s32f* get_ord_var_data( CvDTreeNode* n, int vi );
    virtual int get_child_buf_idx( CvDTreeNode* n );

    ////////////////////////////////////

    virtual bool set_params( const CvDTreeParams& params );
    virtual CvDTreeNode* new_node( CvDTreeNode* parent, int count,
                                   int storage_idx, int offset );

    virtual CvDTreeSplit* new_split_ord( int vi, float cmp_val,
                int split_point, int inversed, float quality );
    virtual CvDTreeSplit* new_split_cat( int vi, float quality );
    virtual void free_node_data( CvDTreeNode* node );
    virtual void free_train_data();
    virtual void free_node( CvDTreeNode* node );

    int sample_count, var_all, var_count, max_c_count;
    int ord_var_count, cat_var_count;
    bool have_labels, have_priors;
    bool is_classifier;

    int buf_count, buf_size;
    bool shared;

    CvMat* cat_count;
    CvMat* cat_ofs;
    CvMat* cat_map;

    CvMat* counts;
    CvMat* buf;
    CvMat* direction;
    CvMat* split_buf;

    CvMat* var_idx;
    CvMat* var_type; // i-th element =
                     //   k<0  - ordered
                     //   k>=0 - categorical, see k-th element of cat_* arrays
    CvMat* priors;

    CvDTreeParams params;

    CvMemStorage* tree_storage;
    CvMemStorage* temp_storage;

    CvDTreeNode* data_root;

    CvSet* node_heap;
    CvSet* split_heap;
    CvSet* cv_heap;
    CvSet* nv_heap;

    CvRNG rng;
};
</pre>
<p>
This structure is mostly used internally for storing both standalone trees and tree ensembles
efficiently. Basically, it contains 3 types of information:
<ol>
<li>The training parameters, <a href="#decl_CvDTreeParams">CvDTreeParams</a> instance.
<li>The training data, preprocessed in order to find the best splits more efficiently.
    For tree ensembles this preprocessed data is reused by all the trees.
    Additionally, the training data characteristics that are shared by
    all trees in the ensemble are stored here: variable types,
    the number of classes, class label compression map etc.
<li>Buffers, memory storages for tree nodes, splits and other elements of the trees constructed.
</ol>
<p>
There are 2 ways of using this structure.
In simple cases (e.g. standalone tree,
or ready-to-use "black box" tree ensemble from ML, like <a href=#ch_randomforest>Random Trees</a>
or <a href=#ch_boosting>Boosting</a>) there is no need to care or even to know about the structure -
just construct the needed statistical model, train it and use it. The <code>CvDTreeTrainData</code>
structure will be constructed and used internally. However, for custom tree algorithms,
or another sophisticated cases, the structure may be constructed and used explicitly.
The scheme is the following:
<ol>
<li>The structure is initialized using the default constructor, followed by
<code>set_data</code> (or it is built using the full form of constructor).
The parameter <code>_shared</code> must be set to <code>true</code>.
<li>One or more trees are trained using this data, see the special form of the method
<a href="#decl_CvDTree_train">CvDTree::train</a>.
<li>Finally, the structure can be released only after all the trees using it are released.
</ol>
</p>


<hr><h3><a name="decl_CvDTree">CvDTree</a></h3>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -