📄 机器学习中文参考手册 - opencv china.htm

📁 When I use opencv, I use this very useful paper to begin the study. This is all I searched from the
💻 HTM
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
<DIV class=editsection style="FLOAT: right; MARGIN-LEFT: 5px">[<A 
title=机器学习中文参考手册 
href="http://www.opencv.org.cn/index.php?title=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B8%AD%E6%96%87%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C&amp;action=edit&amp;section=28">编辑</A>]</DIV><A 
name=.E5.B8.B8.E7.94.A8libSVM.E8.B5.84.E6.96.99.E9.93.BE.E6.8E.A5></A>
<H2>常用libSVM资料链接 </H2>
<P><A class="external text" title=http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 
href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/" 
rel=nofollow>官方站点，有一些tutorial和测试数据</A> </P>
<P><A class="external text" title=http://bbs.ir-lab.org/cgi-bin/leoboard.cgi 
href="http://bbs.ir-lab.org/cgi-bin/leoboard.cgi" 
rel=nofollow>哈工大的机器学习论坛，非常好</A> </P>
<P>上交的一个研究生还写过libsvm2.6版的代码中文注释，源链接找不着了，大家自己搜搜吧，写得很好，上海交通大学模式分析与机器智能实验室。 </P>
<DIV class=editsection style="FLOAT: right; MARGIN-LEFT: 5px">[<A 
title=机器学习中文参考手册 
href="http://www.opencv.org.cn/index.php?title=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B8%AD%E6%96%87%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C&amp;action=edit&amp;section=29">编辑</A>]</DIV><A 
name=Decision_Trees></A>
<H1>Decision Trees</H1>
<P>The ML classes discussed in this section implement Classification And 
Regression Tree algorithms, which is described in [Brieman84]. </P>
<P>The class CvDTree represents a single decision tree that may be used alone, 
or as a base class in tree ensembles (see Boosting and Random Trees). </P>
<P>Decision tree is a binary tree (i.e. tree where each non-leaf node has 
exactly 2 child nodes). It can be used either for classification, when each tree 
leaf is marked with some class label (multiple leafs may have the same label), 
or for regression, when each tree leaf is also assigned a constant (so the 
approximation function is piecewise constant). </P>
<P><B>Predicting with Decision Trees</B> </P>
<P>To reach a leaf node, and thus to obtain a response for the input feature 
vector, the prediction procedure starts with the root node. From each non-leaf 
node the procedure goes to the left (i.e. selects the left child node as the 
next observed node), or to the right based on the value of a certain variable, 
which index is stored in the observed node. The variable can be either ordered 
or categorical. In the first case, the variable value is compared with the 
certain threshold (which is also stored in the node); if the value is less than 
the threshold, the procedure goes to the left, otherwise, to the right (for 
example, if the weight is less than 1 kilo, the procedure goes to the left, else 
to the right). And in the second case the discrete variable value is tested, 
whether it belongs to a certain subset of values (also stored in the node) from 
a limited set of values the variable could take; if yes, the procedure goes to 
the left, else - to the right (for example, if the color is green or red, go to 
the left, else to the right). That is, in each node, a pair of entities 
(&lt;variable_index&gt;, &lt;decision_rule (threshold/subset)&gt;) is used. This 
pair is called split (split on the variable #&lt;variable_index&gt;). Once a 
leaf node is reached, the value assigned to this node is used as the output of 
prediction procedure. </P>
<P>Sometimes, certain features of the input vector are missed (for example, in 
the darkness it is difficult to determine the object color), and the prediction 
procedure may get stuck in the certain node (in the mentioned example if the 
node is split by color). To avoid such situations, decision trees use so-called 
surrogate splits. That is, in addition to the best "primary" split, every tree 
node may also be split on one or more other variables with nearly the same 
results. </P>
<P><B>Training Decision Trees</B> </P>
<P>The tree is built recursively, starting from the root node. The whole 
training data (feature vectors and the responses) are used to split the root 
node. In each node the optimum decision rule (i.e. the best "primary" split) is 
found based on some criteria (in ML gini "purity" criteria is used for 
classification, and sum of squared errors is used for regression). Then, if 
necessary, the surrogate splits are found that resemble at the most the results 
of the primary split on the training data; all data are divided using the 
primary and the surrogate splits (just like it is done in the prediction 
procedure) between the left and the right child node. Then the procedure 
recursively splits both left and right nodes etc. At each node the recursive 
procedure may stop (i.e. stop splitting the node further) in one of the 
following cases: </P>
<UL>
  <LI>depth of the tree branch being constructed has reached the specified 
  maximum value. 
  <LI>number of training samples in the node is less than the specified 
  threshold, i.e. it is not statistically representative set to split the node 
  further. 
  <LI>all the samples in the node belong to the same class (or, in case of 
  regression, the variation is too small). 
  <LI>the best split found does not give any noticeable improvement comparing to 
  just a random choice. </LI></UL>
<P>When the tree is built, it may be pruned using cross-validation procedure, if 
need. That is, some branches of the tree that may lead to the model overfitting 
are cut off. Normally, this procedure is only applied to standalone decision 
trees, while tree ensembles usually build small enough trees and use their own 
protection schemes against overfitting. </P>
<P><B>Variable importance</B> </P>
<P>Besides the obvious use of decision trees - prediction, the tree can be also 
used for various data analysis. One of the key properties of the constructed 
decision tree algorithms is that it is possible to compute importance (relative 
decisive power) of each variable. For example, in a spam filter that uses a set 
of words occurred in the message as a feature vector, the variable importance 
rating can be used to determine the most "spam-indicating" words and thus help 
to keep the dictionary size reasonable. </P>
<P>Importance of each variable is computed over all the splits on this variable 
in the tree, primary and surrogate ones. Thus, to compute variable importance 
correctly, the surrogate splits must be enabled in the training parameters, even 
if there is no missing data. </P>
<P>[Brieman84] Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), 
"Classification and Regression Trees", Wadsworth. </P>
<DIV class=editsection style="FLOAT: right; MARGIN-LEFT: 5px">[<A 
title=机器学习中文参考手册 
href="http://www.opencv.org.cn/index.php?title=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B8%AD%E6%96%87%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C&amp;action=edit&amp;section=30">编辑</A>]</DIV><A 
name=CvDTreeSplit></A>
<H2>CvDTreeSplit</H2>
<P>Decision tree node split </P><PRE>struct CvDTreeSplit
{
    int var_idx;
    int inversed;
    float quality;
    CvDTreeSplit* next;
    union
    {
        int subset[2];
        struct
        {
            float c;
            int split_point;
        }
        ord;
    };
};
</PRE>
<DL>
  <DT>var_idx
  <DD>Index of the variable used in the split 
  <DT>inversed
  <DD>When it equals to 1, the inverse split rule is used (i.e. left and right 
  branches are exchanged in the expressions below) 
  <DT>quality
  <DD>The split quality, a positive number. It is used to choose the best 
  primary split, then to choose and sort the surrogate splits. After the tree is 
  constructed, it is also used to compute variable importance. 
  <DT>next
  <DD>Pointer to the next split in the node split list. 
  <DT>subset
  <DD>Bit array indicating the value subset in case of split on a categorical 
  variable. The rule is: if var_value in subset then next_node&lt;-left else 
  next_node&lt;-right 
  <DT>c
  <DD>The threshold value in case of split on an ordered variable. The rule is: 
  if var_value &lt; c then next_node&lt;-left else next_node&lt;-right 
  <DT>split_point
  <DD>Used internally by the training algorithm. </DD></DL>
<DIV class=editsection style="FLOAT: right; MARGIN-LEFT: 5px">[<A 
title=机器学习中文参考手册 
href="http://www.opencv.org.cn/index.php?title=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B8%AD%E6%96%87%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C&amp;action=edit&amp;section=31">编辑</A>]</DIV><A 
name=CvDTreeNode></A>
<H2>CvDTreeNode</H2>
<P>Decision tree node </P><PRE>struct CvDTreeNode
{
    int class_idx;
    int Tn;
    double value;

    CvDTreeNode* parent;
    CvDTreeNode* left;
    CvDTreeNode* right;

    CvDTreeSplit* split;

    int sample_count;
    int depth;
    ...
};
</PRE>
<DL>
  <DT>value
  <DD>The value assigned to the tree node. It is either a class label, or the 
  estimated function value. 
  <DT>class_idx
  <DD>The assigned to the node normalized class index (to 0..class_count-1 
  range), it is used internally in classification trees and tree ensembles. 
  <DT>Tn
  <DD>The tree index in a ordered sequence of trees. The indices are used during 
  and after the pruning procedure. The root node has the maximum value Tn of the 
  whole tree, child nodes have Tn less than or equal to the parent's Tn, and the 
  nodes with Tn≤CvDTree::pruned_tree_idx are not taken into consideration at the 
  prediction stage (the corresponding branches are considered as cut-off), even 
  if they have not been physically deleted from the tree at the pruning stage. 
  <DT>parent, left, right
  <DD>Pointers to the parent node, left and right child nodes. 
  <DT>split
  <DD>Pointer to the first (primary) split. 
  <DT>sample_count
  <DD>The number of samples that fall into the node at the training stage. It is 
  used to resolve the difficult cases - when the variable for the primary split 
  is missing, and all the variables for other surrogate splits are missing too, 
  the sample is directed to the left if 
  left-&gt;sample_count&gt;right-&gt;sample_count and to the right otherwise. 
  <DT>depth
  <DD>The node depth, the root node depth is 0, the child nodes depth is the 
  parent's depth + 1. </DD></DL>
<P>Other numerous fields of CvDTreeNode are used internally at the training 
stage. </P>
<DIV class=editsection style="FLOAT: right; MARGIN-LEFT: 5px">[<A 
title=机器学习中文参考手册 
href="http://www.opencv.org.cn/index.php?title=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E4%B8%AD%E6%96%87%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C&amp;action=edit&amp;section=32">编辑</A>]</DIV><A 
name=CvDTreeParams></A>
<H2>CvDTreeParams</H2>
<P>Decision tree training parameters </P><PRE>struct CvDTreeParams
{
    int max_categories;
    int max_depth;
    int min_sample_count;
    int cv
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -