📄 4.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
   return a single node with that value;
If R is empty, then return a single node with as value
   the most frequent of the values of the categorical attribute
   that are found in records of S; [note that then there
   will be errors, that is, records that will be improperly
   classified];
Let D be the attribute with largest Gain(D,S)
   among attributes in R;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
   respectively of records with value dj for attribute D;
Return a tree with root labeled D and arcs labeled
   d1, d2, .., dm going respectively to the trees

     ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);
   end ID3;


In the Golfing example we obtain the following decision tree: 



Outlook
       / |     \
      /  |      \
            overcast /   |sunny  \rain
                    /    |        \
         Play   Humidity   Windy
       /   |         |  \
                      /    |         |   \
<=75 /  >75|     true|    \false
    /      |         |     \
                 Play   Don'tPlay Don'tPlay Play


   In the stock market case the decision tree is:


 Age
       / |    \
      /  |     \
  new/   |mid   \old
    /    |       \
  Up  Competition Down
                       /      \
      /        \
   no/          \yes
    /            \
  Up             Down


Here is the decision tree, just as produced by c4.5, for the voting 
example introduced earlier.



Using Gain Ratios

The notion of Gain introduced earlier tends to favor attributes that 
have a large number of values. For example, if we have an attribute D 
that has a distinct value for each record, then Info(D,T) is 0, thus 
Gain(D,T) is maximal. To compensate for this Quinlan suggests using 
the following ratio instead of Gain:

 Gain(D,T)
GainRatio(D,T) = ----------
 SplitInfo(D,T)

   where SplitInfo(D,T) is the information due to the split of T on 
the basis
   of the value of the categorical attribute D. Thus SplitInfo(D,T) is

 I(|T1|/|T|, |T2|/|T|, .., |Tm|/|T|)

   where {T1, T2, .. Tm} is the partition of T induced by the value of 
D.

   In the case of our golfing example SplitInfo(Outlook,T) is

-5/14*log(5/14) - 4/14*log(4/14) - 5/14*log(5/14) = 1.577

   thus the GainRatio of Outlook is 0.246/1.577 = 0.156. And
   SplitInfo(Windy,T) is

-6/14*log(6/14) - 8/14*log(8/14) = 6/14*0.1.222 + 8/14*0.807
 = 0.985

   thus the GainRatio of Windy is 0.048/0.985 = 0.049


You can run PAIL to see how ID3 generates the decision tree.



C4.5 Extensions

C4.5 introduces a number of extensions of the original ID3 algorithm.

In building a decision tree we can deal with training sets that have 
records with unknown attribute values by evaluating the gain, or the 
gain ratio, for an attribute by considering only the records where 
that attribute is defined.

In using a decision tree, we can classify records that have unknown 
attribute values by estimating the probability of the various possible 
results. In our golfing example, if we are given a new record for 
which the outlook is sunny and the humidity is unknown, we proceed as 
follows:

   We move from the Outlook root node to the Humidity node following
   the arc labeled 'sunny'. At that point since we do not know
   the value of Humidity we observe that if the humidity is at most 75
   there are two records where one plays, and if the humidity is over
   75 there are three records where one does not play. Thus one
   can give as answer for the record the probabilities
   (0.4, 0.6) to play or not to play.


We can deal with the case of attributes with continuous ranges as 
follows. Say that attribute Ci has a continuous range. We examine the 
values for this attribute in the training set. Say they are, in 
increasing order, A1, A2, .., Am. Then for each value Aj, j=1,2,..m, 
we partition the records into those that have Ci values up to and 
including Aj, and those that have values greater than Aj. For each of 
these partitions we compute the gain, or gain ratio, and choose the 
partition that maximizes the gain.
In our Golfing example, for humidity, if T is the training set, we 
determine the information for each partition and find the best partition
 at 75. Then the range for this attribute becomes {<=75, >75}. Notice 
that this method involves a substantial number of computations.



Pruning Decision Trees and Deriving Rule Sets

The decision tree built using the training set, because of the way it 
was built, deals correctly with most of the records in the training set.
 In fact, in order to do so, it may become quite complex, with long 
and very uneven paths.

Pruning of the decision tree is done by replacing a whole subtree by a 
leaf node. The replacement takes place if a decision rule establishes 
that the expected error rate in the subtree is greater than in the 
single leaf. For example, if the simple decision tree

Color
       /     \
   red/       \blue
     /         \
  Success     Failure


is obtained with one training red success record and two training blue 
Failures, and then in the Test set we find three red failures and one 
blue success, we might consider replacing this subtree by a single 
Failure node. After replacement we will have only two errors instead 
of five failures.

Winston shows how to use Fisher's exact test to determine if the 
category attribute is truly dependent on a non-categorical attribute. If
 it is not, then the non- categorical attribute need not appear in the 
current path of the decision tree.

Quinlan and Breiman suggest more sophisticated pruning heuristics.

It is easy to derive a rule set from a decision tree: write a rule for 
each path in the decision tree from the root to a leaf. In that rule the
 left-hand side is easily built from the label of the nodes and the 
labels of the arcs.

The resulting rules set can be simplified:

Let LHS be the left hand side of a rule. Let LHS' be obtained from LHS 
by eliminating some of its conditions. We can certainly replace LHS by 
LHS' in this rule if the subsets of the training set that satisfy 
respectively LHS and LHS' are equal.

A rule may be eliminated by using metaconditions such as "if no other 
rule applies".

You can run the C45 program here.

Classification Models in the Undergraduate AI Course

It is easy to find implementations of ID3. For example, a Prolog program
 by Shoham and a nice Pail module.

The software for C4.5 can be obtained with Quinlan's book. A wide 
variety of training and test data is available, some provided by 
Quinlan, some at specialized sites such as the University of 
California at Irvine.

Student projects may involve the implementation of these algorithms. 
More interesting is for students to collect or find a significant data 
set, partition it into training and test sets, determine a decision 
tree, simplify it, determine the corresponding rule set, and simplify 
the rule set.

The study of methods to evaluate the error performance of a decision 
tree is probably too advanced for most undergraduate courses.



References

   Breiman,Friedman,Olshen,Stone: Classification and Decision Trees
Wadsworth, 1984

   A decision science perspective on decision trees.

   Quinlan,J.R.: C4.5: Programs for Machine Learning
Morgan Kauffman, 1993

   Quinlan is a very readable, thorough book, with actual usable 
programs
   that are available on the internet. Also available are a number of
   interesting data sets.

   Quinlan,J.R.: Simplifying decision trees
International Journal of Man-Machine Studies, 27, 221-234, 1987

   Winston,P.H.: Artificial Intelligence, Third Edition
Addison-Wesley, 1992

   Excellent introduction to ID3 and its use in building decision 
trees and,
   from them, rule sets.




 Back to the Lectures Home Page

ingargiola@cis.temple.edu
--
白云在天，丘陵自出。
道里悠远，山川间之。
将子无死，尚复能来。
        

※ 来源:．南京大学小百合站 bbs.nju.edu.cn．[FROM: 202.119.36.111]
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -