📄 4.txt
字号:
return a single node with that value;
If R is empty, then return a single node with as value
the most frequent of the values of the categorical attribute
that are found in records of S; [note that then there
will be errors, that is, records that will be improperly
classified];
Let D be the attribute with largest Gain(D,S)
among attributes in R;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute D;
Return a tree with root labeled D and arcs labeled
d1, d2, .., dm going respectively to the trees
ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);
end ID3;
In the Golfing example we obtain the following decision tree:
Outlook
/ | \
/ | \
overcast / |sunny \rain
/ | \
Play Humidity Windy
/ | | \
/ | | \
<=75 / >75| true| \false
/ | | \
Play Don'tPlay Don'tPlay Play
In the stock market case the decision tree is:
Age
/ | \
/ | \
new/ |mid \old
/ | \
Up Competition Down
/ \
/ \
no/ \yes
/ \
Up Down
Here is the decision tree, just as produced by c4.5, for the voting
example introduced earlier.
Using Gain Ratios
The notion of Gain introduced earlier tends to favor attributes that
have a large number of values. For example, if we have an attribute D
that has a distinct value for each record, then Info(D,T) is 0, thus
Gain(D,T) is maximal. To compensate for this Quinlan suggests using
the following ratio instead of Gain:
Gain(D,T)
GainRatio(D,T) = ----------
SplitInfo(D,T)
where SplitInfo(D,T) is the information due to the split of T on
the basis
of the value of the categorical attribute D. Thus SplitInfo(D,T) is
I(|T1|/|T|, |T2|/|T|, .., |Tm|/|T|)
where {T1, T2, .. Tm} is the partition of T induced by the value of
D.
In the case of our golfing example SplitInfo(Outlook,T) is
-5/14*log(5/14) - 4/14*log(4/14) - 5/14*log(5/14) = 1.577
thus the GainRatio of Outlook is 0.246/1.577 = 0.156. And
SplitInfo(Windy,T) is
-6/14*log(6/14) - 8/14*log(8/14) = 6/14*0.1.222 + 8/14*0.807
= 0.985
thus the GainRatio of Windy is 0.048/0.985 = 0.049
You can run PAIL to see how ID3 generates the decision tree.
C4.5 Extensions
C4.5 introduces a number of extensions of the original ID3 algorithm.
In building a decision tree we can deal with training sets that have
records with unknown attribute values by evaluating the gain, or the
gain ratio, for an attribute by considering only the records where
that attribute is defined.
In using a decision tree, we can classify records that have unknown
attribute values by estimating the probability of the various possible
results. In our golfing example, if we are given a new record for
which the outlook is sunny and the humidity is unknown, we proceed as
follows:
We move from the Outlook root node to the Humidity node following
the arc labeled 'sunny'. At that point since we do not know
the value of Humidity we observe that if the humidity is at most 75
there are two records where one plays, and if the humidity is over
75 there are three records where one does not play. Thus one
can give as answer for the record the probabilities
(0.4, 0.6) to play or not to play.
We can deal with the case of attributes with continuous ranges as
follows. Say that attribute Ci has a continuous range. We examine the
values for this attribute in the training set. Say they are, in
increasing order, A1, A2, .., Am. Then for each value Aj, j=1,2,..m,
we partition the records into those that have Ci values up to and
including Aj, and those that have values greater than Aj. For each of
these partitions we compute the gain, or gain ratio, and choose the
partition that maximizes the gain.
In our Golfing example, for humidity, if T is the training set, we
determine the information for each partition and find the best partition
at 75. Then the range for this attribute becomes {<=75, >75}. Notice
that this method involves a substantial number of computations.
Pruning Decision Trees and Deriving Rule Sets
The decision tree built using the training set, because of the way it
was built, deals correctly with most of the records in the training set.
In fact, in order to do so, it may become quite complex, with long
and very uneven paths.
Pruning of the decision tree is done by replacing a whole subtree by a
leaf node. The replacement takes place if a decision rule establishes
that the expected error rate in the subtree is greater than in the
single leaf. For example, if the simple decision tree
Color
/ \
red/ \blue
/ \
Success Failure
is obtained with one training red success record and two training blue
Failures, and then in the Test set we find three red failures and one
blue success, we might consider replacing this subtree by a single
Failure node. After replacement we will have only two errors instead
of five failures.
Winston shows how to use Fisher's exact test to determine if the
category attribute is truly dependent on a non-categorical attribute. If
it is not, then the non- categorical attribute need not appear in the
current path of the decision tree.
Quinlan and Breiman suggest more sophisticated pruning heuristics.
It is easy to derive a rule set from a decision tree: write a rule for
each path in the decision tree from the root to a leaf. In that rule the
left-hand side is easily built from the label of the nodes and the
labels of the arcs.
The resulting rules set can be simplified:
Let LHS be the left hand side of a rule. Let LHS' be obtained from LHS
by eliminating some of its conditions. We can certainly replace LHS by
LHS' in this rule if the subsets of the training set that satisfy
respectively LHS and LHS' are equal.
A rule may be eliminated by using metaconditions such as "if no other
rule applies".
You can run the C45 program here.
Classification Models in the Undergraduate AI Course
It is easy to find implementations of ID3. For example, a Prolog program
by Shoham and a nice Pail module.
The software for C4.5 can be obtained with Quinlan's book. A wide
variety of training and test data is available, some provided by
Quinlan, some at specialized sites such as the University of
California at Irvine.
Student projects may involve the implementation of these algorithms.
More interesting is for students to collect or find a significant data
set, partition it into training and test sets, determine a decision
tree, simplify it, determine the corresponding rule set, and simplify
the rule set.
The study of methods to evaluate the error performance of a decision
tree is probably too advanced for most undergraduate courses.
References
Breiman,Friedman,Olshen,Stone: Classification and Decision Trees
Wadsworth, 1984
A decision science perspective on decision trees.
Quinlan,J.R.: C4.5: Programs for Machine Learning
Morgan Kauffman, 1993
Quinlan is a very readable, thorough book, with actual usable
programs
that are available on the internet. Also available are a number of
interesting data sets.
Quinlan,J.R.: Simplifying decision trees
International Journal of Man-Machine Studies, 27, 221-234, 1987
Winston,P.H.: Artificial Intelligence, Third Edition
Addison-Wesley, 1992
Excellent introduction to ID3 and its use in building decision
trees and,
from them, rule sets.
Back to the Lectures Home Page
ingargiola@cis.temple.edu
--
白云在天,丘陵自出。
道里悠远,山川间之。
将子无死,尚复能来。
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.119.36.111]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -