📄 4.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
发信人: strawman (只作鸳鸯不羡仙), 信区: DataMining
标  题: ID3&C4.5Tutorial
发信站: 南京大学小百合站 (Mon Aug 19 11:20:59 2002), 站内信件

Building Classification Models: ID3 and C4.5


Introduction
Basic Definitions
The ID3 Algorithm
Using Gain Ratios
C4.5 Extensions
Pruning Decision Trees and Deriving Rule Sets
Classification Models in the undergraduate AI Course
References


Introduction

ID3 and C4.5 are algorithms introduced by Quinlan for inducing 
Classification Models, also called Decision Trees, from data.
We are given a set of records. Each record has the same structure, 
consisting of a number of attribute/value pairs. One of these attributes
 represents the category of the record. The problem is to determine a 
decision tree that on the basis of answers to questions about the 
non-category attributes predicts correctly the value of the category 
attribute. Usually the category attribute takes only the values {true, 
false}, or {success, failure}, or something equivalent. In any case, one
 of its values will mean failure.

For example, we may have the results of measurements taken by experts on
 some widgets. For each widget we know what is the value for each 
measurement and what was decided, if to pass, scrap, or repair it. 
That is, we have a record with as non categorical attributes the 
measurements, and as categorical attribute the disposition for the 
widget.

Here is a more detailed example. We are dealing with records reporting 
on weather conditions for playing golf. The categorical attribute 
specifies whether or not to Play. The non-categorical attributes are:

ATTRIBUTE   |POSSIBLE VALUES
============+=======================
outlook    | sunny, overcast, rain
------------+-----------------------
temperature | continuous
------------+-----------------------
humidity    | continuous
------------+-----------------------
windy       | true, false
============+=======================


and the training data is:

OUTLOOK | TEMPERATURE | HUMIDITY | WINDY | PLAY
=====================================================
sunny   |      85     |    85    | false | Don't Play
sunny   |      80     |    90    | true  | Don't Play
overcast|      83     |    78    | false | Play
rain    |      70     |    96    | false | Play
rain    |      68     |    80    | false | Play
rain    |      65     |    70    | true  | Don't Play
overcast|      64     |    65    | true  | Play
sunny   |      72     |    95    | false | Don't Play
sunny   |      69     |    70    | false | Play
rain    |      75     |    80    | false | Play
sunny   |      75     |    70    | true  | Play
overcast|      72     |    90    | true  | Play
overcast|      81     |    75    | false | Play
rain    |      71     |    80    | true  | Don't Play



Notice that in this example two of the attributes have continuous 
ranges, Temperature and Humidity. ID3 does not directly deal with such 
cases, though below we examine how it can be extended to do so. A 
decision tree is important not because it summarizes what we know, i.e.
 the training set, but because we hope it will  classify correctly new 
cases. Thus when building classification models one should have both 
training data to build the model and test data to verify how well it 
actually works.

A simpler example from the stock market involving only discrete ranges 
has Profit as categorical attribute, with values {up, down}. Its non 
categorical attributes are:

ATTRIBUTE   |POSSIBLE VALUES
============+=======================
age    | old, midlife, new
------------+-----------------------
competition | no, yes
------------+-----------------------
type        | software, hardware
------------+-----------------------

   and the training data is:

AGE| COMPETITION | TYPE| PROFIT
=========================================
old| yes      | swr| down
--------+-------------+---------+--------
old| no      | swr | down
--------+-------------+---------+--------
old| no      | hwr| down
--------+-------------+---------+--------
mid| yes      | swr| down
--------+-------------+---------+--------
mid| yes      | hwr| down
--------+-------------+---------+--------
mid| no      | hwr| up
--------+-------------+---------+--------
mid| no      | swr| up
--------+-------------+---------+--------
new| yes      | swr| up
--------+-------------+---------+--------
new| no      | hwr| up
--------+-------------+---------+--------
new| no      | swr| up
--------+-------------+---------+--------



For a more complex example, here are files that provide records for a 
series of votes in Congress. The first file describes the structure of 
the records. The second file provides the Training Set, and the third 
the Test Set.

The basic ideas behind ID3 are that:

In the decision tree each node corresponds to a non-categorical 
attribute and each arc to a possible value of that attribute. A leaf 
of the tree specifies the expected value of the categorical attribute 
for the records described by the path from the root to that leaf. 
[This defines what is a Decision Tree.]
In the decision tree at each node should be associated the 
non-categorical attribute which is most informative among the attributes
 not yet considered in the path from the root. [This establishes what is
 a "Good" decision tree.]
Entropy is used to measure how informative is a node. [This defines what
 we mean by "Good". By the way, this notion was introduced by Claude 
Shannon in Information Theory.]

C4.5 is an extension of ID3 that accounts for unavailable values, 
continuous attribute value ranges, pruning of decision trees, rule 
derivation, and so on.



Definitions

If there are n equally probable possible messages, then the 
probability p of each is 1/n and the information conveyed by a message 
is -log(p) = log(n). [In what follows all logarithms are in base 2.] 
That is, if there are 16 messages, then log(16) = 4 and we need 4 bits 
to identify each message.

In general, if we are given a probability distribution P = (p1, p2, ..,
 pn) then the Information conveyed by this distribution, also called the
 Entropy of P, is:

I(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))


For example, if P is (0.5, 0.5) then I(P) is 1, if P is (0.67, 0.33) 
then I(P) is 0.92, if P is (1, 0) then I(P) is 0. [Note that the more 
uniform is the probability distribution, the greater is its 
information.]

If a set T of records is partitioned into disjoint exhaustive classes 
C1, C2, .., Ck on the basis of the value of the categorical attribute, 
then the information needed to identify the class of an element of T 
is Info(T) = I(P), where P is the probability distribution of the 
partition (C1, C2, .., Ck):

P = (|C1|/|T|, |C2|/|T|, ..., |Ck|/|T|)


In our golfing example, we have Info(T) = I(9/14, 5/14) = 0.94,
and in our stock market example we have Info(T) = I(5/10,5/10) = 1.0.

If we first partition T on the basis of the value of a non-categorical 
attribute X into sets T1, T2, .., Tn then the information needed to 
identify the class of an element of T becomes the weighted average of 
the information needed to identify the class of an element of Ti, i.e. 
the weighted average of Info(Ti):

      |Ti|
Info(X,T) = Sum for i from 1 to n of  ---- * Info(Ti)
      |T|


In the case of our golfing example, for the attribute Outlook we have

Info(Outlook,T) = 5/14*I(2/5,3/5) + 4/14*I(4/4,0) + 5/14*I(3/5,2/5)
= 0.694


Consider the quantity Gain(X,T) defined as

Gain(X,T) = Info(T) - Info(X,T)


This represents the difference between the information needed to 
identify an element of T and the information needed to identify an 
element of T after the value of attribute X has been obtained, that is,
 this is the gain in information due to attribute X.

In our golfing example, for the Outlook attribute the gain is:

Gain(Outlook,T) = Info(T) - Info(Outlook,T) = 0.94 - 0.694 = 0.246.


If we instead consider the attribute Windy, we find that Info(Windy,T) 
is 0.892 and Gain(Windy,T) is 0.048. Thus Outlook offers a greater 
informational gain than Windy.

We can use this notion of gain to rank attributes and to build 
decision trees where at each node is located the attribute with greatest
 gain among the attributes not yet considered in the path from the 
root.

The intent of this ordering are twofold:

To create small decision trees so that records can be identified after 
only a few questions.
To match a hoped for minimality of the process represented by the 
records being considered(Occam's Razor).


The ID3 Algorithm

The ID3 algorithm is used to build a decision tree, given a set of 
non-categorical attributes C1, C2, .., Cn, the categorical attribute C,
 and a training set T of records.

   function ID3 (R: a set of non-categorical attributes,
 C: the categorical attribute,
 S: a training set) returns a decision tree;
   begin
If S is empty, return a single node with value Failure;
If S consists of records all with the same value for
   the categorical attribute,
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -