📄 c4.5文档说明.txt

📁 C4.5文档说明（数据类型
💻 TXT
📖 第 1 页 / 共 4 页
字号:
	Rule 4: (296, lift 1.1)
		on thyroxine = t
		FTI > 65
		->  class negative  [0.997]
	
	Rule 5: (240, lift 1.1)
		TT4 > 153
		->  class negative  [0.996]
	
	Rule 6: (29, lift 1.1)
		thyroid surgery = t
		FTI > 65
		->  class negative  [0.968]
	
	Rule 7: (31, lift 42.7)
		thyroid surgery = f
		TSH > 6
		TT4 <= 37
		->  class primary  [0.970]

The rules are divided into four bands of roughly equal sizes and a further summary is generated for both training and test cases. Here is the output for test cases: 
	Evaluation on test data (1000 cases):
	
		        Rules     
		  ----------------
		    No      Errors
	
		     7    5( 0.5%)   <<
	
	
		   (a)   (b)   (c)   (d)    <-classified as
		  ----  ----  ----  ----
		    32                      (a): class primary
		     1    39                (b): class compensated
		                            (c): class secondary
		     1     3         924    (d): class negative
	
	Rule utility summary:
	
		Rules	      Errors
		-----	      ------
		1-2	   56( 5.6%)
		1-4	   10( 1.0%)
		1-5	    6( 0.6%)

This shows that, when only the first two rules are used, the error rate on the test cases is 5.6%, dropping to 1.0% when the first four rules are used, and so on. The performance of the entire ruleset is not repeated since it is shown above the utility summary. 
Rule utility orderings are not given for cross-validations (see below). 


Boosting
Another innovation incorporated in See5 is adaptive boosting, based on the work of Rob Schapire and Yoav Freund. The idea is to generate several classifiers (either decision trees or rulesets) rather than just one. When a new case is to be classified, each classifier votes for its predicted class and the votes are counted to determine the final class. 

But how can we generate several classifiers from a single dataset? As the first step, a single decision tree or ruleset is constructed as before from the training data (e.g. hypothyroid.data). This classifier will usually make mistakes on some cases in the data; the first decision tree, for instance, gives the wrong class for 7 cases in hypothyroid.data. When the second classifier is constructed, more attention is paid to these cases in an attempt to get them right. As a consequence, the second classifier will generally be different from the first. It also will make errors on some cases, and these become the the focus of attention during construction of the third classifier. This process continues for a pre-determined number of iterations or trials, but stops if the most recent classifiers is either extremely accurate or inaccurate. 

The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Naturally, constructing multiple classifiers requires more computation that building a single classifier -- but the effort can pay dividends! Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%. 

Selecting the Boost option with 10 trials causes ten decision trees to be generated. The summary of the trees' individual and aggregated performance on the 1000 test cases is: 

	Trial	    Decision Tree   
	-----	  ----------------  
		  Size      Errors  
	
	   0	    14    4( 0.4%)
	   1	     7   52( 5.2%)
	   2	    11    9( 0.9%)
	   3	    15   21( 2.1%)
	   4	     7   12( 1.2%)
	   5	    10    7( 0.7%)
	   6	     8    8( 0.8%)
	   7	    13   13( 1.3%)
	   8	    12   12( 1.2%)
	   9	    16   54( 5.4%)
	boost	          2( 0.2%)   <<

(Again, different hardware can lead to slightly different results.) The performance of the classifier constructed at each trial is summarized on a separate line, while the line labeled boost shows the result of voting all the classifiers. 

The decision tree constructed on Trial 0 is identical to that produced without the Boost option. Some of the subsequent trees produced by paying more attention to certain cases have relatively high overall error rates. Nevertheless, when the trees are combined by voting, the final predictions have a lower error rate of 0.2% on the test cases. 


Winnowing attributes
The decision trees and rulesets constructed by See5 do not generally use all of the attributes. The hypothyroid application has 22 predictive attributes (plus a class and a label attribute) but only six of them appear in the tree and the ruleset. This ability to pick and choose among the predictors is an important advantage of tree-based modeling techniques. 

Some applications, however, have an abundance of attributes! For instance, one approach to text classification describes each passage by the words that appear in it, so there is a separate attribute for each different word in a restricted dictionary. 

When there are numerous alternatives for each test in the tree or ruleset, it is likely that at least one of them will appear to provide valuable predictive information. In applications like these it can be useful to pre-select a subset of the attributes that will be used to construct the decision tree or ruleset. The See5 mechanism to do this is called "winnowing" by analogy with the process for separating wheat from chaff (or, here, useful attributes from unhelpful ones). 

Winnowing is not obviously relevant for the hypothyroid application since there are relatively few attributes. To illustrate the idea, however, here are the results when the Winnowing option is invoked: 

	See5 [Release 1.20a]	Wed Sep  1 11:02:48 2004
	
	    Options:
		Winnow attributes
	
	Class specified by attribute `diagnosis'
	
	Read 2772 cases (24 attributes) from hypothyroid.data
	
	Attributes winnowed:
	    age
	    sex
	    query on thyroxine
	    on antithyroid medication
	    sick
	    pregnant
	    I131 treatment
	    query hypothyroid
	    query hyperthyroid
	    lithium
	    tumor
	    goitre
	    hypopituitary
	    psych
	    T3
	    T4U
	    referral source
	
	Decision tree:
	
	TSH <= 6: negative (2472/2)
	TSH > 6:
	:...FTI <= 65: primary (72.4/13.9)
	    FTI > 65:
	    :...on thyroxine = t: negative (37.7)
	        on thyroxine = f:
	        :...thyroid surgery = t: negative (6.8)
	            thyroid surgery = f:
	            :...TT4 > 153: negative (6/0.1)
	                TT4 <= 153:
	                :...TT4 > 62: compensated (170.1/24.3)
	                    TT4 <= 62:
	                    :...TT4 <= 37: primary (2.5/0.2)
	                        TT4 > 37: compensated (4.5/0.4)
	
	
	Evaluation on training data (2772 cases):
	
		    Decision Tree   
		  ----------------  
		  Size      Errors  
	
		     8   12( 0.4%)   <<
	
	
		   (a)   (b)   (c)   (d)    <-classified as
		  ----  ----  ----  ----
		    60     3                (a): class primary
		     1   153                (b): class compensated
		                       2    (c): class secondary
		     5     1        2547    (d): class negative
	
	
	Evaluation on test data (1000 cases):
	
		    Decision Tree   
		  ----------------  
		  Size      Errors  
	
		     8    4( 0.4%)   <<
	
	
		   (a)   (b)   (c)   (d)    <-classified as
		  ----  ----  ----  ----
		    32                      (a): class primary
		     1    39                (b): class compensated
		                            (c): class secondary
		     1     2         925    (d): class negative
	
	
	Time: 0.0 secs

After analyzing the training cases, See5 winnows (discards) 17 of the 22 predictive attributes before the decision tree is built. Although it is smaller than the original tree, the new decision tree constructed from only five attributes is just as accurate on the test cases. 
Since winnowing the attributes can be a time-consuming process, it is recommended primarily for large applications (10,000 cases or more) where there is reason to suspect that many of the attributes have at best marginal relevance to the classification task. 


Softening thresholds
The top of our initial decision tree tests whether the value of the attribute TSH is less than or equal to, or greater than, 6. If the former holds, we go no further and predict that the case's class is negative, while if it does not we look at other information before making a decision. Thresholds like this are sharp by default, so that a case with a hypothetical value of 5.99 for TSH is treated quite differently from one with a value of 6.01. 

For some domains, this sudden change is quite appropriate -- for instance, there are hard-and-fast cutoffs for bands of the income tax table. For other applications, though, it is more reasonable to expect classification decisions to change more slowly with changes in attribute values. 

See5 contains an option to `soften' thresholds such as 6 above. When this is invoked, each threshold is broken into three ranges -- let us denote them by a lower bound lb, an upper bound ub, and a central value t. If the attribute value in question is below lb or above ub, classification is carried out using the single branch corresponding to the `<=' or '>' result respectively. If the value lies between lb and ub, both branches of the tree are investigated and the results combined probabilistically. The values of lb and ub are determined by See5 based on an analysis of the apparent sensitivity of classification to small changes in the threshold. They need not be symmetric -- a fuzzy threshold can be sharper on one side than on the other. 

Invoking the Fuzzy thresholds option gives the following decision tree: 

	TSH <= 6 (6.05): negative (2472/2)
	TSH >= 6.1 (6.05):
	:...FTI <= 64 (65.35):
	    :...thyroid surgery = t:
	    :   :...FTI <= 24 (38.25): negative (2.1)
	    :   :   FTI >= 52.5 (38.25): primary (2.1/0.1)
	    :   thyroid surgery = f:
	    :   :...TT4 <= 60 (61.5): primary (51/3.7)
	    :       TT4 >= 63 (61.5):
	    :       :...referral source in {WEST,SVHD}: primary (0)
	    :           referral source = STMW: primary (0.1)
	    :           referral source = SVHC: primary (1)
	    :           referral source = SVI: primary (3.8/0.8)
	    :           referral source = other:
	    :           :...TSH <= 19 (22.5): negative (6.4/2.7)
	    :               TSH >= 26 (22.5): primary (5.8/0.8)
	    FTI >= 65.7 (65.35):
	    :...on thyroxine = t: negative (37.7)
	        on thyroxine = f:
	        :...thyroid surgery = t: negative (6.8)
	            thyroid surgery = f:
	            :...TT4 >= 158 (153): negative (6/0.1)
	                TT4 <= 148 (153):
	                :...TT4 <= 31 (37.5): primary (2.5/0.2)
	                    TT4 >= 44 (37.5): compensated (174.6/24.8)

Each threshold is now of the form <= lb (t) or >= ub (t). In this example, most of the thresholds are still relatively tight, but notice the asymmetric threshold values for the test FTI <= 64. Soft thresholds slightly improve the classifier's accuracy on both training and test data. 
A final point: soft thresholds affect only decision tree classifiers -- they do not change the interpretation of rulesets. 


Advanced pruning options
Three further options enable aspects of the classifier-generation process to be tweaked. These are best regarded as advanced options that should be used sparingly (if at all), so that this section can be skipped without much loss. 

See5 constructs decision trees in two phases. A large tree is first grown to fit the data closely and is then `pruned' by removing parts that are predicted to have a relatively high error rate. This pruning process is first applied to every subtree to decide whether it should be replaced by a leaf or sub-branch, and then a global stage looks at the performance of the tree as a whole. 

Turning off the default Global pruning option disables this second pruning component and generally results in larger decision tees and rulesets. For the hypothyroid application, the tree increases in size from 14 to 15 leaves. 

The Pruning CF option affects the way that error rates are estimated and hence the severity of pruning; values smaller than the default (25%) cause more of the initial tree to be pruned, while larger values result in less pruning. 

The Minimum cases option constrains the degree to which the initial tree can fit the data. At each branch point in the decision tree, the stated minimum number of training cases must follow at least two of the branches. Values higher than the default (2 cases) can lead to an initial tree that fits the training data only approximately -- a form of pre-pruning. (This option is complicated by the presence of missing attribute values and by the use of differential misclassification costs, discussed below. Both cause adjustments to the apparent number of cases following a branch.) 


Sampling from large datasets
Even though See5 is relatively fast, building classifiers from large numbers of cases can take an inconveniently long time, especially when options such as boosting are employed. See5 incorporates a facility to extract a random sample from a dataset, construct a classifier from the sample, and then test the classifier on a disjoint collection of cases. By using a smaller set of training cases in this way, the process of generating a classifier is expedited, but at the cost of a possible reduction in the classifier's predictive performance. 

The Sample option with x% has two consequences. Firstly, a random sample containing x% of the cases in the application's data file is used to construct the classifier. Secondly, the classifier is evaluated on a non-overlapping set of test cases consisting of another (disjoint) sample of the same size as the training set (if x is less than 50%), or all cases that were not used in the training set (if x is greater than or equal to 50%).
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -