⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 verbose.html

📁 由于ID3算法在实际应用中存在一些问题
💻 HTML
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>Manpage of C4.5</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<STYLE TYPES="text/css">DIV.section {
	MARGIN-LEFT: 2cm
}
</STYLE>
<LINK REL=StyleSheet HREF="../../../../stylesheet/main.css" TYPE="text/css">
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
<blockquote>
<H1>VERBOSE</H1>
<HR>
<A name=lbAB>&nbsp;</A> 
<H2>NAME</H2>A guide to the verbose output of the C4.5 decision tree generator 
<P><A name=lbAC></A> 
<H2>DESCRIPTION</H2>This document explains the output of the program <I>C4.5</I> 
when it is run with the verbosity level (option <B>v</B>) set to values from 1 
to 3. 
<P><A name=lbAD></A> 
<H2>TREE BUILDING</H2>
<P><B>Verbosity level 1</B> 
<P>To build a decision tree from a set of data items each of which belongs to 
one of a set of classes, <I>C4.5</I> proceeds as follows: 
<DL compact>
  <DT>1.
  <DD>If all items belong to the same class, the decision tree is a leaf which 
  is labelled with this class. 
  <DT>2.
  <DD>Otherwise, <I>C4.5</I> attempts to find the best attribute to test in 
  order to divide the data items into subsets, and then builds a subtree from 
  each subset by recursively invoking this procedure for each one. 
  <DT>
  <DD>The best attribute to branch on at each stage is selected by determining 
  the information gain of a split on each of the attributes. If the selection 
  criterion being used is GAIN (option <B>g</B>), the best attribute is that 
  which divides the data items with the highest gain in information, whereas if 
  the GAINRATIO criterion (the default) is being used (and the gain is at least 
  the average gain across all attributes), the best attribute is that with the 
  highest ratio of information gain to potential information. 
  <P>For discrete-valued attributes, a branch corresponding to each value of the 
  attribute is formed, whereas for continuous-valued attributes, a threshold is 
  found, thus forming two branches. If subset tests are being used (option 
  <B>s</B>), branches may be formed corresponding to a subset of values of a 
  discrete attribute being tested. 
  <P>The verbose output shows the number of items from which a tree is being 
  constructed, as well as the total weight of these items. The weight of an item 
  is the probability that the item would reach this point in the tree and will 
  be less than 1.0 for items with an unknown value of some previously-tested 
  attribute. 
  <P>Shown for the best attribute is: 
  <P><BR>&nbsp;&nbsp;&nbsp;&nbsp;cut&nbsp;&nbsp;-&nbsp;&nbsp;threshold&nbsp;(continuous&nbsp;attributes&nbsp;only) 
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;inf&nbsp;&nbsp;-&nbsp;&nbsp;the&nbsp;potential&nbsp;information&nbsp;of&nbsp;a&nbsp;split 
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;gain&nbsp;-&nbsp;&nbsp;the&nbsp;gain&nbsp;in&nbsp;information&nbsp;of&nbsp;a&nbsp;split 
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;val&nbsp;&nbsp;-&nbsp;&nbsp;the&nbsp;gain&nbsp;or&nbsp;the&nbsp;gain/inf&nbsp;(depending&nbsp;on&nbsp;the 
  selection criterion) 
  <P>Also shown is the proportion of items at this point in the tree with an 
  unknown value for that attribute. Items with an unknown value for the 
  attribute being tested are distributed across all values in proportion to the 
  relative frequency of these values in the set of items being tested. 
  <P>If no split gives a gain in information, the set of items is made into a 
  leaf labelled with the most frequent class of items reaching this point in the 
  tree, and the message: 
  <P><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TT>no sensible splits 
  <BR><I>r1</I>/<I>r2</I> 
  <P>is given, where <I>r1</I> is the total weight of items reaching this point 
  in the tree, and <I>r2</I> is the weight of these which don't belong to the 
  class of this leaf. 
  <P>If a subtree is found to misclassify at least as many items as does 
  replacing the subtree with a leaf, then the subtree is replaced and the 
  following message given: 
  <P><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</TT>Collapse tree 
  for<BR><I>n</I> items to leaf <I>c</I> 
  <P>where <I>c</I> is the class assigned to the leaf. 
  <P>
  <P><B>Verbosity level 2</B> 
  <P>When determining the best attribute to test, also shown are the threshold 
  (continuous attributes only), information gain and potential information for a 
  split on each of the attributes. If a test on a continuous attribute has no 
  gain or there are insufficient cases with known values of the attribute on 
  which to base a test, appropriate messages are given. (Sufficient here means 
  at least twice MINOBJS, an integer which defaults to 2 but can be set with 
  option <B>m.)</B> The average gain across all attributes is also shown. 
  <P>If subset tests on discrete attributes are being used, for each attribute 
  being examined, the combinations of attribute values that are made (i.e. at 
  each stage, the combination with highest gain or gain ratio) and the potential 
  info, gain and gain or gain ratio are shown. 
  <P>
  <P><B>Verbosity level 3</B> 
  <P>When determining the best attribute to test, also shown is the frequency 
  distribution table showing the total weight of items of each class with: 
  <P><BR>&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;each&nbsp;value&nbsp;of&nbsp;the&nbsp;attribute&nbsp;(discrete&nbsp;attributes),&nbsp;or 
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;values&nbsp;below&nbsp;and&nbsp;above&nbsp;the&nbsp;threshold&nbsp;(contin&nbsp;atts),&nbsp;or 
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;values&nbsp;in&nbsp;each&nbsp;subset&nbsp;formed&nbsp;so&nbsp;far&nbsp;(subset&nbsp;tests). 

  <P>
  <P>
  <P></P></DD></DL><A name=lbAE>&nbsp;</A> 
<H2>TREE PRUNING</H2>
<P><B>Verbosity level 1</B> 
<P>After the entire decision tree has been constructed, <I>C4.5</I> recursively 
examines each subtree to determine whether replacing it with a leaf or a branch 
would be beneficial. (Note: the numbers treated below as counts of items 
actually refer to the total weight of the items mentioned.) 
<P>Each leaf is shown as: 
<P><I>c</I>(<I>r1</I>:<I>r2</I>/ <I>r3</I>) 
<P><BR>&nbsp;&nbsp;with: 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>c</I>&nbsp;&nbsp;&nbsp;-&nbsp;&nbsp;the&nbsp;most&nbsp;frequent&nbsp;class&nbsp;at&nbsp;the&nbsp;leaf 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r1</I>&nbsp;&nbsp;-&nbsp;&nbsp;the&nbsp;number&nbsp;of&nbsp;items&nbsp;at&nbsp;the&nbsp;leaf 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r2</I>&nbsp;&nbsp;-&nbsp;&nbsp;misclassifications&nbsp;at&nbsp;the&nbsp;leaf 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r3</I>&nbsp;&nbsp;-&nbsp;&nbsp;<I>r2</I>&nbsp;adjusted&nbsp;for&nbsp;additional&nbsp;errors 

<P>Each test is shown as: 
<P><I>att</I>:[<I>n1</I>% N=<I>r4</I>tree= <I>r5</I>leaf=<I>r6</I>+ 
<I>r7</I>br[<I>n2</I>]=<I>r8</I>] 
<P><BR>&nbsp;&nbsp;with: 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>n1</I>&nbsp;&nbsp;-&nbsp;&nbsp;percentage&nbsp;of&nbsp;egs&nbsp;at&nbsp;this&nbsp;subtree&nbsp;that&nbsp;are&nbsp;misclassified 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r4</I>&nbsp;&nbsp;-&nbsp;&nbsp;the&nbsp;number&nbsp;of&nbsp;items&nbsp;in&nbsp;the&nbsp;subtree 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r5</I>&nbsp;&nbsp;-&nbsp;&nbsp;misclassifications&nbsp;of&nbsp;this&nbsp;subtree 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r6</I>&nbsp;&nbsp;-&nbsp;&nbsp;misclassifications&nbsp;if&nbsp;this&nbsp;was&nbsp;a&nbsp;leaf 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r7</I>&nbsp;&nbsp;-&nbsp;&nbsp;adjustment&nbsp;to&nbsp;<I>r6</I>&nbsp;for&nbsp;additional&nbsp;errors 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>n2</I>&nbsp;&nbsp;-&nbsp;&nbsp;number&nbsp;of&nbsp;the&nbsp;largest&nbsp;branch 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<I>r8</I>&nbsp;&nbsp;-&nbsp;&nbsp;total&nbsp;misclassifications&nbsp;if&nbsp;subtree&nbsp;is&nbsp;replaced&nbsp;by&nbsp;largest&nbsp;branch 

<P>If replacing the subtree with a leaf or the largest branch reduces the number 
of errors, then the subtree is replaced by whichever of these results in the 
least number of errors. 
<P>
<P><A name=lbAF></A> 
<H2>THRESHOLD SOFTENING</H2>
<P><B>Verbosity level 1</B> 
<P>In softening the thresholds of tests on continuous attributes (option 
<B>p</B>), upper and lower bounds for each test are calculated. For each such 
test, the following are shown: 
<DL compact>
  <DT>*
  <DD>Base errors - the number of items misclassified when the threshold has its 
  original value 
  <DT>*
  <DD>Items - the number of items tested (with a known value for this attribute) 

  <DT>*
  <DD>se - the standard deviation of the number of errors 
  <DT>
  <DD>For each of the different attribute values, shown are: 
  <DT>*
  <DD>Val &lt;= - the attribute value 
  <DT>*
  <DD>Errors - the errors with this value as threshold 
  <DT>*
  <DD>+Errors - Errors - Base errors 
  <DT>*
  <DD>+Items - the number of items between this value and the original threshold 

  <DT>*
  <DD>Ratio - Ratio of +Errors to +Items 
  <DT>
  <DD>The lower and upper bounds are then calculated so that the number of 
  errors with each as threshold would be one standard deviation above the base 
  errors. 
  <P>
  <P></P></DD></DL><A name=lbAG>&nbsp;</A> 
</blockquote>
  </BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -