⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 dtree.html

📁 it is tree Dtree algorithm. so see it. it is in c++
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><!-- ===================================================================  File    : dtree.html  Contents: Description of decision and regression tree programs  Author  : Christian Borgelt==================================================================== --><html><head><title>Decision and Regression Trees</title></head><!-- =============================================================== --><body bgcolor=white><h1><a name="top">Decision and Regression Trees</h1><h3>(A Brief Documentation of the Programs     dti / dtp / dtx / dtr)</a></h3><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3>Contents</h3><ul type=disc><li><a href="#intro">Introduction</a></li><li><a href="#domains">Determining Attribute Domains</a></li><li><a href="#induce">Inducing a Decision Tree</a></li><li><a href="#prune">Pruning a Decision Tree</a></li><li><a href="#exec">Executing a Decision Tree</a></li><li><a href="#xmat">Computing a Confusion Matrix</a></li><li><a href="#rules">Extracting Rules from a Decision Tree</a></li><li><a href="#other">Other Decision Tree Examples</a></li><li><a href="#copying">Copying</a></li><li><a href="#download">Download</a></li><li><a href="#contact">Contact</a></li></ul><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="intro">Introduction</a></h3><p>I am sorry that there is no detailed documentation yet. Below youcan find a brief explanation of how to grow a decision tree with theprogram <tt>dti</tt>, how to prune a decision tree with the program<tt>dtp</tt>, how to execute a decision tree with the program<tt>dtx</tt>, and how to extract rules from a decision tree with theprogram <tt>dtr</tt>. For a list of options, call the programs withoutany arguments.</p><p>Enjoy,<br><a href="http://fuzzy.cs.uni-magdeburg.de/~borgelt/">Christian Borgelt</a></p><p>As a simple example for the explanations below I use the datasetin the file <tt>dtree/ex/drug.tab</tt>, which lists 12 records ofpatient data (sex, age, and blood pressure) together with an effectivedrug (effective w.r.t. some unspecified disease). The contents of thisfile is:</p><pre>   Sex    Age Blood_pressure Drug   male   20  normal         A   female 73  normal         B   female 37  high           A   male   33  low            B   female 48  high           A   male   29  normal         A   female 52  normal         B   male   42  low            B   male   61  normal         B   female 30  normal         A   female 26  low            B   male   54  high           A</pre><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="domains">Determining Attribute Domains</a></h3><p>To induce a decision tree for the effective drug, one firsthas to determine the domains of the table columns using the program<tt>dom</tt> (to be found in the table package, see below):</p><pre>  dom -a drug.tab drug.dom</pre><p>The program <tt>dom</tt> assumes that the first line of the tablefile contains the column names. (This is the case for the example file<tt>drug.tab</tt>.) If you have a table file without column names, youcan let the program read the column names from another file (using the<tt>-h</tt> option) or you can let the program generate default names(using the <tt>-d</tt> option), which are simply the column numbers.The <tt>-a</tt> option tells the program to determine automaticallythe column data types. Thus the values of the <tt>Age</tt> column areautomatically recognized as integer values.</p><p>After dom has finished, the contents of the file <tt>drug.dom</tt>should look like this:</p><pre>  dom(Sex) = { male, female };  dom(Age) = ZZ;  dom(Blood_pressure) = { normal, high, low };  dom(Drug) = { A, B };</pre><p>The special domain <tt>ZZ</tt> represents the set of integer numbers,the special domain <tt>IR</tt> (not used here) the set of real numbers.(The double <tt>Z</tt> and the <tt>I</tt> in front of the <tt>R</tt>are intended to mimic the bold face or double stroke font used inmathematics to write the set of integer or the set of real numbers.All programs that need to read a domain description also recognizea single <tt>Z</tt> or a single <tt>R</tt>.)</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="induce">Inducing a Decision Tree</a></h3><p>To induce a decision tree using the <tt>dti</tt> program(<tt>dti</tt> is simply an abbreviation of Decision Tree Induction),type</p><pre>  dti -a drug.dom drug.tab drug.dt</pre><p>You need not tell the program <tt>dti</tT> that the Drug columncontains the class, since by default it uses the last column as theclass column (the <tt>Drug</tt> column is the last column in the file<tt>drug.tab</tt>). If a different column contains the class, you canspecify its name on the command line using the <tt>-c</tt> option,e.g. <tt>-c Drug</tt>.</p><p>At first glance it seems to be superfluous to provide the<tt>dti</tt> program with a domain description, since it is alsogiven the table file and thus can determine the domains itself.But without a domain description, the <tt>dti</tt> program would beforced to use all columns in the table file and to use them with theautomatically determined data types. But occasions may arise in whichyou want to induce a decision tree from a subset of the columns or inwhich the numbers in a column are actually coded symbolic values. Insuch a case the domain file provides a way to tell the <tt>dti</tt>program about the columns to use and their data types. To ignore acolumn, simply remove the corresponding domain definition from thedomain description file (or comment it out --- C-style(<tt>/* ... */</tt>) and C++-style (<tt>// ...</tt>) comments aresupported). To change the data type of a column, simply change thedomain definition.</p><p>By default the program <tt>dti</tt> uses information gain ratio asthe attribute selection measure. Other measures can be selected viathe <tt>-e</tt> option. Call <tt>dti</tt> with option <tt>-!</tt> fora list of available attribute selection measures.</p><p>With the above command the induced decision tree is written to thefile <tt>drug.dt</tt>. The contents of this file should look likethis:</p><pre>  dtree(Drug) =  { (Blood_pressure)    normal:{ (Age|41)             <:{ A: 3 },             >:{ B: 3 }},    high  :{ A: 3 },    low   :{ B: 3 }};</pre><p>Since the <tt>-a</tt> option was given, the colons after the valuesof an attribute (here, for example, the values of the attribute<tt>Blood_pressure</tt>) are aligned. This makes a decision tree easierto read, but may result in larger than necessary output files.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="prune">Pruning a Decision Tree</a></h3><p>Although it is not necessary for our simple example, the induceddecision tree can be pruned, i.e., simplified by removing somedecisions. This is done by invoking the program <tt>dtp</tt>(<tt>dtp</tt> is simply an abbreviation for Decision Tree Pruning):</p><pre>  dtp -a drug.dt drug_p.dt</pre><p>The table the decision tree was induced from can be given as athird argument to the <tt>dtp</tt> program. In this case an additionalway of pruning (replacing an inner node (an attribute test) by itslargest child) is enabled.</p><p>By default dtp uses confidence level pruning with a confidencelevel of 0.5 as the pruning method. The confidence level can bechanged via the <tt>-p</tt> option (pruning parameter), the pruningmethod via the <tt>-m</tt> option. Call <tt>dtp</tt> without argumentsfor a list of available pruning methods.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="exec">Executing a Decision Tree</a></h3><p>An induced decision tree can be used to classify new data using theprogram <tt>dtx</tt> (<tt>dtx</tt> is simply an abbreviation forDecision Tree Execution):</p><pre>  dtx -a drug.dt drug.tab drug.cls</pre><p><tt>drug.tab</tt> is the table file (since we do not have specialtest data, we simply use the training data), <tt>drug.cls</tt> isthe output file. After <tt>dtx</tt> has finished, <tt>drug.cls</tt>contains (in addition to the columns appearing in the decision tree,and, for preclassified data, the class column) a new column <tt>dt</tt>,which contains the class that is predicted by the decision tree.You can give this new column a different name with the <tt>-p</tt>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -