⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 bayes.html

📁 数据挖掘中的bayes算法,很好的代码
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><!-- ===================================================================  File    : bayes.html  Contents: Description of full and naive Bayes classifiers  Author  : Christian Borgelt==================================================================== --><html><head><title>Full and Naive Bayes Classifiers</title></head><!-- =============================================================== --><body bgcolor=white><h1><a name="top">Full and Naive Bayes Classifiers</h1><h3>(A Brief Documentation of the Programs bci / bcx / bcdb / corr)</a></h3><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3>Contents</h3><ul type=disc><li><a href="#intro">Introduction</a></li><li><a href="#domains">Determining Attribute Domains</a></li><li><a href="#induce">Inducing a Bayes Classifier</a></li><li><a href="#exec">Executing a Bayes Classifier</a></li><li><a href="#xmat">Computing a Confusion Matrix</a></li><li><a href="#gendb">Generating a Database</a></li><li><a href="#corr">Computing Covariances and                    Correlation Coefficients</a></li><li><a href="#copying">Copying</a></li><li><a href="#download">Download</a></li><li><a href="#contact">Contact</a></li></ul><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="intro">Introduction</a></h3><p>I am sorry that there is no detailed documentation yet. Below youcan find a brief explanation of how to induce a full or naive Bayesclasssifier with the program <tt>bci</tt> and how to execute a Bayesclassifier with the program <tt>bcx</tt>. For a list of options, callthe programs without any arguments.</p><p>Enjoy,<br><a href="http://fuzzy.cs.uni-magdeburg.de/~borgelt/">Christian Borgelt</a></p><p>As a simple example for the explanations below I use the datasetin the file <tt>bayes/ex/drug.tab</tt>, which lists 12 records ofpatient data (sex, age, and blood pressure) together with an effectivedrug (effective w.r.t. some unspecified disease). The contents of thisfile is:</p><pre>   Sex    Age Blood_pressure Drug   male   20  normal         A   female 73  normal         B   female 37  high           A   male   33  low            B   female 48  high           A   male   29  normal         A   female 52  normal         B   male   42  low            B   male   61  normal         B   female 30  normal         A   female 26  low            B   male   54  high           A</pre><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="domains">Determining Attribute Domains</a></h3><p>To induce a Bayes classifier for the effective drug, one firsthas to determine the domains of the table columns using the program<tt>dom</tt> (to be found in the table package, see below):</p><pre>  dom -a drug.tab drug.dom</pre><p>The program <tt>dom</tT> assumes that the first line of the tablefile contains the column names. (This is the case for the example file<tt>drug.tab</tt>.) If you have a table file without column names, youcan let the program read the column names from another file (using the<tt>-h</tt> option) or you can let the program generate default names(using the <tt>-d</tt> option), which are simply the column numbers.The <tt>-a</tt> option tells the program to determine automaticallythe column data types. Thus the values of the <tt>Age</tt> column areautomatically recognized as integer values.</p><p>After dom has finished, the contents of the file <tt>drug.dom</tt>should look like this:</p><pre>  dom(Sex) = { male, female };  dom(Age) = ZZ;  dom(Blood_pressure) = { normal, high, low };  dom(Drug) = { A, B };</pre><p>The special domain <tt>ZZ</tt> represents the set of integer numbers,the special domain <tt>IR</tt> (not used here) the set of real numbers.(The double <tt>Z</tt> and the <tt>I</tt> in front of the <tt>R</tt>are intended to mimic the bold face or double stroke font used inmathematics to write the set of integer or the set of real numbers.All programs that need to read a domain description also recognizea single <tt>Z</tt> or a single <tt>R</tt>.)</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="induce">Inducing a Bayes Classifier</a></h3><p>Induce a naive Bayes classifier with the <tt>bci</tt> program(<tt>bci</tt> is simply an abbreviation of Bayes ClassifierInduction):</p><pre>  bci -sa drug.dom drug.tab drug.nbc</pre><p>You need not tell the program <tt>bci</tt> that the <tt>Drug</tt>column contains the class, since by default it uses the last column asthe class column (the <tt>Drug</tt> column is the last column in thefile <tt>drug.tab</tt>). If a different column contains the class,you can specify its name on the command line using the <tt>-c</tt>option, e.g. <tt>-c Drug</tt>.</p><p>At first glance it seems to be superfluous to provide the<tt>bci</tt> program with a domain description, since it is alsogiven the table file and thus can determine the domains itself.But without a domain description, the <tt>bci</tt> program would beforced to use all columns in the table file and to use them with theautomatically determined data types. But occasions may arise in whichyou want to induce a naive Bayes classifier from a subset of thecolumns or in which the numbers in a column are actually codedsymbolic values. In such a case the domain file provides a way totell the <tt>bci</tt> program about the columns to use and theirdata types. To ignore a column, simply remove the correspondingdomain definition from the domain description file (or comment it out--- C-style (<tt>/* ... */</tt>) and C++-style (<tt>// ... </tt>)comments are supported). To change the data type of a column, simplychange the domain definition.</p><p>By default the program <tt>bci</tt> uses all attributes given inthe domain description file. However, it can also be instructed tosimplify the classifier by using only a subset of the attributes.This is done with the options <tt>-sa</tt> or <tt>-sr</tt> (s forsimplify), the first of which is used in the example above. With thefirst option attributes are added one by one (a for add) as long asthe classification result improves on the training data. With thesecond option, attributes are removed one by one (r for remove) aslong as the classification result does not get worse.</p><p>With the above command the induced naive Bayes classifier iswritten to the file <tt>drug.nbc</tt>. The contents of this fileshould look like this:</p><pre>  nbc(Drug) = {    prob(Drug) = {      A: 6,      B: 6 };    prob(Age|Drug) = {      A: N(36.3333, 161.867) [6],      B: N(47.8333, 310.967) [6] };    prob(Blood_pressure|Drug) = {      A:{ high: 3, low: 0, normal: 3 },      B:{ high: 0, low: 3, normal: 3 }};  };</pre><p>The prior probabilities of the class attribute's values are statedfirst (as absolute frequencies), followed by the conditionalprobabilities of the descriptive attributes. For symbolic attributesa simple frequency table is stored. For numeric attributes a normaldistribution is used, which is stated as<tt>N(&mu;, &sigma;<sup>2</sup>) [n]</tt>. Here &mu; is the expectedvalue, &sigma;<sup>2</sup> is the variance, and <tt>n</tt> is the numberof tuples these parameters were estimated from. <tt>n</tt> may differfrom the number of cases for the corresponding class, since for sometuples the value of the attribute may be missing.</p><p>In this example, however, since there are no missing values, thevalue of is identical to the number of cases for the correspondingclass.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="exec">Executing a Bayes Classifier</a></h3><p>An induced naive Bayes classifier can be used to classify new datausing the program <tt>bcx</tt> (<tt>bcx</tt> is simply an abbreviationfor Bayes Classifier eXecution):</p><pre>  bcx -a drug.nbc drug.tab drug.cls</pre><p><tt>drug.tab</tt> is the table file (since we do not have specialtest data, we simply use the training data), <tt>drug.cls</tt> is theoutput file. After <tt>bcx</tt> has finished, <tt>drug.cls</tt>contains (in addition to the columns appearing in the naive Bayesclassifier, and, for preclassified data, the class column) a newcolumn <tt>bc</tt>, which contains the class that is predicted bythe naive Bayes classifier. You can give this new column a differentname with the <tt>-c</tt> option, e.g. <tt>-c predicted</tt>.</p><p>If the table contains preclassified data and the name of thecolumn containing the preclassification is the same as for thetraining data, the error rate of the naive Bayes classifier isdetermined and printed to the terminal.</p><p>The contents of the file <tt>drug.cls</tt> should look like this:</p>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -