📄 page_104.html
字号:
<HTML> <HEAD> <!--SCRIPT LANGUAGE="JavaScript" SRC="http://a1835.g.akamai.net/f/1835/276/3h/www.netlibrary.com/include/js/dictionary_library.js"></SCRIPT> <SCRIPT LANGUAGE="JavaScript"> if (!opener){document.onkeyup=parent.turnBookPage;} </SCRIPT!--> <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache"> <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Expires" CONTENT="-1"><META http-equiv="Content-Type" content="text/html; charset=windows-1252"><SCRIPT>var PrevPage="Page_103";var NextPage="Page_105";var CurPage="Page_104";var PageOrder="115";</SCRIPT> <TITLE>Document</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF"><CENTER><TABLE BORDER=0 WIDTH=100% CELLPADDING=0><TR><TD ALIGN=CENTER> <TABLE BORDER=0 CELLPADDING=2 CELLSPACING=0 WIDTH=100%> <TR> <TD ALIGN=LEFT><A HREF='Page_103.html'>Previous</A></TD> <TD ALIGN=RIGHT><A HREF='Page_105.html'>Next</A></TD> </TR> </TABLE></TD></TR><TR><TD ALIGN=LEFT><P><A NAME='JUMPDEST_Page_104'/><A NAME='{38D}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH='100%'><TR><TD ALIGN=RIGHT><FONT FACE='Times New Roman, Times, Serif' SIZE=2 COLOR=#FF0000>Page 104</FONT></TD></TR></TABLE><A NAME='{38E}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR> <TD ROWSPAN=5></TD> <TD COLSPAN=3 HEIGHT=12></TD> <TD ROWSPAN=5></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR><TD></TD> <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>Rules are not notably more compact than a tree, though, and in fact they are just what you would get by reading rules <I>from</I> a tree. In some situations, however, rules are considerably more compact than trees—particularly if it is possible to have a default rule that covers cases not specified by the other rules. One reason why rules are popular is that each rule seems to represent an independent insight into database. New rules can be added to an existing rule set without disturbing ones already there, whereas making an addition to a tree structure reshapes the whole tree. Most rules can often achieve surprisingly high accuracy, perhaps because the structure many real world data sets exhibit is quite rudimentary, and interactions between the attributes can be safely ignored.</FONT></TD><TD></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR> <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{38F}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR> <TD ROWSPAN=5></TD> <TD COLSPAN=3 HEIGHT=17></TD> <TD ROWSPAN=5></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR><TD></TD> <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3><B>Automatic Segmentation</B></FONT></TD><TD></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR> <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{390}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR> <TD ROWSPAN=5></TD> <TD COLSPAN=3 HEIGHT=12></TD> <TD ROWSPAN=5></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR><TD></TD> <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>The process of stratification or classification is automated by machine-learning algorithms on the basis of the data—rather than the hypothesis, or hunch, of the user. Attributes are chosen to split the set, and a tree is build for each subset, until all members of the subsets belong to the same class. Which algorithm is best for you is dependent on the nature of your data mining problem.</FONT></TD><TD></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR> <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{391}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR> <TD ROWSPAN=5></TD> <TD COLSPAN=3 HEIGHT=12></TD> <TD ROWSPAN=5></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR><TD></TD> <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>Keep in mind what each of the algorithms were designed to do and how they operate on the data. CHAID was designed to detect statistical relationships between variables and is restricted to analysis of categorical variables, such as low, medium, and high. CART is designed to measure the degree of diversity of variables in making its splits—it looks to see which variable is the best splitter or separator in a database (in other words, which website customer attribute is the best ''splitter" between your buyers and browsers). ID3 and C4.5 use the concept of "information gain" to make these splitting decisions: which online customer attributes tell you the most about buyers and non buyers in your website. The information gained depends on its probability as measured in bits as minus the logarithm to base 2 of that probability. The CART procedures induce strictly binary trees while ID3 partitions by attribute values. CART also uses a statistical resampling technique for both error estimation and cost complexity pruning. Lastly, keep in mind that the output for each of the algorithms differs somewhat—almost all generate some type of decision tree with most also generating IF/THEN rules in varying degrees of numbers. Remember that CART and ID3 can only generate binary trees, whereas CHAID, C4.5, and C5.0 produce multiple branch trees.</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3 COLOR=#FFFF00><!-- break --></FONT></TD><TD></TD></TR><TR> <TD COLSPAN=3></TD></TR><TR> <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{392}'/></FORM></P></TD></TR></TABLE><P><FONT SIZE=0 COLOR=WHITE></CENTER><A NAME="bottom"> </A><!-- netLibrary.com Copyright Notice --> </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -