⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 analysis of data mining algorithms.htm

📁 对数据挖掘的Algorithms算法进行了详尽的分析
💻 HTM
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0057)http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm -->
<HTML><HEAD><TITLE>Analysis of Data Mining Algorithms</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
<BODY>
<CENTER><FONT size=+2>Analysis of Data Mining Algorithms</FONT></CENTER>
<CENTER>by</CENTER>
<CENTER>Karuna Pande Joshi</CENTER><PRE>&nbsp;Copyright Karuna Pande Joshi 1997.</PRE>
<HR>

<P><B><FONT size=+1>Table of Contents</FONT></B> 
<OL>
  <LI><A 
  href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#Introduction">Introduction</A> 

  <LI><A 
  href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#class_algo">Classification 
  Algorithms</A> 
  <OL>
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#data_class_mtd">Data 
    Classification Methods</A> 
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#data_abst">Data 
    Abstraction</A> 
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#class_rule_learn">Classification-rule 
    learning.</A> 
    <OL>
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#ID3 algorithm">ID3 
      algorithm</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#c4.5_algo">C4.5 
      algorithm</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#sliq_algo">SLIQ 
      algorithm</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#other_algos">Other 
      Algorithms</A> </LI></OL>
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#parallel_algos">Parallel 
    Algorithms</A> 
    <OL>
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#basic_idea">Basic 
      Idea</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#synch_tree">Synchronous 
      Tree Construction Approach</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#partition_tree">Partitioned 
      Tree Construction Approach</A> </LI></OL></LI></OL>
  <LI><A 
  href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#assoc_rules">Association 
  Rule Algorithms</A> 
  <OL>
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#apriori">Apriori 
    Algorithm</A> 
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#dist_parallel">Distributed/Parallel 
    Algorithms</A> </LI></OL>
  <LI><A 
  href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#seq_analysis">Sequential 
  Analysis</A> 
  <OL>
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#seq_patterns">Sequential 
    Patterns</A> 
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#algo_seq_pattern">Algorithms 
    for Finding Sequential Patterns</A> 
    <OL>
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#algorithm">Algorithm</A> 

      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#aprioriall">Algorithm 
      AprioriAll</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#apriorisome">Algorithm 
      AprioriSome</A> 
      <LI><A 
      href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#relative_perform">Relative 
      Performance of the two Algorithms</A> </LI></OL></LI></OL>
  <LI><A 
  href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#Conclusion">Conclusion</A> 

  <OL>
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#compare_algos">Comparing 
    Algorithms</A> 
    <LI><A 
    href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#Drawbacks">Drawbacks 
    of Existing Algorithms</A> </LI></OL></LI></OL><A 
href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#References">REFERENCES</A> 
<BR><A 
href="http://userpages.umbc.edu/~kjoshi1/data-mine/proj_rpt.htm#Appendix">APPENDIX 
A : URL Listing</A> 
<P>
<HR width="100%">

<CENTER><A name=Introduction></A><B><U><FONT color=#000000><FONT 
size=+2>Introduction</FONT></FONT></U></B></CENTER>
<P>With an enormous amount of data stored in databases and data warehouses, it 
is increasingly important to develop powerful tools for analysis of such data 
and mining interesting knowledge from it. Data mining is a process of inferring 
knowledge from such huge data. Data Mining has three major components Clustering 
or <I>Classification</I>, <I>Association Rules</I> and <I>Sequence Analysis</I>. 

<P>By simple definition, in classification/clustering we analyze a set of data 
and generate a set of grouping rules which can be used to classify future data. 
For example, one may classify diseases and provide the symptoms which describe 
each class or subclass. This has much in common with traditional work in 
statistics and machine learning. However, there are important new issues which 
arise because of the sheer size of the data. One of the important problem in 
data mining is the Classification-rule learning which involves finding rules 
that partition given data into predefined classes. In the data mining domain 
where millions of records and a large number of attributes are involved, the 
execution time of existing algorithms can become prohibitive, particularly in 
interactive applications. This is discussed in detail in <I>Chapter 2</I>. 
<P>An association rule is a rule which implies certain association relationships 
among a set of objects in a database. In this<I> </I>process we discover a set 
of association rules at multiple levels of abstraction from the relevant set(s) 
of data in a database. For example, one may discover a set of symptoms often 
occurring together with certain kinds of diseases and further study the reasons 
behind them. Since finding interesting association rules in databases may 
disclose some useful patterns for decision support, selective marketing, 
financial forecast, medical diagnosis, and many other applications, it has 
attracted a lot of attention in recent data mining research . Mining association 
rules may require iterative scanning of large transaction or relational 
databases which is quite costly in processing. Therefore, efficient mining of 
association rules in transaction and/or relational databases has been studied 
substantially. This is discussed in detail in <I>Chapter 3</I>. 
<P>In <I>sequential Analysis</I>, we seek to discover patterns that occur in 
sequence. This deals with data that appear in separate transactions (as opposed 
to data that appear in the same transaction in the case of association).For e.g. 
: If a shopper buys item A in the first week of the month, then s/he buys item B 
in the second week etc. This is discussed in detail in <I>Chapter 4</I>. 
<P>There are many algorithms proposed that try to address the above aspects of 
data mining. Compiling a list of all algorithms suggested/used for these 
problems is an arduous task . I have thus limited the focus of this report to 
list only some of the algorithms that have had better success than the others. I 
have included a list of URLs in Appendix A which can be referred to for more 
information on data mining algorithms. 
<P>
<HR>

<HR>

<CENTER><A name=class_algo></A><B><U><FONT size=+2>Classification 
Algorithms</FONT></U></B></CENTER>
<P>In Data classification one develops a description or model for each class in 
a database, based on the features present in a set of class-labeled training 
data. There have been many data classification methods studied, including 
decision-tree methods, such as C4.5, statistical methods, neural networks, rough 
sets, database-oriented methods etc. 
<P><A name=data_class_mtd></A><FONT size=+1>Data Classification Methods</FONT> 
<P>In this paper, I have discussed in detail some of the <I>machine-learning</I> 
algorithms that have been successfully applied in the initial stages of this 
field. The other methods listed above are just being applied to data mining and 
have not been very successful. This section briefly describes these other 
methods. Appendix A lists the URLs which can be referred to for more information 
on these various methods. 
<UL>
  <LI><B>Statistical Algorithms</B> Statistical analysis systems such as SAS and 
  SPSS have been used by analysts to detect unusual patterns and explain 
  patterns using statistical models such as linear models. Such systems have 
  their place and will continue to be used. 
  <LI><B>Neural Networks </B>Artificial neural networks mimic the 
  pattern-finding capacity of the human brain and hence some researchers have 
  suggested applying Neural Network algorithms to pattern-mapping. Neural 
  networks have been applied successfully in a few applications that involve 
  classification. 
  <LI><B>Genetic algorithms</B> Optimization techniques that use processes such 
  as genetic combination, mutation, and natural selection in a design based on 
  the concepts of natural evolution. 
  <LI><B>Nearest neighbor method </B>A technique that classifies each record in 
  a dataset based on a combination of the classes of the k record(s) most 
  similar to it in a historical dataset. Sometimes called the k-nearest neighbor 
  technique. 
  <LI><B>Rule induction</B> The extraction of useful <I>if-then</I> rules from 
  data based on statistical significance. 
  <LI><B>Data visualization</B> The visual interpretation of complex 
  relationships in multidimensional data. </LI></UL><A name=data_abst></A><FONT 
size=+1>Data Abstraction</FONT> 
<P>Many existing algorithms suggest abstracting the test data before classifying 
it into various classes. There are several alternatives for doing abstraction 
before classification: A data set can be generalized to either a minimally 
generalized abstraction level, an intermediate abstraction level, or a rather 
high abstraction level. Too low an abstraction level may result in scattered 
classes, bushy classification trees, and difficulty at concise semantic 
interpretation; whereas too high a level may result in the loss of 
classification accuracy. The generalization-based multi-level classification 
process has been implemented in the DB-Miner system.[4] <BR>&nbsp; 
<P><A name=class_rule_learn></A><FONT size=+1>Classification-rule 
learning</FONT><FONT size=+1></FONT> 
<P>Classification-rule learning involves finding rules or decision trees that 
partition given data into predefined classes. For any realistic problem domain 
of the classification-rule learning, the set of possible decision trees is too 
large to be searched exhaustively. In fact, the computational complexity of 
finding an optimal classification decision tree is NP hard. 
<P>Most of the existing induction-based algorithms use <B>Hunt</B>'s method as 
the basic algorithm.[2] Here is a recursive description of Hunt's method for 
constructing a decision tree from a set T of training cases with classes denoted 
{C<SUB>1</SUB>, C<SUB>2</SUB>, 

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -