⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 478.txt

📁 This complete matlab for neural network
💻 TXT
字号:
发信人: GzLi (笑梨), 信区: DataMining
标  题: Re: a review on Han's book
发信站: 南京大学小百合站 (Sat Apr 20 23:58:54 2002), 站内信件

【 在 daniel (飞翔鸟) 的大作中提到: 】
:    http://cs.nju.edu.cn/people/zhouzh/zhouzh.files/publication/tnn02.pdf
全文如下:

Data Mining: Concepts and Techniques by J. Han
and M. Kamber. (San Francisco, CA: Morgan Kaufmann,
2001, 550 pp., ISBN 1-55860-489-8). Reviewed by Zhi-
Hua Zhou.
Written from a database perspective, the reviewed book is
organized into 10 chapters. Chapter 1 provides an introduction
to data mining. Chapter 2 focuses on data warehouse and online
analytical processing. Chapter 3 presents techniques for
preprocessing the data prior to mining. Chapter 4 introduces
the primitives of data mining that define the specification of a
data mining task. Chapter 5 devotes to descriptive data mining.
Chapter 6 is for association rule mining. Chapter 7 presents
techniques for classification and regression. Chapter 8 devotes
to cluster analysis. Chapter 9 focuses on data mining in
advanced data repository systems. Chapter 10 discusses
applications and challenges of data mining.
What is impressive about this book is that it covers almost
all aspects of concepts and techniques of data mining. The
coverage is even greatly extended by the bibliography that
contains over 400 literatures. A useful bibliographic notes is
provided at the end of each chapter, which gives a roadmap
for readers who hope to learn more from the literatures. A
great index occupying 18 pages makes this book a good
reference book or handbook for data mining researchers.
Moreover, this book provides a lot of algorithms geared to
the discovery of data patterns hidden in large, real databases.
Those algorithms are illustrated in pseudo-code and are easy
to be translated into concrete programming languages, which
may be especially helpful to data mining practitioners.
Furthermore, this book has some good features for it to be
chosen as a textbook for classes. Firstly, the materials are
presented in a question-and-answer style. Secondly, each
chapter is provided with a set of exercises that could be used
as assignments. Thirdly, a suite of slides are provided at the
book homepage (www.cs.sfu.ca/~han/dm_book).
However, in spite of its strengths and attractive features,
this book also has some drawbacks.
There are much typos or language errors in this book.
Although the authors have provided an erratum at the book
homepage, there are still much has not been listed.
Some organization of this book seems a bit chaos. For
example, Section 4.4 discusses the architectures of data
mining systems, which has little relation with the main topic
of Chapter 4, that is, data mining primitives. Maybe it is better
to merge this section into Section 2.6.
There are some ambiguities in this book. Examples are as
follows. In Section 2.2, some sales data are depicted as cubes
Book Review
and called data cube representation. But later, it is said that
they are cuboids, and data cube is the lattice of cuboids. Such
kind of description may cause panic to novices when they are
trying to know what is data cube. Maybe it is better to call
those as cube representation instead of data cube
representation. In Section 4.1.4, novelty is described as an
objective interestingness measure. But it should be at least
mentioned that novelty is more often regarded as a subjective
interestingness measure. In Chapter 6, A ∪ B is used to denote
event A and B occur simultaneously. However, A ∪ B often
means at least one event (A or B) occur. Maybe it is better to
replace A ∪ B with A and B. In Chapter 7, predication is used
parallel to classification. But in general, approximating a realvalued
function is referred to as regression instead of
prediction, and predication encompasses both classification
and regression.
Some claims in this book may be not very adequate. For
example, in Section 5.6.1, the authors list several differences
between descriptive mining methods and machine learning
methods, but some of which, such as the claim that descriptive
mining methods do not explicitly store the negative data while
machine learning methods do, are not fair.
Moreover, it may be better to add some materials to this
book. In Section 4.1.4, subjective interestingness measures
such as actionability and unexpectedness could be described
so that the reader can know what the subjective measures look
like. In Section 7.4.2, the Laplacian correction to Naive
Bayesian learning should be presented, which is used when
training data of some classes is not available. For example,
suppose the class attribute buys is binary, and the instances are
described by two independent binary attributes, i.e. student
and credit. If all the training instances are positive/negative,
then the class label of a new instance (student, credit) should
be determined through comparing below probabilities:

where the number of training instances with property X is
denoted as #X, the positive and negative values of attribute Y
are represented as Y and Y respectively, the total number of
training instances is denoted as #total.
Overall, this is a good book that could benefit data mining
researchers, practitioners, and those who want to learn
something about data mining. It is also qualified to be used as
a textbook for classes. However, since there is still much room
for improvement, a second edition may be necessary for this
book to become an excellent, or even classical, literature in
this area.

--
GzLi如是说:
     Joy and pain are coming and going both
     Be kind to yourself and others.

welcome to DataMining  http://DataMining.bbs.lilybbs.net
welcome to Matlab http://bbs.sjtu.edu.cn/cgi-bin/bbsdoc?board=Matlab

※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 211.80.38.29]

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -