478.txt

来自「This complete matlab for neural network」· 文本代码 · 共 113 行

TXT

113 行

发信人: GzLi (笑梨), 信区: DataMining
标  题: Re: a review on Han's book
发信站: 南京大学小百合站 (Sat Apr 20 23:58:54 2002), 站内信件

【 在 daniel (飞翔鸟) 的大作中提到: 】
:    http://cs.nju.edu.cn/people/zhouzh/zhouzh.files/publication/tnn02.pdf
全文如下：

Data Mining: Concepts and Techniques by J. Han
and M. Kamber. (San Francisco, CA: Morgan Kaufmann,
2001, 550 pp., ISBN 1-55860-489-8). Reviewed by Zhi-
Hua Zhou.
Written from a database perspective, the reviewed book is
organized into 10 chapters. Chapter 1 provides an introduction
to data mining. Chapter 2 focuses on data warehouse and online
analytical processing. Chapter 3 presents techniques for
preprocessing the data prior to mining. Chapter 4 introduces
the primitives of data mining that define the specification of a
data mining task. Chapter 5 devotes to descriptive data mining.
Chapter 6 is for association rule mining. Chapter 7 presents
techniques for classification and regression. Chapter 8 devotes
to cluster analysis. Chapter 9 focuses on data mining in
advanced data repository systems. Chapter 10 discusses
applications and challenges of data mining.
What is impressive about this book is that it covers almost
all aspects of concepts and techniques of data mining. The
coverage is even greatly extended by the bibliography that
contains over 400 literatures. A useful bibliographic notes is
provided at the end of each chapter, which gives a roadmap
for readers who hope to learn more from the literatures. A
great index occupying 18 pages makes this book a good
reference book or handbook for data mining researchers.
Moreover, this book provides a lot of algorithms geared to
the discovery of data patterns hidden in large, real databases.
Those algorithms are illustrated in pseudo-code and are easy
to be translated into concrete programming languages, which
may be especially helpful to data mining practitioners.
Furthermore, this book has some good features for it to be
chosen as a textbook for classes. Firstly, the materials are
presented in a question-and-answer style. Secondly, each
chapter is provided with a set of exercises that could be used
as assignments. Thirdly, a suite of slides are provided at the
book homepage (www.cs.sfu.ca/~han/dm_book).
However, in spite of its strengths and attractive features,
this book also has some drawbacks.
There are much typos or language errors in this book.
Although the authors have provided an erratum at the book
homepage, there are still much has not been listed.
Some organization of this book seems a bit chaos. For
example, Section 4.4 discusses the architectures of data
mining systems, which has little relation with the main topic
of Chapter 4, that is, data mining primitives. Maybe it is better
to merge this section into Section 2.6.
There are some ambiguities in this book. Examples are as
follows. In Section 2.2, some sales data are depicted as cubes
Book Review
and called data cube representation. But later, it is said that
they are cuboids, and data cube is the lattice of cuboids. Such
kind of description may cause panic to novices when they are
trying to know what is data cube. Maybe it is better to call
those as cube representation instead of data cube
representation. In Section 4.1.4, novelty is described as an
objective interestingness measure. But it should be at least
mentioned that novelty is more often regarded as a subjective
interestingness measure. In Chapter 6, A ∪ B is used to denote
event A and B occur simultaneously. However, A ∪ B often
means at least one event (A or B) occur. Maybe it is better to
replace A ∪ B with A and B. In Chapter 7, predication is used
parallel to classification. But in general, approximating a realvalued
function is referred to as regression instead of
prediction, and predication encompasses both classification
and regression.
Some claims in this book may be not very adequate. For
example, in Section 5.6.1, the authors list several differences
between descriptive mining methods and machine learning
methods, but some of which, such as the claim that descriptive
mining methods do not explicitly store the negative data while
machine learning methods do, are not fair.
Moreover, it may be better to add some materials to this
book. In Section 4.1.4, subjective interestingness measures
such as actionability and unexpectedness could be described
so that the reader can know what the subjective measures look
like. In Section 7.4.2, the Laplacian correction to Naive
Bayesian learning should be presented, which is used when
training data of some classes is not available. For example,
suppose the class attribute buys is binary, and the instances are
described by two independent binary attributes, i.e. student
and credit. If all the training instances are positive/negative,
then the class label of a new instance (student, credit) should
be determined through comparing below probabilities:

where the number of training instances with property X is
denoted as #X, the positive and negative values of attribute Y
are represented as Y and Y respectively, the total number of
training instances is denoted as #total.
Overall, this is a good book that could benefit data mining
researchers, practitioners, and those who want to learn
something about data mining. It is also qualified to be used as
a textbook for classes. However, since there is still much room
for improvement, a second edition may be necessary for this
book to become an excellent, or even classical, literature in
this area.

--
GzLi如是说：
     Joy and pain are coming and going both
     Be kind to yourself and others.

welcome to DataMining  http://DataMining.bbs.lilybbs.net
welcome to Matlab http://bbs.sjtu.edu.cn/cgi-bin/bbsdoc?board=Matlab

※ 来源:．南京大学小百合站 bbs.nju.edu.cn．[FROM: 211.80.38.29]

478.txt - 源码说明

本页面展示了「This complete matlab for neural network」中的 478.txt 源码文件，采用文本编程语言编写，共 113 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与complete相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?