17.txt
来自「This complete matlab for neural network」· 文本 代码 · 共 91 行
TXT
91 行
发信人: yaomc (白头翁&山东大汉), 信区: DataMining
标 题: Statistics is the road from DM to KDD.
发信站: 南京大学小百合站 (Thu Dec 20 15:56:15 2001), 站内信件
STATISTICS IS THE ROAD FROM
DATA MINING TO KNOWLEDGE DISCOVERY
by Arnold Goodman, Associate Director,
UCI Center for Statistical Consulting, agoodman@uci.edu
I have spent forty years as a statistician within information technology
and I founded the Annual Symposia on the Interface of Computing Science
and Statistics. When I discovered data mining in 1997, I hoped that
data mining and statistics would contribute to each other and benefit
from each other in solving client problems, by the cross-fertilization
of their approaches and results.
Although Interface '01 featured both data mining and bioinformatics,
KDD-2001 not only did not feature statistics, but also seemed arrogant
in its mistreatment of statistics. Still most data miners remain
ignorant of statistics, most statisticians remain ignorant of data
mining, and they continue to sarcastically criticize each other:
arrogance and ignorance are a self-destructive combination.
My comments need to be sufficiently negative for penetrating the
atmosphere of ignorance and arrogance, yet sufficiently positive for
motivating data miners to approach statistics productively. I offer
suggestions for data miners to improve and for program planners to
improve KDD-2002.
Unfortunately, the anti-statistical attitude will keep data mining
from reaching its actual potential. Such an attitude is also
increasing the probability of it following artificial intelligence and
expert systems through the typical computer technology stages of hype,
then hope, and finally has-been.
Data mining becomes knowledge discovery when it interprets and
assesses the mined patterns and relationships, according to
"Principles of Data Mining" by David Hand, Heikki Mannila and Padhraic
Smyth. This analytic journey, from the questions discovered in data
through answers developed from data to reasons provided by data, depends
upon statistical methods and thinking.
Randomness in data can be handled only by statistics, not by any
amount of database technology: making sense of data has always been
statistics, whether admitted or not. Data miners must stop relying
mostly on computational algorithms and denying a requirement for
statistical modeling.
My suggestion to data miners is a shift in attitude and a new
appreciation of the requirements for:
Broad statistical thinking to achieve what technology alone will not
be capable of achieving
A broader perspective on problems and a conscious openness to ideas from
other disciplines
I challenge key data miners to start a constructive dialogue with
Interface statisticians and others.
KDD-2001 had twice the attendance and cost, but half the breadth and
quality, of Interface '01: just ask any data miner or statistician who
happened to attend both Interface '01 and KDD-2001. Everyone I spoke
to who was competent in statistics was under-whelmed with the KDD
program.
In the three-and-a-half days, there was one "pearl of wisdom" and it
involved data mining with statistics: Russ Altman characterized the
new biology as going from idea through collected data to suggest an
hypothesis and to more data for testing this hypothesis as being true or
being false.
The panel on sampling had no one who was knowledgeable in sampling:
why did statisticians not want to participate? Of those presentations
I attended, around 1/3 were technically excellent, 1/3 were only good,
and 1/3 were either fair or poor. That leaves much room for 2002
improvement.
My corresponding suggestion to KDD-2002 program planners is a focused
increase in effort:
To improve breadth, invite more presenters and relevant tutorials with a
prior guidance
For higher quality, provide stronger guidance to both your keynoters and
your panelists
--
Welcome to http://datamining.bbs.lilybbs.net.
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.204.36.15]
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?