17.txt

来自「This complete matlab for neural network」· 文本 代码 · 共 91 行

TXT
91
字号
发信人: yaomc (白头翁&山东大汉), 信区: DataMining
标  题: Statistics is the road from DM to KDD.
发信站: 南京大学小百合站 (Thu Dec 20 15:56:15 2001), 站内信件

STATISTICS IS THE ROAD FROM 
DATA MINING TO KNOWLEDGE DISCOVERY 

by Arnold Goodman, Associate Director, 

UCI Center for Statistical Consulting, agoodman@uci.edu 

I have spent forty years as a statistician within information technology
 and I founded the Annual Symposia on the Interface of Computing Science
 and Statistics. When I discovered data mining in 1997, I hoped that 
data mining and statistics would contribute to each other and benefit 
from each other in solving client problems, by the cross-fertilization 
of their approaches and results. 

Although Interface '01 featured both data mining and bioinformatics, 
KDD-2001 not only did not feature statistics, but also seemed arrogant 
in its mistreatment of statistics. Still most data miners remain 
ignorant of statistics, most statisticians remain ignorant of data 
mining, and they continue to sarcastically criticize each other: 
arrogance and ignorance are a self-destructive combination. 

My comments need to be sufficiently negative for penetrating the 
atmosphere of ignorance and arrogance, yet sufficiently positive for 
motivating data miners to approach statistics productively. I offer 
suggestions for data miners to improve and for program planners to 
improve KDD-2002. 

Unfortunately, the anti-statistical attitude will keep data mining 
from reaching its actual potential. Such an attitude is also 
increasing the probability of it following artificial intelligence and 
expert systems through the typical computer technology stages of hype, 
then hope, and finally has-been. 

Data mining becomes knowledge discovery when it interprets and 
assesses the mined patterns and relationships, according to 
"Principles of Data Mining" by David Hand, Heikki Mannila and Padhraic 
Smyth. This analytic journey, from the questions discovered in data 
through answers developed from data to reasons provided by data, depends
 upon statistical methods and thinking. 

Randomness in data can be handled only by statistics, not by any 
amount of database technology: making sense of data has always been 
statistics, whether admitted or not. Data miners must stop relying 
mostly on computational algorithms and denying a requirement for 
statistical modeling. 

My suggestion to data miners is a shift in attitude and a new 
appreciation of the requirements for: 

Broad statistical thinking to achieve what technology alone will not 
be capable of achieving 
A broader perspective on problems and a conscious openness to ideas from
 other disciplines 
I challenge key data miners to start a constructive dialogue with 
Interface statisticians and others. 

KDD-2001 had twice the attendance and cost, but half the breadth and 
quality, of Interface '01: just ask any data miner or statistician who 
happened to attend both Interface '01 and KDD-2001. Everyone I spoke 
to who was competent in statistics was under-whelmed with the KDD 
program. 

In the three-and-a-half days, there was one "pearl of wisdom" and it 
involved data mining with statistics: Russ Altman characterized the 
new biology as going from idea through collected data to suggest an 
hypothesis and to more data for testing this hypothesis as being true or
 being false. 

The panel on sampling had no one who was knowledgeable in sampling: 
why did statisticians not want to participate? Of those presentations 
I attended, around 1/3 were technically excellent, 1/3 were only good, 
and 1/3 were either fair or poor. That leaves much room for 2002 
improvement. 

My corresponding suggestion to KDD-2002 program planners is a focused 
increase in effort: 

To improve breadth, invite more presenters and relevant tutorials with a
 prior guidance 
For higher quality, provide stronger guidance to both your keynoters and
 your panelists 
--

Welcome to http://datamining.bbs.lilybbs.net.

※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.204.36.15]

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?