⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 490.txt

📁 This complete matlab for neural network
💻 TXT
字号:
发信人: yaomc (白头翁&山东大汉), 信区: DataMining
标  题: 关于Han的BOOK的一些讨论。
发信站: 南京大学小百合站 (Fri Jan 18 15:03:54 2002), 站内信件

(1) BY:wnqian/
First of all, it is a well organized, well written book. However, I 
think this book is not suitable for introductory course or research. You
 can see that the contents in the book are mostly contributed by 
researchers in recent years. MOST of them are not applied in REAL, 
LARGE, IMPORTANT (MISSION CRITICAL) applications. Then, this book only 
introduced a bunch of concepts and techniques (as the book is named). If
 these concepts or techniques will survive is unknown.  
Therefore, it MAY mis-lead the beginers that these techniques are 
classic ones, if they use this book as introductory textbook. I read 
some posters here, and find that somebody is really mis-lead by it. On 
the other hand, this book is a good list for recent research work 
about data mining. But it is not suitable for research. For each concept
 or technique, the book doesn't research it in detail. You can know 
nothing if you want to do research on it. If you want to do reseach on 
any topic, there should be hundreds of papers waiting for you to read. 
From this viewpoint, the only useful part is the references. Then, you 
may ask: "why they write this book?" I think the answer is that this 
book is for people who want to do data mining from application domain. 
They know the data, they know the requests, but they don't know the 
technique. So, they should find something out. And they cann't read 
papers. So, they MAY use this book to choose the right techniuqes.
Another thing I should mention is that Han's background is English and 
database. So, he can write good papers/books from literal aspect. BUT, 
many of his work is not solid! His standpoint is database, which may not
 be the mainstream in data mining/KDD. The viewpoint of people from 
machine learning or statistics may be totally different with his.
I do some research on data mining. My homepage is:
http://www.cs.fudan.edu.cn/ch/third_web/WebDB/wnqian_English.htm
Any discussion is welcome!
(2) BY: ALEX/
That's a good point.  I have recently join this group to find out if 
thereis anything/anybody helpful to me. I'm now working for Lucent R&D 
inshenzhen, and personally find the data mining/KDD will be a trend in 
theTelcom Industry(TMN) and which may be one of my goals in future. 
However,most of the technical discussions here are somewhat too 
theoritical, and toofar away from real application/implementation.
We can keep on the discussion.
Best Regards,
Alex
(3) BY:huj3
The viewpoint makes sense. However I still think Han's book gives a good
 introduction to data mining field. Why? Because it covers the major 
topics, important algorithms and ideas in a very systematic way. 
Although for eachtopic there might be thousands relevant paper, most 
of them share some important ideas, which come from those papers as 
milestones. Just think how many variations of BIRCH, a hierachical 
clustering technique, emerge after BIRCH was proposed. Additionally, 
it provides a huge bunch of references, which are valuable for 
beginners.
A good introductory book is not necessarily an encyclopedia, which 
covers everything. Actually every book, every article has its own 
point of view. It focuses on something, and mentions others a little 
bit. That's it! Think about survey papers, is there any paper of the 
kind that successfully cover every thing like methodologies, 
algorithms in the associated field? If so, the field must be too narrow.
 
Data mining is a multidisciplinary field. Researchers in database, 
machine learning, pattern recognition, statistics would make their own 
contribution, and meanwhile, benefit from each other. Thinking it 
carefully, it is not difficult to find the differences between their 
research diagrams. That's another long story.
As to the theory and application, I don't think we pay too much 
attention to the theoretical stuffs.Actually we just touch the surface 
of the theory yet.. Take telecommunication industry as an example. How 
to analyze a datastream in an acceptable time limit is really a 
theoretical problem, which comes out from the application field. My 
suggestion is that we shouldn't treat the data mining field from an 
engnieering point of view, which dominates research for a long time in 
China.
(4) BY :Wnqian
    Yes, as I said, Han's book do give a brief introduction. But also 
not 
suitable for beginers.
Its organization for the whole field of data mining is quite good. But,
 each chapter, such as Cluster Analysis (Chapter 8), is far from good 
enough. Han tries to organize each part in a hieararchical way, which do
 help at some time, but fails to find the relationships between the 
contents. You can see that in his book, each subsestion usually 
corresponds to one paper, but lacks of cross-reference between sections.
 :)
BIRCH is important since it is the first clustering algorithm that can 
handle VLDB well. But I think Han's book doesn't describe it well in the
 context. Similarly, DBSCAN and OPTICS can work because that they have 
indexing support, but still, Han et al. don't emphasize on it, which may
 greatly mis-lead the readers. 
A good introductory book should make it clear that the application and 
research is different, and what has been applied in real applications, 
what is developing. From this point of view, Han's book does not 
satisify the request.
According to the comments on the theory, I agree with you. In fact, I am
 also doing work on research. And I like it. But, I think I should 
remind that when reading Han's book, we should think more.
Furthermore, people have different background have different viewpoint.
 For you or me, we have academic background, of course we will stand 
in the point that data mining is a quite good research field. But at the
 side of those people from companies, how can you let them think about 
the algorithm of BIRCH, CURE, ROCK, DBSCAN, and the related complexity 
issues?
It seems that you have quite a lot of thought about clustering and 
data stream. It's really interesting. Discussion is welcome!

(5)BY:Alex
Hi Weining and Huj,
I have to say that I only know data mining in surface, but not so 
detailed it theory and algorithms.
Actually I come out with this idea from my work. The system we design 
is something very much alike a data warehouse or is a data warehouse, we
collect all types of telcom data from different management systems and
stored in a unified modelled the database.
The problem here is how to use the data. The status now can be called 
a GIGO(garbage in garbage out). Currently, most of the management system 
in Telcom focus on the Equipment level or Network level, but how to use 
the data focus on the Service level which is the leading edge and trend of 
the telcom management systems.
I think here I should introduce some concept in the Telcom Network which
 is totally different to the Data Network(Internet or TCP/IP network) most 
us familiar with. Telcom Network is basicly formed up by Telphone/Data
Network( connect to End user, Public Phone user, Leased Line user), 
Access Network(interface between the core transport network and end user 
network, e.g. ATM, GB router), and Transport Network(SDH/DWDM, core network,
backbone). e.g. China Telcom build the core network and most of the access 
network, while some of the ISPs leased a bandwidth and privide the access net
work
 to internet users.
The Service Level systems bind closely to the ISPs, what the ISPs care 
when they lease a 100M bandwidth from China Telcom: the Qos, in 
terminologies, the FM and PM info(Fault Management and Performance Management
), all 
those data are already available in different systems provided by different
vendors.
As I have said, this service level requirement becomes hot recently, 
in China, this is really in the beginning stages, just think about it: 
years ago when there is only China Telcom , how ISPs can request for such
services. But now as the separation of Telcom Industry in China comes 
to reality, this becomes hotter.

Here is how I turn to data mining, when millions of circuits and there 
in the data ware house, how the find out the most problematic
circuits(performance is low and with a lot of service-affecting 
alarms) andaddress where the problem is?(what is a circuit: you can think it 
as a 
e.g. 2M leased line from ShangHai to Shenzhen and provide the co-located 
company a circuit to build its private network.)
It comes out firstly to me something like a statistically issue, but 
when thinking deeper in implementation/application level, this becomes data
mining.

I'm quite busy with my assignment and can only study it in my spare 
time. Hope I can get clues from you guys. As most of you should know more 
about DM than I do.


Best Regards,
Alex
--

Welcome to http://datamining.bbs.lilybbs.net.

※ 修改:.yaomc 於 Jan 18 16:02:13 修改本文.[FROM: 202.204.36.15]
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.204.36.15]

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -