📄 369.txt
字号:
发信人: roamingo (漫步鸥), 信区: DataMining
标 题: Re: 有没有关于XML的cluster算法???
发信站: 南京大学小百合站 (Sun Nov 18 10:34:11 2001), 站内信件
好像没有特别针对XML文档的clustering. 但是研究英文文档clustering的到是很多.
我刚刚下载的新加坡国立的一个TextMinerDemo.zip, 是采用LargeItems[1]方法对文本
自动cluster的. 国内下载TextMinerDemo.zip请访问:
http://dctc.sjtu.edu.cn/adaptive/x/view/A9903022/Clustering_Software
更多的Clustering文章可访问:
http://dctc.sjtu.edu.cn/adaptive/x/view/A9903022/Clustering
[1] K. Wang, C. Xu, and B Liu. Clustering transactions using large items.
CIKM-99, 1999. http://citeseer.nj.nec.com/wang99clustering.html (wang99clusteri
http://www.comp.nus.edu.sg/~wangk/TM/index.html
What is TextMiner?
TextMiner is a tool built by a research team at School of Computing,
Dr. Wang Ke and student Xu Chu, for clustering a collection of
transactions (such as text documents).
How TextMiner works?
Traditionally, the similarity of a cluster of transactions is defined in
terms of pairwise similarity of the transactions in the cluster. For
example, all distance based matrics are a form of pairwise similarity.
Approaches using pairwise similarity suffer from two serious limitations.
First, computing the similarity of a cluster is very expensive because
every pair of objects in the cluster must be examined in general. Second,
pairwise similarity is neither sufficient nor necessary for a cluster of
transactions to be similar. TextMiner completely abandons the notion of
pairwise similarity and adopts a novel idea of measuring the cluster
similarity using large items. An item is large in a cluster (such as a
keyword in a group of text documents) if it is contained in some minimum
fraction of the transactions in the cluster. Our clusting criterion is
that there are many large items within each cluster and little overlapping
of such items across different clusters. Our research and experiments show
that this criterion often produces better clusterings than tranditional
methods. For more details, see our recent publication .
Features At A Glance
Automated data preprocessing
Adaptive number of clusters
Hierarchical clustering
Minimum user input parameters
Friendly user interface
System Requirements
Processor: Pentium 75 or better
RAM: 32MB
HDD: 10MB free disk space
OS: Windows NT/98/95
Try TextMiner
TextMiner 1.0 is now available for trial download. Please read the agreements
before you proceed to download.
--
Read digitally, save a tree.
※ 修改:.roamingo 於 Nov 18 10:39:30 修改本文.[FROM: 202.120.7.27]
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.120.7.27]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -