⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 174.txt

📁 This complete matlab for neural network
💻 TXT
字号:
发信人: yaomc (白头翁&山东大汉), 信区: DataMining
标  题: [合集]有没有关于XML的cluster算法???
发信站: 南京大学小百合站 (Sat Dec 29 16:37:47 2001), 站内信件

ssos (存在与虚无) 于Sun Nov 18 09:54:18 2001提到:


roamingo (漫步鸥) 于Sun Nov 18 10:34:11 2001提到:

好像没有特别针对XML文档的clustering. 但是研究英文文档clustering的到是很多.
我刚刚下载的新加坡国立的一个TextMinerDemo.zip, 是采用LargeItems[1]方法对文本
自动cluster的. 国内下载TextMinerDemo.zip请访问:
http://dctc.sjtu.edu.cn/adaptive/x/view/A9903022/Clustering_Software

更多的Clustering文章可访问:
http://dctc.sjtu.edu.cn/adaptive/x/view/A9903022/Clustering

[1] K. Wang, C. Xu, and B Liu. Clustering transactions using large items. 
CIKM-99, 1999. http://citeseer.nj.nec.com/wang99clustering.html (wang99clusteri

http://www.comp.nus.edu.sg/~wangk/TM/index.html

What is TextMiner?

TextMiner is a tool built by a research team at School of Computing, 
Dr. Wang Ke and student Xu Chu, for clustering a collection of 
transactions (such as text documents). 
          
How TextMiner works?

Traditionally, the similarity of a cluster of transactions is defined in 
terms of pairwise similarity of the transactions in the cluster. For 
example, all distance based matrics are a form of pairwise similarity. 
Approaches using pairwise similarity suffer from two serious limitations. 
First, computing the similarity of a cluster is very expensive because 
every pair of objects in the cluster must be examined in general. Second, 
pairwise similarity is neither sufficient nor necessary for a cluster of 
transactions to be similar. TextMiner completely abandons the notion of
pairwise similarity and adopts a novel idea of measuring the cluster 
similarity using large items. An item is large in a cluster (such as a
keyword in a group of text documents) if it is contained in some minimum
fraction of the transactions in the cluster. Our clusting criterion is
that there are many large items within each cluster and little overlapping 
of such items across different clusters. Our research and experiments show 
that this criterion often produces better clusterings than tranditional
methods. For more details, see our recent publication . 
          
Features At A Glance

  Automated data preprocessing
  Adaptive number of clusters
  Hierarchical clustering
  Minimum user input parameters
  Friendly user interface 
          
System Requirements 

  Processor: Pentium 75 or better 
  RAM: 32MB 
  HDD: 10MB free disk space 
  OS: Windows NT/98/95 
          
Try TextMiner

TextMiner 1.0 is now available for trial download. Please read the agreements
before you proceed to download.


ssos (存在与虚无) 于Sun Nov 18 10:57:30 2001提到:

thankx


ssos (存在与虚无) 于Sun Nov 18 11:20:23 2001提到:

5555~~~~~
没有找到需要的算法:(

有没有是关于XML树型结构进衏luster的方法呢??


roamingo (漫步鸥) 于Sun Nov 18 20:07:10 2001提到:

如何按照树型结构进行Cluster呢? 比较树之间结构的相似度? 要考虑文字内容吗?
有什么具体应用呢?  倒是一个比较超前的想法, 日后网络上XML越来越多, 应该有
很好的发展前景.



ssos (存在与虚无) 于Mon Nov 19 09:45:28 2001提到:

应当考虑文字内容吧
两个树型结构有结构上的不一致,还有内容上的不一致,根据这进行cluster
有啥好办法么??


roamingo (漫步鸥) 于Mon Nov 19 13:08:35 2001提到:

聚类基本的方法是定义两个对象的距离. 最好能满足三角形边长不等式: 
L(AB) <= L(AC)+L(BC)  
再使用经典的k-means或hierarchical方法.

这对连续低维数据是比较方便的. 对离散数据可以采用一些特殊的离散距离计算方法. 
也可采用不依赖于距离的方法: 定义一个优化目标, 再优化得到某个local mininal. 

对于具体的应用, 如XML文档的cluster, 就要具体分析了.



⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -