字号:
From fmurtagh@eso.org Wed Apr 14 17:01:47 1993Replied: Fri, 23 Apr 93 01:52:34 MDTReplied: "To: fmurtagh@eso.org Fcc: soft/cluster"Return-Path: fmurtagh@eso.org Delivery-Date: Wed, 14 Apr 93 08:02:04 MDTReturn-Path: fmurtagh@eso.orgReturn-Path: <fmurtagh@eso.org>Received: from st2.hq.eso.org by icsia.ICSI.Berkeley.EDU (4.1/HUB$Revision: 1.2 $) id AA00679; Wed, 14 Apr 93 08:01:56 PDTDate: Wed, 14 Apr 93 17:01:47 +0200From: fmurtagh@eso.orgMessage-Id: <9304141501.AA04435@st2.hq.eso.org>Received: by st2.hq.eso.org (4.1/ eso-1.1) id AA04435; Wed, 14 Apr 93 17:01:47 +0200To: stolcke@ICSI.Berkeley.EDUSubject: cluster-2.7Status: OJust a quick comment on your code, which I pulled over today (- I had it before at some stage), if I may.Your algorithm has nothing to do with the single link method (of course bothare hierarchical clustering methods). Given that your dissimilarity betweencluster centers is a straight Euclidean (or other metric) distance, and giventhat your definition of cluster centers is a weighted average of the subclustercenters, your algorithm is an implementation of the Centroid or UPGMC method.The latter stands for something like 'Unweighted pair group method - centroids' - see PHA Sneath and RR Sokal, Numerical Taxonomy, Freeman, 1973.With a small change in the dissimilarity formula between cluster centers, to also take account of cluster sizes, you could have Ward's minimum variance method. Alternatively with a small change in the cluster update formula, to*not* take cluster size into account, you get the Median or Gower or WPGMCmethod. Note that in the case of the Centroid and Median methods, unlike Ward's method,you can run into inversion or reversal problems, i.e. nonmonotic variation in the agglomerative criterion value. You may be taking care of this situation(I'm not 100% sure) in the part of your 'cluster.c' code where you have a comment: 'distance to new node is smaller than previous nnb, ...'. With dueallowances made for non-unique situations, inversions cannot arise in the caseof the Ward method.The single link method (as also the complete link or average link methods) don'tallow for the definition of a cluster center (which can subsequently be usedfor calculating inter-cluster dissimilarity, as in the case of your implementation).And... Another thing. In carrying out an agglomeration, you scan the list of objects remaining and their nearest neighbors. Each pair of 'reciprocal nearest neighbors' (i.e. x is NN of y, and y is NN of x) with a guarantee that there will be no untoward knock-on effects when determining the new cluster centers, - i.e. updating will be sufficiently local. More on this, etc., is in my 'Multidimensional Clustering Algorithms', Compstat Lectures 4, Physica-Verlag, 1985. Hope you don't mind me sending you this... Good luck with further developmentof your program. Fionn Murtaghfmurtagh@eso.orgFrom murtagh@stsci.edu Fri Apr 23 10:08:46 1993Return-Path: murtagh@stsci.edu Delivery-Date: Fri, 23 Apr 93 07:09:05 MDTReturn-Path: murtagh@stsci.eduReturn-Path: <murtagh@stsci.edu>Received: from STSCI.EDU by icsia.ICSI.Berkeley.EDU (4.1/HUB$Revision: 1.2 $) id AA12948; Fri, 23 Apr 93 07:08:54 PDTReceived: Fri, 23 Apr 93 10:08:47 EDT from NEMESIS.STSCI.EDU (ceres.stsci.edu) by stsci.edu (4.1)Received: Fri, 23 Apr 93 10:08:46 EDT by ceres.stsci.edu (4.1)Date: Fri, 23 Apr 93 10:08:46 EDTFrom: Fionn Murtagh <murtagh@stsci.edu>Message-Id: <9304231408.AA09009@NEMESIS.STSCI.EDU>To: stolcke@ICSI.Berkeley.EDUSubject: re: cluster-2.7Status: OMany thanks for the message. There are of course clustering methods (and other methods which could also be useful - princ. comp. analysis, etc.) available in the mainstream commercial statistical packages - SAS, SPSS, BMDP, SYSTAT - and the manualsof these packages often give a good, succinct overview of whatever methods areavailable. These are small selections of methods, though. I myself use the S-Plus statistical language and graphics environment quite a bit. At some point soon I hope to put some of my own code onto the Statlib server (see below). S-Plus is commercial, and is an augmented version of S, developed by ATT Bell. There is a package (again, commercial) called CLUSTAN which specializes in clustering methods, and is available from Clustan Ltd. in Edinburgh. In biology and such fields, there are packages - one which I have not used is NT-SYS avaiable from F.J. Rohlf in SUNY - Stony Brook. A clustering-related list which periodically has information on code, or requests, is CLASS-L at Bitnet node sbccvm. (Send one-liner: 'subscribe CLASS-L your-name' to listserv@sbccvm.bitnet) Statlib is the premier location for 'public domain' statistics code, it includes quite a bit of material for S, and it also includes some clusteringetc. code. The Xgobi 3-d spin/rotation/etc. data viewer is standalone and(hopefully still) is available there. Algorithms from 'Applied Statistics'have some (limited) clustering code, and there are also some things which Iput there (in area 'multiv' if I remember correctly) in the past (Fortran...).To get started using Statlib, send the one-line message: 'send index' to 'statlib@lib.stat.cmu.edu'. Stablib is ftp'able at 'lib.stat.cum.edu', login with username 'statlib' (This access mechanism is probably available through gopher also.)Hope this helps... As you can see there is a lot around, but it might notbe in the shape that you want it.All the best,Fionn Murtaghfmurtagh@eso.orgmurtagh@stsci.eduReturn-Path: crr@cogsci.psych.utah.edu Delivery-Date: Fri, 02 Jul 93 12:28:36 MDTReturn-Path: crr@cogsci.psych.utah.eduReturn-Path: <crr@cogsci.psych.utah.edu>Received: from cs.utah.edu by icsia.ICSI.Berkeley.EDU (4.1/HUB$Revision: 1.2 $) id AA13104; Fri, 2 Jul 93 12:28:32 PDTReceived: from cogsci.psych.utah.edu by cs.utah.edu (5.65/utah-2.21-cs) id AA24685; Fri, 2 Jul 93 13:28:30 -0600Received: by cogsci.psych.utah.edu (AIX 3.2/UCB 5.64/4.03) id AA08078; Fri, 2 Jul 1993 13:17:17 -0600Message-Id: <9307021917.AA08078@cogsci.psych.utah.edu>To: fmurtagh%eso.org@cs.utah.eduCc: crr@cogsci.psych.utah.edu, stolcke%icsi.berkeley.edu@cs.utah.eduSubject: Re: statlib In-Reply-To: Your message of Fri, 02 Jul 93 09:26:11 +0200. <9307020726.AA01893@st2.hq.eso.org> Date: Fri, 02 Jul 93 13:17:17 -0700From: crr@cogsci.psych.utah.edu It has just worked okay for me. Note that the node name is lib.stat.c mu.edu (you wrote 'cum' instead of 'cmu' = Carniegie-Mellon Univ.). Note als o that this ftp access requires you to log in with username 'statlib' (so it' s not anonymous ftp), and then you are asked to give your email address as a password.Got it now. The problem was that there was a typo in the mail messageyou sent to Andreas Stolcke in response to his clustering program.Thanks for your help.Charlie
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -