⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 cluster_test1.m

📁 关于聚类问题的matlab程序以及其中函数的说明
💻 M
字号:
%最短距离法系统聚类分析
%  ZSCORE Standardized z score.
%     Z = ZSCORE(D) returns the deviation of each column of D from its mean 
%     normalized by its standard deviation. This is known as the Z score of D.
%     For a column vector V, z score is Z = (V-mean(V))./std(V)
%     返回每列的标准偏差;
%     ZSCORE is commonly used to preprocess data before computing distances for 
%     cluster analysis.一般用于聚类计算距离前进行数据预处理
%--------------------------------------------------------------------------
%  PDIST Pairwise distance between observations.
%     Y = PDIST(X) returns a vector Y containing the Euclidean distances
%     between each pair of observations in the M-by-N data matrix X.
%     默认返回包含矩阵X中每两对向量的欧式距离;
%     Rows of X correspond to observations, columns correspond to variables. 
%      Y is an (M*(M-1)/2)-by-1 vector, corresponding to the M*(M-1)/2 pairs 
%     of observations in X.
%     一共产生m*(m-1)/2个距离,注意是以行向量作为一个参数的n次采样向量,
%     即m*n表示m个参数,n次采样
%--------------------------------------------------------------------------
%     Y = PDIST(X, DISTANCE) computes Y using DISTANCE.  Choices are:
%  
%         'euclidean'   - Euclidean distance欧式距离
%         'seuclidean'  - Standardized Euclidean distance, each coordinate
%                         in the sum of squares is inverse weighted by the
%                         sample variance of that coordinate
%         'cityblock'   - City Block distance
%         'mahalanobis' - Mahalanobis distance
%         'minkowski'   - Minkowski distance with exponent 2马式距离
%         'cosine'      - One minus the cosine of the included angle
%                         between observations (treated as vectors)
%         'correlation' - One minus the sample correlation between
%                         observatons (treated as sequences of values).
%         'hamming'     - Hamming distance, percentage of coordinates
%                         that differ
%         'jaccard'     - One minus the Jaccard coefficient, the
%                         percentage of nonzero coordinates that differ
%         function      - A distance function specified using @, for
%                         example @DISTFUN
%  
%     A distance function must be of the form
%  
%           function D = DISTFUN(XI, XJ, P1, P2, ...),
%  
%     taking as arguments two L-by-N matrices XI and XJ each of which
%     contains rows of X, plus zero or more additional problem-dependent
%     arguments P1, P2, ..., and returning an L-by-1 vector of distances D,
%     whose Kth element is the distance between the observations XI(K,:)
%     and XJ(K,:).
%  
%     Y = PDIST(X, DISTFUN, P1, P2, ...) passes the arguments P1, P2, ...
%     directly to the function DISTFUN.
%  
%     Y = PDIST(X, 'minkowski', P) computes Minkowski distance using the
%     positive scalar exponent P.
%  
%     The output Y is arranged in the order of ((1,2),(1,3),..., (1,M),
%     (2,3),...(2,M),.....(M-1,M)), i.e. the upper right triangle of the full
%     M-by-M distance matrix.  To get the distance between the Ith and Jth
%     observations (I < J), either use the formula Y((I-1)*(M-I/2)+J-I), or
%     use the helper function Z = SQUAREFORM(Y), which returns an M-by-M
%     square symmetric matrix, with the (I,J) entry equal to distance between
%     observation I and observation J.
%  
%     Example:
%  
%        X = randn(100, 5);                 % some random points
%        Y = pdist(X, 'euclidean');         % unweighted distance
%        Wgts = [.1 .3 .3 .2 .1];           % coordinate weights
%        Ywgt = pdist(X, @weucldist, Wgts); % weighted distance
%  
%        function d = weucldist(XI, XJ, W) % weighted euclidean distance
%        d = sqrt((XI-XJ).^2 * W');
%--------------------------------------------------------------------------
%  SQUAREFORM Square matrix formatted distance. 
%     Z = squareform(Y) converts the output of PDIST function into a 
%     square format, so that Z(i,j) denotes the distance between the
%     i and j objects in the original data.将pdist的输出转换成一个对称方阵形式,
%     因此Z(i,j)表示原始数据中参数i和j之间的距离
%--------------------------------------------------------------------------
%  LINKAGE Create hierarchical cluster tree. 创建分等级的聚类树
%     Z = LINKAGE(Y) creates a hierarchical cluster tree, using the single
%     linkage algorithm.  The input Y is a distance matrix such as is
%     generated by PDIST.  Y may also be a more general dissimilarity
%     matrix conforming to the output format of PDIST.
%  
%     Z = LINKAGE(Y, method) creates a hierarchical cluster tree using
%     the specified algorithm. The available methods are:
%  
%        'single'   --- nearest distance 最短距离法
%        'complete' --- furthest distance最长距离法
%        'average'  --- average distance 平均距离法
%        'centroid' --- center of mass distance (the output Z is meaningful
%                       only if Y contains Euclidean distances)
%        'ward'     --- inner squared distance
%  
%     Cluster information will be returned in the matrix Z with size m-1
%     by 3, where m is the number of observations in the original data. 
%     Column 1 and 2 of Z contain cluster indices linked in pairs
%     to form a binary tree. The leaf nodes are numbered from 1 to
%     m. They are the singleton clusters from which all higher clusters
%     are built. Each newly-formed cluster, corresponding to Z(i,:), is
%     assigned the index m+i, where m is the total number of initial
%     leaves. Z(i,1:2) contains the indices of the two component
%     clusters which form cluster m+i. There are n-1 higher clusters
%     which correspond to the interior nodes of the output clustering
%     tree. Z(i,3) contains the corresponding linkage distances between
%     the two clusters which are merged in Z(i,:), e.g. if there are
%     total of 30 initial nodes, and at step 12, cluster 5 and cluster 7
%     are combined and their distance at this time is 1.5, then row 12
%     of Z will be (5,7,1.5). The newly formed cluster will have an
%     index 12+30=42. If cluster 42 shows up in a latter row, that means
%     this newly formed cluster is being combined again into some bigger
%     cluster.
%  
%     The centroid method can produce a cluster tree that is not monotonic.
%     This occurs when the distance from the union of two clusters to a third
%     cluster is less than the distance from either individual cluster to
%     that third cluster. In such a case, sections of the dendrogram change
%     direction.  This is an indication that another method should be used.
%--------------------------------------------------------------------------
clc;
clear;
X=[7.90 39.77 8.49 12.94 19.27 11.05 2.04 13.29;
    7.68 50.37 11.35 13.3 19.25 14.59 2.75 14.87;
    9.42 27.93 8.20 8.14 16.17 9.42 1.55 9.76;
    9.16 27.98 9.01 9.32 15.99 9.10 1.82 11.35;
    10.06 28.64 10.52 10.05 16.18 8.39 1.96 10.81];
%X1=xlsread('d:\test1.xls');X=X';
BX=zscore(X);   %标准化数据矩阵
Y=pdist(X)  %用欧氏距离计算两两之间的距离
D=squareform(Y) %欧氏距离矩阵
Z=linkage(Y,'single')    %最短距离法
T=cluster(Z,9)  %等价于    {T=clusterdata(X,3)}
%find(T==3)  %第3 类集合中的元素
[H,T]=dendrogram(Z) %画聚类图

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -