📄 cluster_test1.m
字号:
%最短距离法系统聚类分析
% ZSCORE Standardized z score.
% Z = ZSCORE(D) returns the deviation of each column of D from its mean
% normalized by its standard deviation. This is known as the Z score of D.
% For a column vector V, z score is Z = (V-mean(V))./std(V)
% 返回每列的标准偏差;
% ZSCORE is commonly used to preprocess data before computing distances for
% cluster analysis.一般用于聚类计算距离前进行数据预处理
%--------------------------------------------------------------------------
% PDIST Pairwise distance between observations.
% Y = PDIST(X) returns a vector Y containing the Euclidean distances
% between each pair of observations in the M-by-N data matrix X.
% 默认返回包含矩阵X中每两对向量的欧式距离;
% Rows of X correspond to observations, columns correspond to variables.
% Y is an (M*(M-1)/2)-by-1 vector, corresponding to the M*(M-1)/2 pairs
% of observations in X.
% 一共产生m*(m-1)/2个距离,注意是以行向量作为一个参数的n次采样向量,
% 即m*n表示m个参数,n次采样
%--------------------------------------------------------------------------
% Y = PDIST(X, DISTANCE) computes Y using DISTANCE. Choices are:
%
% 'euclidean' - Euclidean distance欧式距离
% 'seuclidean' - Standardized Euclidean distance, each coordinate
% in the sum of squares is inverse weighted by the
% sample variance of that coordinate
% 'cityblock' - City Block distance
% 'mahalanobis' - Mahalanobis distance
% 'minkowski' - Minkowski distance with exponent 2马式距离
% 'cosine' - One minus the cosine of the included angle
% between observations (treated as vectors)
% 'correlation' - One minus the sample correlation between
% observatons (treated as sequences of values).
% 'hamming' - Hamming distance, percentage of coordinates
% that differ
% 'jaccard' - One minus the Jaccard coefficient, the
% percentage of nonzero coordinates that differ
% function - A distance function specified using @, for
% example @DISTFUN
%
% A distance function must be of the form
%
% function D = DISTFUN(XI, XJ, P1, P2, ...),
%
% taking as arguments two L-by-N matrices XI and XJ each of which
% contains rows of X, plus zero or more additional problem-dependent
% arguments P1, P2, ..., and returning an L-by-1 vector of distances D,
% whose Kth element is the distance between the observations XI(K,:)
% and XJ(K,:).
%
% Y = PDIST(X, DISTFUN, P1, P2, ...) passes the arguments P1, P2, ...
% directly to the function DISTFUN.
%
% Y = PDIST(X, 'minkowski', P) computes Minkowski distance using the
% positive scalar exponent P.
%
% The output Y is arranged in the order of ((1,2),(1,3),..., (1,M),
% (2,3),...(2,M),.....(M-1,M)), i.e. the upper right triangle of the full
% M-by-M distance matrix. To get the distance between the Ith and Jth
% observations (I < J), either use the formula Y((I-1)*(M-I/2)+J-I), or
% use the helper function Z = SQUAREFORM(Y), which returns an M-by-M
% square symmetric matrix, with the (I,J) entry equal to distance between
% observation I and observation J.
%
% Example:
%
% X = randn(100, 5); % some random points
% Y = pdist(X, 'euclidean'); % unweighted distance
% Wgts = [.1 .3 .3 .2 .1]; % coordinate weights
% Ywgt = pdist(X, @weucldist, Wgts); % weighted distance
%
% function d = weucldist(XI, XJ, W) % weighted euclidean distance
% d = sqrt((XI-XJ).^2 * W');
%--------------------------------------------------------------------------
% SQUAREFORM Square matrix formatted distance.
% Z = squareform(Y) converts the output of PDIST function into a
% square format, so that Z(i,j) denotes the distance between the
% i and j objects in the original data.将pdist的输出转换成一个对称方阵形式,
% 因此Z(i,j)表示原始数据中参数i和j之间的距离
%--------------------------------------------------------------------------
% LINKAGE Create hierarchical cluster tree. 创建分等级的聚类树
% Z = LINKAGE(Y) creates a hierarchical cluster tree, using the single
% linkage algorithm. The input Y is a distance matrix such as is
% generated by PDIST. Y may also be a more general dissimilarity
% matrix conforming to the output format of PDIST.
%
% Z = LINKAGE(Y, method) creates a hierarchical cluster tree using
% the specified algorithm. The available methods are:
%
% 'single' --- nearest distance 最短距离法
% 'complete' --- furthest distance最长距离法
% 'average' --- average distance 平均距离法
% 'centroid' --- center of mass distance (the output Z is meaningful
% only if Y contains Euclidean distances)
% 'ward' --- inner squared distance
%
% Cluster information will be returned in the matrix Z with size m-1
% by 3, where m is the number of observations in the original data.
% Column 1 and 2 of Z contain cluster indices linked in pairs
% to form a binary tree. The leaf nodes are numbered from 1 to
% m. They are the singleton clusters from which all higher clusters
% are built. Each newly-formed cluster, corresponding to Z(i,:), is
% assigned the index m+i, where m is the total number of initial
% leaves. Z(i,1:2) contains the indices of the two component
% clusters which form cluster m+i. There are n-1 higher clusters
% which correspond to the interior nodes of the output clustering
% tree. Z(i,3) contains the corresponding linkage distances between
% the two clusters which are merged in Z(i,:), e.g. if there are
% total of 30 initial nodes, and at step 12, cluster 5 and cluster 7
% are combined and their distance at this time is 1.5, then row 12
% of Z will be (5,7,1.5). The newly formed cluster will have an
% index 12+30=42. If cluster 42 shows up in a latter row, that means
% this newly formed cluster is being combined again into some bigger
% cluster.
%
% The centroid method can produce a cluster tree that is not monotonic.
% This occurs when the distance from the union of two clusters to a third
% cluster is less than the distance from either individual cluster to
% that third cluster. In such a case, sections of the dendrogram change
% direction. This is an indication that another method should be used.
%--------------------------------------------------------------------------
clc;
clear;
X=[7.90 39.77 8.49 12.94 19.27 11.05 2.04 13.29;
7.68 50.37 11.35 13.3 19.25 14.59 2.75 14.87;
9.42 27.93 8.20 8.14 16.17 9.42 1.55 9.76;
9.16 27.98 9.01 9.32 15.99 9.10 1.82 11.35;
10.06 28.64 10.52 10.05 16.18 8.39 1.96 10.81];
%X1=xlsread('d:\test1.xls');X=X';
BX=zscore(X); %标准化数据矩阵
Y=pdist(X) %用欧氏距离计算两两之间的距离
D=squareform(Y) %欧氏距离矩阵
Z=linkage(Y,'single') %最短距离法
T=cluster(Z,9) %等价于 {T=clusterdata(X,3)}
%find(T==3) %第3 类集合中的元素
[H,T]=dendrogram(Z) %画聚类图
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -