⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 kmeanscluster.m

📁 这个程序主要讲述了数据挖掘中K均值算法的应用
💻 M
字号:
function y=kMeansCluster(m,k,isRand)%%%%%%%%%%%%%%%%%                                                        % kMeansCluster - Simple k means clustering algorithm                                                              % Author: Kardi Teknomo, Ph.D.                                                                  %                                                                                                                    % Purpose: classify the objects in data matrix based on the attributes    % Criteria: minimize Euclidean distance between centroids and object points                    % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html    
% Output: matrix data plus an additional column represent the group of each object               %                                                                                                                % Example: m = [ 1 1; 2 1; 4 3; 5 4]  or in a nice form                         %          m = [ 1 1;                                                                                     %                2 1;                                                                                         %                4 3;                                                                                         %                5 4]                                                                                         %          k = 2                                                                                             % kMeansCluster(m,k) produces m = [ 1 1 1;                                        %                                   2 1 1;                                                                   %                                   4 3 2;                                                                   %                                   5 4 2]                                                                   % Input:%   m      - required, matrix data: objects in rows and attributes in columns                                                 %   k      - optional, number of groups (default = 1)%   isRand - optional, if using random initialization isRand=1, otherwise input any number (default)%            it will assign the first k data as initial centroids%% Local Variables%   f      - row number of data that belong to group i%   c      - centroid coordinate size (1:k, 1:maxCol)%   g      - current iteration group matrix size (1:maxRow)%   i      - scalar iterator %   maxCol - scalar number of rows in the data matrix m = number of attributes%   maxRow - scalar number of columns in the data matrix m = number of objects%   temp   - previous iteration group matrix size (1:maxRow)%   z      - minimum value (not needed)%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3,        isRand=0;   endif nargin<2,        k=1;        end    [maxRow, maxCol]=size(m)if maxRow<=k,     y=[m, 1:maxRow]else  % initial value of centroid    if isRand,        p = randperm(size(m,1));      % random initialization        for i=1:k            c(i,:)=m(p(i),:)       end    else        for i=1:k           c(i,:)=m(i,:)        % sequential initialization     end    end     temp=zeros(maxRow,1);   % initialize as zero vector     while 1,        d=DistMatrix(m,c);  % calculate objcets-centroid distances        [z,g]=min(d,[],2);  % find group matrix g        if g==temp,            break;          % stop the iteration        else            temp=g;         % copy group matrix to temporary variable        end        for i=1:k            f=find(g==i);            if f            % only compute centroid if f is not empty                c(i,:)=mean(m(find(g==i),:),1);            end        end end     y=[m,g];    end
The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here.

function d=DistMatrix(A,B)
             %%%%%%%%%%%%%%%%%%%%%%%%%
             % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2]
             % Copyright (c) 2005 by Kardi Teknomo,  http://people.revoledu.com/kardi/
             %
             % Numbers of rows (represent points) in A and B are not necessarily the same.
             % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
             %
             % A and B must contain the same number of columns (represent variables of n dimensions),
             % first column is the X coordinates, second column is the Y coordinates, and so on.
             % The distance matrix is distance between points in A as rows
             % and points in B as columns.
             % example: Spacing= dist(A,A)
             % Headway = dist(A,B), with hA ~= hB or hA=hB
             %          A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0]             %          dist(A,B)= [ 4.69   5.83;             %                       5.00   7.00;             %                       5.48   7.48;             %                       4.69   5.83]             %             %          dist(B,A)= [ 4.69   5.00     5.48    4.69;             %                       5.83   7.00     7.48    5.83]             %%%%%%%%%%%%%%%%%%%%%%%%%%%
             [hA,wA]=size(A);             [hB,wB]=size(B);             if wA ~= wB,  error(' second dimension of A and B must be the same'); end             for k=1:wA                  C{k}= repmat(A(:,k),1,hB);                  D{k}= repmat(B(:,k),1,hA);             end             S=zeros(hA,hB);             for k=1:wA                  S=S+(C{k}-D{k}').^2;             end             d=sqrt(S);

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -