📄 kmeans.asv
字号:
function [cluster,mu,kseed] = kmeans(data,k,kseed)% KMEANS: [cluster,mu,kseed] = kmeans(data,k,kseed)%% Cluster the N x d matrix data into k clusters using the% k means algorithm with Euclidean distance. %% INPUTS:% data: N x d matrix, with d-dimensional vectors as rows% k: number of clusters for k-means% kseed (optional): seed for the "rand" function for initialization%% OUTPUTS:% cluster: N x 1 matrix, where element i is the cluster label% assigned to row i in "data", and labels ranges from 1 to k% mu: k x d vector of cluster means, where row j is the % mean for cluster j, j = 1% kseed: seed used by "rand" in initialization%% % Demo code for CS 175, May 2005, Professor Smyth, UC Irvine.% [N d] = size(data);% initialize the vectors containing the cluster assignmentsoldcluster = zeros(N,1);cluster = ones(N,1);if(k==1) % special case where k=1 mu = mean(data);endif k>1 % general case where k > 1% select k initial random seedsif nargin<3% if no seed is supplied kseed = rand('seed'); index = randperm(N);else% use the seed that is supplied rand('seed',kseed); index = randperm(N);endindex = index(1:k);mu = data(index,:); % continue to loop until the the cluster assigments do not change while sum(oldcluster - cluster) ~= 0 dist = [];% calculate the Euclidean distance between each point and each cluster mean for i=1:k x = euclid(data,mu(i,:)); dist = [dist,x]; end% find the closest cluster mean for each point and assign it to that cluster oldcluster = cluster; [mindist cluster] = min(dist'); sum(mindist); % Based on the new assignment, recalculate the cluster means. for i=1:k if sum(cluster==i)>1 data(cluster==i,:); mu(i,:) = mean(data(cluster==i,:));% this "else" is to handle a problem in MATLAB when it takes the% the mean of a single vector. elseif sum(cluster==i)>0 mu(i,:) = data(cluster==i,:); end end% If we have cluster(s) with no points then randomly start% over and regenerate new means. (This can occasionally happen% in the algorithm if k is relatively large, e.g., 8 or 10 or larger). flag=0; for i=1:k if(sum(cluster==i)==0) flag = 1; end end if flag==1 kseed = rand('seed'); index = randperm(N); index = index(1:k); mu = data(index,:); fprintf('Cluster of size 0: k = %d\n',k); cluster = zeros(N,1); else cluster = cluster'; end endendfunction [z] = euclid(A,x)% EUCLID: [z] = euclid(A,x)%% Return a vector containing the Euclidean distance between% the 1 x d vector x, and each column in the N x d matrix A.% % [k n] = size(A);tmp = ones(k,1)*x;size(tmp);if(n>1) y = sum( ((A-tmp).*(A-tmp))' ) ;else y = ((A-tmp).*(A-tmp))';end z = sqrt(y)';
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -