📄 kmeans.asv

📁 kmeans clustering in matlab
💻 ASV
字号:
function [cluster,mu,kseed] = kmeans(data,k,kseed)% KMEANS: [cluster,mu,kseed] = kmeans(data,k,kseed)%% Cluster the N x d matrix data into k clusters using the% k means algorithm with Euclidean distance.  %% INPUTS:%   data: N x d matrix, with d-dimensional vectors as rows%   k: number of clusters for k-means%   kseed (optional): seed for the "rand" function for initialization%% OUTPUTS:%   cluster: N x 1 matrix, where element i is the cluster label%            assigned to row i in "data", and labels ranges from 1 to k%   mu:  k x d vector of cluster means, where row j is the %        mean for cluster j, j = 1%   kseed: seed used by "rand" in initialization%%  %  Demo code for  CS 175, May 2005, Professor Smyth, UC Irvine.%  [N d] = size(data);%  initialize the vectors containing the cluster assignmentsoldcluster = zeros(N,1);cluster = ones(N,1);if(k==1)  % special case where k=1   mu = mean(data);endif k>1   % general case where k > 1%  select k initial random seedsif nargin<3%  if no seed is supplied 	kseed = rand('seed');	index = randperm(N);else%  use the seed that is supplied	rand('seed',kseed);	index = randperm(N);endindex = index(1:k);mu = data(index,:); % continue to loop until the the cluster assigments do not change while sum(oldcluster - cluster) ~= 0	dist = [];% calculate the Euclidean distance between each point and each cluster mean   	for i=1:k		x = euclid(data,mu(i,:));		dist = [dist,x];  	end% find the closest cluster mean for each point and assign it to that cluster	oldcluster = cluster;	[mindist cluster] = min(dist');	sum(mindist);	% Based on the new assignment, recalculate the cluster means.   	for i=1:k		if sum(cluster==i)>1			data(cluster==i,:);			mu(i,:) = mean(data(cluster==i,:));%  this "else" is to handle a problem in MATLAB when it takes the%  the mean of a single vector.		elseif sum(cluster==i)>0			mu(i,:) = data(cluster==i,:);		end  	end%  If we have cluster(s) with no points then randomly start%  over and regenerate new means. (This can occasionally happen%  in the algorithm if k is relatively large, e.g., 8 or 10 or larger).	flag=0;   	for i=1:k		if(sum(cluster==i)==0)			flag = 1;		end	end	if flag==1		kseed = rand('seed');		index = randperm(N);		index = index(1:k);		mu = data(index,:);		fprintf('Cluster of size 0: k = %d\n',k);		cluster = zeros(N,1);	else		cluster = cluster';	end endendfunction [z] = euclid(A,x)% EUCLID: [z] = euclid(A,x)%% Return a vector containing the Euclidean distance between% the 1 x d vector x, and each column in the N x d matrix A.% % [k n] = size(A);tmp = ones(k,1)*x;size(tmp);if(n>1)  y =  sum( ((A-tmp).*(A-tmp))' ) ;else  y = ((A-tmp).*(A-tmp))';end  z = sqrt(y)';
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -