📄 som_linkage.m

📁 很全的som工具箱四个demo可为初学者提够帮助
💻 M
字号:
function [Z,order,Md] = som_linkage(sM,varargin)%SOM_LINKAGE Make a hierarchical linkage of the SOM map units.%% [Z,order,Dist] = som_linkage(sM, [[argID,] value, ...])%  %  Z = som_linkage(sM);%  Z = som_linkage(D,'complete');%  Z = som_linkage(sM,'single','ignore',find(~som_hits(sM,D)));%  Z = som_linkage(sM,pdist(sM.codebook,'mahal'));%  som_dendrogram(Z); %%  Input and output arguments ([]'s are optional):%   sM       (struct) map or data struct to be clustered%            (matrix) size dlen x dim, a data set: the matrix must not%                     contain any NaN's!%   [argID,  (string) See below. The values which are unambiguous can %    value]  (varies) be given without the preceeding argID.%%   Z        (matrix) size dlen-1 x 3, the linkage info%                     Z(i,1) and Z(i,2) hold the indeces of clusters %                     combined on level i (starting from bottom). The new%                     cluster has index dlen+i. The initial cluster %                     index of each unit is its linear index in the %                     original data matrix. Z(i,3) is the distance%                     between the combined clusters. See LINKAGE%                     function in the Statistics Toolbox.%                     The ignored samples are listed at the %                     end of the Z-matrix and have Z(*,3) == Inf%   Dist     (matrix) size dlen x dlen, pairwise distance matrix%% Here are the valid argument IDs and corresponding values. The values % which are unambiguous (marked with '*') can be given without the% preceeding argID.%   'linkage' *(string) the linkage criteria to use: 'single' (the%                       default), 'average' or 'complete' %   'topol'   *(struct) topology struct%   'connect' *(string) 'neighbors' or 'any' (default), whether the%                       connections should be allowed only between %                       neighbors or between any vectors%              (matrix) size dlen x dlen indicating the connections%                       between vectors%              (scalar) the N-neighborhood upto which the connections%                       should be formed (implies 'neighbors')%   'ignore'   (vector) the units/vectors which should be ignored %   'dist'     (matrix) size dlen x dlen, pairwise distance matrix to %                       be used instead of euclidian distances%              (vector) as the output of PDIST function%              (scalar) distance norm to use (euclidian = 2)%   'mask'     (vector) size dim x 1, the search mask used to %                       weight distance calculation. By default %                       sM.mask or a vector of ones is used.%% Note that together 'connect'='neighbors' and 'ignore' may form% areas on the map which will never be connected: connections% across the ignored map units simply do not exist.%% See also KMEANS_CLUSTERS, LINKAGE, PDIST, DENDROGRAM. % Copyright (c) 2000 by Juha Vesanto% Contributed to SOM Toolbox on June 16th, 2000 by Juha Vesanto% http://www.cis.hut.fi/projects/somtoolbox/ % Version 2.0beta juuso 160600%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% input arguments% the dataif isstruct(sM),   if isfield(sM,'data'), D = sM.data; sT = []; mask = [];   else D = sM.codebook; sT = sM.topol; mask = sM.mask;  endelse  D = sM; sT = []; mask = []; end[dlen dim] = size(D);if isempty(mask), mask = ones(dim,1); endif any(isnan(D(:))), error('Data matrix must not have any NaNs.'); end% vararginMd = 2; linkage = 'single';ignore_units = []; constrained = 0;i=1; while i<=length(varargin),   argok = 1;   if ischar(varargin{i}),     switch varargin{i},       % argument IDs     case {'topol','som_topol','sTopol'}, i=i+1; sT = varargin{i};     case 'connect', i=i+1;        if ischar(varargin{i}), constrained = ~strcmp(varargin{i},'any');       else constrained = varargin{i}; end     case 'ignore',  i=i+1; ignore_units = varargin{i};      case 'dist',    i=i+1; Md = varargin{i};     case 'linkage', i=i+1; linkage = varargin{i};     case 'mask',    i=i+1; mask = varargin{i};     case 'tracking',i=i+1; tracking = varargin{i};       % unambiguous values     case 'neighbors', constrained = 1;      case 'any',       constrained = 0;      case {'single','average','complete'}, linkage = varargin{i};     otherwise argok=0;     end  elseif isstruct(varargin{i}) & isfield(varargin{i},'type'),     switch varargin{i}(1).type,      case 'som_topol', sTopol = varargin{i};      otherwise argok=0;     end  else    argok = 0;   end  if ~argok,     disp(['(som_linkage) Ignoring invalid argument #' num2str(i+1)]);   end  i = i+1; end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% distance matrix% given distance matrix % jh corrected this place totally 27.3. 03if (prod(size(Md))==1), % no explicit distance matrix, set flag  q=2; % 17.2.03 kr added some bracketselse  if (prod(size(Md))<dlen^2), % check pdist form    Md = squareform(Md);   % transform to ordinary square diastance matrix  end  % jh: 27.3. 03 "calculate pairwise dist. matrix" see approx. 20 lines below  % sets self-distance to Inf! This must be set here also,  % otherwise clustering fails for user defined dist. matrix!   Md(eye(dlen)==1)=Inf; end% neighborhood constraintif length(constrained)==1 & constrained>0,   Ne1 = som_unit_neighs(sT);   Conn = som_neighborhood(Ne1,constrained);   Conn(~isfinite(Conn(:))) = 0; else Conn = constrained; endif ~isempty(Conn), for i=1:dlen, C(i,i) = 1; end, end% pairwise distance matrix across connected unitsn = size(D,1);if prod(size(Md))>1,     % remove distances between non-neighbors  if constrained, for i = 1:n, Md(i,find(Conn(i,:)==0)) = Inf; end, endelse      % calculate pairwise distance matrix  q = Md;   Md = zeros(n,n)+Inf;  if ~constrained & q==2, % fast for the default case     for i = 1:n-1,      x = D(i,:);      inds = [(i+1):n];       Diff = D(inds,:) - x(ones(n-i,1),:);      Md(inds,i) = sqrt((Diff.^2)*mask);      Md(i,inds) = Md(inds,i)';    end    else    for i = 1:n-1,       inds = find(Conn(i,:)==1);       inds = inds(find(inds>i));       Diff = abs(D(inds,:) - D(i*ones(length(inds),1),:));      switch q,        case 1,    dist = Diff*mask;       case 2,    dist = sqrt((Diff.^2)*mask);       case Inf,  dist = max(Diff,[],2);       otherwise, dist = ((Diff.^q)*mask).^(1/q);      end      Md(inds,i) = dist;       Md(i,inds) = dist';     end  endend% set distances to ignored units to Infif ~isempty(ignore_units),   Md(ignore_units,:) = Inf;  Md(:,ignore_units) = Inf;end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% construct dendrogramZ = zeros(n-1,3)+NaN;  % merged clusters and distance for each stepclusters = 1:dlen;     % each vector is at first in its own clusterCd = Md;               % distances between clustersh = waitbar(0,'Constructing hierarchical clustering');   for i=1:n-1,     % tracking  waitbar(i/(n-1),h);   %% combine two closest clusters    % find the clusters which are closest to each other (c1 and c2)  [d,ind] = min(min(Cd));    if ~isfinite(d), break; end   % no more connected clusters  [d,c1] = min(Cd(:,ind));      % cluster1  c2 = clusters(ind);           % cluster2      % combine the two clusters  c1_inds = find(clusters==c1); % vectors belonging to c1  c2_inds = find(clusters==c2); % vectors belonging to c2  c_inds = [c1_inds, c2_inds];  % members of the new cluster    % new cluster index = bigger cluster  if length(c2_inds)>length(c1_inds), c=c2; k=c1; else c=c1; k=c2; end  clusters(c_inds) = c;         % update cluster info  Z(i,:) = [c, k, d];           % save info into Z    %% update cluster distances    % remove the subclusters from the Cd table    Cd(c_inds,c_inds) = Inf;      % distance of the cluster to its members = Inf  k_inds = c_inds(c_inds ~= c); % vectors of the smaller cluster  Cd(k_inds,:) = Inf;           % distance of the subclusters to   Cd(:,k_inds) = Inf;           % other clusters = Inf  % update the distance of this cluster to the other clusters  cl = unique(clusters(clusters ~= c)); % indeces of all other clusters  if ~isempty(cl), % added since v6.5 works differently than 6.1    for l=cl,         o_inds = find(clusters==l); % indeces belonging to cluster k      vd = Md(c_inds,o_inds);     % distances between vectors in c and k      vd = vd(isfinite(vd(:)));   % remove infinite distances (no connection)      len = length(vd);      if ~len, % if the two clusters are not connected, their distance in Inf	sd = Inf;      else   % otherwise, calculate the distance between them	switch linkage,	 case 'single',   sd = min(vd);	 case 'average',  sd = sum(vd)/len; 	 case 'complete', sd = max(vd);	 otherwise, error(['Unknown set distance: ' linkage]);	end      end      Cd(c,l) = sd; Cd(l,c) = sd;    end    endendclose(h); last = Z(i,1); if isnan(last),   last = Z(i-1,1);   rest = setdiff(unique(clusters),last);   Z(i:n-1,1) = rest';   Z(i:n-1,2) = last;   Z(i:n-1,3) = Inf;   i = i - 1; else   rest = []; end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% return values% calculate the order of samplesorder = last; % go through the combination matrix from top to downfor k=i:-1:1,   c = Z(k,1); k = Z(k,2);                     % what (k) change to what (c)  j = find(order==c);                         % the occurance of c in order      if j == length(order), order = [order k];   % put k behind c  else order = [order(1:j) k order(j+1:end)];   endend  order = [rest, order]; % to maintain compatibility with Statistics Toolbox, the values in % Z must be yet transformed so that they are similar to the output% of LINKAGE functionZs = Z;current_cluster = 1:dlen;for i=1:size(Z,1),  Zs(i,1) = current_cluster(Z(i,1));  Zs(i,2) = current_cluster(Z(i,2));  current_cluster(Z(i,[1 2])) = dlen+i;  endZ = Zs;return;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -