📄 mds.m
字号:
%MDS - Multidimensional Scaling - a variant of Sammon mapping %% [W,J,stress] = MDS(D,Y,OPTIONS)% [W,J,stress] = MDS(D,N,OPTIONS)% % INPUT% D Square (M x M) dissimilarity matrix% Y M x N matrix containing starting configuration, or% N Desired output dimensionality% OPTIONS Various parameters of the minimization procedure put into % a structure consisting of the following fields: 'q', 'optim',% 'init','etol','maxiter', 'isratio', 'itmap', 'st' and 'inspect'% (default:% OPTIONS.q = 0 % OPTIONS.optim = 'pn' % OPTIONS.init = 'cs'% OPTIONS.etol = 1e-6 (the precise value depends on q)% OPTIONS.maxiter = inf % OPTIONS.isratio = 0 % OPTIONS.itmap = 'yes' % OPTIONS.st = 1 % OPTIONS.inspect = 2).% % OUTPUT% W Multidimensional scaling mapping% J Index of points removed before optimization% stress Vector of stress values%% DESCRIPTION % Finds a nonlinear MDS map (a variant of the Sammon map) of objects% represented by a symmetric distance matrix D with zero diagonal, given% either the dimensionality N or the initial configuration Y. This is done% in an iterative manner by minimizing the Sammon stress:%% e = 1/(sum_{i<j} D_{ij}^(Q+2)) sum_{i<j} (D_{ij} - DY_{ij})^2 * D_{ij}^Q%% where DY is the distance matrix of Y, which should approximate D. If D(i,j) % = 0 for any different points i and j, then one of them is superfluous. The % indices of these points are returned in J.%% OPTIONS is an optional variable, using which the parameters for the mapping% can be controlled. It may contain the following fields: % % Q Stress measure to use (see above): -2,-1,0,1 or 2. % INIT Initialisation method for Y: 'randp', 'randnp', 'maxv', 'cs' % or 'kl'. See MDS_INIT for an explanation.% OPTIM Minimization procedure to use: 'pn' for Pseudo-Newton or% 'scg' for Scaled Conjugate Gradients.% ETOL Tolerance of the minimization procedure. Usually, it should be % MAXITER in the order of 1e-6. If MAXITER is given (see below), then the % optimization is stopped either when the error drops below ETOL or % MAXITER iterations are reached.% ISRATIO Indicates whether a ratio MDS should be performed (1) or not (0).% If ISRATIO is 1, then instead of fitting the dissimilarities % D_{ij}, A*D_{ij} is fitted in the stress function. The value A % is estimated analytically in each iteration.% ITMAP Determines the way new points are mapped, either in an iterative % manner ('yes') by minimizing the stress; or by a linear projection % ('no').% ST Status, determines whether after each iteration the stress should % INSPECT be printed on the screen (1) or not (0). When INSPECT > 0, % ST = 1 and the mapping is onto 2D or larger, then the progress % is plotted during the minimization every INSPECT iterations.% % Important:% 1. It is assumed that D either is or approximates a Euclidean distance % matrix, i.e. D_{ij} = sqrt (sum_k(x_i - x_j)^2). % 2. Missing values can be handled; they should be marked by 'NaN' in D. %% EXAMPLES:% opt.optim = 'scg';% opt.init = 'cs'; % D = sqrt(distm(a)); % Compute the Euclidean distance dataset of A% w1 = mds(D,2,opt); % An MDS map onto 2D initialized by Classical Scaling,% % optimized by a Scaled Conjugate Gradients algorithm% n = size(D,1);% y = rand(n,2);% w2 = mds(D,y,opt); % An MDS map onto 2D initialized by random vectors%% z = rand(n,n); % Set around 40% of the random distances to NaN, i.e. % z = (z+z')/2; % not used in the MDS mapping% z = find(z <= 0.6);% D(z) = NaN;% D(1:n+1:n^2) = 0; % Set the diagonal to zero% opt.optim = 'pn';% opt.init = 'randnp'; % opt.etol = 1e-8; % Should be high, as only some distances are used% w3 = mds(D,2,opt); % An MDS map onto 2D initialized by a random projection%% REFERENCES% 1. M.F. Moler, A Scaled Conjugate Gradient Algorithm for Fast Supervised% Learning', Neural Networks, vol. 6, 525-533, 1993.% 2. W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery,% Numerical Recipes in C, Cambridge University Press, Cambridge, 1992. % 3. I. Borg and P. Groenen, Modern Multidimensional Scaling, Springer% Verlag, Berlin, 1997. % 4. T.F. Cox and M.A.A. Cox, Multidimensional Scaling, Chapman and Hall, % London, 1994.%% SEE ALSO% MAPPINGS, MDS_STRESS, MDS_INIT, MDS_CS%% Copyright: Elzbieta Pekalska, Robert P.W. Duin, ela@ph.tn.tudelft.nl, 2000-2003% Faculty of Applied Sciences, Delft University of Technology%function [w,J,err,opt,y] = mds(D,y,options) if (nargin < 3) options = []; % Will be filled by MDS_SETOPT, below. end opt = mds_setopt(options); if (nargin < 2) | (isempty(y)) prwarning(2,'no output dimensionality given, assuming N = 2.'); y = 2; end % No arguments given: return an untrained mapping. if (nargin < 1) | (isempty(D)) w = mapping(mfilename,{y,opt,[],[],[],[]}); w = setname(w,'MDS'); return end % YREP contains representative objects in the projected MDS space, i.e. % for which the mapping exists. YREP is empty for the original MDS, since % no projection is available yet. yrep = []; if (isdataset(D) | isa(D,'double')) [m,mm] = size(D); % Convert D to double, but retain labels in LAB. if (isdataset(D)), lab = getlab(D); D = +D; else, lab = ones(m,1); end; if (ismapping(y)) % The MDS mapping exists; this means that YREP has already been stored. pars = getdata(y); [k,c] = size(y); y = []; % Empty, should now be found. yrep = pars{1}; % There exists an MDS map, hence YREP is stored. opt = pars{2}; % Options used for the mapping. II = pars{3}; % Index of non-repeated points in YREP. winit = pars{4}; % The Classical Scaling map, if INIT = 'cs'. v = pars{5}; % Weights used for adding new points if ITMAP = 'no'. n = c; % Number of dimensions of the projected space. % Initialization by 'cs' is not possible when there is no winit % (i.e. the CS map) and new points should be added. if (strcmp(opt.init,'cs')) & (isempty(winit)) prwarning(2,'OPTIONS.init = cs is not possible when adding points; using kl.'); opt.init = 'kl'; end % If YREP is a scalar, we have an empty mapping. if (max(size(yrep)) == 1) y = yrep; yrep = []; end % Check whether D is a matrix with the zero diagonal for the existing map. if (m == mm) & (length(intersect(find(D(:)<eps),1:m+1:(m*mm))) >= m) w = yrep; % D is the same matrix as in the projection process; return % YREP is then the solution end if (length(pars) < 6) | (isempty(pars{6})) yinit = []; else yinit = pars{6}; % Possible initial configuration for points to % be added to an existing map if (size(yinit,1) ~= size(D,1)) prwarning(2,'the size of the initial configuration does not match that of the dissimilarity matrix, using random initialization.') yinit =[]; end end else % No MDS mapping available yet; perform the checks. if (~issym(D,1e-12)) prwarning(2,'D is not a symmetric matrix; will average.'); D = (D+D')/2; end % Check the number of zeros on the diagonal if (any(abs(diag(D)) > 0)) error('D should have a zero diagonal'); end end else % D is neither a dataset nor a matrix of doubles error('D should be a dataset or a matrix of doubles.'); end if (~isempty(y)) % Y is the initial configuration or N, no MDS map exists yet; % D is a square matrix. % Remove identical points, i.e. points for which D(i,j) = 0 for i ~= j. % I contains the indices of points left for the MDS mapping, J those % of the removed points and P those of the points left in I which were % identical to those removed. [I,J,P] = mds_reppoints(D); D = D(I,I); [ni,nc] = size(D); % NANID is an extra field in the OPTIONS structure, containing the indices % of NaN values (= missing values) in distance matrix D. opt.nanid = find(isnan(D(:)) == 1); % Initialise Y. [m2,n] = size(y); if (max(m2,n) == 1) % Y is a scalar, hence really is a dimensionality N. n = y; [y,winit] = mds_init(D,n,opt.init); else if (mm ~= m2) error('The matrix D and the starting configuration Y should have the same number of columns.'); end winit = []; y = +y(I,:); end % The final number of true distances is: no_dist = (ni*(ni-1) - length(opt.nanid))/2; else % This happens only if we add extra points to an existing MDS map. % Remove identical points, i.e. points for which D(i,j) = 0 for i ~= j. % I contains the indices of points left for the MDS mapping, J those % of the removed points and P those of the points left in I which were % identical to those removed. [I,J,P] = mds_reppoints(D(:,II)); D = D(I,II); [ni,nc] = size(D); yrep = yrep(II,:); n = size(yrep,2); % NANID is an extra field in the OPTIONS structure, containing the indices % of NaN values (= missing values) in distance matrix D. opt.nanid = find(isnan(D(:))); % Initialise Y. if the new points should be added in an iterative manner. [m2,n] = size(yrep); if (~isempty(yinit)) % An initial configuration exists. y = yinit; elseif (strcmp(opt.init, 'cs')) & (~isempty(winit)) y = D*winit; else y = mds_init(D,n,opt.init); end if (~isempty(opt.nanid)) % Rescale. scale = (max(yrep)-min(yrep))./(max(y)-min(y)); y = y .* repmat(scale,ni,1); end % The final number of true distances is: no_dist = (ni*nc - length(opt.nanid)); end % Check whether there is enough data left. if (~isempty(opt.nanid)) if (n*ni+2 > no_dist), error('There are too many missing distances: it is not possible to determine the MDS map.'); end if (strcmp (opt.itmap,'no')) opt.itmap = 'yes'; prwarning(1,'Due to the missing values, the projection can only be iterative. OPTIONS are changed appropriately.') end end if (opt.inspect > 0) opt.plotlab = lab(I,:); % Labels to be used for plotting in MDS_SAMMAP. end if (isempty(yrep)) | (~isempty(yrep) & strcmp(opt.itmap, 'yes')) % Either no MDS exists yet OR new points should be mapped in an iterative manner. printinfo(opt); [yy,err] = mds_sammap(D,y,yrep,opt); % Define the linear projection of distances. v = []; if (isempty(yrep)) & (isempty(opt.nanid)) if (rank(D) < m) v = pinv(D)*yy; else v = D \ yy; end end else % New points should be added by a linear projection of dissimilarity data. yy = D*v; end % Establish the projected configuration including the removed points. y = zeros(m,n); y(I,:) = +yy; if (~isempty(J)) if (~isempty(yrep)) y(J,:) = +yrep(II(P),:); else for k=length(J):-1:1 % J: indices of removed points. y(J(k),:) = y(P(k),:); % P: indices of points equal to points in J. end end end % In the definition step: shift the obtained configuration such that the % mean lies at the origin. if (isempty(yrep)) y = y - ones(length(y),1)*mean(y); y = dataset(y,lab); else w = dataset(y,lab); return; end % In the definition step: the mapping should be stored. opt = rmfield(opt,'nanid'); % These fields are to be used internally only; opt = rmfield(opt,'plotlab'); % not to be set up from outside w = mapping(mfilename,'trained',{y,opt,I,winit,v,[]},[],m,n); w = setname(w,'MDS');return% **********************************************************************************% Extra functions% **********************************************************************************% PRINTINFO(OPT)%% Prints progress information to the file handle given by OPT.ST.function printinfo(opt)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -