⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme

📁 基于测地距离不变性的非线性降维算法源码
💻
字号:
Instructions for Isomap code, Release 1---------------------------------------Author: Josh Tenenbaum (jbt@psych.stanford.edu)        [Dijkstra code by Mark Steyvers (msteyver@psych.stanford.edu)]Date: December 22, 2000Website for updates: http://isomap.stanford.edu0. Copyright notice    Isomap code -- (c) 1998-2000 Josh Tenenbaum    This code is provided as is, with no guarantees except that     bugs are almost surely present.  Published reports of research     using this code (or a modified version) should cite the     article that describes the algorithm:       J. B. Tenenbaum, V. de Silva, J. C. Langford (2000).  A global      geometric framework for nonlinear dimensionality reduction.        Science 290 (5500): 2319-2323, 22 December 2000.      Comments and bug reports are welcome.  Email to jbt@psych.stanford.edu.     I would also appreciate hearing about how you used this code,     improvements that you have made to it, or translations into other    languages.        You are free to modify, extend or distribute this code, as long     as this copyright notice is included whole and unchanged.  1. Contents This package of Matlab (version 5.3) code implements the Isomapalgorithm of Tenenbaum, de Silva, and Langford (2000) [TdSL].  Thecontents of the file "IsomapR1.tar" are:Isomap.mIsomapII.mL2_distance.m Readmedfun.mdijk.m dijkstra.cpp dijkstra.dll (Windows binary produced by "mex -O dijkstra")dijkstra.m swiss_roll_data.matThe data file "swiss_roll_data" contains the input-space coordinates("X_data") and the known low-dimensional manifold coordinates("Y_data") for 20,000 data points on the Swiss roll.Separately available at isomap.stanford.edu is a large data file of synthetic face images, "face_data.mat.Z".  The face data file containsimages of 698 faces ("images"), the projections of those facesonto the first 240 (scaled) principal components ("image_pcs"), and the known pose and lighting parameters for each face ("poses", "lights").  2. Getting started  (Isomap.m)Isomap.m implements the basic version of the algorithm described in[TdSL].  The algorithm takes as input the distances between pointsin a high-dimensional observation space, and returns as output theircoordinates in a low-dimensional embedding that best preserves theirintrinsic geodesic distances. Try it out on N=1000 points from the "Swiss roll" data set:>> load swiss_roll_data>> D = L2_distance(X_data(:,1:1000), X_data(:,1:1000), 1); To run Isomap with K = 7 neighbors/point, and produce embeddingsin dimensions 1, 2, ..., 10, type these commands: >> options.dims = 1:10;>> [Y, R, E] = Isomap(D, 'k', 7, options); The other options for the code are explained in the header of Isomap.m.  The only subtle option is "option.comp", which specifies which connectedcomponent to embed in the final step of the algorithm when more thanone component has been detected in the neighborhood graph.  The defaultis to embed the largest component. This code should work reasonably well for data sets with 1000 or fewerpoints.  For larger data sets, consider using the advanced code describedbelow. 3. Advanced code  (IsomapII.m)IsomapII.m implements a more advanced version of the algorithm thatexploits the sparsity of the neighborhood graph in computing shortest-pathdistances, and can also exploit the redundancy of the distances in constructing a low-dimensional embedding.  IsomapII uses Dijkstra's algorithm to compute graph distances.IsomapII works optimally when the file "dijkstra.cpp" (which usesFibonacci heap data structures) has been compiled (with the command"mex -O dijkstra.cpp") to produce "dijkstra.dll".  If IsomapII can'tfind a file called dijkstra.dll, it will default to a much slowerMatlab implementation of Dijkstra's algorithm in dijk.m.Alternatively, setting "option.dijkstra = 0" tells IsomapII to use aMatlab implementation of Floyd's algorithm (also used in Isomap.m),which is generally faster than dijk.m for small-to-medium-size datasets but does not exploit sparsity to achieve better time and spaceefficiency for large data sets.  Both dijk.m and the Floyd algorithmare much much slower than dijkstra.dll, so the latter should be usedif at all possible.  This code package includes a version ofdijkstra.dll suitable for running on Windows platforms.For very large data sets, it is impractical to store in memory a fullN x N distance matrix, as Isomap produces after step 2 (computingshortest-path distances), or to calculate its eigenvectors, as Isomapdoes in step 3 (constructing a low-dimensional embedding).  However,in many cases where the data lie on a low-dimensional manifold, thedistances computed in step 2 are heavily redundant and most of themcan be ignored with little effect on the final embedding.  IsomapIIconstructs embeddings that, rather than trying to preserve distancesbetween all pairs of points, preserve only the distances between allpoints and a subset of "landmark" points.The user specifies which points to use as landmarks (in theoptions.landmarks field).  Setting "options.landmarks = 1:N" (i.e.the entire data set) makes step 3 of IsomapII equivalent to classicalmultidimensional scaling (MDS); this is just as in Isomap.m, and it isthe default mode for IsomapII.  Choosing a much smaller set oflandmarks (e.g. by sampling randomly, or by using some subset of thedata that is known a priori to be representative) can often be quite agood approximation.  Try this version of the Swiss roll example withN=1000 data points but only 50 (random) landmark points:>> load swiss_roll_data>> D = L2_distance(X_data(:,1:1000), X_data(:,1:1000), 1); >> options.dims = 1:10;>> options.landmarks = 1:50; >> [Y, R, E] = IsomapII(D, 'k', 7, options); We do not know of any prior studies of the use of landmark points withclassical MDS, and we have only just begun to explore this approach asan extension to Isomap.  A more detailed discussion of the use oflandmark points in Isomap and classical MDS will be presented inSteyvers, de Silva, and Tenenbaum (in preparation, athttp://isomap.stanford.edu).  The use of landmark points in nonmetricMDS for data visualization is discussed briefly in a paper by Buja,Swayne, Littman, and Dean ("XGvis: Interactive Data Visualization withMultidimensional scaling", J. Comp. & Graph. Statistics, http://www.research.att.com/areas/stat/xgobi/#xgvis-paper).To facilitate working with large data sets, IsomapII can take as inputdistances in one of three formats:Mode 1: A full N x N matrixMode 2: A sparse N x N matrixMode 3: A distance function (see sample: "dfun.m")Mode 1 is equivalent to Isomap.m (except for the use of dijkstra.dll, which will usually be much faster than Isomap.m).  Mode 2 is designed for cases where the distances between nearby pointsare known, but the differences between faraway points are not known.The sparse input matrix is assumed to contain distances between eachpoint and a set of neighboring points, from which the graph neighborswill be chosen using the standard K or epsilon methods.  Any missingentries in the sparse input matrix are effectively assumed to beinfinite (NOT zero) -- these represents non-neighboring pairs of points.  Except for the sparsity of the input matrix, Mode 2 does not differin any noticeable way from Mode 1. Mode 3 is designed for cases where the input-space distances have notalready been computed explicitly.  The user provides a distancefunction, which takes as input one argument, i, and returns as outputa row vector containing the input-space distances from all N points topoint i.  A sample distance function, "dfun.m", is provided.  Notethat the distance function may have to use a global variable in orderto encode the information necessary to compute the appropriatedistances.  For example, dfun.m assumes that the coordinates of pointsin the high-dimensional input space are encoded in the global variableX.  It then uses these coordinates to compute Euclidean distances ininput space.  Try this example: >> load swiss_roll_data>> global X>> X = X_data(:,1:1000); >> options.dims = 1:10;>> options.landmarks = 1:size(X,2); >> [Y, R, E] = IsomapII('dfun', 'k', 7, options); This should give exactly the same results (subject to numerical errorand sign changes) as the first example in Section 2, but much morequickly!  Using a distance function rather than a distance matrixallows IsomapII to handle much larger data sets (we have tested it upto N=20,000) than the simple Isomap code can handle.  Also, note thatthe distance function need not be Euclidean.  Depending on theapplication, domain-specific knowledge may be useful for designing amore sophisticated distance function (as in Fig. 1B of [TdSL]).

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -