📄 tp5.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><link rel=File-List href="index_files/filelist.xml"><title>Lecture 5 - Graph-Based data Processing</title><link href="styles.css" rel="stylesheet" type="text/css"></head><body bgcolor="#FFFFFF" lang=FR><div > <h1>Lecture 5 - Graph-based <br> data processing</h1></div><blockquote> <blockquote> <blockquote> <blockquote> <blockquote> <p align="justify"><strong>Abstract : </strong>The goal of this lecture is to manipulate data in arbitrary dimensions using graph-based method. The points is the data are linked together in a graph structure. Geodesic computations can be performed to compute distance on the dataset.</p> </blockquote> </blockquote> </blockquote> </blockquote></blockquote><h2> Setting up Matlab. </h2><ul> <li>First download the Matlab toolbox <a href="matlab/toolbox_dimreduc.zip"><font face="Courier New, Courier, mono">toolbox_dimreduc.zip</font></a> and <a href="matlab/toolbox_graph.zip"><font face="Courier New, Courier, mono">toolbox_graph.zip</font></a>. Unzip it into your working directory. You should have directories <font face="Courier New, Courier, mono">toolbox_dimreduc/</font> and <font face="Courier New, Courier, mono">toolbox_graph</font>/ in your path. </li> <li>The first thing to do is to install this toolbox in your path.</li> <blockquote> <p><font face="Courier New, Courier, mono">path(path, 'toolbox_dimreduc');<br> path(path, 'toolbox_dimreduc/toolbox/');<br> path(path, 'toolbox_dimreduc/data/');<br> path(path, 'toolbox_graph');</font></p> </blockquote> <li>Recompile the mex file for your machine (this can produce some warning). If it does not work, either use the already compiled mex file (they should be available for MacOs and/or Unix) or try to set up matlab with a C compiler (e.g. gcc) using '<font face="Courier New, Courier, mono">mex -setup</font>'. </li> <blockquote> <p><font face="Courier New, Courier, mono">cd toolbox_graph<br> compile_mex;<br> cd ..</font></p> </blockquote></ul><h2> Distance Computation on Graphs. </h2><ul> <li>You can load synthetic and real datasets (only frey_rawface is included in the distribution however)</li> <blockquote> <p> <font face="Courier New, Courier, mono">name = 'swissroll'; % you can also try with 'scurve'<br> n = 800; % number of points<br> [X,col] = load_points_set( name, n );<br> clf; plot_scattered(X,col);<br> axis equal; axis off;</font></p> </blockquote> <li>From points in arbitrary dimension, you can create a graph data structure using the nearest-neighbor rule (you can also try the alternative epsilon rule). </li> <blockquote> <p> <font face="Courier New, Courier, mono">options.use_nntools = 0; % set to 1 if you have compile TStool for your machine, this will speed up computations<br> options.nn_nbr = 7; % number of nearest neighbor<br> % compute NN graph and the distance between nodes<br> [D,A] = compute_nn_graph(X,options);<br> % display the graph<br> clf; plot_graph(A,X, col);<br> axis tight; axis off; </font></p> </blockquote> <li>You can now compute the geodesic distance on this graph using the Dijkstra algorithm. </li> <blockquote> <p> <font face="Courier New, Courier, mono">% set up the length of the edges (0 for no edges)<br> D(D==Inf) = 0; % weight on the graph<br> % find some cool location for starting point<br> [tmp,start_point] = min( abs(col(:)-mean(col(:)))); % starting point<br> % compute the geodesic distance<br> [d,S] = perform_dijkstra(D, start_point);<br> % display the graph with distance colors<br> clf; hold on;<br> plot_scattered(X, d);<br> h = plot3(X(1,start_point), X(2,start_point), X(3,start_point), 'k.');<br> set(h, 'MarkerSize', 25);<br> axis tight; axis off; hold off;</font></p> </blockquote> <table width="1000" border="0" cellspacing="0" cellpadding="0" align="center"> <tr align="center"> <td> <img src="images/tp4/scurve-display.jpg" width="300"/> <img src="images/tp4/scurve-graph.jpg" width="300"/> <img src="images/tp4/scurve-dist.jpg" width="300"/> <br/> <img src="images/tp4/swissroll-display.jpg" width="300"/> <img src="images/tp4/swissroll-graph.jpg" width="300"/> <img src="images/tp4/swissroll-dist.jpg" width="300"/> </td> </tr> <tr> <td align="center">Two examples of 3D point clouds (left) ; corresponding NN-graph (center) ;<br> geodesic distance to the point in black (right), blue means close.<br></td> </tr> </table></ul><h2> PCA and Isomap Flattening. </h2><ul> <li>The principal component analysis realize a linear dimensionality reduction by projecting the data set on the axes of largest variance.</li> <blockquote> <p><font face="Courier New, Courier, mono">% dimension reduction using PCA<br> [Y,xy] = pca(X,2);<br> % display<br> clf; plot_scattered(xy,col);<br> axis equal; axis off;</font></p> </blockquote> <li>In order to better capture the manifold geometry of the data, Isomap compute the geodesic distance between pair of points, and find a low-dimensional layout that best respect these geodesic distance.</li> <blockquote> <p><font face="Courier New, Courier, mono">% dimension reduction using Isomap<br> options.nn_nbr = 7; % number of NN for the graph<br> xy = isomap(X,2, options);<br> % display<br> clf; plot_scattered(xy,col);<br> axis equal; axis off;</font></p> </blockquote> <table width="1100" border="0" cellspacing="0" cellpadding="0" align="center"> <tr align="center"> <td> <img src="images/tp4/scurve-display.jpg" width="300"/> <img src="images/tp4/scurve-dred-pca.jpg" width="300"> <img src="images/tp4/scurve-dred-isomap.jpg" width="300"><br/> <img src="images/tp4/swissroll-display.jpg" width="300"/> <img src="images/tp4/swissroll-dred-pca.jpg" width="300"> <img src="images/tp4/swissroll-dred-isomap.jpg" width="300"> </td> </tr> <tr> <td align="center">Original 3D data set (left) ; 2D flattening using PCA (center) and using Isomap (right).<br> Note how the PCA does not recover the simple 2D parameterization of the <br> manifold since it is a linear process. In contrast, Isomap is able <br> to "unfold" the curvy surface.</td> </tr> </table></p> </ul><h2> Dimension Reduction for Image Libraries. </h2><ul> <li>We can use the same process of flattening for data of arbitrary dimension. We first use a simple library of translating disks.</li> <blockquote> <p><font face="Courier New, Courier, mono">name = 'disks';<br> options.nbr = 1000;<br> % Read database<br> M = load_images_dataset(name, options);<br> % turn it into a set of points<br> a = size(M,1);b = size(M,2);n = size(M,3);<br> X = reshape(M, a*b, n);<br> % perform isomap<br> options.nn_nbr = 7;<br> options.use_nntools = 0;<br> xy = isomap(X,2, options);<br> k = 30;<br> clf; plot_flattened_dataset(xy,M,k);<br> % perform pca<br> [tmp,xy] = pca(X,2);<br> clf; plot_flattened_dataset(xy,M,k); </font></blockquote> <li>We can do the same computation on a more complex library of faces.</li> <blockquote> <p><font face="Courier New, Courier, mono">name = 'frey_rawface';<br> options.nbr = 2000;<br> % Read database<br> M = load_images_dataset(name, options);<br> % subsample at random<br> n = 1000;<br> sel = randperm(size(M,3));<br> M = M(:,:,sel(1:n)); <br> M = permute(M,[2 1 3]); % fix x/y orientation of the faces<br> % turn it into a set of points<br> a = size(M,1);b = size(M,2);n = size(M,3);<br> X = reshape(M, a*b, n);<br> % perform isomap<br> options.nn_nbr = 7;<br> options.use_nntools = 0;<br> xy = isomap(X,2, options);<br> k = 30;<br> clf; plot_flattened_dataset(xy,M,k);<br> % perform pca<br> [tmp,xy] = pca(X,2);<br> clf; plot_flattened_dataset(xy,M,k); </font></blockquote> <blockquote> <table width="920" border="0" cellspacing="0" cellpadding="0" align="center"> <tr align="center"> <td> <img src="images/tp4/disks-dataset.jpg" width="400"> <img src="images/tp4/frey_rawface-dataset.jpg" width="400"> <br/> <img src="images/tp4/disks-pca.jpg" width="400"> <img src="images/tp4/frey_rawface-pca.jpg" width="400"> <br/> <img src="images/tp4/disks-isomap.jpg" width="400"> <img src="images/tp4/frey_rawface-isomap.jpg" width="400"> </td> </tr> <tr> <td align="center"><p>Dimension reduction for two different libraries of images.<br> Left: translating disks, right: face images.<br> Although the disks data set depends on 2D translation, this is not<br> a flat (euclidean) manifold (it is a bit curvy due to the disk shape).<br> This is why the PCA mapping does not recover exactly a 2D square.<br> The face database exhibits a more complex embedding.</p></td> </tr> </table> <br/> Copyright © 2006 <a href="http://www.ceremade.dauphine.fr/%7Epeyre/">Gabriel Peyré</a> </blockquote></ul></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -