http:^^www.cs.cornell.edu^info^people^jshapiro^cs664^664paper.html

来自「This data set contains WWW-pages collect」· HTML 代码 · 共 502 行 · 第 1/2 页

HTML
502
字号
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 01-Dec-96 19:52:08 GMT
Content-Type: text/html
Content-Length: 23087
Last-Modified: Friday, 17-May-96 02:19:39 GMT

<HTML> <HEAD><TITLE>Parallel Object Recognition and Applications to Facial Recognition</TITLE><BODY bgcolor="#ffffff" text="#000000"></HEAD><BODY><H1><CENTER><FONT SIZE=10>Parallel Object Recognition and Applications to Facial Recognition</FONT><BR>Matt Androski, Chris Paradis, &amp; Jody Shapiro</CENTER></H1><HR><H1>Introduction </H1><P>This project was an attempt to parallelize a computer vision objectrecognition algorithm. Given a database of edge-detected objectmodels and a previously unseen edge-detected image, this algorithmattempts to find any and all models which appear in the new image.Matching is performed using an approximation of the HausdorffFraction;  see Figure 1 for a simple example. The image searchis performed in a hierarchical scheme which attempts to minimizecomparisons by quickly eliminating regions of the image from consideration.<!WA0><!WA0><!WA0><!WA0><A HREF="#references">[1]</A><P><CENTER><!WA1><!WA1><!WA1><!WA1><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/scene.gif"><BR>Figure 1 - A previously unseen image containing a model</CENTER><P>In this paper we will discuss the pros and cons of different methodsof parallelization and describe the factors that motivated ourdesign decisions. We will then describe our parallel implementationusing <!WA2><!WA2><!WA2><!WA2><A HREF="http://http.cs.berkeley.edu/projects/parallel/castle/split-c/">Split-C</A>in detail;  this will be followed by a performance analysis.  We alsodiscuss the suitability of this algorithm for use in facial recognition andpresent our results.  We conclude with a discussion of some possibleextensions to both the parallel implementation and the facial recognitionaspects.<BR><H1>Approximation to the Hausdorff Fraction </H1><P>This method uses an approximation to the Hausdorff Fraction tomeasure the similarity between two edge images of the same size.This approximation is computed using a subspace representationof the two edge images. The distance for which the fraction isbeing computed is factored in by dilating one of the images bythe distance. For this application, one of the images in the matchwill be a model from a stored database while the other will beregion of a new image. To create the subspace, we first converteach of the model images into a column vector by reading the pixelsin column major order. We then construct a matrix from these vectors.We compute the eigenvectors and eigenvalues of this matrix andselect the <I>k</I> largest eigenvalues and create a matrix fromtheir corresponding eigenvectors. This matrix is used to projecteach of the model images and any region of the new image intothe subspace. The subspace representation of the two images willeach be a <I>k</I>-length vector. The primary step in computingthe approximation to the Hausdorff Fraction is a dot product ofthese two <I>k</I>-length vectors. This method is described indetail in <!WA3><!WA3><!WA3><!WA3><A HREF="#references">[1]</A>.<BR><P>The subspace representations of the database models are precomputedand stored along with all of the matrices and data needed to performthe projection operation. Regions of the image we are searchingneed to be projected into the subspace during run time. This projectionoperation is computationally very expensive. Therefore, both thesearch algorithm and the parallelization method attempt to minimizethe number of projections performed. Note that the approximationis only accurate to within 0.05 of the true Hausdorff Fraction,so final decisions about the relative quality of a match are madeusing the true bi-directional Hausdorff Fraction.<BR><H1>Search Algorithm </H1><P>The search algorithm presented in <!WA4><!WA4><!WA4><!WA4><A HREF="#references">[1]</A> attempts to minimize thenumber of projections and comparisons by performing a coarse-to-fineexamination of the new image. The new image is subdivided intocells where each cell represents a set of translations of themodel with respect to the image. Each model is translated to thecenter of the cell and compared with that region of the imagedilated by the radius of the cell (see Figure 2).  This amountsto a rough comparison of this model with every possible translationin the cell. If the resulting fraction is less than some threshold,then the set of translations represented by that cell can be ruledout as a match with that model.  If the resulting fraction isnot below the threshold, then the algorithm subdivides the cellinto four quadrants and considers each of those. This is repeateduntil the model is ruled out or the algorithm has recursed downto the point where each cell is one translation (see Figure 3).If a model cannot be ruled out by the approximation at the lowestlevel, then the true bi-directional Hausdorff Fraction is calculated.If the model still cannot be ruled out, it is marked as a possiblematch and the fraction and position are stored for later consideration.<P><CENTER><!WA5><!WA5><!WA5><!WA5><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/cells.gif"><BR>Figure 2 - Image divided up into 24x24 cells with a model translation shown</CENTER><P><P><CENTER><!WA6><!WA6><!WA6><!WA6><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/levels.gif"><BR>Figure 3 - 24x24 cell is searched depth-first with a coarse-to-fine strategy</CENTER>Thus the search is depth first and each cell of the image is anindependent entity. Projections of regions of the new image arecomputed for each cell, at each level. For efficiency, all modelswhich passed the test at a higher level are compared to the projection,and a list of the survivors is passed to the next lower level.The search algorithm is described in detail in <!WA7><!WA7><!WA7><!WA7><A HREF="#references">[1]</A>. <BR><H1>Parallelization Issues </H1><P>The first major task in parallelizing this object recognitionalgorithm was deciding how to distribute the processing amongstthe processors. One option was to distribute the search cellsacross the processors. Thus each processor would be assigned agroup of cells and would search each of these cells for all modelsin the database. This method had some attractive features. Cellsare independent entities as are all image projections associatedwith them. The projections could be computed and used locallyfor each of the models and no communication between processorswould be necessary. However, this method requires each processorto have access to the entire database of models. Our test databasecontained 30 view each of 20 models for a total of 600 models.For a database of that size having a complete set of models ateach processor is not unreasonable. However, any real application wouldrequire a much larger database which would preclude the use ofthis method.<P>Our second option was to distribute the models across the processors.Thus each processor would have a subset of the models in the databaseand would search every cell in the image for its set of models.This method is much more tolerant of a large database than thefirst, but it introduces some significant problems. Each processoris searching through all of the cells in the image. As such, everyprocessor will need most of the image projections calculated.Image projections are very expensive, so having each processorcompute all of the projections locally is unacceptable. Insteadit is necessary to have processors store projections in a globaldata structure as they are calculated so other processors canaccess them. This introduces significant network traffic intothe parallel version. However, since this method can handle alarge database much more effectively, it became the only choicefor implementation.<P>Once this implementation was chosen, we were faced with two primaryissues in parallelization. One issue was how to efficiently computeand share the image projections. To prevent processors from needingto access the same projections simultaneously, we stagger thestarting points of each processor across the image we are searching.In an ideal case, the processors will move from cell to cell,in lock step, without any conflicts. If two processors happento need the same projection simultaneously, and the projectionhas not yet been calculated, then a simple locking mechanism ensuresthat only one does the calculation while the other waits. So whena given processor needs a projection, it first checks to see ifthe projection has been calculated. If so it gets the projectionand uses it. If it has not been calculated, it locks the projection,calculates it, and stores it in the global structure. If it failsto lock the projection, then another processor must be computingit, so it waits until that computation completes.<P>The second issue was that of load balance. This issue appearedto be deceptively simple since a large quantity of models couldbe easily distributed across many processors. The problem withthis simple solution arose because of the organization of ourdatabase. The 30 views of each of the 20 models were stored sequentiallyin the database. A given processor would be assigned most of theviews of a few objects. If one of those objects appeared in theimage, that processor would have many more close matches thanother processors and would recurse more deeply into the image.This created a distinct load imbalance. We solved this problemby spreading the views of each object across the processors. Ina real application it would be sensible to store the views ofan object out of order to solve this problem. With this change inplace we achieved a much more acceptable load balance. Figure4 shows the load balance of the algorithm before and after thiscorrection. It is important to note, however, that it is impossibleto predict which models will cause the deepest searches so therewill always be a certain amount of load imbalance.<BR><P><CENTER><!WA8><!WA8><!WA8><!WA8><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/loadbal_off.gif" BORDER=1><BR><!WA9><!WA9><!WA9><!WA9><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/loadbal_on.gif" BORDER=1><BR>Figure 4 - Load Balancing</CENTER><H1>Performance </H1><P>Once our Split-C implementation was complete, we measured itsperformance on the <!WA10><!WA10><!WA10><!WA10><A HREF="http://www.cs.cornell.edu/Info/Projects/SP-2/">SP-2</A>massively parallel processor. We also compiled the sequentialversion for the SP-2 and measured its performance. The graph inFigure 5 shows the performance results in terms of speedup overthe sequential version.<P><CENTER><!WA11><!WA11><!WA11><!WA11><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/speedup.gif"></CENTER><P><CENTER>Figure 5 - Speedup vs. Number of Processors</CENTER><P>The graph clearly shows that our parallel implementation achievesless than ideal performance. Ideal performance would be a speedupof N for a test on N processors. These results were not as goodas we had hoped, so we attempted to identify the source of overheadwhich was limiting our performance. We separately measured thetime spent in each phase of the search. The breakdown of timefor different numbers of processors is shown in Figure 6.<P><CENTER><!WA12><!WA12><!WA12><!WA12><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/proc1.gif"><!WA13><!WA13><!WA13><!WA13><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/proc2.gif"><BR><!WA14><!WA14><!WA14><!WA14><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/proc4.gif"><!WA15><!WA15><!WA15><!WA15><IMG SRC="http://www.cs.cornell.edu/Info/People/jshapiro/cs664/proc8.gif"></CENTER><P><CENTER>Figure 6 - Breakdown of Processing Time</CENTER><P>The single processor graph shows that the parallel version, whenrun on one processor performs nearly as well as the sequentialversion. However, when more processors are added, we see thatthe time spent fetching projections from the global structureis a significant source of overhead. As the number of processorsbeing used increases, the fewer projections are calculated locally,so more global accesses are required. The graphs show that theoverhead associated with fetching projections increases for moreprocessors. <P>To solve this problem, we would like to fetch the projectionsusing a split-phase access and do some other calculations whilewaiting for the data to arrive. However, the nature of the algorithmprevents us from knowing in advance which projections we need.Thus by the time we know we need a projection we have no othercalculations to perform while waiting for the data. <BR><H1>Experiments With Facial Recognition</H1><P>With our parallel implementation complete we began to investigate theusefulness of this method for facial recognition.  Our goal was tosee if the algorithm could successfully find faces in a scene usingthe approximation to the Hausdorff fraction.  One test method weconsidered was inserting face models from our database directly into

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?