📄 releasenotes3_0.htm
字号:
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META NAME="Generator" CONTENT="Microsoft Word 97">
<TITLE>Release Notes: version 3</TITLE>
<META NAME="Template" CONTENT="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
</HEAD>
<BODY LINK="#0000ff" VLINK="#800080">
<FONT FACE="Arial" SIZE=5><P>Release Notes: version 3.0</P>
</FONT><P> </P>
<B><I><FONT FACE="Arial"><P>Revision History:</P>
</B></I></FONT><FONT SIZE=2><P>Quadrature Disambiguation Feature Detector</P>
<P>All code by Tyler C. Folsom</P>
</FONT><B><P>Version 1.x</P>
</B><FONT SIZE=2><P>C language code for doctoral dissertation at University of Washington, June 1994</P>
<P>DOS version with no graphics displays. Output is text only.</P>
<P>Works on monochrome single images.</P>
</FONT><B><P>Version 2.0</P>
</B><FONT SIZE=2><P>C++ version for UW short course in C++, Autumn 1998</P>
<P>Latest update: December 7, 1998</P>
<P>Windows version with graphics overlays to show where features were found.</P>
<P>Uses Microsoft Vision SDK version 1.0</P>
</FONT><B><P>Version 2.1</P>
</B><FONT SIZE=2><P>Latest update: Feb 27, 2000</P>
<P>Added the capability of generating test images to verify performance on known cases.</P>
<P>Expanded the output file.</P>
<P>Research plan was only partially completed.</P>
<P>Uses Microsoft Vision SDK version 1.2</P>
</FONT><B><P>Version 3.0</P>
</B><FONT SIZE=2><P>July 9, 2003</P>
<P>Faster routines. Can handle color images. First cut at corner detection. Uses lateral inhibition / facilitation to build extended edges. See below for details.</P>
</FONT><B><P>Version 3.1 (Sept 2003)</P>
</B><FONT SIZE=2><P>Objectives: Link the features to produce a segmentation of the image. Output should be Hermite curves that outline objects. Be able to go from an image to a cartoon.</P>
<P>Expand to handle a color stereo video image stream.</P>
<P>Instead of a fixed sampling grid, adapt the sampling grid based on features found in the previous frame.</P>
<P> </P>
</FONT><B><I><FONT FACE="Arial"><P>Overview</P>
</B></I></FONT><FONT SIZE=2><P>For an introduction, see the Software Design Document for Version 2.0 and http://home.earthlink.net/~tylerfolsom/</P>
<P>The code is written using Microsoft Foundation Classes (MFC) document / view architecture. There are several wizard generated files such as { MainFrm.cpp, ChildFrm.cpp, StdAfx.cpp, ImageFeatures.cpp, and their header files} which do not need to be disturbed and are of little interest. These MFC classes are derived from window classes. The view class is meant to handle the graphical user interface (GUI). The document class is meant to handle the real work. The header files carry comments on usage for class methods and variables. The files are organized as follows:</P>
<P>ImageFeatureView: Handles the GUI</P>
<P>Options: Gets items requested from the Options menu.</P>
<P>ImageFeaturesDoc: Handler for trivial tasks requested from the GUI.</P>
<P>ProcessFeatures: The main routine. It sets up the grid that determines the locations at which the image is sampled</P>
<P>IFFilter: The filter class. It takes as input a patch of image and correlates it with cortical filters.</P>
<P>IFLocation: Holds the result of the correlation at each location. Finds the orientations.</P>
<P>IFFeature: The feature class. It takes as inputs the results from applying cortical filters to a patch of image. It does 1D processing based on knowing the orientation. As outputs, it produces the interpretation of what features are at that location.</P>
<P>TestImage: Routines to generate synthetic images for testing. (incomplete) These are on the Process | Test cases menu.</P>
<P> </P>
</FONT><B><I><FONT FACE="Arial"><P>Version 3.0 Changes from version 2.1: (May 16 - July 9, 2003)</P>
</B></I></FONT><FONT SIZE=2><P>Changed the orientation of filters. Previously, both even and odd filters shared a horizontal orientation. Now the common orientation is vertical. This is done to facilitate stereo vision.</P>
<P>Changed from using four odd filter orientations to two. Both versions use three orientations of even filters. The even filter has a "bump" window applied to the equation -502.48 x<SUP>2</SUP> + 7.8287. The old odd filter used the equation -925.81 x<SUP>3</SUP> + 97.7913 x. The new odd filter uses 72.0232 x. The theory of steerable filters says that you can interpolate the angle of any filter from a small number of sampled orientations. In the case of polynomials, the number of samples needed is one more than the degree of the polynomial. See the dissertation for details. The equations used in the code have a multiplier of 15.66 times the numbers in the dissertation. Code changes also involved new look-up tables to determine the edge or bar position and strength based on the response of the odd/even filters. This data was generated on a spreadsheet. New bar width data was also produced, but it needs more work. However, bar width code was changed to get the bar width from the edge position when possible.</P>
<P>Updated the rules used to determine whether a feature is an edge or a bar. The basic idea is to see where the large and small filters predict the feature will be. There are sets of predictions for edges, dark bars, and light bars. Only one of the three pairs of predictions can be consistent. That one is chosen. A supplementary rule is based on the phase from the even and odd filters: If the absolute value of the phase is less than 0.42, then it is a dark bar; if it is greater than 2.7 radians, then it is a light bar. Phases in those ranges should produce no response for the other two filters.</P>
<P>Wrote a new routine for finding the angle of maximum response. The previous code used a "solve" routine to find the angle that would give the maximum filter response (based on three even and four odd). It tried multiple starting points to find where the derivative of the total response is zero. This was replaced with a "solve_max" routine based on three even and two odd filters. It should use many fewer iterations than the previous version. It starts at four approximate solutions and finds the maximum, using the value of the response as well as first and second derivatives.</P>
<P>Added the ability to handle color images. If more than one band is present, the analysis is done on the band that produces the strongest response. This should work best on an image with a luminance band and two chrominance bands. Defining DO_COLORS in stdAfx.h determines whether images are processed in gray or in color.</P>
<P>Added the ability to subsample when doing a correlation using large filters. This has not been adequately tested.</P>
<P>The above changes were archived as version 3.0.0. Changes below are in version 3.0.1</P>
<P>The main reason for most of the above changes was to make the code run faster. To find out how fast it runs, profiling code was added. This is in files profile.h and profile.cpp. These can be deleted from the project with no ill effect, or equivalently, undefine PROFILE_ME in profile.h. Profiling causes information to be written to a profile.txt file. The software is taken from Greg Hjelstrom and Byon Garrabrant, "Real-Time Hierarchical Profiling" in <I>Game Programming Gems 3</I>, Charles River Media, 2002.</P>
<P>Also added is a Canny edge detector. This is in files canny.cpp, canny.h, cannyDlg.cpp and cannyDlg.h. These are not central to the software and may be omitted. Their purpose it to compare the time required by the Quadrature Disambiguation method against a standard method. The Canny code was written by J.R. Parker in <I>Algorithms for Image Processing and Computer Vision</I>. It was modified by Travis Udd to work with the Microsoft Vision SDK in November, 2000, then further modified by Tyler Folsom in 2003. </P>
<P>Profiling shows that the time required for processing depends strongly on the size of the smallest filter used. The following times are for the CProcessFeatures::Process method, which is the heart of Quadrature disambiguation. Times should be in ms, but the timing has not been independently calibrated. The times are for a release version of the code processing a gray level elephant image, with no subsampling.</P></FONT>
<TABLE BORDER CELLSPACING=1 CELLPADDING=7 WIDTH=590>
<TR><TD WIDTH="20%" VALIGN="TOP">
<P><FONT SIZE=2>Filter size</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>20</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>12</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>8</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>5</FONT></TD>
</TR>
<TR><TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>Time</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>19.2</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>33.4</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>61.7</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
<FONT SIZE=2><P>149.1</FONT></TD>
</TR>
</TABLE>
<FONT SIZE=2><P>The time to do Canny edge detection on this image is 41.2. Thus we have achieved speed comparable to the Canny edge detector. It remains to be shown that the results achieved are as good or better.</P>
<P>The time to process features is mostly split into two subtasks: Filter, which performs the correlations at selected areas of the image, and Interpret, which finds the angle of maximum response, steers to that angle, and determines feature position, type, and strength. In addition, the code spends time displaying graphics of the found edges and writing the answers to a file. The following table shows the time spent on these tasks relative to the time for processing the features.</P></FONT>
<TABLE BORDER CELLSPACING=1 CELLPADDING=7 WIDTH=590>
<TR><TD WIDTH="20%" VALIGN="TOP">
<P><FONT SIZE=2>Filter size</FONT></TD>
<TD WIDTH="20%" VALIGN="TOP">
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -