📄 cs585 fall 1998 project one by stanislav rost.htm
字号:
src="CS585 Fall 1998 Project One by Stanislav Rost.files/mahalanobis.gif"></CENTER><BR><I><B>X</B>
is the identifying vector for the letter which we are attempting to
recognize<BR><B>Mu</B> is the mean and <B>C</B> is covariance matrix for
a vowel to which we are comparing the letter, obtained in training</I>
<LI>Pick a vowel which is the closest to the letter we are considering
(whose shifted Mahalanobis distance is the smallest).
<LI>Calculate a non-shifted Mahalanobis distance (without a log term) to
find out the actual closeness of the vowel and the letter in question.
<LI>Find the range which will threshold the distance between letters and
allow the program to determine whether letters match or not. We obtain
the variance from diagonals of the covariance matrix, and the square
root of variance produces standard deviation. To obtain the standard
deviation's measure in the space of our data distribution for the vowel,
the program applies the Mahalanobis distance to it. Once the normalized
standard deviation's value is known, it is scaled by an empirically
derived constant to cover most of the distribution.
<LI>If the distance is greater than the allowed range, mark the letter
as non-(O, E, A, U) and proceed to the next letter. <A name=checks></A>
<LI>If the distance is within the allowed range, perform series of
secondary checks using the Euler number (number of image components
minus the number of holes in the image. For instance, O's must have an
Euler number of 0 (1 component, 1 hole). If the Euler checks fails, the
letter is marked as non-(O, E, A, U) and the program proceeds to the
next letter.<BR><BR>In particular, this is where <B>u</B>'s are
distinguished from <B>n</B>'s. The letter in question is sliced into two
halves horizontally, and <B>u</B>'s have different Euler numbers for
each half than <B>n</B>'s. <BR>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/n2.gif"> <IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/u2.gif"></CENTER><BR>Also,
<B>a</B>'s which are similar to <B>s</B>'s go through an additional
check here. In a's, the top half of the image has smaller area than the
bottom half. In s's, both halves will have roughly the same area.
<LI>If the letter which we want to recognize passes all the checks and
is within the allowed range, then paint the letter as the appropriate
vowel and proceed to the next letter in the paragraph. </LI></UL><A
name=results></A>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/results.jpg"
border=0></CENTER>
<CENTER>Paragraph 1<BR><A
href="CS585 Fall 1998 Project One by Stanislav Rost.files/par1result.gif"><IMG
height=70
src="CS585 Fall 1998 Project One by Stanislav Rost.files/par1result.gif"
width=497 border=0></A><BR><BR>
<TABLE cellSpacing=1 border=1>
<TBODY>
<TR>
<TD><B>Number Right</B> </TD>
<TD><B>Number Wrong</B> </TD>
<TD><B>Correctness</B> </TD></TR>
<TR>
<TD>16 </TD>
<TD>1 </TD>
<TD>94.11 % </TD></TR></TBODY></TABLE>
<P>Paragraph 2<BR><A
href="CS585 Fall 1998 Project One by Stanislav Rost.files/par2result.gif"><IMG
height=137
src="CS585 Fall 1998 Project One by Stanislav Rost.files/par2result.gif"
width=488 border=0></A><BR><BR>
<TABLE cellSpacing=1 border=1>
<TBODY>
<TR>
<TD><B>Number Right</B> </TD>
<TD><B>Number Wrong</B> </TD>
<TD><B>Correctness</B> </TD></TR>
<TR>
<TD>82 </TD>
<TD>4 wrong, 4 missed </TD>
<TD>91.11 % </TD></TR></TBODY></TABLE>
<P>Scaled Paragraph<BR><A
href="CS585 Fall 1998 Project One by Stanislav Rost.files/scaledresult.gif"><IMG
height=194
src="CS585 Fall 1998 Project One by Stanislav Rost.files/scaledresult.gif"
width=229 border=0></A><BR><BR>
<TABLE cellSpacing=1 border=1>
<TBODY>
<TR>
<TD><B>Number Right</B> </TD>
<TD><B>Number Wrong</B> </TD>
<TD><B>Correctness</B> </TD></TR>
<TR>
<TD>11 </TD>
<TD>0 </TD>
<TD>100.00 % </TD></TR></TBODY></TABLE>
<P>Tilted Paragraph<BR><BR>Original<BR><A
href="CS585 Fall 1998 Project One by Stanislav Rost.files/tilted.gif"><IMG
height=20
src="CS585 Fall 1998 Project One by Stanislav Rost.files/tilted.gif"
width=498 border=0></A><BR><BR>Result<BR><A
href="CS585 Fall 1998 Project One by Stanislav Rost.files/tiltedresult.gif"><IMG
height=62
src="CS585 Fall 1998 Project One by Stanislav Rost.files/tiltedresult.gif"
width=498 border=0></A><BR><BR>
<TABLE cellSpacing=1 border=1>
<TBODY>
<TR>
<TD><B>Number Right</B> </TD>
<TD><B>Number Wrong</B> </TD>
<TD><B>Correctness</B> </TD></TR>
<TR>
<TD>10 </TD>
<TD>6 </TD>
<TD>62.50 % </TD></TR></TBODY></TABLE></CENTER><BR><FONT size=+1>What Went
Wrong and How To Fix It</FONT><BR><BR>The recognition performed poorly on
the tilted paragraph for several reasons.
<UL>
<LI>When the program performed reverse rotation on the tilted paragraph
to orient it properly, the algorithm for rotation might have distorted
data or added noise to it. There really is no way to avoid this because
the data is discrete, and during rotation some data will be corrupted.
<LI>The training paragraph did not have any rotated letters. Although
the moments are rotation-invariant, when the training routine calculated
the mean for each set of instances of vowels, the mean was probably far
away from the value of the mean which also accounts for rotated letters.
To correct for this, i should have also included rotated letters in
training.
<LI>The tilted paragraph was not simply rotated, but as you can see from
the image deformed in other ways, i.e. some letters are rotated
differently than the whol paragraph. These deformations in conjunction
with my usage of orientation-variant parameters may produce error. To
correct for this, I should have refrained from using rotation-variant
parameters. </LI></UL>The splitting did not quite work in some cases. The
improve the splitting technique, I probably should have considered a
technique which uses image morphology such as erosion, removal of spurs,
shrinking etc. <A name=code></A>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/code.jpg"
border=0></CENTER><A
href="http://web.mit.edu/stanrost/www/cs585p1/train.m">train.m</A> -
Training program <BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/vowel.m">vowel.m</A> - main
recognition program <BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/letter_compare.m">letter_compare.m</A>
- compares a letter to all vowels</A> <BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/invmoments.m">invmoments.m</A>
- calculates HU-moments <BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/ncmoment.m">ncmoment.m</A> -
calculates normalized central moments<BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/centralmoment.m">centralmoment.m</A>
- calculates central moments<BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/gravcenters.m">gravcenters.m</A>
- calculates gravity centers for an image<BR><BR><B>Training
data</B><BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/a.mat">a.mat</A> - training
data for vowel a<BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/e.mat">e.mat</A> - training
data for vowel e<BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/o.mat">o.mat</A> - training
data for vowel o<BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/u.mat">u.mat</A> - training
data for vowel u<BR></TD></TR></TBODY></TABLE></P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -