📄 cs585 fall 1998 project one by stanislav rost.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0047)http://web.mit.edu/stanrost/www/cs585p1/p1.html -->
<HTML><HEAD><TITLE>CS585 Fall 1998 Project One by Stanislav Rost</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY text=white vLink=#8000ff link=#00ffff
background="CS585 Fall 1998 Project One by Stanislav Rost.files/background.jpg">
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/title.jpg"><BR>Image
and Video Processing Project One by Stanislav Rost </CENTER>
<P>
<TABLE cellSpacing=1 cellPadding=1 border=0>
<TBODY>
<TR>
<TD vAlign=top><A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#goals"><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/goals.jpg"
border=0></A><BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#method"><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/method.jpg"
border=0></A><BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#program"><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/program.jpg"
border=0></A><BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#results"><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/results.jpg"
border=0></A><BR><A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#code"><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/code.jpg"
border=0></A><BR></TD>
<TD vAlign=top><A name=goals></A><BR>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/goals.jpg"
border=0></CENTER>The basic goal of the program is to perform letter
recognition of vowels O, E, A, U in paragraphs provided in form of
black-and-white images. Using the training paragraph image, we are allowed
to train our program to distinguish among letters. Some difficulties that
need to be resolved are noise, joined letters, scaled or tilted letters,
and distinguishing between letters that are similar to each other.
<P>Some of the assumptions are:
<UL>
<LI>All letters are typed with the same font, which has
fixed-width.<BR> <I>Taken advantage of in the letter-splitting
algorithm</I>
<LI>If the paragraph is tilted, then it is shaped like a tilted wide
rectangle. <I>Used for the purposes of rotating the paragraph into
proper orientation</I> </LI></UL><A name=method></A>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/method.jpg"
border=0></CENTER>
<UL>
<LI><I>How to distinguish one letter from another?</I><BR><BR>To
distinguish an image of one letter from an image of another letter, we
need to compute a set of <A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#parameters">parameters</A>
(an <I>identifying vector</I>) for each letter which would ideally
uniquely be different images of different letters, but the same for each
class of letters (i.e. all <B>a</B>'s, all <B>e</B>'s etc).
<BR><BR>Since the different parameters, i.e. elements of the identifying
vector, vary with different and possibly non-linear rates for each
letter, we cannot use Euclidian distance to compare how close the two
images of letters are. We need to calculate the mean of all the letters
in the same class, and the covariance matrix to characterize the
distribution of values of the identifying vectors. Once that information
is known, we can use the Mahalanobis distance to find out how a given
image of a letter is close to the whole class of letters, represented as
a distribution of vector values by mean vector and covariance matrix.
<BR><BR>This is what <B>The Training</B> accomplishes: multiple
instances of the same vowels are identified by the user, and the
computer "learns" the means and covariances for each of the vowels which
need to be recognized using the identifying vectors for all instances of
vowels encountered in the training paragraph.
<P></P>
<LI><A name=parameters></A><I>What are the image parameters included in
the identifying vectors?</I><BR><BR>It was originally suggested that to
distinguish among letters we have to use the <I>HU-moments</I>:
statistical image characteristics which are invariant regardless of
image's rotation or scaling. We were given formulas for 7 moments, but I
found that only the first three moments change sufficiently when
different letters are compared. Other moments were a) of higher order
which meant that noise would introduce big deviations in the values of
moments, and b) smaller than 0.0001, which means that floating point
errors would be introduced when doing arithmetic or comparisons with
them.<BR><BR>However, using just the first three moments was not enough,
there were still errors in identifying the letters. I added another
rotation-invariant parameter to the identifying vectors: <I>the
compactness</I>, defined as the perimeter of the image squared over its
area. That improved recognition to some extent. <BR><BR>A significant
problem arose: the letter <B>n</B> in most cases looks exactly like the
letter <B>u</B>, just flipped upside down, and the rotation-invariant
moments produced very similar vectors for both letters. So I had to
reject the notion of using rotation-invariant methods since I had to
determine the correct orientation of the paragraph to determine whether
a letter was an "n" or a "u." If I had <A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#rotate">a way of
finding the angle by which the paragraph was rotated</A>, then I could
rotate the paragraph the opposite way and use rotation-variant
parameters. One such parameter is the ratio of the area of the top half
of the image of a letter to the area of the bottom half. Using that
parameter further improved recognition, but did not completely eliminate
the confusion between <B>u</B>'s and <B>n</B>'s. So I had to put in <A
href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#checks">secondary
checks</A> in the recognition routine to reject all <B>n</B>'s.
<P></P>
<LI><A name=rotate></A><I>How does the program rotate the paragraph so
that it is correctly oriented on the page?</I><BR><BR>Finding the
paragraph's orientation is a non-trivial task to generalize for all
cases. However, if the paragraph was rectangular before it was tilted,
we can find its orientation by finding the angle (in degrees) between
the x-axis and the major axis of the ellipse that has the same
second-moments as the paragraph, and then rotate by reverse of that
angle. </LI></UL><A name=program></A>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/program.jpg"
border=0></CENTER>Basic algorithm for the <B>Training</B> program:
<UL>
<LI>Load the training paragraph.
<LI>Split the paragraph image into separate images of letters.
<LI>For each image of a letter, compute the parameters for identifying
vector.
<LI>Show each image of the letter to the user, ask the user which letter
it is.
<LI>For each class of letters (O, E, A, U), calculate the mean vector
and covariance and write them to disk. </LI></UL>Basic algorithm for the
<B>Recognition</B> program:
<UL>
<LI>Load the image with letters to be recognized.
<LI>Calculate the average height of all letters.
<LI>Determine if the image is tilted by slicing the image into two
halves, and determining whether the upper boundaries of the bounding
boxes of the two halves are more than 0.5 the average letter height
apart. <BR><BR><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/istilted.gif">
<BR><BR>
<LI>If the image is tilted, find out the rotation from its x-axis to the
x-axis of an ellipse with the same second-order moments as the image,
and rotate the image by that angle by the negative of that angle.
<LI>Split the image into separate letters (joined letters are still
considered one letter).
<LI>Calculate the average width of all letters.
<LI>Consider each image of the letter, if it is wider than 1.5 * average
image width, then it is a group of letters joined together. Split the
letter into pieces each having a width equal to an average width of
letters. <BR>
<CENTER><IMG
src="CS585 Fall 1998 Project One by Stanislav Rost.files/split.gif"></CENTER><BR>
<LI>Relabel and split the paragraph's image into separate letters again.
<LI>For each image of the letter, determine the parameters for
identifying vector.
<LI>Calculate Mahalanobis distances from the letter's identifying vector
to each vowel using Mahalanobis distance, shifted by the log term so
that the distances can be compared with each other. <BR>
<CENTER><IMG
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -