cs585 fall 1998 project one by stanislav rost.htm

来自「《Visual C++数字图像识别技术典型案例》之光学字符识别技术源码」· HTM 代码 · 共 307 行 · 第 1/2 页
HTM
307 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0047)http://web.mit.edu/stanrost/www/cs585p1/p1.html -->
<HTML><HEAD><TITLE>CS585 Fall 1998 Project One by Stanislav Rost</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY text=white vLink=#8000ff link=#00ffff 
background="CS585 Fall 1998 Project One by Stanislav Rost.files/background.jpg">
<CENTER><IMG 
src="CS585 Fall 1998 Project One by Stanislav Rost.files/title.jpg"><BR>Image 
and Video Processing Project One by Stanislav Rost </CENTER>
<P>
<TABLE cellSpacing=1 cellPadding=1 border=0>
  <TBODY>
  <TR>
    <TD vAlign=top><A 
      href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#goals"><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/goals.jpg" 
      border=0></A><BR><A 
      href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#method"><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/method.jpg" 
      border=0></A><BR><A 
      href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#program"><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/program.jpg" 
      border=0></A><BR><A 
      href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#results"><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/results.jpg" 
      border=0></A><BR><A 
      href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#code"><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/code.jpg" 
      border=0></A><BR></TD>
    <TD vAlign=top><A name=goals></A><BR>
      <CENTER><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/goals.jpg" 
      border=0></CENTER>The basic goal of the program is to perform letter 
      recognition of vowels O, E, A, U in paragraphs provided in form of 
      black-and-white images. Using the training paragraph image, we are allowed 
      to train our program to distinguish among letters. Some difficulties that 
      need to be resolved are noise, joined letters, scaled or tilted letters, 
      and distinguishing between letters that are similar to each other. 
      <P>Some of the assumptions are: 
      <UL>
        <LI>All letters are typed with the same font, which has 
        fixed-width.<BR>&nbsp;<I>Taken advantage of in the letter-splitting 
        algorithm</I> 
        <LI>If the paragraph is tilted, then it is shaped like a tilted wide 
        rectangle. &nbsp;<I>Used for the purposes of rotating the paragraph into 
        proper orientation</I> </LI></UL><A name=method></A>
      <CENTER><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/method.jpg" 
      border=0></CENTER>
      <UL>
        <LI><I>How to distinguish one letter from another?</I><BR><BR>To 
        distinguish an image of one letter from an image of another letter, we 
        need to compute a set of <A 
        href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#parameters">parameters</A> 
        (an <I>identifying vector</I>) for each letter which would ideally 
        uniquely be different images of different letters, but the same for each 
        class of letters (i.e. all <B>a</B>'s, all <B>e</B>'s etc). 
        <BR><BR>Since the different parameters, i.e. elements of the identifying 
        vector, vary with different and possibly non-linear rates for each 
        letter, we cannot use Euclidian distance to compare how close the two 
        images of letters are. We need to calculate the mean of all the letters 
        in the same class, and the covariance matrix to characterize the 
        distribution of values of the identifying vectors. Once that information 
        is known, we can use the Mahalanobis distance to find out how a given 
        image of a letter is close to the whole class of letters, represented as 
        a distribution of vector values by mean vector and covariance matrix. 
        <BR><BR>This is what <B>The Training</B> accomplishes: multiple 
        instances of the same vowels are identified by the user, and the 
        computer "learns" the means and covariances for each of the vowels which 
        need to be recognized using the identifying vectors for all instances of 
        vowels encountered in the training paragraph. 
        <P></P>
        <LI><A name=parameters></A><I>What are the image parameters included in 
        the identifying vectors?</I><BR><BR>It was originally suggested that to 
        distinguish among letters we have to use the <I>HU-moments</I>: 
        statistical image characteristics which are invariant regardless of 
        image's rotation or scaling. We were given formulas for 7 moments, but I 
        found that only the first three moments change sufficiently when 
        different letters are compared. Other moments were a) of higher order 
        which meant that noise would introduce big deviations in the values of 
        moments, and b) smaller than 0.0001, which means that floating point 
        errors would be introduced when doing arithmetic or comparisons with 
        them.<BR><BR>However, using just the first three moments was not enough, 
        there were still errors in identifying the letters. I added another 
        rotation-invariant parameter to the identifying vectors: <I>the 
        compactness</I>, defined as the perimeter of the image squared over its 
        area. That improved recognition to some extent. <BR><BR>A significant 
        problem arose: the letter <B>n</B> in most cases looks exactly like the 
        letter <B>u</B>, just flipped upside down, and the rotation-invariant 
        moments produced very similar vectors for both letters. So I had to 
        reject the notion of using rotation-invariant methods since I had to 
        determine the correct orientation of the paragraph to determine whether 
        a letter was an "n" or a "u." If I had <A 
        href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#rotate">a way of 
        finding the angle by which the paragraph was rotated</A>, then I could 
        rotate the paragraph the opposite way and use rotation-variant 
        parameters. One such parameter is the ratio of the area of the top half 
        of the image of a letter to the area of the bottom half. Using that 
        parameter further improved recognition, but did not completely eliminate 
        the confusion between <B>u</B>'s and <B>n</B>'s. So I had to put in <A 
        href="http://web.mit.edu/stanrost/www/cs585p1/p1.html#checks">secondary 
        checks</A> in the recognition routine to reject all <B>n</B>'s. 
        <P></P>
        <LI><A name=rotate></A><I>How does the program rotate the paragraph so 
        that it is correctly oriented on the page?</I><BR><BR>Finding the 
        paragraph's orientation is a non-trivial task to generalize for all 
        cases. However, if the paragraph was rectangular before it was tilted, 
        we can find its orientation by finding the angle (in degrees) between 
        the x-axis and the major axis of the ellipse that has the same 
        second-moments as the paragraph, and then rotate by reverse of that 
        angle. </LI></UL><A name=program></A>
      <CENTER><IMG 
      src="CS585 Fall 1998 Project One by Stanislav Rost.files/program.jpg" 
      border=0></CENTER>Basic algorithm for the <B>Training</B> program: 
      <UL>
        <LI>Load the training paragraph. 
        <LI>Split the paragraph image into separate images of letters. 
        <LI>For each image of a letter, compute the parameters for identifying 
        vector. 
        <LI>Show each image of the letter to the user, ask the user which letter 
        it is. 
        <LI>For each class of letters (O, E, A, U), calculate the mean vector 
        and covariance and write them to disk. </LI></UL>Basic algorithm for the 
      <B>Recognition</B> program: 
      <UL>
        <LI>Load the image with letters to be recognized. 
        <LI>Calculate the average height of all letters. 
        <LI>Determine if the image is tilted by slicing the image into two 
        halves, and determining whether the upper boundaries of the bounding 
        boxes of the two halves are more than 0.5 the average letter height 
        apart. <BR><BR><IMG 
        src="CS585 Fall 1998 Project One by Stanislav Rost.files/istilted.gif"> 
        <BR><BR>
        <LI>If the image is tilted, find out the rotation from its x-axis to the 
        x-axis of an ellipse with the same second-order moments as the image, 
        and rotate the image by that angle by the negative of that angle. 
        <LI>Split the image into separate letters (joined letters are still 
        considered one letter). 
        <LI>Calculate the average width of all letters. 
        <LI>Consider each image of the letter, if it is wider than 1.5 * average 
        image width, then it is a group of letters joined together. Split the 
        letter into pieces each having a width equal to an average width of 
        letters. <BR>
        <CENTER><IMG 
        src="CS585 Fall 1998 Project One by Stanislav Rost.files/split.gif"></CENTER><BR>
        <LI>Relabel and split the paragraph's image into separate letters again. 

        <LI>For each image of the letter, determine the parameters for 
        identifying vector. 
        <LI>Calculate Mahalanobis distances from the letter's identifying vector 
        to each vowel using Mahalanobis distance, shifted by the log term so 
        that the distances can be compared with each other. <BR>
        <CENTER><IMG
cs585 fall 1998 project one by stanislav rost.htm - 源码说明

本页面展示了「《Visual C++数字图像识别技术典型案例》之光学字符识别技术源码」中的 cs585 fall 1998 project one by stanislav rost.htm 源码文件，采用 HTM 编程语言编写，共 307 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Visual相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?