📄 neuralnetrecognition.html
字号:
<!--------------------------------------------------------------------------->
<!-- INTRODUCTION
The Code Project article submission template (HTML version)
Using this template will help us post your article sooner. To use, just
follow the 3 easy steps below:
1. Fill in the article description details
2. Add links to your images and downloads
3. Include the main article text
That's all there is to it! All formatting will be done by our submission
scripts and style sheets.
-->
<!--------------------------------------------------------------------------->
<!-- IGNORE THIS SECTION -->
<html>
<head>
<title>The Code Project</title>
<Style>
BODY, P, TD { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt }
H2,H3,H4,H5 { color: #ff9900; font-weight: bold; }
H2 { font-size: 13pt; }
H3 { font-size: 12pt; }
H4 { font-size: 10pt; color: black; }
PRE { BACKGROUND-COLOR: #FBEDBB; FONT-FAMILY: "Courier New", Courier, mono; WHITE-SPACE: pre; }
CODE { COLOR: #990000; FONT-FAMILY: "Courier New", Courier, mono; }
</style>
<link rel="stylesheet" type=text/css href="http://www.codeproject.com/styles/global.css">
</head>
<body bgcolor="#FFFFFF" color=#000000>
<!--------------------------------------------------------------------------->
<!------------------------------- STEP 1 --------------------------->
<!-- Fill in the details (CodeProject will reformat this section for you) -->
<pre>
Title: Neural Network for Recognition of Handwritten Digits
Author: Mike O'Neill
Email: MikeAThon2000@hotmail.com
Environment: VC++ 5.0-6.0, NT 4.0, Win95/98, WinXP/2K
Keywords: Neural Network, Artificial Neural Network, ANN, Backpropagation, Hessian, OCR, MNIST
Level: Intermediate
Description: A Convolutional Neural Network Achieves 99.26% Accuracy on Modified NIST Database of Handwritten Digits
Section General C++
SubSection Internet & Network programming
</pre>
<!------------------------------- STEP 2 --------------------------->
<!-- Include download and sample image information. -->
<P><ul class=download>
<li><a href="Demo-Mnist.zip">Download the Neural Network demo project - 202 Kb</a> (includes a release-build executable that you can run without the need to compile)</li>
<li><a href="10September-PolishedWithUndistorted-7epochs-dot026MSE-74Errors.zip">Download a sample neuron weight file - 2,785 Kb</a> (achieves the 99.26% accuracy mentioned above)</li>
<li><a href="http://yann.lecun.com/exdb/mnist/index.html" target=_newwin>Download the MNIST database - 11,594 Kb total for all four files <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A> (external link to four files which are all required for this project)</li>
</UL>
<BR><IMG SRC="Images/Screenshot-GraphicalView.gif" WIDTH="600" HEIGHT="372" BORDER="0" ALT="Graphical view of the neural network">
<!------------------------------- STEP 3 --------------------------->
<!-- Add the article text. Please use simple formatting (<h2>, <p> etc) -->
<A name="topmost"/>
<H2>Contents</H2>
<UL type="disc">
<LI><A HREF="#Introduction">Introduction</A></LI>
<LI><A HREF="#Theory">Some Neural Network Theory</A></LI>
<UL type="disc">
<LI><A HREF="#ForwardPropagation">Forward Propagation</A></LI>
<LI><A HREF="#ActivationFunction">The Activation Function (or, "Sigmoid" or "Squashing" Function)</A></LI>
<LI><A HREF="#Backpropagation">Backpropagation</A></LI>
<LI><A HREF="#SecondOrder">Second Order Methods</A></LI>
</UL>
<LI><A HREF="#ConvolutionalStructure">Structure of the Convolutional Neural Network</A></LI>
<UL type="disc">
<LI><A HREF="#Illustration">Illustration and General Description</A></LI>
<LI><A HREF="#CodeToBuild">Code For Building the Neural Network</A></LI>
</UL>
<LI><A HREF="#AboutMNist">MNIST Database of Handwritten Digits</A></LI>
<LI><A HREF="#Architecture">Overall Architecture of the Test/Demo Program</A></LI>
<UL type="disc">
<LI><A HREF="#Using">Using The Demo Program</A></LI>
<LI><A HREF="#GraphicalView">Graphical View of the Neural Network</A></LI>
<LI><A HREF="#TrainingView">Training View and Control Over the Neural Network</A></LI>
<LI><A HREF="#TestingView">Testing View of the Neural Network</A></LI>
</UL>
<LI><A HREF="#Training">Training the Neural Network</A></LI>
<LI><A HREF="#Tricks">Tricks That Make Training Faster</A></LI>
<UL type="disc">
<LI><A HREF="#Hessian">Second Order Backpropagation Using Pseudo-Hessian</A></LI>
<LI><A HREF="#SimultaneousBackprop">Simultaneous Backpropagation and Forward Propagation</A></LI>
<LI><A HREF="#SkipBackprop">Skip Backpropagation for Small Errors</A></LI>
</UL>
<LI><A HREF="#Experiences">Experiences In Training the Neural Network</A></LI>
<LI><A HREF="#Results">Results</A></LI>
<LI><A HREF="#Bibliography">Bibliography</A></LI>
<LI><A HREF="#Version">License and Version Information</A></LI>
</UL>
<BR><BR>
<A name="Introduction"/>
<h2>Introduction</h2>
<P>This article chronicles the development of an artificial neural network designed to recognize handwritten digits. Although some theory of neural networks is given here, it would be better if you already understood some neural network concepts, like neurons, layers, weights and backpropagation.</P>
<P>The neural network described here is <I>not</I> a general-purpose neural network, and it's not some kind of a neural network workbench. Rather, we will focus on one very specific neural network (a five-layer convolutional neural network) built for one very specific purpose (to recognize handwritten digits).</P>
<P>The idea of using neural networks for the purpose of recognizing handwritten digits is not a new one. The inspiration for the architecture described here comes from articles written by two separate authors. The first is Dr. Yann LeCun, who was an independent discoverer of the basic backpropagation algorithm. Dr. LeCun hosts an excellent site on his research into neural networks, at <A HREF="http://yann.lecun.com/" target=_newwin>http://yann.lecun.com/ <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT=""></A>. In particular you should view his <A HREF="http://yann.lecun.com/exdb/lenet/index.html" target=_newwin>"Learning and Visual Perception"</A> section, which uses animated gif's to show results of his research. The MNIST database (which provides the database of handwritten digits) was developed by him. I used two of his publications as primary source materials for much of my work, and I highly recommend reading his other publications too (they're posted at his site). Unlike many other publications on neural networks, Dr. LeCun's publications are not inordinately theoretical and math-intensive; rather, they are extremely readable and provide practical insights and explanations. His articles and publications can be found <A HREF="http://yann.lecun.com/exdb/publis/index.html" target=_newwin>here <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A>. Here are the two publications that I relied on:</P>
<br>
<UL>
<LI>Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, <A HREF="http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf" target=_newwin>"Gradient-Based Learning Applied to Document Recognition," <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A> Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. [46 pages] </LI>
<LI>Y. LeCun, L. Bottou, G. Orr, and K. Muller, <A HREF="http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf" target=_newwin>"Efficient BackProp," <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A> in Neural Networks: Tricks of the trade, (G. Orr and Muller K., eds.), 1998. [44 pages]</LI>
</UL>
<BR>
<P>The second author is Dr. Patrice Simard, a former collaborator with Dr. LeCun when they both worked at AT&T Laboratories. Dr. Simard is now a researcher at Microsoft's <A HREF="http://www.research.microsoft.com/dpu/" target=_newwin>"Document Processing and Understanding"</A> group. His articles and publications can be found <A HREF="http://research.microsoft.com/~patrice/publi.html" target=_newwin>here</A>, and the publication that I relied on is:</P>
<BR>
<UL>
<LI>Patrice Y. Simard, Dave Steinkraus, John Platt, <A HREF="http://research.microsoft.com/~patrice/PDF/fugu9.pdf" target=_newwin>"Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis," <IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A> International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Los Alamitos, pp. 958-962, 2003. </LI>
</UL>
<P>One of my goals here was to reproduce the accuracy achieved by Dr. LeCun, who was able to train his neural network to achieve 99.18% accuracy (i.e., an error rate of only 0.82%). This error rate served as a type of "benchmark", guiding my work.</P>
<P>As a final introductory note, I'm not overly proud of the source code, which is most definitely an engineering work-in-progress. I started out with good intentions, to make source code that was flexible and easy to understand and to change. As things progressed, the code started to turn ugly. I began to write code simply to get the job done, sometimes at the expense of clean code and comprehensibility. To add to the mix, I was also experimenting with different ideas, some of which worked and some of which did not. As I removed the failed ideas, I did not always back out all the changes and there are therefore some dangling stubs and dead ends. I contemplated the possibility of not releasing the code. But that was one of my criticisms of the articles I read: none of them included code. So, with trepidation and the recognition that the code is easy to criticize and could really use a re-write, here it is.</P>
<BR><A HREF="#topmost"><FONT SIZE="-6" COLOR="">go back to top</FONT></A>
<BR><BR>
<A name="Theory"/>
<h2>Some Neural Network Theory</h2>
<P>This is not a neural network tutorial, but to understand the code and the names of the variables used in it, it helps to see some neural networks basics. </P>
<P>The following discussion is not completely general. It considers only feed-forward neural networks, that is, neural networks composed of multiple layers, in which each layer of neurons feeds only the very next layer of neurons, and receives input only from the immediately preceding layer of neurons. In other words, the neurons don't skip layers.</P>
<P>Consider a neural network that is composed of multiple layers, with multiple neurons in each layer. Focus on one neuron in layer <I>n</I>, namely the <I>i-th</I> neuron. This neuron gets its inputs from the outputs of neurons in the previous layer, plus a bias whose valued is one ("1"). I use the variable "<I>x</I>" to refer to outputs of neurons. The <I>i-th</I> neuron applies a weight to each of its inputs, and then adds the weighted inputs together so as to obtain something called the "activation value". I use the variable "<I>y</I>" to refer to activation values. The <I>i-th</I> neuron then calculates its output value "<I>x</I>" by applying an "activation function" to the activation value. I use the letter "<I>F()</I>" to refer to the activation function. The activation function is sometimes referred to as a "Sigmoid" function, a "Squashing" function and other names, since its primary purpose is to limit the output of the neuron to some reasonable range like a range of -1 to +1, and thereby inject some degree of non-linearity into the network. Here's a diagram of a small part of the neural network; remember to focus on the <I>i-th</I> neuron in level <I>n</I>:</P>
<BR>
<IMG SRC="Images/NeuralNetDiagram.gif" WIDTH="573" HEIGHT="543" BORDER="0" ALT="General diagram of a neuron in a neural network">
<BR>
<P>This is what each variable means:</P>
<TABLE>
<TR>
<TD><IMG SRC="Images/xNI.gif" WIDTH="21" HEIGHT="25" BORDER="0" ALT="Output of the i-th neuron in layer n"></TD>
<TD>is the output of the <I>i</I>-th neuron in layer <I>n</I></TD>
</TR>
<TR>
<TD><IMG SRC="Images/xNm1J.gif" WIDTH="32" HEIGHT="25" BORDER="0" ALT="Output of the j-th neuron in layer n-1"></TD>
<TD>is the output of the <I>j</I>-th neuron in layer <I>n-1</I></TD>
</TR>
<TR>
<TD><IMG SRC="Images/xNm1K.gif" WIDTH="30" HEIGHT="25" BORDER="0" ALT="Output of the k-th neuron in layer n-1"></TD>
<TD>is the output of the <I>k</I>-th neuron in layer <I>n-1</I></TD>
</TR>
<TR>
<TD VALIGN="top"><IMG SRC="Images/wNIJ.gif" WIDTH="27" HEIGHT="25" BORDER="0" ALT="Weight that the i-th neuron in layer n applies to the output of the j-th neuron from layer n-1"></TD>
<TD>is the weight that the <I>i</I>-th neuron in layer <I>n</I> applies to the output of the <I>j</I>-th neuron from layer <I>n-1</I> (i.e., the previous layer). In other words, it's the weight from the output of the <I>j</I>-th neuron in the previous layer to the <I>i</I>-th neuron in the current (<I>n</I>-th) layer.</TD>
</TR>
<TR>
<TD><IMG SRC="Images/wNIK.gif" WIDTH="27" HEIGHT="25" BORDER="0" ALT="Weight that the i-th neuron in layer n applies to the output of the k-th neuron in layer n-1"></TD>
<TD>is the weight that the <I>i</I>-th neuron in layer <I>n</I> applies to the output of the <I>k</I>-th neuron in layer <I>n-1</I></TD>
</TR>
</TABLE>
<TABLE>
<TR>
<TD><IMG SRC="Images/ForwardPropagationEquation.gif" WIDTH="227" HEIGHT="53" BORDER="0" ALT="General feed-forward equation"></TD>
<TD>is the general feed-forward equation, where <I>F()</I> is the activation function. We will discuss the activation function in more detail in a moment.</TD>
</TR>
</TABLE>
<P>How does this translate into code and C++ classes? The way I saw it, the above diagram suggested that a neural network is composed of objects of four different classes: layers, neurons in the layers, connections from neurons in one layer to those in another layer, and weights that are applied to connections. Those four classes are reflected in the code, together with a fifth class -- the neural network itself -- which acts as a container for all other objects and which serves as the main interface with the outside world. Here's a simplified view of the classes. Note that the code makes heavy use of <CODE>std::vector</CODE>, particularly <CODE>std::vector< double ></CODE>:</P>
<PRE>// simplified view: some members have been omitted, and some signatures have been altered
// helpful typedef's
typedef std::vector< NNLayer* > VectorLayers;
typedef std::vector< NNWeight* > VectorWeights;
typedef std::vector< NNNeuron* > VectorNeurons;
typedef std::vector< NNConnection > VectorConnections;
// Neural Network class
class NeuralNetwork
{
public:
NeuralNetwork();
virtual ~NeuralNetwork();
void Calculate( double* inputVector, UINT iCount,
double* outputVector = NULL, UINT oCount = 0 );
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -