📄 neuralnetrecognition.html

📁 基于神经网络的手写体识别程序
💻 HTML
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
		
	int iNumWeight;
		
	int fm;  // "fm" stands for "feature map"
		
	for ( fm=0; fm&lt;6; ++fm)
	{
		for ( ii=0; ii&lt;13; ++ii )
		{
			for ( jj=0; jj&lt;13; ++jj )
			{
				iNumWeight = fm * 26;  // 26 is the number of weights per feature map
				NNNeuron& n = *( pLayer-&gt;m_Neurons[ jj + ii*13 + fm*169 ] );
				
				n.AddConnection( ULONG_MAX, iNumWeight++ );  // bias weight
				
				for ( kk=0; kk&lt;25; ++kk )
				{
					// note: max val of index == 840, corresponding to 841 neurons in prev layer
					n.AddConnection( 2*jj + 58*ii + kernelTemplate[kk], iNumWeight++ );
				}
			}
		}
	}
	
	
	// layer two:
	// This layer is a convolutional layer that has 50 feature maps.  Each feature 
	// map is 5x5, and each unit in the feature maps is a 5x5 convolutional kernel
	// of corresponding areas of all 6 of the previous layers, each of which is a 13x13 feature map
	// So, there are 5x5x50 = 1250 neurons, (5x5+1)x6x50 = 7800 weights
	
	pLayer = new NNLayer( _T("Layer02"), pLayer );
	NN.m_Layers.push_back( pLayer );
	
	for ( ii=0; ii&lt;1250; ++ii )
	{
		pLayer-&gt;m_Neurons.push_back( new NNNeuron( (LPCTSTR)label ) );
	}
	
	for ( ii=0; ii&lt;7800; ++ii )
	{
		initWeight = 0.05 * UNIFORM_PLUS_MINUS_ONE;
		pLayer-&gt;m_Weights.push_back( new NNWeight( initWeight ) );
	}
	
	// Interconnections with previous layer: this is difficult
	// Each feature map in the previous layer is a top-down bitmap image whose size
	// is 13x13, and there are 6 such feature maps.  Each neuron in one 5x5 feature map of this 
	// layer is connected to a 5x5 kernel positioned correspondingly in all 6 parent
	// feature maps, and there are individual weights for the six different 5x5 kernels.  As
	// before, we move the kernel by TWO pixels, i.e., we
	// skip every other pixel in the input image.  The result is 50 different 5x5 top-down bitmap
	// feature maps
	
	int kernelTemplate2[25] = {
		0,  1,  2,  3,  4,
		13, 14, 15, 16, 17, 
		26, 27, 28, 29, 30,
		39, 40, 41, 42, 43, 
		52, 53, 54, 55, 56   };
		
		
	for ( fm=0; fm&lt;50; ++fm)
	{
		for ( ii=0; ii&lt;5; ++ii )
		{
			for ( jj=0; jj&lt;5; ++jj )
			{
				iNumWeight = fm * 26;  // 26 is the number of weights per feature map
				NNNeuron& n = *( pLayer-&gt;m_Neurons[ jj + ii*5 + fm*25 ] );
				
				n.AddConnection( ULONG_MAX, iNumWeight++ );  // bias weight
				
				for ( kk=0; kk&lt;25; ++kk )
				{
					// note: max val of index == 1013, corresponding to 1014 neurons in prev layer
					n.AddConnection(       2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
					n.AddConnection( 169 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
					n.AddConnection( 338 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
					n.AddConnection( 507 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
					n.AddConnection( 676 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
					n.AddConnection( 845 + 2*jj + 26*ii + kernelTemplate2[kk], iNumWeight++ );
				}
			}
		}
	}
			
	
	// layer three:
	// This layer is a fully-connected layer with 100 units.  Since it is fully-connected,
	// each of the 100 neurons in the layer is connected to all 1250 neurons in
	// the previous layer.
	// So, there are 100 neurons and 100*(1250+1)=125100 weights
	
	pLayer = new NNLayer( _T("Layer03"), pLayer );
	NN.m_Layers.push_back( pLayer );
	
	for ( ii=0; ii&lt;100; ++ii )
	{
		pLayer-&gt;m_Neurons.push_back( new NNNeuron( (LPCTSTR)label ) );
	}
	
	for ( ii=0; ii&lt;125100; ++ii )
	{
		initWeight = 0.05 * UNIFORM_PLUS_MINUS_ONE;
	}
	
	// Interconnections with previous layer: fully-connected
	
	iNumWeight = 0;  // weights are not shared in this layer
	
	for ( fm=0; fm&lt;100; ++fm )
	{
		NNNeuron& n = *( pLayer-&gt;m_Neurons[ fm ] );
		n.AddConnection( ULONG_MAX, iNumWeight++ );  // bias weight
		
		for ( ii=0; ii&lt;1250; ++ii )
		{
			n.AddConnection( ii, iNumWeight++ );
		}
	}
	
			
			
	// layer four, the final (output) layer:
	// This layer is a fully-connected layer with 10 units.  Since it is fully-connected,
	// each of the 10 neurons in the layer is connected to all 100 neurons in
	// the previous layer.
	// So, there are 10 neurons and 10*(100+1)=1010 weights
	
	pLayer = new NNLayer( _T("Layer04"), pLayer );
	NN.m_Layers.push_back( pLayer );
	
	for ( ii=0; ii&lt;10; ++ii )
	{
		pLayer-&gt;m_Neurons.push_back( new NNNeuron( (LPCTSTR)label ) );
	}
	
	for ( ii=0; ii&lt;1010; ++ii )
	{
		initWeight = 0.05 * UNIFORM_PLUS_MINUS_ONE;
	}
	
	// Interconnections with previous layer: fully-connected
	
	iNumWeight = 0;  // weights are not shared in this layer
	
	for ( fm=0; fm&lt;10; ++fm )
	{
		NNNeuron& n = *( pLayer-&gt;m_Neurons[ fm ] );
		n.AddConnection( ULONG_MAX, iNumWeight++ );  // bias weight
		
		for ( ii=0; ii&lt;100; ++ii )
		{
			n.AddConnection( ii, iNumWeight++ );
		}
	}
	
	
	SetModifiedFlag( TRUE );
	
	return TRUE;
}</PRE>

<P>This code builds the illustrated neural network in stages, one stage for each layer.  In each stage, an <CODE>NNLayer</CODE> is <CODE>new</CODE>'d and then added to the <CODE>NeuralNetwork</CODE>'s vector of layers.  The needed number of <CODE>NNNeuron</CODE>s and <CODE>NNWeight</CODE>s are <CODE>new</CODE>'d and then added respectively to the layer's vector of neurons and vector of weights.  Finally, for each neuron in the layer, <CODE>NNConnection</CODE>s are added (using the <CODE>NNNeuron::AddConnection()</CODE> function), passing in appropriate indices for weights and neurons.</P>


<BR><A HREF="#topmost"><FONT SIZE="-6" COLOR="">go back to top</FONT></A>

<BR><BR>
<A name="AboutMNist"/>
<h2>MNIST Database of Handwritten Digits</h2>

<P>The MNIST database is modified (hence the &quot;M&quot;) from a database of handwritten patterns offered by the National Institute of Standards and Technology (&quot;NIST&quot;) at <A HREF="http://www.nist.gov/srd/nistsd19.htm" target=_newwin>http://www.nist.gov/srd/nistsd19.htm&nbsp;<IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A>.  As explained by Dr. LeCun at the <A HREF="http://yann.lecun.com/exdb/mnist/index.html" target=_newwin>MNIST section of his web site&nbsp;<IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A>, the database has been broken down into two distinct sets, a fist set of 60,000 images of handwritten digits that is used as a training set for the neural network, and a second set of 10,000 images that is used as a testing set.  The training set is composed of digits written by around 250 different writers, of which approximately half are high school students and the other half are U.S. census workers.  The number of writers in the testing set is unclear, but special precautions were made to ensure that all writers in the testing set were not also in the training set.  This makes for a strong test, since the neural network is tested on images from writers that it has never before seen.  It is thus a good test of the ability of the neural network to generalize, i.e., to extract intrinsically important features from the patterns in the training set that are also applicable to patterns that it has not seen before.</P>

<P>To use the neural network, you must download the MNIST database.  Besides the two files that compose the patterns from the training and testing sets, there are also two companion files that give the &quot;answers&quot;, i.e., the digit that is represented by a corresponding handwritten pattern.  These two files are called &quot;label&quot; files.  As indicated at the beginning of this article, the four files can be downloaded from <a href="http://yann.lecun.com/exdb/mnist/index.html" target=_newwin>here (11,594 Kb total)&nbsp;<IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A></P>

<P>Incidentally, it's been mentioned that Dr. LeCun's achieval of an error rate of 0.82% has been used as a benchmark.  If you read Dr. Simard's article, you will see that he claims an even better error rate of 0.40%.  Why not use Dr. Simard's 0.40% as the benchmark?</P>

<P>The reason is that Dr. Simard did not respect the boundary between the training set and the testing set.  In particular, he did not respect the fact that the writers in the training set were distinct from the writers in the testing set.  In fact, Dr. Simard did not use the testing set at all.  Instead, he trained with the first 50,000 patterns in the training set, and then tested with the remaining 10,000 patterns.  This raises the possibility that, during testing, Dr. Simard's network was fed patterns from writers whose handwriting had already been seen before, which would give the network an unfair advantage.  Dr. LeCun, on the other hand, took pains to ensure that his network was tested with patterns from writers it had never seen before.  Thus, as compared with Dr. Simard's testing, Dr. LeCun's testing was more representative of real-world results since he did not give his network any unfair advantages.  That's why I used Dr. LeCun's error rate of 0.82% as the benchmark.</P>

<P>Finally, you should note that the MNIST database is still widely used for study and testing, despite the fact that it was created back in 1998.  As one recent example, published in February 2006, see:</P>

<UL>
	<LI>Fabien Lauer, Ching Y. Suen and Gerard Bloch, <A HREF="http://hal.archives-ouvertes.fr/docs/00/05/75/61/PDF/LauerSuenBlochPR.pdf" target=_newwin>"A Trainable Feature Extractor for Handwritten Digit Recognition"&nbsp;<IMG SRC="Images/ExternalLink.gif" WIDTH="14" HEIGHT="14" BORDER="0" ALT="External Link"></A>, Elsevier Science, February 2006</LI>
</UL>

<P>In the Lauer et al. article, the authors used a convolutional neural network for everything except the actual classification/recognition step.  Instead, they used the convolutional neural network for black-box extraction of feature vectors, which they then fed to a different type of classification engine, namely a support vector machine (&quot;SVM&quot;) engine.  With this architecture, they were able to obtain the excellent error rate of just 0.54%. Good stuff.</P>


<BR><A HREF="#topmost"><FONT SIZE="-6" COLOR="">go back to top</FONT></A>

<BR><BR>
<A name="Architecture"/>
<h2>Overall Architecture of the Test/Demo Program</h2>

<P>The test program is an MFC SDI doc/view application, using worker threads for the various neural network tasks.</P>

<P>The document owns the neural network as a protected member variable.  Weights for the neural network are saved to and loaded from an <CODE>.nnt</CODE> file in the <CODE>CMNistDoc::Serialize()</CODE> function.  Storage/retrieval of the weights occurs in response to the menu items &quot;<CODE>File-&gt;Open</CODE>&quot; and &quot;<CODE>File-&gt;Save</CODE>&quot; or &quot;<CODE>Save As</CODE>&quot;.  For this purpose, the neural network also has a <CODE>Serialize()</CODE> function, which was not shown in the simplified code above, and it is the neural network that does the heavy lifting of storing its weights to a disk file, and extracting them later.</P>

<P>The document further holds two static functions that are used for the worker threads to run backpropagation and testing on the neural network.  These functions are unimaginatively named <CODE>CMNistDoc::BackpropagationThread()</CODE> and <CODE>CMNistDoc::TestingThread()</CODE>.  T
上一页 1 2 3 45
💿 文件大小 204 K
👤 上传用户 yuyx2003
📂 所属分类人工智能/神经网络
🏷️ 相关标签

#神经网络 #识别 #程序
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -