📄 neuralnetrecognition.html

📁 基于神经网络的手写体识别程序
💻 HTML
📖 第 1 页 / 共 5 页
字号:

<P>Then <IMG SRC="Images/DerivativeEquation2.gif" ALIGN="center" WIDTH="324" HEIGHT="55" BORDER="0" ALT="Derivative of tanh function"></P>

<P>Which simplifies to <IMG SRC="Images/DerivativeEquation3.gif" ALIGN="center" WIDTH="140" HEIGHT="47" BORDER="0" ALT="Simplified derivative"> </P>

<P>Or, since <IMG SRC="Images/DerivativeEquation4.gif" ALIGN="center" WIDTH="86" HEIGHT="21" BORDER="0" ALT="Given values">, the result is <IMG SRC="Images/DerivativeEquation5.gif" ALIGN="center" WIDTH="90" HEIGHT="47" BORDER="0" ALT="Result">.  This result means that we can calculate the value of the derivative of <I>F()</I> given only the output of the function, without any knowledge of the input.  In this article, we will refer to the derivative of the activation function as <I>G(x)</I>.</P>

<P>The activation function used in the code is a scaled version of the hyperbolic tangent.  It was chosen based on a recommendation in one of Dr. LeCun's articles.  Scaling causes the function to vary between &plusmn1.7159, and permits us to train the network to values of &plusmn;1.0.</P>


<BR><A HREF="#topmost"><FONT SIZE="-6" COLOR="">go back to top</FONT></A>

<BR><BR>
<A name="Backpropagation"/>
<h3>Backpropagation</h2>

<P>Backpropagation is an iterative process that starts with the last layer and moves backwards through the layers until the first layer is reached.  Assume that for each layer, we know the error in the output of the layer.  If we know the error of the output, then it is not hard to calculate changes for the weights, so as to reduce that error.  The problem is that we can only observe the error in the output of the very last layer.</P>

<P>Backpropagation gives us a way to determine the error in the output of a prior layer given the output of a current layer.  The process is therefore iterative: start at the last layer and calculate the change in the weights for the last layer.  Then calculate the error in the output of the prior layer.  Repeat.</P>

<P>The backpropagation equations are given below.  My purpose in showing you the equations is so that you can find them in the code, and understand them.  For example, the first equation shows how to calculate the partial derivative of the error <I>E<sup>P</sup></I> with respect to the activation value <I>y<sup>i</sup></I> at the <I>n-th</I> layer.  In the code, you will see a variable named <CODE>dErr_wrt_dYn[ ii ]</CODE>.</P>

<P>Start the process off by computing the partial derivative of the error due to a single input image pattern with respect to the outputs of the neurons on the last layer.  The error due to a single pattern is calculated as follows:</P>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn1-ErrNP.gif" WIDTH="157" HEIGHT="46" BORDER="0" ALT="Equation (1): Error due to a single pattern"></TD>
	<TD>(equation 1)</TD>
</TR>
</TABLE>
<P>where:</P>
<TABLE>
<TR>
	<TD><IMG SRC="Images/ErrNP.gif" WIDTH="25" HEIGHT="25" BORDER="0" ALT="Error due to a single pattern P at the last layer n"></TD>
	<TD>is the error due to a single pattern <I>P</I> at the last layer <I>n</I>;</TD>
</TR>
<TR>
	<TD><IMG SRC="Images/TNI.gif" WIDTH="21" HEIGHT="25" BORDER="0" ALT="Target output at the last layer (i.e., the desired output at the last layer)"></TD>
	<TD>is the target output at the last layer (i.e., the desired output at the last layer); and</TD>
</TR>
<TR>
	<TD><IMG SRC="Images/xNI.gif" WIDTH="21" HEIGHT="25" BORDER="0" ALT="Actual value of the output at the last layer"></TD>
	<TD>is the actual value of the output at the last layer.</TD>
</TR>
</TABLE>

<P>Given equation (1), then taking the partial derivative yields:</P>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn2-dErrNP-wrt-dxNI.gif" WIDTH="97" HEIGHT="54" BORDER="0" ALT="Equation (2): Partial derivative of the output error for one pattern with respect to the neuron output values"></TD>
	<TD>(equation 2)</TD>
</TR>
</TABLE>					

<P>Equation (2) gives us a starting value for the backpropagation process.  We use the numeric values for the quantities on the right side of equation (2) in order to calculate numeric values for the derivative.  Using the numeric values of the derivative, we calculate the numeric values for the changes in the weights, by applying the following two equations (3) and then (4):</P>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn3-dErrNP-wrt-dyNI.gif" WIDTH="148" HEIGHT="57" BORDER="0" ALT="Equation (3): Partial derivative of the output error for one pattern with respect to the activation value of each neuron"></TD>
	<TD>(equation 3)</TD>
</TR>
</TABLE>
 			
<TABLE>
<TR>
	<TD>where</TD>
	<TD><IMG SRC="Images/G-of-xNI.gif" WIDTH="48" HEIGHT="25" BORDER="0" ALT="Derivative of the activation function"></TD>
	<TD>is the derivative of the activation function.</TD>
</TR>
</TABLE>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn4-dErrNP-wrt-dWNij.gif" WIDTH="133" HEIGHT="57" BORDER="0" ALT="Equation (4): Partial derivative of the output error for one pattern with respect to each weight feeding the neuron"></TD>
	<TD>(equation 4)</TD>
</TR>
</TABLE>

<P>Then, using equation (2) again and also equation (3), we calculate the error for the previous layer, using the following equation (5):</P>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn5-dErrNm1P-wrt-dxNm1K.gif" WIDTH="161" HEIGHT="55" BORDER="0" ALT="Equation (5): Partial derivative of the error for the previous layer"></TD>
	<TD>(equation 5)</TD>
</TR>
</TABLE>

<P>The values we obtain from equation (5) are used as starting values for the calculations on the immediately preceding layer.  <I><B>This is the single most important point in understanding backpropagation.</B></I>  In other words, we take the numeric values obtained from equation (5), and use them in a repetition of equations (3), (4) and (5) for the immediately preceding layer.</P>

<P>Meanwhile, the values from equation (4) tell us how much to change the weights in the current layer n, which was the whole purpose of this gigantic exercise.  In particular, we update the value of each weight according to the formula:</P>

<TABLE>
<TR>
	<TD><IMG SRC="Images/eqtn6-UpdateWeights.gif" WIDTH="245" HEIGHT="55" BORDER="0" ALT="Equation (6): Updating the weights"></TD>
	<TD>(equation 6)</TD>
</TR>
</TABLE>
 		
<P>where <I>eta</I> is the "learning rate", typically a small number like 0.0005 that is gradually decreased during training.</P>

<P>In the code, these equations are implemented by calling <CODE>NeuralNetwork::Backpropagate()</CODE>.  The input to the <CODE>NeuralNetwork::Backpropagate()</CODE> function is the actual output of the neural network and the desired output.  Using these two inputs, the <CODE>NeuralNetwork::Backpropagate()</CODE> function calculates the value of equation (2).  It then iterates through all layers in the network, starting from the last layer and proceeding backwards toward the first layer.  For each layer, the layer's <CODE>NNLayer::Backpropagate()</CODE> function is called.  The input to <CODE>NNLayer::Backpropagate()</CODE> is the derivative, and the output is equation (5).  These derivatives are all stored in a <CODE>vector</CODE> of a <CODE>vector</CODE> of <CODE>doubles</CODE> named <CODE>differentials</CODE>.  The output from one layer is then fed as the input to the next preceding layer:</P>

<PRE>// simplified code
void NeuralNetwork::Backpropagate(double *actualOutput, double *desiredOutput, UINT count)
{
	// Backpropagates through the neural net
	// Proceed from the last layer to the first, iteratively
	// We calculate the last layer separately, and first, since it provides the needed derviative
	// (i.e., dErr_wrt_dXnm1) for the previous layers
	
	// nomenclature:
	//
	// Err is output error of the entire neural net
	// Xn is the output vector on the n-th layer
	// Xnm1 is the output vector of the previous layer
	// Wn is the vector of weights of the n-th layer
	// Yn is the activation value of the n-th layer, i.e., the weighted sum of inputs BEFORE 
	//    the squashing function is applied
	// F is the squashing function: Xn = F(Yn)
	// F' is the derivative of the squashing function
	//   Conveniently, for F = tanh, then F'(Yn) = 1 - Xn^2, i.e., the derivative can be 
	//   calculated from the output, without knowledge of the input
	
	
	VectorLayers::iterator lit = m_Layers.end() - 1;
	
	std::vector&lt; double &gt; dErr_wrt_dXlast( (*lit)-&gt;m_Neurons.size() );
	std::vector&lt; std::vector&lt; double &gt; &gt; differentials;
	
	int iSize = m_Layers.size();
	
	differentials.resize( iSize );
	
	int ii;
	
	// start the process by calculating dErr_wrt_dXn for the last layer.
	// for the standard MSE Err function (i.e., 0.5*sumof( (actual-target)^2 ), this differential is simply
	// the difference between the target and the actual
	
	for ( ii=0; ii&lt;(*lit)-&gt;m_Neurons.size(); ++ii )
	{
		dErr_wrt_dXlast[ ii ] = actualOutput[ ii ] - desiredOutput[ ii ];
	}
	
	
	// store Xlast and reserve memory for the remaining vectors stored in differentials
	
	differentials[ iSize-1 ] = dErr_wrt_dXlast;  // last one
	
	for ( ii=0; ii&lt;iSize-1; ++ii )
	{
		differentials[ ii ].resize( m_Layers[ii]-&gt;m_Neurons.size(), 0.0 );
	}
	
	// now iterate through all layers including the last but excluding the first, and ask each of
	// them to backpropagate error and adjust their weights, and to return the differential
	// dErr_wrt_dXnm1 for use as the input value of dErr_wrt_dXn for the next iterated layer
	
	ii = iSize - 1;
	for ( lit; lit&gt;m_Layers.begin(); lit--)
	{
		(*lit)-&gt;Backpropagate( differentials[ ii ], differentials[ ii - 1 ], m_etaLearningRate );
		--ii;
	}
	
	
	differentials.clear();
	
}
</PRE>

<P>Inside the <CODE>NNLayer::Backpropagate()</CODE> function, the layer implements equations (3) through (5) in order to determine the derivative for use by the next preceding layer.  It then implements equation (6) in order to update the weights in its layer.  In the following code, the derivative of the activation function <I>G(x)</I> is <CODE>#define</CODE>d as <CODE>DSIGMOID:</CODE></P>

<PRE>// simplified code

void NNLayer::Backpropagate( std::vector&lt; double &gt;& dErr_wrt_dXn /* in */, 
							std::vector&lt; double &gt;& dErr_wrt_dXnm1 /* out */, 
							double etaLearningRate )
{
	double output;

	// calculate equation (3): dErr_wrt_dYn = F'(Yn) * dErr_wrt_Xn
	
	for ( ii=0; ii&lt;m_Neurons.size(); ++ii )
	{
		output = m_Neurons[ ii ]-&gt;output;
	
		dErr_wrt_dYn[ ii ] = DSIGMOID( output ) * dErr_wrt_dXn[ ii ];
	}
	
	// calculate equation (4): dErr_wrt_Wn = Xnm1 * dErr_wrt_Yn
	// For each neuron in this layer, go through the list of connections from the prior layer, and
	// update the differential for the corresponding weight
	
	ii = 0;
	for ( nit=m_Neurons.begin(); nit&lt;m_Neurons.end(); nit++ )
	{
		NNNeuron& n = *(*nit);  // for simplifying the terminology
		
		for ( cit=n.m_Connections.begin(); cit&lt;n.m_Connections.end(); cit++ )
		{
			kk = (*cit).NeuronIndex;
			if ( kk == ULONG_MAX )
			{
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -