📄 theory.html~
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR"
content="Mozilla/4.04 [en] (X11; I; Linux 2.0.30 i686) [Netscape]">
<title>Perceptron</title>
</head>
<body alink="#ff0000" bgcolor="#ffffff" link="#0000ee" text="#000000"
vlink="#551a8b">
<h2>
Supervised learning in a single-layer neural network</h2>
Let's consider a single-layer neural network with <i>b</i> inputs and <i>c</i>
outputs:
<center><img src="theory_dateien/layer.gif" nosave="" height="141"
width="361"></center>
<center> </center>
<ul>
<li>
<i>W</i><sub>ij</sub> = weight from input i to unit j in output
layer;
W<i><sub>j </sub></i>is the vector of all the weights of the j-th
neuron
in the output layer.</li>
<li>
<i>I</i><sup>p</sup> = input vector (pattern p) = (<i>I</i><sub>1</sub><sup>p</sup>,
<i>I</i><sub>2</sub><sup>p</sup>, ..., <i>I</i><sub>b</sub><sup>p</sup>).</li>
<li>
<i>T</i><sup>p</sup> = target output vector (pattern p) = (<i>T</i><sub>1</sub><sup>p</sup>,
<i>T</i><sub>2</sub><sup>p</sup>, ..., <i>T</i><sub>c</sub><sup>p</sup>).</li>
<li>
<i>A</i><sup>p</sup> = Actual output vector (pattern p) = (<i>A</i><sub>1</sub><sup>p</sup>,
<i>A</i><sub>2</sub><sup>p</sup>, ..., <i>A</i><sub>c</sub><sup>p</sup>).</li>
<li>
<i>g()</i> = sigmoid activation function: <i>g(a )</i> = [1 + exp
(-<i>a</i>)]<sup>-1</sup></li>
</ul><hr width="100%"><h3> <a name="Supervised_learning"></a>Supervised learning</h3> We have seen that different weights of a neural network produce
different functions of the input. To train a network, we can present some sample inputs and compare the actual output to the desired results. The
difference is called the <b>error</b>.
<center><img src="theory_dateien/learning.gif"
alt="[an error term is computed and fed back]" nosave="" height="164"
width="251"></center>
The different learning rules tell us which way to adjust the weights to
reduce this error. We say that training has converged when this
error
reaches some small, acceptable level.
<p>Often the learning rule takes the following form:
<br>
<i>W<sub>ij </sub></i> <i>(t+1)</i> = <i>W<sub>ij </sub></i>
<i>(t) + eta . err (p)</i>
<br>
where 0<i> <= eta < </i>1 is a parameter that controls the
learning
rate, and <i>err(p)</i> is the error when input pattern <i>p</i> is
presented.
</p>
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a>
</p>
<hr width="100%"><h3><a name="Adaline"></a>Adaline learning</h3>
ADALINE is an acronym for ADAptive LINear Element (or ADAptive LInear
NEuron).
It was developed by Bernard Widrow and Marcian Hoff (1960).
<p>The adaline learning rule (also known as the least-mean-squares
rule,
the delta rule, and the Widrow-Hoff rule) is a training rule that
minimises
the output error using (approximate) gradient descent. After each
training
pattern <i>I<sup>p</sup></i> is presented, the correction to
apply
to the weights is proportional to the error. The correction is
calculated
<i>before</i> the thresholding step, using <i>err<sub>ij</sub></i> <i>(p)</i>=<i>T</i><sup>p</sup>-<i>W<sub>ij</sub></i>
<i>I<sup>p</sup></i>:
</p>
<center><img src="theory_dateien/adaline.gif"
alt="error=(inner product - target value)" nosave="" height="113"
width="198"></center>
<p>Thus, the weights are adjusted by
</p>
<p> <i>W<sub>ij</sub> (t+1) = W<sub>ij</sub>
(t) + eta (T<sup>p</sup>-W<sub>ij</sub></i> <i>I<sup>p</sup>)</i>
(<i>I<sup>p</sup>)</i>
<br>
This corresponds to gradient descent on the quadratic error surface,
<i>E<sub>j</sub></i>=Sum<i><sub>p</sub></i> [<i>T<sup>p</sup></i> - <i>W<sub>j</sub>
<sup>.</sup> I<sup>p</sup></i>] <sup>2</sup>
</p>
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a>
</p>
<hr width="100%"><h3><a name="Perceptron"></a>Perceptron learning</h3>
In perceptron learning, the weights are adjusted <b>only when a
pattern
is misclassified</b>. The correction to the weights
after
applying the training pattern <i>p</i> is
<br>
<i>W<sub>ij</sub> (t+1) = W<sub>ij</sub>
(t) + eta (T<sup>p </sup>- A<sup>p</sup>)</i> (<i>I<sup>p</sup>)</i>
<br>
This corresponds to gradient descent on the error surface E (<i>W<sub>ij</sub>
</i>)= Sum<sub>misclassified</sub> [<i>W<sub>ij</sub> (A<sup>p</sup>)(I<sup>p</sup></i>)].
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a></p>
<p></p>
<hr width="100%"><hr width="100%"><h3><a name="Pocket"></a>Pocket algorithm</h3>
The perceptron learning algorithm does not terminate if the learning
set
is not linearly separable. In many real-world cases,
however,
we want to find the "best" linear separation even when the learning
sets
are not ideal. The pocket algorithm is a modification of the perceptron
rule proposed by S. I. Gallant (1990). It stores the best weight vector
so far in a "pocket" while continuing to learn. The weights are
actually
modified only if a better weight vector is found.
<br>
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a>
</p><hr width="100%"><h3><a name="Backpropagation"></a>Backpropagation</h3>
The backpropagation algorithm was developed for training multilayer
perceptron
networks. In this applet, we will study how it works for a single-layer
network. It was popularized by Rumelhart, Hinton and Williams
(1986),
although similar ideas had been developed previously by others (Werbos,
1974; Parker, 1985). The idea is to train a network by
propagating
the output errors backward through the layers. The errors serve to
evaluate
the derivatives of the error function with respect to the weights,
which
can then be adjusted.
<p>The backpropagation algorithm for a single-layer network using the
sum-of-squares
error function consists of two phases:
</p>
<ol>
<li>
<b>Feedforward</b> - apply an input; evaluate the activations <i>a<sub>j</sub>
</i>and store the error <i>delta<sub>j </sub></i>at each node <i>j</i></li>
<br>
<i>a<sub>j</sub> </i>= <i>Sum <sub>i</sub>(W<sub>ij</sub>
(t) I<sup>p</sup><sub>i</sub>)</i>
<br>
<i> A<sup>p</sup><sub>j </sub> = g (a<sub>j</sub>
</i>)
<br>
<i>delta<sub>j </sub> = A<sup>p</sup><sub>j </sub>
-I<sup>p</sup><sub>j</sub></i>
<br>
<li><b>Backpropagation</b> - compute the adjustments and update the
weights.
Since there is just one layer, the output layer, we compute</li>
<br>
<i>W<sub>ij</sub> (t+1) = W<sub>ij</sub>
(t) - eta delta<sub>i </sub> I<sup>p</sup><sub>j</sub></i>
<br>
(This is called "on-line" learning, because the weights are adjusted
each time a new input is presented. In "batch" learning, the
weights
are adjusted after summing over all the patterns in the training set.)
</ol>
<a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a><hr width="100%"><h3><a name="optimal"></a>Optimal Perceptron learning</h3>
In the case of linear separable problems a perceptron can find
different solutions:<br>
<img style="width: 403px; height: 315px;" alt=""
src="theory_dateien/solutions.png"><br>
<br>
It would now be interesting to find the hyperplane that assures
the maximal safety tolerance:<br>
<img style="width: 403px; height: 315px;" alt=""
src="theory_dateien/optimal.png"><br>
<br>
The margins of that hyperplane touches a limited number of
special points which define the hyperplane and which are called the <span
style="font-style: italic;">Support Vectors</span>.<br>
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML"><math:semantics><math:mrow><math:mrow><math:mover
math:accent="true"></math:mover></math:mrow></math:mrow></math:semantics><br>
</math:math>
<p style="margin-bottom: 0cm;" align="center"><img
src="temp_html_16539589.gif" name="Objekt2" align="middle" height="36"
hspace="8" width="383"></p>
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML"></math:math>
<p style="margin-bottom: 0cm;">The perceptron has to determine the
samples for which <img src="temp_html_m32b2ad78.gif" name="Objekt1"
align="middle" height="20" hspace="8" width="65">. The remaining
sam<span lang="de-DE">ples with <img src="temp_html_m411eef90.gif"
name="Objekt3" align="middle" height="20" hspace="8" width="39">are
the Support Vectors <i>sv</i><span style="font-style: normal;">. </span></span></p>
<p style="margin-bottom: 0cm;"><span lang="de-DE"><span
style="font-style: normal;"><img style="width: 403px; height: 315px;"
alt="" src="theory_dateien/optimal0.png"></span></span></p>
<p style="margin-bottom: 0cm;"><img src="temp_html_m4e014a7b.gif"
name="Objekt4" align="middle" height="20" hspace="8" width="40">Represents
the distance between a sample and<img src="temp_html_7bb41a07.gif"
name="Objekt5" align="middle" height="20" hspace="8" width="17">.
<i>z-</i><sub> </sub>and <i>z+</i><sub> </sub>represent the
projection of the critical points on the axis defined by<img
src="temp_html_13dea929.gif" name="Objekt6" align="middle" height="18"
hspace="8" width="19">.</p>
<p style="margin-bottom: 0cm;">Algorithm of the Optimal Perceptron:</p>
<br>
<img style="width: 287px; height: 522px;" alt=""
src="theory_dateien/optimal_algorithm.png"><br>
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a>
</p>
<p></p>
<hr width="100%"><h3><a name="reading"></a><i>Further reading</i></h3>
<ul>
<li>
C. M. Bishop.<i> Neural Networks for Pattern Recognition.</i> Clarendon
Press, Oxford, 1995. pp 95-103 (adaline and perceptron); pp 140-148
(backprop)</li>
<li>
J. Hertz, A. Krogh, and R.G. Palmer. <i>Introduction to the
Theory
of Neural Computation</i>. Addison-Wesley, Redwood City CA, 1991. pp
89-111</li>
<li>
R. Rojas. <i>Neural Networks: A Systematic Introduction</i>.
Springer-Verlag,
Berlin 1996. pp 84-91 (perceptron learning); pp 159-162 (backprop)</li>
</ul>
<hr width="100%">
<p><a
href="http://diwww.epfl.ch/mantra/tutorial/english/apb/html/index.html">[Back
to the Adaline/Perceptron/Backprop applet page]</a>
</p>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -