📄 readme
字号:
In the student version every time you made a network all the training
and testing patterns were thrown out because they were attached to the
network. (Not true in the pro version.)
To make a recurrent network with 25 regular input units, twenty
hidden layer units (that are copied to the input layer) and 25 output
units use:
m 25+20 20 25
This means that the first layer will have 45 inputs and the first 25 are
regular input values but the next 20 come from the first hidden layer.
These 20 units are called the short term memory units. Then there are
20 units in the hidden layer. This value should match the number of
units given for the short term memory units. At the moment there is no
check to see that it does. Finally there are 25 units in the output
layer. This recurrent network notation also requires a change in the
way training and testing patterns are written down for input into the
program. For more on this see the next section.
14. Recurrent Networks
----------------------
Recurrent back-propagation networks take values from hidden layer
and/or output layer units and copy them down to the input layer for use
with the next input. These values that are copied down are a kind of
coded record of what the recent inputs to the network have been and this
gives a network a simple kind of short-term memory, possibly a little
like human short-term memory. For instance, suppose you want a network
to memorize the two short sequences, "acb" and "bcd". In the middle of
both of these sequences is the letter, `c'. In the first case you want
a network to take in `a' and output `c', then take in `c' and output
`b'. In the second case you want a network to take in `b' and output
`c', then take in `c' and output `d'. To do this a network needs a
simple memory of what came before the `c'.
Let the network be an 7-3-4 network where input units 1-4 and output
units 1-4 stand for the letters a-d and the `h' stands for the value of
a hidden layer unit. So the codes are:
a: 1000
b: 0100
c: 0010
d: 0001
In action, the networks need to do the following. When `a' is input,
`c' must be output:
0010 <- output layer
hhh <- hidden layer
1000 stm <- input layer
In this context, when `c' is input, `b' should be output:
0100
hhh
0010 stm
For the other string, when `b' is input, `c' is output:
0010
hhh
0100 stm
and when `c' in input, `d' is output:
0001
hhh
0010 stm
This is easy to do if the network keeps a short-term memory of what its
most recent inputs have been. Suppose we input a and the output is c:
0010 <- output layer
hhh <- hidden layer
1000 stm <- input layer
Placing `a' on the input layer generates some kind of code (like a hash
code) on the 3 units in the hidden layer. On the other hand, placing
`b' on the input units will generate a different code on the hidden
units. All we need to do is save these hidden unit codes and input them
with a `c'. In one case the network will output `b' and in the other
case it will output `d'. In one particular run inputting `a' produced:
0 0 1 0
0.993 0.973 0.020
1 0 0 0 0 0 0
When `c' is input the hidden layer units are copied down to input to
give:
0 1 0 0
0.006 0.999 0.461
0 0 1 0 0.993 0.973 0.020
For the other pattern, inputting `b' gave:
0 0 1 0
0.986 0.870 0.020
0 1 0 0 0 0 0
Then the input of `c' gave:
0 0 0 1
0.005 0.999 0.264
0 0 1 0 0.986 0.870 0.020
This particular problem can be set up as follows:
m 7 3 4
s 7
ci
t 0.2
rt {
1000 H 0010
0010 H 0100
0100 H 0010
0010 H 0001
}
where the first four values on each line are the normal input. The H
codes for however many hidden layer units there are. The last four
values are the desired outputs.
By the way, this simple problem does not converge particularly fast
and you may need to do a number of runs before you hit on initial values
that will work quickly. It will work more reliably with more hidden
units.
Rather than using recurrent networks to memorize sequences of letters
they are probably more useful at predicting the value of some variable
at time t+1 given its value at t, t-1, t-2, ... . A very simple of this
is to give the value of sin(t+1) given a recent history of inputs to the
net. Given a value of sin(t) the curve may be going up or down and the
net needs to keep track of this in order to correctly predict the next
value. The following setup will do this:
m 1+5 5 1
f ir
a aol dd uq
qp e 0.02
ci
rt {
0.00000 H 0.15636
0.15636 H 0.30887
0.30887 H 0.45378
. . .
-0.15950 H -0.00319
-0.00319 H 0.15321
}
and in fact it converges rather rapidly. The complete set of data can
be found in the example file rsin.bp.
Another recurrent network included in the examples is one designed to
memorize two lines of poetry. The two lines were:
I the heir of all the ages in the foremost files of time
For I doubt not through all the ages ones increasing purpose runs
but for the sake of making the problem simpler each word was shortened
to 5 characters giving:
i the heir of all the ages in the frmst files of
time for i doubt not thru the ages one incre purpo runs
The letters were coded by taking the last 5 bits of their ASCII codes.
See the file poetry.bp.
Once upon a time I was wondering what would happen if the poetry
network learned its verses and then the program was given several words
in the middle of the verses. Would it pick up the sequence and be able
to complete it given 1 or 2 or 3 or n words? So given for example, the
short sequence "for i doubt" will it be able to "get on track" and
finish the verse? To test for this there are an extra pair of commands,
tr and trp. Given a test set (which should be the training set) they
start at every possible place in the test set, input n words and then
check to see if the net produces the right answer. For this example I
tried n = 3, 4, 5, 6 and 7 with the following results:
[ACDFGMNPQTW?!acdefhlmopqrstw]? tr 3
TOL: 81.82 % ERROR: 0.022967
[ACDFGMNPQTW?!acdefhlmopqrstw]? tr 4
TOL: 90.48 % ERROR: 0.005672
[ACDFGMNPQTW?!acdefhlmopqrstw]? tr 5
TOL: 90.00 % ERROR: 0.005974
[ACDFGMNPQTW?!acdefhlmopqrstw]? tr 6
TOL: 100.00 % ERROR: 0.004256
[ACDFGMNPQTW?!acdefhlmopqrstw]? tr 7
TOL: 100.00 % ERROR: 0.004513
So after getting just 3 words the program was 81.82% right in predicting
the next word to within the desired tolerance. Given 6 or 7 words it
was getting them all right. The trp command does the same thing except
it also prints the final output value for each of the tests made.
15. Miscellaneous Commands
--------------------------
Below is a list of some miscellaneous commands, a short example of
each and a short description of the command.
! Example: ! ls
Anything after `!' will be passed on to the OS as a command to execute.
An ! followed immediately by a carriage-return will repeat the last
command sent to the OS.
l Example: l 2
Entering "l 2" will print the values of the units on layer 2, or
whatever layer is specified.
sb Example: sb -3
Entering "sb -3" will set the bias unit weight to -3. In the symmetric
versions the weight will be frozen at this value while in the regular
versions it will only be the initial value and should be set after the
other weights are initialized.
16. Limitations
---------------
Weights in the ibp and sibp programs are 16-bit integer weights where
the real value of the weight has been multiplied by 1024. The integer
versions cannot handle weights less than -32 or greater than 31.999.
The weight changes are all checked for overflow but there are other
places in these programs where calculations can possibly overflow as
well and none of these places are checked. Input values for the integer
versions can run from -31.992 to 31.999. Due to the method used to
implement recurrent connections, input values in the real version are
limited to -31992.0 and above.
17. The Pro Version Additions
-----------------------------
This section lists the additions to the pro version at this time.
For a more detailed and more up-to-date description see the online pro
version manual at:
http://www.mcs.com/~drt/probp.html
The additional commands are:
ac <units> add a weight connection between the units
ah <layer> add a hidden unit to <layer>
b benchmarking
i <filename> read input from the file
k <numbers> give the network a kick
n <options> dynamic network building parameters
ofu <unit> turn off a unit
onu <unit> turn on a unit
ofw <weight> turn off a weight
onw <weight> turn on a weight
pw <number> prune weights
rp set rprop parameters
s <seeds> set multiple seed values
ss <options> set SuperSAB parameters
swem <option> save weights every minimum flag
sw+ increment the weight file suffix
to overall tolerance to be met (not per pattern, as with t)
u the same as p but for recurrent classification problems
v the same as t but for recurrent classification problems
Benchmarking allows you to make multiple runs of a problem and find the
mean, standard deviation and average CPU time to converge. You can also
use it to average the outputs of multiple runs and thereby possibly get
a better overall answer.
You can make networks in a cascade type of architecture. You can make
a new network with a different number of hidden layer units without
losing the training and testing patterns. You can add hidden layer
units as the network is trained. You can turn on and off individual
units.
The additonal options:
a bh <value> set the hidden layer bias unit value
a bo <value> set the output layer bias unit value
a Dh <value> set the hidden layer sharpness/gain
a Do <value> set the output layer sharpness/gain
a wd <value> weight decay
f t <reals> set target values for classificat
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -