📄 chapter4.ps
字号:
5 .9 FMFILL6 .97 FMFILL7 1 FMFILL8 <0f1e3c78f0e1c387> FMFILL9 <0f87c3e1f0783c1e> FMFILL10 <cccccccccccccccc> FMFILL11 <ffff0000ffff0000> FMFILL12 <8142241818244281> FMFILL13 <03060c183060c081> FMFILL14 <8040201008040201> FMFILL16 1 FMFILL17 .9 FMFILL18 .7 FMFILL19 .5 FMFILL20 .3 FMFILL21 .1 FMFILL22 0.03 FMFILL23 0 FMFILL24 <f0e1c3870f1e3c78> FMFILL25 <f0783c1e0f87c3e1> FMFILL26 <3333333333333333> FMFILL27 <0000ffff0000ffff> FMFILL28 <7ebddbe7e7dbbd7e> FMFILL29 <fcf9f3e7cf9f3f7e> FMFILL30 <7fbfdfeff7fbfdfe> FMFILL%%EndSetup%%Page: "65" 1%%BeginPaperSize: Letter%%EndPaperSize612 792 0 FMBEGINPAGE108 72 540 81 R7 X0 KV0 12 Q0 X(65) 312.01 73 T108 90 540 648 R7 XV0 X(CHAPTER IV) 288.85 640 T(THE EMERGENCE OF T) 180.88 604 T(ASK-SPECIFIC STRUCTURES) 308.18 604 T1 F(4.0 The Emergence of T) 108 562 T(ask-Speci\336c Structur) 232.14 562 T(es) 339.53 562 T0 F0.6 (The induction of appropriate connectionist network architectures is a topic of current) 126 536 P0.74 (research. Most of the work by connectionists favor assuming an architecture for the net-) 108 518 P0.63 (work and simply learning the weights. Analytical solutions for specifying an appropriate) 108 500 P(network architecture have thus far eluded connectionist researchers.) 108 482 T0.86 (The heuristics used in the induction of network architectures are usually overly sim-) 126 452 P0.23 (plistic in their determination of when and how to modify the architecture. This is because) 108 434 P0.69 (the implications of manipulating the network\325) 108 416 P0.69 (s architecture are not well understood. The) 331.01 416 P1 (properties of a network are sensitive to its architecture especially in recurrent networks.) 108 398 P0.64 (This level of sensitivity is not adequately expressed in a simple heuristic. As a result the) 108 380 P(associated heuristics are of poor quality and only narrowly applicable.) 108 362 T2.59 (This chapter demonstrates that task-speci\336c architectures for neural networks can) 126 332 P-0.12 (emer) 108 314 P-0.12 (ge directly from interaction between an evolutionary weak method and the task being) 131.76 314 P-0.08 (solved. An evolutionary program \050Fogel, Owens and W) 108 296 P-0.08 (alsh 1966; Fogel 1992\051 is the basis) 373.93 296 P-0.27 (of the described algorithm. Operators speci\336c to the manipulation of network architectures) 108 278 P2.94 (are employed. The chapter ends with several experiments that demonstrate that this) 108 260 P0.2 (method is comparable in speed to a connectionist method which only induces the weights) 108 242 P(of a network.) 108 224 T1 F(4.1 Connectionist Ar) 108 188 T(chitectur) 216.38 188 T(es) 262.13 188 T0 F0.3 (Models and simulations of networks of neurons have been investigated since the con-) 126 164 P2.05 (ception of the computer \050McCulloch and Pitts 1943; Hebb 1949\051. Connectionism, the) 108 146 P3.06 (investigation of arti\336cial networks of neurons, is a descendant of perceptron theory) 108 128 P0.54 (\050Rosenblatt 1958; 1962\051, an early mathematical abstraction from networks of neurons. A) 108 110 PFMENDPAGE%%EndPage: "65" 2%%Page: "66" 2612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(66) 528.01 712 T108 90 540 702 R7 XV0 X0.62 (perceptron \050Figure 9\051 consists of an input retina, a set of binary masks, weights on those) 108 430.34 P0.67 (masks, a summation device and a threshold device. A perceptron recognizes a feature of) 108 412.34 P0.31 (the input retina in the following manner) 108 394.34 P0.31 (. First, each mask takes as input a number of reti-) 300.41 394.34 P0.03 (nal positions and returns a one only if all of the inputs are a one. Next, each mask result is) 108 376.34 P0.89 (multiplied by its associated weight and summed. The feature represented by the percep-) 108 358.34 P1.31 (tron is in the retina if this sum is greater than the given threshold) 108 340.34 P2 F1.31 (q) 439.51 340.34 P0 F1.31 (. Rosenblatt \0501958;) 445.76 340.34 P0.64 (1962\051, following Hebb \0501949\051, devised a method of training the weights in a perceptron,) 108 322.34 P-0.24 (called the) 108 304.34 P3 F-0.24 (per) 156.81 304.34 P-0.24 (ceptr) 172.35 304.34 P-0.24 (on learning rule) 196.56 304.34 P0 F-0.24 (, to perform a particular task. Minsky and Papert \0501969\051) 274.03 304.34 P0.62 (showed that perceptrons were limited in the concepts they could identify) 108 286.34 P0.62 (. For instance, a) 461.86 286.34 P(perceptron can not perform the boolean operation exclusive-or) 108 268.34 T(.) 407.43 268.34 T1.56 (Modern connectionism uses a variety of network architectures \050e.g. Hop\336eld 1982;) 126 238.34 P1.14 (Ackley) 108 220.34 P1.14 (, Hinton and Sejnowski 1985; Rumelhart and McClelland 1986; McClelland and) 141.86 220.34 P0.22 (Rumelhart 1986a; Pollack 1990; Pollack 1991; and many others\051. All assume a collection) 108 202.34 P0.36 (of simple computational units with a set of weighted interconnections. As in perceptrons,) 108 184.34 P0.77 (the weights of the interconnections are modi\336ed through training to represent a concept.) 108 166.34 P0.69 (Modern training techniques for the weights on the interconnections takes as many forms) 108 148.34 P0.82 (as there are architectures: a form of simulated annealing is used in Boltzmann machines) 108 130.34 P-0.22 (\050Ackley) 108 112.34 P-0.22 (, Hinton and Sejnowski 1985; Hinton and Sejnowski 1986\051; a generalization of the) 145.85 112.34 P0.94 (perceptron learning rule, called the) 108 94.34 P3 F0.94 (delta rule) 283.25 94.34 P0 F0.94 (, is used in standard PDP nets \050Rumelhart,) 330.5 94.34 P108.07 438.34 539.93 702 C115.38 472.18 532.2 522 R7 X0 KV4 12 Q0 X1.08 (Figure 9:) 115.38 514 P3 F1.08 (Structur) 167.51 514 P1.08 (e of a per) 206.38 514 P1.08 (ceptr) 254.8 514 P1.08 (on. The binary r) 279.01 514 P1.08 (etina feeds into a set of masks that) 359.75 514 P2.19 (essentially multiply their inputs. These binary variables ar) 115.38 502 P2.19 (e then multiplied by the) 410.82 502 P0.93 (weights and summed. If the sum is lar) 115.38 490 P0.93 (ger than the stor) 302.7 490 P0.93 (ed thr) 384.68 490 P0.93 (eshold, then the concept) 413.48 490 P(r) 115.38 478 T(epr) 119.6 478 T(esented by the pr) 135.15 478 T(eceptr) 216.31 478 T(on appears in the r) 245.84 478 T(etina.) 336.67 478 T134.79 670.5 135.91 558.56 202.29 598.5 202.29 699.19 4 Y7 XV1 H0 Z0 XN192.73 675 280.48 627.75 2 L0.5 HN280.48 657.56 299.6 678.94 R1 HN2 14 Q(j) 285.54 665.01 T2 10 Q(1) 293.98 662.01 T0 14 Q(Retina) 121.29 549.37 T280.48 617.06 299.6 638.44 RN2 F(j) 285.54 624.51 T2 10 Q(2) 293.98 621.51 T281.04 578.81 300.16 600.19 RN2 14 Q(j) 286.1 586.26 T2 10 Q(3) 294.54 583.26 T302.98 628.31 337.29 628.31 2 LN300.73 588.94 340.66 588.94 2 LN300.73 667.69 340.66 667.69 2 LN90 450 9.28 9 349.38 668.79 A90 450 9.28 9 349.38 628.5 A90 450 9.28 9 349.95 590.81 A2 12 Q(1) 348.17 587.57 T(2) 347.61 664.63 T(-3) 341.23 625.26 T401.41 604.69 448.1 655.88 RN2 36 Q(S) 415.54 618.58 T359.23 589.5 401.98 621 2 LN90 450 15.75 17.16 483.54 633.09 A2 24 Q(q) 479.04 624.59 T165.16 632.81 280.48 587.25 2 L0.5 HN154.97 600.19 278.72 586.69 2 LN156.16 655.31 280.48 627.19 2 LN192.73 612 281.04 668.25 2 LN360.91 628.31 401.98 628.31 2 L1 HN0 14 Q(Masks) 272.63 559.36 T(W) 330.54 554.86 T(eights) 342.63 554.86 T(Sum) 408.73 585.56 T(Threshold) 462.16 594 T514.91 634.38 526.28 630.56 514.61 627.77 514.76 631.08 4 YV500.98 631.69 514.77 631.08 2 LN191.6 675.56 278.23 669.38 2 L0.5 HN449.79 629.44 468.91 629.44 2 L1 HN358.1 667.12 400.85 632.81 2 LN0 0 612 792 CFMENDPAGE%%EndPage: "66" 3%%Page: "67" 3612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(67) 528.01 712 T108 90 540 702 R7 XV0 X-0.23 (Hinton and W) 108 430.34 P-0.23 (illiams 1986\051; Pollack \0501990; 1991\051 uses a truncated version of the delta rule) 175.01 430.34 P0.81 (to train) 108 412.34 P0.81 (his) 146.93 412.34 P3 F0.81 (r) 164.73 412.34 P0.81 (ecursive auto-associative memories) 168.95 412.34 P0 F0.81 ( \050Pollack 1990\051 and) 341.77 412.34 P3 F0.81 (sequential cascaded) 442.26 412.34 P(networks) 108 394.34 T0 F( \050Pollack 1991\051.) 151.31 394.34 T2.39 (The typical connectionist architecture is shown in Figure 10. Input nodes receive) 126 364.34 P-0.03 (information from the environment in the form of real numbers. The input nodes propagate) 108 346.34 P0.4 (their values along their weighted connections, also called links, to other nodes in the net-) 108 328.34 P-0.2 (work. As in a perceptron, the output of a node is multiplied by the weight of the associated) 108 310.34 P-0.05 (connection. All other nodes receive these signals as inputs and sum them together) 108 292.34 P-0.05 (. Hidden) 498.75 292.34 P0.16 (and output nodes then apply an) 108 274.34 P3 F0.16 (activation function) 261.54 274.34 P0 F0.16 (, the standard activation function being) 351.99 274.34 P(the sigmoid function given by:) 108 256.34 T0 10 Q(\050EQ 9\051) 512.53 217.97 T0 12 Q0.25 (where) 108 156.78 P3 F0.25 (x) 140.55 156.78 P3 10 Q0.21 (i) 145.87 153.78 P0 12 Q0.25 (is the) 151.35 156.78 P3 F0.25 (i) 180.5 156.78 P0 F0.25 (th input to the node and) 183.84 156.78 P2 F0.25 (q) 301.92 156.78 P0 F0.25 ( is the threshold associated with the node. Thus,) 308.17 156.78 P0.98 (each node of a modern connectionist architecture has a structure comparable to a single) 108 138.78 P0.23 (perceptron \050see Figure 9\051. When the directed connections in the architecture form an acy-) 108 120.78 P-0.3 (clic graph, the architecture is called) 108 102.78 P3 F-0.3 ( feed-forwar) 277.04 102.78 P-0.3 (d) 335.93 102.78 P0 F-0.3 (. When a cycle is present, the architecture) 341.92 102.78 P110.99 438.34 537.01 702 C1 H0 Z0 X0 K90 450 11.53 10.4 222.35 663.9 A90 450 11.53 10.4 222.35 589.04 A90 450 11.53 10.4 320.23 626.47 A90 450 11.53 10.4 433.29 626.47 A3 12 Q(x) 218.7 662.76 T3 10 Q(1) 224.02 659.76 T3 12 Q(x) 218.5 587.36 T3 10 Q(2) 223.82 584.36 T0 12 Q(6.3) 427.95 623.37 T(2.2) 313.2 622.33 T423.38 648.04 429.63 637.39 418.68 643.11 421.03 645.58 4 YV17 90 196.31 28.07 233.32 637.39 A(-4.2) 321.07 667.04 T(-4.2) 322.2 579.7 T(-9.4) 350.88 630.65 T(-6.4) 249.63 607.77 T(-6.4) 249.63 640.42 T411.35 630.3 422.88 626.99 411.35 623.68 411.35 626.99 4 YV332.32 626.99 411.35 626.99 2 LN299.57 619.46 311.5 620.75 302.07 613.34 300.82 616.4 4 YV234.45 589.56 300.83 616.4 2 LN301.55 639.22 310.88 631.67 298.96 633.13 300.26 636.17 4 YV234.38 663.9 300.26 636.17 2 LN421.52 610.13 432.44 615.55 426.05 605.17 423.79 607.65 4 YV270 343 198 27.03 234.45 615.55 A116.98 475.12 534.34 551.09 R7 XV4 F0 X1.42 (Figure 10:) 116.98 543.09 P3 F1.42 (Standar) 175.79 543.09 P1.42 (d connectionist ar) 213.33 543.09 P1.42 (chitectur) 302.34 543.09 P1.42 (e. x) 344.54 543.09 P3 10 Q1.18 (1) 362.6 540.09 P3 12 Q1.42 ( and x) 367.6 543.09 P3 10 Q1.18 (2) 399.75 540.09 P3 12 Q1.42 ( ar) 404.75 543.09 P1.42 (e inputs to the network) 419.39 543.09 P1.77 (and ar) 116.98 531.09 P1.77 (e set by the envir) 149.96 531.09 P1.77 (onment. Their activation is multiplied by the weight on the) 237.87 531.09 P-0.16 (associated connection. A hidden or output node sums all inputs, subtracts the thr) 116.98 519.09 P-0.16 (eshold) 503.02 519.09 P-0.03 (which is shown as a value inside the node, and applies a sigmoid activation function to) 116.98 507.09 P0.52 (determine output. The state of the output node is the The above network computes the) 116.98 495.09 P(exclusive-or of the inputs.) 116.98 483.09 T1 10 Q(Output Node) 403.2 686.07 T(Input Nodes) 190.01 687.43 T(Hidden Node) 295.76 687.43 T0 0 612 792 C237.89 188.78 382.64 234.34 C0 12 Q0 X0 K(a) 238.89 217.97 T(c) 244.21 217.97 T(t) 249.54 217.97 T(i) 252.87 217.97 T(v) 256.2 217.97 T(a) 262.2 217.97 T(t) 267.53 217.97 T(i) 270.86 217.97 T(o) 274.2 217.97 T(n) 280.19 217.97 T(1) 340.71 225.16 T(1) 306.77 192.39 T3 F(e) 325.35 192.39 T3 9 Q(x) 355.63 206.88 T3 6 Q(i) 359.97 204.43 T(i) 347.39 195.99 T2 18 Q(\345) 341.81 203.03 T2 9 Q(q) 371.07 206.88 T(+) 363.89 206.88 T(\050) 337.92 206.88 T(\051) 376.15 206.88 T(-) 331.14 206.88 T2 12 Q(+) 315.77 192.39 T(=) 293.19 217.97 T306.77 220.56 380.39 220.56 2 L0.33 H0 ZN0 0 612 792 CFMENDPAGE%%EndPage: "67" 4%%Page: "68" 4612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(68) 528.01 712 T108 90 540 702 R7 XV0 X0.03 (is said to be) 108 694 P3 F0.03 (r) 168.1 694 P0.03 (ecurr) 172.32 694 P0.03 (ent) 197.86 694 P0 F0.03 (. The rise in popularity of connectionist structures as models of cog-) 212.51 694 P0.2 (nition and computation coincides with the discovery of a simple training methods for dis-) 108 676 P(covering appropriate weights for feed-forward networks with \336xed architectures.) 108 658 T1 F(4.2 The Complete Network Induction Pr) 108 622 T(oblem) 317.32 622 T0 F0.57 (In its complete form, network induction entails both) 126 598 P3 F0.57 (parametric) 383.4 598 P0 F0.57 ( and) 436.7 598 P3 F0.57 (structural) 461.15 598 P0 F0.57 (learn-) 512.03 598 P0.82 (ing \050Barto 1990\051, i.e., learning both weight values and an appropriate topology of nodes) 108 580 P-0.29 (and links. Current connectionist methods that solve this task fall into two broad categories:) 108 562 P1.29 (constructive and destructive.) 108 544 P3 F1.29 (Constructive) 252.43 544 P
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -