📄 chapter4.ps
字号:
0 F1.29 ( algorithms initially assume a simple network) 313.73 544 P1.89 (and add nodes and links \050Ash 1989; Frean 1990; Hanson 1990; Fahlman and Lebiere) 108 526 P1.88 (1990; Fahlman 1991; Chen et al. 1993; Azimi-Sadjadi, Sheedvash and T) 108 508 P1.88 (rujillo 1993\051,) 474.82 508 P1.17 (while) 108 490 P3 F1.17 (destructive) 138.82 490 P0 F1.17 ( methods start with a lar) 191.45 490 P1.17 (ge network and prune of) 312.69 490 P1.17 (f components \050Mozer) 434.4 490 P-0.2 (and Smolensky 1989; Le Cun, Denker and Solla 1990; Hassibi and Stork 1993; Omlin and) 108 472 P1.88 (Giles 1993\051. In all cases a simple heuristic, often based on statistics of the individual) 108 454 P(nodes, determines when to manipulate the architecture.) 108 436 T0.38 (Though these algorithms address the problem of topology acquisition, they do so in a) 126 406 P0.99 (highly constrained manner) 108 388 P0.99 (. First, because they monotonically modify network structure,) 237.24 388 P1.15 (constructive and destructive methods limit the traversal of the available architectures in) 108 370 P-0.08 (that once an architecture has been explored and determined to be insuf) 108 352 P-0.08 (\336cient, a new archi-) 444.98 352 P1.18 (tecture is adopted, and the old becomes topologically unreachable. Also, these methods) 108 334 P0.49 (often use only a single prede\336ned structural modi\336cation, such as \322add a fully connected) 108 316 P0.28 (hidden unit,\323 to generate successive topologies. This is a form of) 108 298 P3 F0.28 (structural hill climbing) 425.16 298 P0 F0.28 (,) 537 298 P1.23 (which is susceptible to becoming trapped at structural local minima which is typical of) 108 280 P0.1 (such \322greedy\323 methods. In addition, constructive and destructive algorithms make simpli-) 108 262 P0.69 (fying architectural assumptions to facilitate network induction. For example, Ash \0501989\051) 108 244 P1.33 (allows only feedforward networks; Fahlman \0501991\051 assumes a restricted form of recur-) 108 226 P-0.1 (rence, and Chen et al. \0501993\051 explores only fully connected topologies. This creates a situ-) 108 208 P0.35 (ation in which the task is forced into the architecture rather than the architecture being \336t) 108 190 P(to the task.) 108 172 T1.63 (These de\336ciencies of connectionist methods for the induction of architectures stem) 126 142 P-0.06 (from inadequate models of assigning credit to) 108 124 P-0.06 (the components of a network\325) 330.45 124 P-0.06 (s architecture.) 472.79 124 P0.13 (As a result, the heuristics used are overly-constrained to increase the likelihood of \336nding) 108 106 PFMENDPAGE%%EndPage: "68" 5%%Page: "69" 5612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(69) 528.01 712 T108 90 540 702 R7 XV0 X0.31 (any topology to solve the problem.) 108 694 P0.31 (Ideally) 280.1 694 P0.31 (, the constraints for determining an appropriate) 312.63 694 P0.22 (architecture should come from the task rather than be hardcoded into the bias of the algo-) 108 676 P(rithm.) 108 658 T1 F(4.3 Evolving Connectionist Networks) 108 622 T0 F0.63 (As with any arti\336cial intelligence program, selecting the correct technique is impera-) 126 598 P0.59 (tive to a successful solution. Given the three variations of the evolutionary weak method) 108 580 P1.44 (outlined in the last chapter) 108 562 P1.44 (, a determination needs to be made as to which one is most) 240.54 562 P0.05 (appropriate for solving the problem at hand. Because the interest here is the acquisition of) 108 544 P0.44 (both structure and function of the recurrent neural network, evolution strategies and their) 108 526 P1.33 (reliance on \336xed-length real-valued parameter strings seems an obvious mismatch. The) 108 508 P(choice then is between a genetic algorithm or an evolutionary program.) 108 490 T2.14 (V) 126 460 P2.14 (arious combinations of genetic algorithms and connectionist networks have been) 133.33 460 P-0.23 (investigated. Much of the research concentrates only on the acquisition of parameters for a) 108 442 P0.17 (\336xed network architecture \050e.g., W) 108 424 P0.17 (ieland 1990; Montana and Davis 1989; Whitley) 275.74 424 P0.17 (, Stark-) 504.52 424 P1.4 (weather and Davis 1990; Beer and Gallagher 1992; Collins and Jef) 108 406 P1.4 (ferson 1991\051. Other) 442.93 406 P0.94 (work allows a variable topology) 108 388 P0.94 (, but disassociates structure acquisition from acquisition) 265.54 388 P1.54 (of weight values by interweaving a GA search for network topology with a traditional) 108 370 P0.13 (parametric training algorithm \050e.g., backpropagation\051 over weights \050e.g., Miller) 108 352 P0.13 (, T) 488.95 352 P0.13 (odd and) 501.56 352 P0.23 (Hegde 1989; Belew) 108 334 P0.23 (, McInerney and Schraudolf 1992; Polani and Uthmann 1993\051 or sim-) 202.94 334 P0.02 (ply ignores the acquisition of parametric values altogether by assuming weights of 1 and -) 108 316 P2.17 (1 for each link \050Gruau 1993; Zhang and M\237hlenbein 1993\051. Some studies attempt to) 108 298 P-0.29 (coevolve both the topology and weight values within the GA framework, but as in the con-) 108 280 P0.1 (nectionist systems described above, the network architectures are severely restricted \050e.g.,) 108 262 P-0.24 (T) 108 244 P-0.24 (orreele 1991; Potter 1992; Karunanithi, Das and Whitley 1992; Romaniuk 1993\051. In spite) 114.49 244 P0.5 (of this collection of studies, current theory from both genetic algorithms and connection-) 108 226 P0.2 (ism suggests GAs are not appropriate for evolving networks. In the following section, the) 108 208 P(reasons for this mismatch are explored.) 108 190 T1 F(4.3.1 Evolving Networks with Genetic Algorithms) 108 154 T0 F0.39 (As stated above, genetic algorithms create new individuals by recombining the repre-) 126 128 P-0.16 (sentational components of two population members. Because of this commitment to struc-) 108 110 PFMENDPAGE%%EndPage: "69" 6%%Page: "70" 6612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(70) 528.01 712 T108 90 540 702 R7 XV0 X3.66 (tural recombination, genetic algorithms rely on two distinct representational spaces) 108 340.49 P0.47 (\050Figure 1) 108 322.49 P0.47 (1\051.) 152.33 322.49 P3 F0.47 (Recombination space) 168.78 322.49 P0 F0.47 (, typically a set of \336xed-length binary strings, is the set) 272.17 322.49 P0.51 (of structures to which the genetic operators are applied. It is here that the search actually) 108 304.49 P0.85 (occurs.) 108 286.49 P3 F0.85 (Evaluation) 146.15 286.49 P0.85 (space) 202.63 286.49 P0 F0.85 (, typically involving a problem-dependent representation, is the) 229.94 286.49 P0.45 (set of structures whose ability to perform a task is evaluated. In the case of using GAs to) 108 268.49 P0.28 (evolve networks, evaluation space would be a set of networks. An) 108 250.49 P3 F0.28 (interpr) 431.21 250.49 P0.28 (etation function) 464.09 250.49 P0 F-0.11 (maps between these two representational spaces. Note that because any set of \336nite-length) 108 232.49 P1.2 (bit strings cannot represent all possible networks, the evaluation space is restricted to a) 108 214.49 P2.12 (predetermined set of networks. By design, this dual representation scheme allows the) 108 196.49 P0.18 (genetic algorithm to manipulate the bit string representations using crossover without any) 108 178.49 P-0.23 (knowledge of their interpretation as networks. The implicit assumption is that the interpre-) 108 160.49 P1.46 (tation function is de\336ned so that the bit strings created by the dynamics of the genetic) 108 142.49 P(algorithm map to better and better networks.) 108 124.49 T101.69 348.49 546.31 702 C0.5 H2 Z0 X0 K90 450 76.16 108.05 198.36 587.31 A381.67 466.42 526.83 671.85 RN192.48 508.35 244.1 518.11 RVN197.84 631.78 243.36 641.55 R12 XV0 XN440.36 496.44 424.23 513.8 424.23 487.76 3 Y5 XV0 XN412.88 503.12 424.13 498.94 412.37 496.52 413.83 499.73 4 YV241.57 512.92 413.84 499.72 2 L1 HN440.7 523.56 451.91 519.28 440.13 516.97 441.61 520.16 4 YV166.19 543.71 441.63 520.15 2 LN8 X90 450 10.08 10.85 461.64 516.51 G0.5 H0 X90 450 10.08 10.85 461.64 516.51 A414.99 597.89 425.48 592.06 413.5 591.45 415.42 594.4 4 YV239.68 634.49 415.43 594.4 2 L1 HN428.98 595.36 423.33 589.57 423.33 578 434.62 566.42 428.98 583.78 440.27 572.21 451.56 578 440.27 578 451.56 583.78 440.27 589.57 451.56 595.36 440.27 601.14 434.62 589.57 13 Y2 XV0.5 H0 XN3 14 Q(Evaluation) 402.27 551.21 T(space) 414.82 540.01 T( Recombination) 145.36 655.99 T(space) 158.29 643.88 T1 F(Interpr) 278.62 579.71 T(etation) 322.67 579.71 T(function) 291.48 563.07 T3 F(Structur) 401.38 652.66 T(e) 446.73 652.66 T2 X90 450 57.57 75.68 448.64 549.15 A502.18 606.75 518.31 624.11 R12 XV0 XN501.28 491.38 501.28 495.72 521.44 491.38 501.28 478.36 4 Y6 XV0 XN(space) 408.55 640 T(Cr) 170.76 580.58 T(ossover) 185.01 580.58 T157.75 547.94 204.48 577.65 2 L3 XN214.6 622.48 221.12 632.55 220.93 620.55 217.76 621.52 4 YV224.96 517.59 205.12 578.3 217.77 621.51 3 LN136.81 538.47 188.42 548.24 R12 XV0 XN230.82 631.78 249.46 641.55 RVN117.07 389.81 531.95 442.23 R7 XV4 12 Q0 X5.3 (Figure 1) 117.07 434.23 P5.3 (1:) 164.7 434.23 P3 F5.3 (The dual r) 182.98 434.23 P5.3 (epr) 243.11 434.23 P5.3 (esentation scheme used in genetic algorithms. The) 258.65 434.23 P1.13 (interpr) 117.07 422.23 P1.13 (etation function maps between the elements in r) 149.95 422.23 P1.13 (ecombination space on which) 386.31 422.23 P0.14 (the sear) 117.07 410.23 P0.14 (ch is performed and the subset of structur) 155.08 410.23 P0.14 (es that can be evaluated as potential) 355.86 410.23 P(task solutions.) 117.07 398.23 T0 0 612 792 CFMENDPAGE%%EndPage: "70" 7%%Page: "71" 7612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(71) 528.01 712 T108 90 540 702 R7 XV0 X0.53 (The dual representation of genetic algorithms is an important feature for searching in) 126 694 P0.41 (certain environments. For instance, when it is unclear how to search the evaluation space) 108 676 P0.5 (directly) 108 658 P0.5 (, and when there exists an interpretation function such that searching the space of) 143.86 658 P0.14 (bit strings by crossover leads to good points in evaluation space, then the dual representa-) 108 640 P-0.14 (tion is ideal. It is unclear) 108 622 P-0.14 (, however) 225.42 622 P-0.14 (, that there exists an interpretation function that makes a) 272.1 622 P0.64 (dual representation bene\336cial for evolving connectionist networks. Clearly) 108 604 P0.64 (, the choice of) 470.13 604 P1.96 (interpretation function introduces a strong bias into the search, typically by excluding) 108 586 P-0.09 (many potentially interesting and useful networks \050another example of forcing the task into) 108 568 P-0.15 (an architecture\051. Moreover) 108 550 P-0.15 (, the bene\336ts of having a dual representation hinge on crossover) 235.42 550 P0.73 (being an appropriate evolutionary operator for the task for some particular interpretation) 108 532 P0.7 (function; otherwise, the need to translate between dual representations is an unnecessary) 108 514 P(complication.) 108 496 T0.67 (Characterizing tasks for which crossover is a bene\336cial operator is an open question.) 126 466 P0.62 (The building block hypothesis \050Goldber) 108 448 P0.62 (g 1989a\051 suggests that crossover tends to recom-) 302.15 448 P0.72 (bine short, connected substrings, i.e. building blocks, of the bit string representation that) 108 430 P2.99 (correspond to generally good task solutions when evaluated. This hypothesis makes) 108 412 P0.61 (explicit the intuition that lar) 108 394 P0.61 (ger structures with high \336tness are built out of smaller struc-) 244.14 394 P0.79 (tures with moderate \336tness. Crossover tends to be most ef) 108 376 P0.79 (fective in environments where) 392.07 376 P-0.22 (the \336tness of a member of the population is reasonably correlated with the expected ability) 108 358 P0.17 (of its representational components \050Goldber) 108 340 P0.17 (g 1989c\051. Environments where this is not true) 319.64 340 P0.03 (are called) 108 322 P3 F0.03 (deceptive) 157.35 322 P0 F0.03 ( \050Goldber) 202.63 322 P0.03 (g 1989b\051. However) 248.74 322 P0.03 (, there is some debate about the appropri-) 341.26 322 P(ateness of this concept \050Grefenstette 1992\051.) 108 304 T0.42 (There are three forms of deception when using crossover to evolve connectionist net-) 126 274 P0.02 (works. The \336rst involves identical networks, i.e., ones that share both a common topology) 108 256 P0.36 (and common weights. Because the interpretation function may be many-to-one, two such) 108 238 P0.83 (networks need not have the same bit string representation, as illustrated in Figure 12. In) 108 220 P0.02 (such a situation, crossover creates of) 108 202 P0.02 (fspring that contain repeated components, thus losing) 283.73 202 P-0.02 (the computational ability of some of the parents\325 hidden units. Consequently) 108 184 P-0.02 (, the resulting) 474.42 184 P0.12 (networks perform worse than their parents since they are not in possession of key compu-) 108 166 P1.19 (tational components for the task. Schaf) 108 148 P1.19 (fer) 300.95 148 P1.19 (, Whitely and Eshelman \0501992\051 terms this the) 313.79 148 P3 F-0.1 (competing conventions) 108 130 P0 F-0.1 ( problem, and points out that the number of competing conventions) 218.16 130 P(grows exponentially with the number of hidden units.) 108 112 TFMENDPAGE%%EndPage: "71" 8%%Page: "72" 8612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(72) 528.01 712 T108 90 540 702 R7 XV0 X0.69 (A second form of deception involves two networks with identical topologies but dif-) 126 339.91 P1.88 (ferent weights. It is well known that for a given task, a single connectionist topology) 108 321.91 P0.48 (af) 108 303.91 P0.48 (fords multiple solutions for a task, each implemented by a unique) 117.1 303.91 P3 F0.48 (distributed r) 439.47 303.91 P0.48 (epr) 499.15 303.91 P0.48 (esen-) 514.69 303.91 P-0.28 (tation) 108 285.91 P0 F-0.28 ( spread across the hidden units \050Hinton, McClelland and Rumelhart 1986; Sejnowski) 135.99 285.91 P1.06 (and Rosenber) 108 267.91 P1.06 (g 1987; Smolensky 1988\051. While the removal of a small number of nodes) 174.46 267.91 P0.8 (has been shown to ef) 108 249.91 P0.8 (fect only minor alterations in the performance of a trained network) 211.58 249.91 P1.53 (\050Hinton, McClelland and Rumelhart 1986; Sejnowski and Rosenber) 108 231.91 P1.53 (g 1987; Smolensky) 444.31 231.91 P0.08 (1988\051, the computational role each node plays in the overall representation of the solution) 108 213.91 P0.97 (is determined purely by the presence and strengths of its interconnections. Furthermore,) 108 195.91 P0.49 (there need be no correlation between distinct distributed representations over a particular) 108 177.91 P0.69 (network architecture for a given task. This seriously reduces the chance that an arbitrary) 108 159.91 P0.76 (crossover operation between distinct distributed representations can construct viable of) 108 141.91 P0.76 (f-) 532.01 141.91 P(spring) 108 123.91 T3 F(r) 140.98 123.91 T(egar) 145.2 123.91 T(dless of the interpr) 166.74 123.91 T(etation function used) 256.59 123.91 T0 F(.) 357.2 123.91 T108.96 347.91 539.04 702 C375.46 467 531.65 698.57 R0.5 H2 Z0 X0 KN405.93 552.21 417.53 549.13 406.06 545.6 407.2 548.93 4 YV240.76 545.67 407.2 548.92 2 L1 HN464.93 581.87 476.55 578.87 465.11 575.25 466.22 578.59 4 YV164.37 570.37 466.23 578.59 2 LN442.17 655.66 453.85 652.87 442.47 649.05 443.52 652.41 4 YV238.85 643.2 443.53 652.4 2 LN1 14 Q(Interpr) 271.04 620.19 T(etation) 315.09 620.19 T(function) 284.07 606.84 T3 F(Network) 382.73 683.82 T(space) 388.17 673.67 T0.5 H90 450 77.18 86.69 193.7 604.26 A187.75 540.91 240.06 548.74 RVN( Recombination) 140 658.57 T(space) 153.1 648.85 T(Cr) 165.73 598.06 T(ossover) 179.99 598.06 T152.56 572.68 199.91 596.51 2 L3 XN209.62 630.92 216.77 640.56 215.82 628.6 212.72 629.76 4 YV220.66 548.32 200.55 597.03 212.72 629.76 3 LN131.33 565.08 183.63 572.92 R12 XV0 XN193.63 639.21 239.76 647.04 R12 XV0 XN227.04 639.21 245.93 647.04 RVN414.81 511.9 451.13 537.4 2 L1 HN451.13 511.9 414.81 537.4 2 LN414.81 537.4 414.81 511.9 2 LN451.13 537.4 451.13 511.9 2 LN432.97 562.91 414.81 537.4 2 LN432.97 562.91 451.13 537.4 2 LN7 X90 450 6.81 5.44 414.81 537.4 G0 X90 450 6.81 5.44 414.81 537.4 A90 450 6.81 5.44 451.13 537.4 G90 450 6.81 5.44 451.13 537.4 A4 X90 450 6.81 5.44 414.81 511.9 G0 X90 450 6.81 5.44 414.81 511.9 A4 X90 450 6.81 5.44 451.13 511.9 G0 X90 450 6.81 5.44 451.13 511.9 A4 X90 450 6.81 5.44 432.97 562.91 G0 X90 450 6.81 5.44 432.97 562.91 A511.97 541.64 475.65 567.15 2 LN475.65 541.64 511.97 567.15 2 LN511.97 567.15 511.97 541.64 2 LN475.65 567.15 475.65 541.64 2 LN493.81 592.66 511.97 567.15 2 LN
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -