📄 chapter7.ps

📁 收集了遗传算法、进化计算、神经网络、模糊系统、人工生命、复杂适应系统等相关领域近期的参考论文和研究报告
💻 PS
📖 第 1 页 / 共 5 页
字号:
        r        } bind def/BITMAPTRUEGRAY {         gsave        translate rotate scale /h exch def /w exch def        /bitmapsave save def         /is w string def        /gis w string def        /bis w string def        /cf currentfile def         w h 8 [w 0 0 h neg 0 h]         { cf is readhexstring pop           cf gis readhexstring pop           cf bis readhexstring pop w gray}  image        bitmapsave restore         grestore        } bind def/BITMAPGRAY { 	8 {fakecolorsetup} COMMONBITMAP	} bind def/BITMAPGRAYc { 	8 {fakecolorsetup} COMMONBITMAPc	} bind def/ENDBITMAP {	} bind defend 	/ALDsave FMLOCAL	/ALDmatrix matrix def ALDmatrix currentmatrix pop/StartALD {	/ALDsave save def	 savematrix	 ALDmatrix setmatrix	} bind def/InALD {	 restorematrix	} bind def/DoneALD {	 ALDsave restore	} bind def%%EndProlog%%BeginSetup(3.0) FMVERSION1 1 612 792 0 1 10 FMDOCUMENT0 0 /Times-Roman FMFONTDEFINE1 0 /Times-Bold FMFONTDEFINE2 0 /Times-Italic FMFONTDEFINE3 0 /Times-BoldItalic FMFONTDEFINE4 1 /Symbol FMFONTDEFINE32 FMFILLS0 0 FMFILL1 .1 FMFILL2 .3 FMFILL3 .5 FMFILL4 .7 FMFILL5 .9 FMFILL6 .97 FMFILL7 1 FMFILL8 <0f1e3c78f0e1c387> FMFILL9 <0f87c3e1f0783c1e> FMFILL10 <cccccccccccccccc> FMFILL11 <ffff0000ffff0000> FMFILL12 <8142241818244281> FMFILL13 <03060c183060c081> FMFILL14 <8040201008040201> FMFILL16 1 FMFILL17 .9 FMFILL18 .7 FMFILL19 .5 FMFILL20 .3 FMFILL21 .1 FMFILL22 0.03 FMFILL23 0 FMFILL24 <f0e1c3870f1e3c78> FMFILL25 <f0783c1e0f87c3e1> FMFILL26 <3333333333333333> FMFILL27 <0000ffff0000ffff> FMFILL28 <7ebddbe7e7dbbd7e> FMFILL29 <fcf9f3e7cf9f3f7e> FMFILL30 <7fbfdfeff7fbfdfe> FMFILL%%EndSetup%%Page: "125" 1%%BeginPaperSize: Letter%%EndPaperSize612 792 0 FMBEGINPAGE108 72 540 81 R7 X0 KV0 12 Q0 X(125) 306.01 73 T108 90 540 648 R7 XV0 X(CHAPTER VII) 286.86 640 T(EMERGENT GOAL-DIRECTED BEHA) 209.85 604 T(VIOR) 408.83 604 T1 F(7.0  Emergent Goal-Dir) 108 562 T(ected Behavior) 228.37 562 T0 F2.21 (The task environment is an important component of emer) 126 536 P2.21 (gent intelligent systems.) 419.3 536 P0.43 (Often in computational problem solving, the feedback to a method comes from an objec-) 108 518 P0.14 (tive function. T) 108 500 P0.14 (ypically) 181.74 500 P0.14 (, these objective functions include an omnipotent teacher or exem-) 219.6 500 P1.09 (plar for the task to compare the progress of the problem solver) 108 482 P1.09 (. Objective functions are) 418.81 482 P-0.17 (also routinely used in the \336tness functions of evolutionary algorithms. While global objec-) 108 464 P-0.11 (tivity is easily provided in a \336tness function when evolving solutions for simple numerical) 108 446 P1.09 (optimization problems, it is often impractical for problems with higher complexity) 108 428 P1.09 (. The) 514.26 428 P0.51 (dif) 108 410 P0.51 (\336culty stems directly from the objectivity of the \336tness function, since objective accu-) 121.11 410 P(racy often comes only at the cost of signi\336cant knowledge about the search space.) 108 392 T0.36 (This chapter demonstrates that a population of learners in a competitive task environ-) 126 362 P1.54 (ment provides the same or better bene\336ts as a task environment that uses an objective) 108 344 P0.72 (measure. This is demonstrated empirically using the GLiB program to discover lisp pro-) 108 326 P-0.27 (grams that play T) 108 308 P-0.27 (ic T) 191.04 308 P-0.27 (ac T) 208.91 308 P-0.27 (oe. However) 228.77 308 P-0.27 (, instead of including an expert strategy in the \336tness) 289.29 308 P0.47 (function to represent the task environment, as was done in the last chapter) 108 290 P0.47 (, the task envi-) 467.96 290 P-0.08 (ronment uses only a metric that determines which of two population members is the better) 108 272 P1.15 (player) 108 254 P1.15 (. As a result, the population moves towards more complex strategies for the task.) 137.31 254 P1.65 (This apparent goal-directed behavior emer) 108 236 P1.65 (ges from the direct interaction of population) 318.23 236 P0.44 (members in the task environment. Generalization in the resulting programs occurs due to) 108 218 P0.85 (the diversity of strategies in the population. The chapter begins with an in-depth look at) 108 200 P-0.03 (the bene\336ts and problems of competitive learning, describes why competition often biases) 108 182 P0.07 (learners toward sub-optimal solutions, termed) 108 164 P2 F0.07 ( cooperative stable states) 328.46 164 P0 F0.07 (, and demonstrates) 450.26 164 P(how a tournament run over the population removes this dif) 108 146 T(\336culty) 389.93 146 T(.) 419.8 146 TFMENDPAGE%%EndPage: "125" 2%%Page: "126" 2612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(126) 522.01 712 T108 90 540 702 R7 XV1 F0 X(7.1  Competitive Learning) 108 694 T0 F2.41 (Competitive learning is a long standing topic in machine learning \050Samuel 1959;) 126 670 P-0.18 (T) 108 652 P-0.18 (esauro 1992\051. A competitive learning process encourages an incremental development so) 114.49 652 P0.84 (that as new strategies are developed by one learner) 108 634 P0.84 (, its opponent adjusts its abilities and) 358.04 634 P-0.23 (discovers new strategies of its own. This strategic \322arms race\323 ideally increases the overall) 108 616 P-0.03 (ability of the learners until they reach near optimal abilities. Interest for using competition) 108 598 P1.93 (in machine learning tasks stems from a desire for a program to discover the strategic) 108 580 P1.55 (nuances of a complex task directly from the \336rst principles of interaction. Appropriate) 108 562 P-0.26 (complex structures arising purely from the \322physics\323 of the task environment would be the) 108 544 P0.58 (ultimate validation of machine learning capability) 108 526 P0.58 (. Such is the essence of emer) 349.63 526 P0.58 (gent com-) 491.12 526 P(putation \050Forrest 1991\051.) 108 508 T-0.12 (A natural collection of tasks on which to study competitive learning is the induction of) 126 478 P0.1 (strategies to play games, since these tasks typically require at least two competitors. Sam-) 108 460 P-0.27 (uel \0501956; 1966\051 was the \336rst to use a self-practicing model in his checkers learning exper-) 108 442 P2.61 (iments. His program induced a linear function that predicted the utility of particular) 108 424 P1.05 (checker board con\336gurations. He reports that a naive version of competitive learning in) 108 406 P2.06 (this system was unsuccessful. Samuel \0501966\051 created a more complicated competitive) 108 388 P0.62 (method with two competitors that updated their respective evaluation functions at dif) 108 370 P0.62 (fer-) 522.69 370 P0.21 (ing rates. Even though this more complicated method uses two separate learners, it is still) 108 352 P0.47 (considered competitive since the two learners comprise a single composite learner that is) 108 334 P(teaching itself.) 108 316 T3.23 (Backgammon is a another game investigated with competitive learning. T) 126 286 P3.23 (esauro) 508.7 286 P-0.17 (\0501992\051 describes a connectionist network, called TD-Gammon, that acquires an evaluation) 108 268 P0.44 (function that plays a very good game of backgammon. TD-Gammon uses it\325) 108 250 P0.44 (s developing) 478.6 250 P-0.04 (network as both opponents to play a game. Once the game is completed the weights of the) 108 232 P1.18 (network are then updated using a temporal dif) 108 214 P1.18 (ference method \050Sutton 1988\051. The game) 336.9 214 P1.76 (provides both positive and negative feedback to the network since it plays both sides.) 108 196 P0.98 (T) 108 178 P0.98 (esauro reports that TD-Gammon plays better than his previous best network backgam-) 114.49 178 P(mon player) 108 160 T(, Neurogammon \050T) 161.81 160 T(esauro 1990\051.) 253.24 160 T2.47 (The crux of competitive learning is the assumption that a learner improves itself) 126 130 P0.8 (through practice with itself, identifying \337aws in its inductions that even it can exploit in) 108 112 P-0.13 (the task. Once repairs are made, the learner moves closer to an adequate concept without a) 108 94 PFMENDPAGE%%EndPage: "126" 3%%Page: "127" 3612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(127) 522.01 712 T108 90 540 702 R7 XV0 X0.12 (priori knowledge signaling in which direction to move. Repeating competitive learning, it) 108 694 P3.66 (is thought, incrementally improves performance and progresses the learner through) 108 676 P-0.12 (improving strategic phases. In essence, competition is expected to ratchet the performance) 108 658 P0.43 (of the learner toward more and more sound strategies. As each new strategic induction is) 108 640 P-0 (made, it is employed both to its own bene\336t and demise, fueling further re\336nements to the) 108 622 P(concept.) 108 604 T-0.23 (While the thesis of competitive learning appears sound, the assumption that the learner) 126 574 P0.81 (steadily improves toward the goal is overly simplistic. Epstein \0501992\051 ar) 108 556 P0.81 (gues that a self-) 461.97 556 P-0.16 (competing learner alone does not provide suf) 108 538 P-0.16 (\336cient feedback to acquire adequate concepts) 323.33 538 P0.18 (in some learning environments. Her experiments on two player deterministic games, such) 108 520 P-0.07 (as T) 108 502 P-0.07 (ic T) 127.82 502 P-0.07 (ac T) 145.9 502 P-0.07 (oe, shows competition acquires concepts consistent in ability as those induced) 165.96 502 P-0.1 (using a teacher playing optimally about half of the time and randomly otherwise. She con-) 108 484 P0.66 (cludes that competition is a bit myopic about the identi\336cation of incorrect inductions in) 108 466 P0.95 (the learner) 108 448 P0.95 (. This creates the potential to underexplore the space unless there are speci\336c) 159.24 448 P(attributes in the task to encourage otherwise.) 108 430 T-0.11 (In any learning system, the implicit \322goal\323 is to locate a strategy such that no feedback) 126 400 P0.93 (received forces the training algorithm to change it. In other words,) 108 382 P2 F0.93 (the concept is stable) 439.62 382 P0.65 (with r) 108 364 P0.65 (espect to the teacher and method of cr) 136.54 364 P0.65 (edit assignment) 324.22 364 P0 F0.65 (. Thus, the ratcheting of per-) 399.83 364 P0.47 (formance to a good strategy is only one possible limit situation for a competitive learner) 108 346 P0.47 (.) 537 346 P0.92 (Another training sequence could lead to limit performance that is optimal for the player) 108 328 P1.33 (playing itself, but fails drastically when forced to play other players. For instance, in a) 108 310 P0.9 (deterministic game such as T) 108 292 P0.9 (ic T) 251.1 292 P0.9 (ac T) 270.14 292 P0.9 (oe, a learner could draw itself consistently without) 291.18 292 P-0.22 (being even remotely pro\336cient at the game, in ef) 108 274 P-0.22 (fect \322cooperating\323 with itself to maximize) 338.89 274 P0.25 (its score and minimize its own modi\336cation. Such a situation, where the learner identi\336es) 108 256 P0.59 (a sub-optimal strategy that is stable with respect to the task environment and the method) 108 238 P(of credit assignment is termed a) 108 220 T2 F(cooperative stable state) 263.89 220 T0 F(.) 377.82 220 T-0.22 (In order for competition to be an ef) 126 190 P-0.22 (fective general method, it\325) 293.15 190 P-0.22 (s bias toward cooperative) 418.41 190 P1.35 (stable states must be curbed. But to curb it we must understand exactly what is causes) 108 172 P0.55 (these sub-optimal states. Axelrod has studied cooperation in a diverse collection of com-) 108 154 P-0.01 (petitive environments \050Axelrod 1984\051. He ar) 108 136 P-0.01 (gues that for cooperation to arise, an environ-) 321.58 136 P1.74 (ment must provide a high probability for repeated meetings and allow both players to) 108 118 P0.72 (reciprocate for past defeats. In the standard, naive de\336nition of competitive learning, the) 108 100 PFMENDPAGE%%EndPage: "127" 4%%Page: "128" 4612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(128) 522.01 712 T108 90 540 702 R7 XV0 X2.08 (learner is always its own opponent. This creates an environment where the learner is) 108 694 P2.15 (assured of seeing the same opponent, itself, again each competition. Self competition) 108 676 P0.62 (unwittingly creates a preference for any concept in the space that minimizes the changes) 108 658 P(in itself after competition on the task.) 108 640 T-0.15 (Thus the re\337exive environment of T) 126 610 P-0.15 (esauro\325) 297.94 610 P-0.15 (s \0501992\051 TD-Gammon should) 332.57 610 P2 F-0.15 (maximize) 477.2 610 P0 F-0.15 ( the) 522.5 610 P1.06 (potential for the network to fall into poor strategic minima. T) 108 592 P1.06 (esauro \0501992\051 and Epstein) 411.25 592 P0.04 (\0501994\051 ar) 108 574 P0.04 (gue that the success of TD-Gammon is due in part to the non-determinism inher-) 152.11 574 P1.11 (ent in the task. Because backgammon requires a dice roll on every move, the learner is) 108 556 P0.19 (forced into a wider variety of strategic situations than if the task was deterministic. While) 108 538 P-0.03 (this conclusion is basically correct, it is incomplete. The dice in backgammon discourages) 108 520 P0.1 (TD-Gammon from settling into cooperative stable states by creating unpredictable advan-) 108 502 P0.6 (tages for one player) 108 484 P0.6 (, reducing the competitor) 204.23 484 P0.6 (\325) 327.04 484 P0.6 (s ability to reciprocate. The roll of the dice) 330.38 484 P0.05 (occasionally forces play into board con\336gurations that have never been visited in any pre-) 108 466 P0.5 (vious game and consequently provides feedback to re\336ne the network. Even if TD-Gam-) 108 448 P0.69 (mon discovers a suboptimal cooperative solution, the non-determinism in the task elicits) 108 430 P-0.17 (feedback that moves the network away from a cooperative stable state by forcing a change) 108 412 P(in its structure.) 108 394 T0.41 (However) 126 364 P0.41 (, the non-determinism of the dice is a weak remedy especially in light of the) 169.48 364 P-0.13 (network being assured of playing itself again. The attraction of cooperative stable states in) 108 346 P-0.23 (some tasks is strong enough to counter the learning gradient supplied by non-determinism.) 108 328 P1.84 (One reason T) 108 310 P1.84 (esauro\325) 175.46 310 P1.84 (s network may have taken so long to train is the gradient toward) 210.09 310 P-0.18 (cooperative stable states and the feedback provided by competition on the non-determinis-) 108 292 P(tic task worked to cancel each other) 108 274 T(\325) 280.3 274 T(s ef) 283.63 274 T(fects.) 300.4 274 T1 F(7.2  Competitive Learning in Evolutionary Algorithms) 108 238 T0 F-0.12 (In evolutionary algorithms, the population represents a natural source of diversity that,) 126 214 P0.39 (while not entirely random, can be recruited to create virtually non-deterministic competi-) 108 196 P1.94 (tive environments. Competitive populations have been under) 108 178 P1.94 (-exploited in evolutionary) 411.54 178 P0.99 (algorithms; exactly why is unclear) 108 160 P0.99 (. One possibility is that competitive environments are) 276.53 160 P1.07 (thought to be too unstructured to guide a population toward a particular goal unless the) 108 142 P1.51 (goal is suitably vague or non-existent. Such an attitude could explain the relegation of) 108 124 P0.66 (competitive evolutionary environments to the Arti\336cial Life community \050e.g., Ray 1992;) 108 106 PFMENDPAGE%%EndPage: "128" 5%%Page: "129" 5612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(129) 522.01 712 T108 90 540 702 R7 XV0 X1.3 (Lindgren 1992\051. Another possibility is that competition is considered too expensive for) 108 694 P0.36 (practical problems; that it requires too many evaluations of population members to deter-) 108 676 P1.01 (mine an accurate ranking. W) 108 658 P1.01 (ithout an accurate enough ranking, the natural dynamics of) 249.78 658 P(the evolutionary process might be compromised.) 108 640 T1 F(7.2.1  Standard Fitness Functions and Competitive Selection) 108 604 T0 F0.25 (The standard \336tness functions used in genetic algorithms are exempli\336ed by the func-) 126 578 P0.71 (tions studied in De Jong \0501975\051. Such functions return the same \336tness for an individual) 108 560 P0.22 (regardless of what other members are present in the population. Their independence from) 108 542 P0.24 (the population\325) 108 524 P0.24 (s composition allows these functions to provide an accurate and consistent) 180.53 524 P(\336tness measure throughout the evolutionary process.) 108 506 T1.09 (While global accuracy is easily computed when evolving solutions for many simple) 126 476 P1.09 (optimization problems, it is often impractical for problems with higher complexity) 108 458 P1.09 (. The) 514.26 458 P0.51 (dif) 108 440 P0.51 (\336culty stems directly from the objectivity of the \336tness function, since objective accu-) 121.11 440 P1.38 (racy often comes only at the cost of signi\336cant knowledge about the search space. For) 108 422 P0.13 (instance, consider the expense of a standard \336tness function for evolving an optimal strat-) 108 404 P0.51 (egy for a particular game. Such a function would need to test members of the population) 108 386 P-0.11 (against all possible strategic situations to garner an objectively accurate measure. For any-) 108 368 P1.38 (thing but a trivial game such a computation is immense. If a suitable \322expert\323 strategy) 108 350 P1.47 (were available, an independent \336tness function could still be constructed, however) 108 332 P1.47 (, the) 517.88 332 P1.23 (evolved solutions would only be \322optimal\323 with respect to this \322expert\323 rather than the) 108 314 P(original task.) 108 296 T2 F0.54 (T) 126 266 P0.54 (ournament selection) 131.57 266 P0 F0.54 (, a competitive selection method, is currently popular in genetic) 229.71 266 P0.35 (algorithms \050Goldber) 108 248 P0.35 (g 1989a\051 and evolutionary programming \050Fogel 1992a\051. Once the \336t-) 205.74 248 P-0.14 (ness of each individual is discovered, the parents of the next selection are selected through) 108 230 P1.21 (comparison. In a) 108 212 P2 F1.21 (k-competition) 194.9 212 P0 F1.21 ( in genetic algorithms \050Goldber) 260.85 212 P1.21 (g 1989a\051, the \336tness of) 416.04 212 P2 F1.21 (k) 534.67 212 P0 F-0 (randomly chosen members of the population are compared. The competitor with the high-) 108 194 P-0.16 (est \336tness value is declared the winner and copied into the next population. This process is) 108 176 P-0.28 (repeated with replacement until the next population is \336lled. In the tournament selection of) 108 158 P0.53 (evolutionary programming \050Fogel 1992a\051, a competition score for each population mem-) 108 140 P0.21 (ber is created by comparing its \336tness with) 108 122 P2 F0.21 (k-1) 318.23 122 P0 F0.21 ( other population members and awarding a) 333.54 122 P0.97 (point for each competitor with a lower \336tness. After each population member is scored,) 108 104 PFMENDPAGE%%EndPage: "129" 6%%Page: "130" 6612 792 0 FMBEGINPAGE108 63 540 702 R7 X0 KV108 711 540 720 RV0 12 Q0 X(130) 522.01 712 T108 90 540 702 R7 XV0 X-0.21 (those members with the highest competition scores become the parents of the next popula-) 108 694 P(tion.) 108 676 T0.05 (Note that both competitive selection methods still require an objective \336tness function) 126 646 P1.78 (and thus are still susceptible to the associated dif) 108 628 P1.78 (\336culties described above. In order to) 355.89 628 P0.33 (remove the reliance on objective \336tness functions, the competition must be used to deter-) 108 610 P(mine the \336tness of the individuals in the population.) 108 592 T1 F
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -