303-306.html

遗传算法经典书籍-英文原版是研究遗传算法的很好的资料
HTML
字号:
<HTML>
<HEAD>
<META name=vsisbn content="0849398010">
<META name=vstitle content="Industrial Applications of Genetic Algorithms">
<META name=vsauthor content="Charles Karr; L. Michael Freeman">
<META name=vsimprint content="CRC Press">
<META name=vspublisher content="CRC Press LLC">
<META name=vspubdate content="12/01/98">
<META name=vscategory content="Web and Software Development: Artificial Intelligence: Other">




<TITLE>Industrial Applications of Genetic Algorithms:What Can I Do with a Learning Classifier System?</TITLE>

<!-- HEADER -->

<STYLE type="text/css"> 
 <!--
 A:hover  {
 	color : Red;
 }
 -->
</STYLE>

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

<!--ISBN=0849398010//-->
<!--TITLE=Industrial Applications of Genetic Algorithms//-->
<!--AUTHOR=Charles Karr//-->
<!--AUTHOR=L. Michael Freeman//-->
<!--PUBLISHER=CRC Press LLC//-->
<!--IMPRINT=CRC Press//-->
<!--CHAPTER=16//-->
<!--PAGES=303-306//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->

<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="301-303.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="306-308.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<P><FONT SIZE="+1"><B><I>Credit Assignment</I></B></FONT></P>
<P>Credit assignment refers to the tuning of the existing rules&#146; merit. Originally, this metric was termed the <I>strength</I> of the rule. Strength denotes a rule&#146;s predicted payoff for performing its action. Newer measures, based in statistical variance, suggest that direct payoff-based methods may lead to erroneous behavior, since the LCS may need to take a short-term sub-optimal action in order to obtain long-term optimality. The strength-based approaches sometimes become &#147;greedy,&#148; i.e., the rule opts for the highest predicted payoff at the next time step.</P>
<P>Rule updates are commonly done via a <I>reinforcement learning</I> (RL) algorithm. RL is a class of problems that provides an update based on feedback from the environment, but not optimal (target) actions. Update algorithms such as Holland&#146;s <I>Bucket-Brigade</I> (Holland et al., 1986) and Watkins&#146; <I>Q-Learning</I> (Watkins, 1989; Watkins &amp; Dayan, 1992) are common methods for updating a rule&#146;s merit. Rummery and Niranjan (1994) provide an analysis of several of RL algorithms, including Q-learning and Bucket-Brigade algorithms. Their work suggests that in the long run (infinite horizon), both algorithms effectively perform the same update.</P>
<P><FONT SIZE="+1"><B><I>How does the GA come into play?</I></B></FONT></P>
<P>Because the LCS has a variable architecture, a modification method is needed to augment its rule-set. Genetic algorithms (GAs) may be used as global search mechanisms (Goldberg, 1989). In the context of the LCS, the GA is a <I>rule-discovery mechanism</I>.</P>
<P>The type of GA a LCS uses varies according the philosophy of rule search. There are &#147;two camps&#148; of thought for rule discovery in the LCS. These &#147;camps&#148; define the methodology for applying the GA in the LCS. The two approaches are named after the institutions that played a key role in their development. The <I>Michigan approach</I> was developed at the University of Michigan (Holland, 1978; Holland et al., 1986). The <I>Pitt approach</I> emerged from the University of Pittsburgh (Smith, 1980; Smith, 1983).</P>
<P><B>The Michigan Approach</B></P>
<P>The basic philosophy of the Michigan approach is to treat each rule in the rule-set as an individual in a population. Individuals compete via fitness for reproductive rights. Highly fit individuals have the opportunity to mate with other highly fit individuals thereby increasing the chance that their progeny will have better survival characteristics. This method of rule-discovery can be said to be a &#147;population of rules approach,&#148; i.e., only one rule-set exists.
</P>
<P>The type of GA operating in a Michigan-style LCS is typically a <I>steady-state GA</I> (DeJong, 1975; DeJong &amp; Sarma, 1993; Whitley 1989). This form of GA selects a subset of individuals from the population. This subset is then subjected to recombination and mutation. Finally, the offspring of the subset replace existing individuals in the population.</P>
<P><B>The Pitt Approach</B></P>
<P>The Pitt approach chooses to view an <I>entire rule-set</I> as an individual with the population made up of multiple rule-sets. Using this approach, rule-sets compete to mate with other rule-sets; thereby, exchanging genes (rules). This method hopes to combine rules (genes) over many generations to form an effective rule-set. The Pitt approach can be thought of as a &#147;population of rule-sets approach,&#148; i.e., many competing rule-sets exist. The type of GA used for the Pitt Approach may be either a <I>steady-state GA</I> or a simple GA.</P>
<P><B>Which Approach is Better?</B></P>
<P>There is no clear evidence that either method is superior to the other. The deciding factor relies on the environment (problem) that the LCS is immersed. A rule of thumb might be that for computationally intense environments, where the rules and rule-set may be quite complex, a Michigan approach might be appropriate since it is the one rule-set approach. For environments, where a large number of simple rules is expected, a Pitt approach might provide better search properties.
</P>
<P><FONT SIZE="+1"><B><I>A Word on Message Lists</I></B></FONT></P>
<P>The <I>message list</I> component is sometimes found in LCSs. Message lists provide a way to temporally delay messages for internal message passing, i.e., rules signal other rules. This means that rule conditions do not necessarily match <I>environmental conditions</I> (as in the previous discussion), instead the rules may match the <I>internal state</I> as well. The message list acts as an interface between the detectors, effectors, and rules. Some problems involve <I>temporal delay</I>, which mandates some form of input queuing system (memory) that stores past inputs, or rule firings, in order to utilize that information in determining future actions.</P>
<P>The message list represents one of the most complex structures in the LCS. This complexity stems from a desire to have the system form <I>chains</I>, or sequences of rule firings, that lead to an eventual reward. The complexity of chain formation is quite daunting. Smith (1991) analyzed memory exploitation in LCSs with message lists. Smith studied the effects of internal message passing. His basic setup was to discover rules that trigger other rules in a learning automata framework. He found that <I>parasitic rules</I> inherently formed, which disrupted proper (effective) chain formation.</P>
<P>Disruption is best described by example. Consider a learning system, like an LCS, that discovers rules that lead to some form of reward. These rules by the original, economics-based strengths would grow in prominence. This is due to the system acting appropriately and receiving reward. To continually refine the rule-base, the GA is called to discover better rules. In the process of discovery, parasite rules are formed, which call (activate) the good rules, thereby receiving partial reward for participating in a chain of rule activations leading to a reward. The system could learn the problem, but with subsequent refinements, the GA would form internal messaging rules that, in effect, lie about the current state of the system just to get rewarded. The reason for this is the way reward (credit assignment) was performed. Formation of chains relies on the propagation of rewards received back through the message list. Rules that lead to the activation of other rules, which subsequently lead to reward, received credit for participation via the bucket brigade (or similar) algorithms. The parasitic rules eventually become so greedy, and aggressive that they trigger rules at inappropriate times. The parasite effectively lies about the state of the environment, or system, in hopes of receiving reward through the bucket-brigade. This improper triggering of actions eventually occurs at inappropriate times thus producing counterproductive behaviors.</P><P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="301-303.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="306-308.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>

<hr width="90%" size="1" noshade>
<div align="center">
<font face="Verdana,sans-serif" size="1">Copyright &copy; <a href="/reference/crc00001.html">CRC Press LLC</a></font>
</div>
<!-- all of the reference materials (books) have the footer and subfoot reveresed -->
<!-- reference_subfoot = footer -->
<!-- reference_footer = subfoot -->

</BODY>
</HTML>

<!-- END FOOTER -->
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -