192-196.html

遗传算法经典书籍-英文原版是研究遗传算法的很好的资料
HTML
字号:
<HTML>
<HEAD>
<META name=vsisbn content="0849398010">
<META name=vstitle content="Industrial Applications of Genetic Algorithms">
<META name=vsauthor content="Charles Karr; L. Michael Freeman">
<META name=vsimprint content="CRC Press">
<META name=vspublisher content="CRC Press LLC">
<META name=vspubdate content="12/01/98">
<META name=vscategory content="Web and Software Development: Artificial Intelligence: Other">




<TITLE>Industrial Applications of Genetic Algorithms:Data Mining Using Genetic Algorithms</TITLE>

<!-- HEADER -->

<STYLE type="text/css"> 
 <!--
 A:hover  {
 	color : Red;
 }
 -->
</STYLE>

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

<!--ISBN=0849398010//-->
<!--TITLE=Industrial Applications of Genetic Algorithms//-->
<!--AUTHOR=Charles Karr//-->
<!--AUTHOR=L. Michael Freeman//-->
<!--PUBLISHER=CRC Press LLC//-->
<!--IMPRINT=CRC Press//-->
<!--CHAPTER=9//-->
<!--PAGES=192-196//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->

<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="187-192.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="../ch10/197-198.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
</P>
<P><B>Approximate Function Evaluation Simulation</B></P>
<P>The approximate function evaluation simulation was run with the fitness function, crossover operator, mutation operator, genetic algorithm parameters, and database configuration as noted in the simulation test matrix.
</P>
<P>The approximate function evaluation simulation used the same database configuration as that in the full function simulation; however, only one tenth (500) of the database transactions were sampled to generate the fitness for a given item combination. The total running time for this simulation was 5.1 minutes.</P>
<P>As can be seen from Figure 9.19, the item combination of interest entered the population around generation 7 and the simulation converged on the solution around generation 14. Comparing this to the corresponding simulation run with the full function evaluation, we can see that the approximate function evaluation simulation performed equally as well as the full function evaluation simulation in terms of convergence to the solution. In terms of execution speed, the approximate function evaluation simulation performed 8.53 times better than the full function evaluation simulation.</P>
<P><A NAME="Fig32"></A><A HREF="javascript:displayWindow('images/09-32.jpg',450,216)"><IMG SRC="images/09-32t.jpg"></A>
<BR><A HREF="javascript:displayWindow('images/09-32.jpg',450,216)"><FONT COLOR="#000077"><B>Figure 9.19</B></FONT></A>&nbsp;&nbsp;Simulation 18 results.</P>
<P><B>Performance Simulation Summary</B></P>
<P>The approximate function evaluation simulation, for the data set examined in this test, converged on the solution in the same number of generations as the full function evaluation simulation. In terms of execution speed, however, the approximate function evaluation simulation performed 8.53 times better than the full function evaluation simulation. Thus, it appears that if the data set is fairly large, an approximate function evaluation method can find a solution much more efficiently than when a full function evaluation method is used.
</P>
<P><FONT SIZE="+1"><B>SUMMARY</B></FONT></P>
<P>This chapter presented the results of a genetic algorithm implementation for a data mining application. The goal of this genetic algorithm implementation was to be able to determine, from a synthetically generated database, the four items that most often appeared together in a transaction. Through experimentation with a variety of fitness functions, crossover operators, mutation operators, and genetic algorithm parameters, the simulation results just presented provide evidence that this goal was achieved.
</P>
<P>The development of the data mining genetic algorithm involved, first, the determination of a base-10 coding scheme to represent four potentially related items numbers. Then, in an effort to better understand the operation of the genetic algorithm and to identify the best fitness and genetic operators, multiple fitness functions (F1, F2, and F3), crossover operators (ASPX and UXSCO), and mutation operators (random and window) were developed and targeted for investigation through simulation. In addition to fitness and genetic operators, genetic algorithm parameters and performance were also targeted for investigation.</P>
<P>Also, a synthetic data generation tool was developed to enable production of data sets containing embedded relationships between specified item numbers. This capability greatly facilitated verification and analysis of the genetic algorithm implementation and simulations.</P>
<P>After determining the areas that were to be investigated, a simulation test matrix was created to define the tests and environments that were to be run. The execution of the defined tests led to the determination of a specific fitness function and set of genetic operators that provided the best results. The final fitness function and set of genetic operators that exhibited the best overall performance and determination of correct solutions included the following (each of which have been previously described):</P>
<TABLE WIDTH="100%"><TR>
<TD WIDTH="35%"><B><I>Fitness function:</I></B>
<TD WIDTH="65%">F3
<TR>
<TD><B><I>Reproduction Operator:</I></B>
<TD>Roulette Wheel
<TR>
<TD><B><I>Crossover Operator:</I></B>
<TD>Aligned Single-Point Crossover (ASPX)
<TR>
<TD><B><I>Mutation Operator:</I></B>
<TD>Random
</TABLE>
<P>
</P>
<TABLE WIDTH="100%"><TR>
<TD WIDTH="70%" COLSPAN="2"><B><I>Genetic Algorithm Parameters</I></B><SUP><SMALL><B>2</B></SMALL></SUP><TD WIDTH="30%">
<TR>
<TD><I>Population Size:</I>
<TD>70 (200)
<TR>
<TD><I>Crossover Probability:</I>
<TD>0.9
<TR>
<TD><I>Mutation Probability:</I>
<TD>0.001 (0.010)
</TABLE>

<BLOCKQUOTE>
<HR>
<SUP><SMALL><B>2</B></SMALL></SUP><FONT SIZE="-1">These genetic algorithm parameters provided the best &#147;overall&#148; performance. A higher population size (200) and mutation probability (0.010) did provide better performance in specific test scenarios.
</FONT>
<HR>
</BLOCKQUOTE>

<P>From the results achieved in this project, it is apparent that the use of genetic algorithms in data mining applications shows considerable promise and potential for further research and application. The genetic algorithm displayed its ability to quickly and efficiently locate and converge on correct solutions in a minimal number of generations. Future research might involve the implementation and analysis of this genetic algorithm approach in a &#147;real&#148; database environment. In addition, this project has further emphasized the fact that genetic algorithms are useful across a wide range of applications.
</P>
<P><FONT SIZE="+1"><B>REFERENCES</B></FONT></P>
<DL>
<DD><B>1</B>&nbsp;&nbsp;Dilly, R. (1996, February). <I>Data mining - An introduction</I>. The Queens University of Belfast web site <A HREF="http://www-pcc.qub.ac.uk/tec/courses/datamining/ohp/dm-ohp-final_1.html">http://www-pcc.qub.ac.uk/tec/courses/datamining/ohp/dm-OHP-final_1.html</A>.
<DD><B>2</B>&nbsp;&nbsp;Hedberg, S. R. (Winter 1995). <I>Parallelism speeds data mining</I>. IEEE Parallel &amp; Distributed Technology.
<DD><B>3</B>&nbsp;&nbsp;Author unknown (1997). <I>Data mining - An IBM overview</I>. IBM web site <A HREF="http://www.almaden.ibm.com/cs">http://www.almaden.ibm.com/cs</A>.
<DD><B>4</B>&nbsp;&nbsp;Elmasri. R., &amp; Navathe, S. (1994). <I>Fundamentals of database systems</I>. Benjamin Cummings Publishers.
<DD><B>5</B>&nbsp;&nbsp;Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., &amp; Uthurusamy, R. (1996). <I>Advances in knowledge discovery and data mining</I>. AAAI Press. The MIT Press.
<DD><B>6</B>&nbsp;&nbsp;Chung-Sheng Li, Yu, P. S., &amp; Castelli, V. (1996, 26 February through 1 March). HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. <I>Twelfth International Conference on Data Engineering</I>. New Orleans, LA, USA.
<DD><B>7</B>&nbsp;&nbsp;Goldberg, D. E. (1989). <I>Genetic algorithms in search, optimization, and machine learning</I>. Reading, MA: Addison Wesley.
<DD><B>8</B>&nbsp;&nbsp;Potter, W., Pitts, R., Gillis, P., Young, J., &amp; Caramadre, J. (1992). <I>IDA-NET: An intelligent decision aid for battlefield communications network configuration</I>. New York: IEEE.
</DL>
<P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="187-192.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="../ch10/197-198.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>

<hr width="90%" size="1" noshade>
<div align="center">
<font face="Verdana,sans-serif" size="1">Copyright &copy; <a href="/reference/crc00001.html">CRC Press LLC</a></font>
</div>
<!-- all of the reference materials (books) have the footer and subfoot reveresed -->
<!-- reference_subfoot = footer -->
<!-- reference_footer = subfoot -->

</BODY>
</HTML>

<!-- END FOOTER -->
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -