📄 187-192.html
字号:
<HTML>
<HEAD>
<META name=vsisbn content="0849398010">
<META name=vstitle content="Industrial Applications of Genetic Algorithms">
<META name=vsauthor content="Charles Karr; L. Michael Freeman">
<META name=vsimprint content="CRC Press">
<META name=vspublisher content="CRC Press LLC">
<META name=vspubdate content="12/01/98">
<META name=vscategory content="Web and Software Development: Artificial Intelligence: Other">
<TITLE>Industrial Applications of Genetic Algorithms:Data Mining Using Genetic Algorithms</TITLE>
<!-- HEADER -->
<STYLE type="text/css">
<!--
A:hover {
color : Red;
}
-->
</STYLE>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<!--ISBN=0849398010//-->
<!--TITLE=Industrial Applications of Genetic Algorithms//-->
<!--AUTHOR=Charles Karr//-->
<!--AUTHOR=L. Michael Freeman//-->
<!--PUBLISHER=CRC Press LLC//-->
<!--IMPRINT=CRC Press//-->
<!--CHAPTER=9//-->
<!--PAGES=187-192//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="183-187.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="192-196.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<P><B>Mutation Operator Simulation Summary</B></P>
<P>As suspected, no obvious advantage of one mutation operator over the other was found. Each corresponding random mutation and window mutation simulation converged (or did not converge) on the solution within the same number of generations. Thus, due to the equal performance of these two operators, the random mutation operator was used in the remaining simulations due to its simplicity.
</P>
<BLOCKQUOTE>
<P><FONT SIZE="-1"><HR><B>Note: </B>Mutation is further addressed, with respect to mutation probability, in the next section, “Genetic Algorithm Parameters Simulation Results.”<HR></FONT>
</BLOCKQUOTE>
<P><B>Genetic Algorithm Parameters Simulation Results</B></P>
<P>Several combinations of genetic algorithm parameters were simulated, as specified in the simulation test matrix. Using the same database configuration and random number generator seed as in simulation 11A, two genetic algorithm parameters (population size and mutation probability) were modified in an attempt to see if those simulation results could be improved.
</P>
<P><B>Population Size Simulations</B></P>
<P>Recall that in simulation 11A, the genetic algorithm converged on a false plateau and never recovered. Postulating that this could have been caused by an initial population that was not diverse enough, I ran a simulation in which the population size was increased from 70 to 200. As can be seen from Figure 9.15, the item combination of interest entered the population around generation 4 and the simulation converged on the solution around generation 13. Thus, the fact that this simulation converged on the solution suggests that a larger population may help insure that a correct solution is determined. Of course, it must be considered that a larger population has the disadvantage of contributing to more computation and thus a reduced overall execution time.
</P>
<P><A NAME="Fig28"></A><A HREF="javascript:displayWindow('images/09-28.jpg',450,218)"><IMG SRC="images/09-28t.jpg"></A>
<BR><A HREF="javascript:displayWindow('images/09-28.jpg',450,218)"><FONT COLOR="#000077"><B>Figure 9.15</B></FONT></A> Simulation 14 results</P>
<P><B>Mutation Probability Simulations (Random Mutation)</B></P>
<P>Again using simulation 11A as the test configuration, and using the random mutation operator, I attempted to recover the lost allele by increasing the mutation probability from 0.001 to 0.010. As can be seen from Figure 9.16, the item combination of interest entered the population around generation 190. Then the simulation, after having been stuck on a false plateau for most of the preceding simulation, quickly converged on the solution around generation 200. Thus, the fact that the simulation eventually converged on the solution suggests that an increased mutation probability may help insure that lost alleles are recovered and that a correct solution is determined.
</P>
<P><A NAME="Fig29"></A><A HREF="javascript:displayWindow('images/09-29.jpg',450,232)"><IMG SRC="images/09-29t.jpg"></A>
<BR><A HREF="javascript:displayWindow('images/09-29.jpg',450,232)"><FONT COLOR="#000077"><B>Figure 9.16</B></FONT></A> Simulation 15 results</P>
<P><B>Mutation Probability Simulations (Window Mutation)</B></P>
<P>This simulation was identical to the previous simulation (simulation 15) except that the window mutation operator was used instead of the random mutation operator. As can be seen from Figure 9.17, the lost allele was never recovered and the simulation remained on a false plateau through 400 generations.
</P>
<P><A NAME="Fig30"></A><A HREF="javascript:displayWindow('images/09-30.jpg',450,224)"><IMG SRC="images/09-30t.jpg"></A>
<BR><A HREF="javascript:displayWindow('images/09-30.jpg',450,224)"><FONT COLOR="#000077"><B>Figure 9.17</B></FONT></A> Simulation 16 results</P>
<P><B>Genetic Algorithm Parameters Simulation Summary</B></P>
<P>In the environment in which the population size and mutation probability parameters were tested, it appears that increasing the population size can greatly increase the chances of converging on the correct solution, but at the cost of more computation. In addition, an increased mutation probability may help recover lost alleles and enable the simulation to locate the item combination of interest, as it did in the random mutation operator case.
</P>
<P><FONT SIZE="+1"><B><I>Genetic Algorithm Performance and Performance Simulation Results</I></B></FONT></P>
<P>Given the fact that the data sets used in this project were synthetic, and also considering the limited scope of this project, the performance of the genetic algorithm-based approach could not be analyzed in a “real” database environment. One performance-related area that was investigated, however, involved improving the genetic algorithm execution speed. As is the case in most genetic algorithms, the bulk of the execution time is spent performing fitness function calculations, and the situation was no different for the genetic-based search examined in this project.
</P>
<P>Depending on the size of the database that is being investigated by the genetic algorithm, the time to scan the database in order to generate the fitness for a particular item combination can vary significantly. If a relatively <I>small</I> database is being used, all transactions can be examined and an exact fitness value can be determined within a reasonable amount of time<SUP><SMALL><B>1</B></SMALL></SUP>. On the other hand, if the database is <I>very large</I>, complete examination of every transaction in order to calculate an exact fitness value, may not be feasible. In this case, we can perform an <I>approximate function evaluation</I> to help speed up the execution time of the genetic algorithm. In the case of the data mining genetic algorithm, approximate function evaluation is fairly straightforward in that a subset of the transactions in the database can be sampled to determine a fitness value for an item combination.</P>
<BLOCKQUOTE>
<HR>
<SUP><SMALL><B>1</B></SMALL></SUP><FONT SIZE="-1">Of course, database size and fitness calculation times are relative, and the computer system on which a simulation is run will greatly impact overall execution time.
</FONT>
<HR>
</BLOCKQUOTE>
<P>An approximate function evaluation method was implemented and compared to the full function evaluation method, and the simulation results that were obtained are discussed next.</P>
<P><B>Full Function Evaluation Simulation</B></P>
<P>The full function evaluation simulation was run with the fitness function, crossover operator, mutation operator, genetic algorithm parameters, and database configuration as noted in the simulation test matrix.
</P>
<P>Due to the large number of transactions in the database (5000), the total running time for this simulation was much longer than all previous simulations, at 43.5 minutes.</P>
<P>As can be seen from Figure 9.18, the item combination of interest entered the population around generation 5 and the simulation converged on the solution around generation 14.</P>
<P><A NAME="Fig31"></A><A HREF="javascript:displayWindow('images/09-31.jpg',450,217)"><IMG SRC="images/09-31t.jpg"></A>
<BR><A HREF="javascript:displayWindow('images/09-31.jpg',450,217)"><FONT COLOR="#000077"><B>Figure 9.18</B></FONT></A> Simulation 17 results.<P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="183-187.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="192-196.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<hr width="90%" size="1" noshade>
<div align="center">
<font face="Verdana,sans-serif" size="1">Copyright © <a href="/reference/crc00001.html">CRC Press LLC</a></font>
</div>
<!-- all of the reference materials (books) have the footer and subfoot reveresed -->
<!-- reference_subfoot = footer -->
<!-- reference_footer = subfoot -->
</BODY>
</HTML>
<!-- END FOOTER -->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -