📄 052-054.html
字号:
<HTML>
<HEAD>
<META name=vsisbn content="0849398010">
<META name=vstitle content="Industrial Applications of Genetic Algorithms">
<META name=vsauthor content="Charles Karr; L. Michael Freeman">
<META name=vsimprint content="CRC Press">
<META name=vspublisher content="CRC Press LLC">
<META name=vspubdate content="12/01/98">
<META name=vscategory content="Web and Software Development: Artificial Intelligence: Other">
<TITLE>Industrial Applications of Genetic Algorithms:Software Test Data Generation from a Genetic Algorithm</TITLE>
<!-- HEADER -->
<STYLE type="text/css">
<!--
A:hover {
color : Red;
}
-->
</STYLE>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<!--ISBN=0849398010//-->
<!--TITLE=Industrial Applications of Genetic Algorithms//-->
<!--AUTHOR=Charles Karr//-->
<!--AUTHOR=L. Michael Freeman//-->
<!--PUBLISHER=CRC Press LLC//-->
<!--IMPRINT=CRC Press//-->
<!--CHAPTER=4//-->
<!--PAGES=052-054//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="050-052.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="054-059.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<P><FONT SIZE="+1"><B>PRIOR ART</B></FONT></P>
<P>Genetic algorithms (GAs) have previously been used to generate both hardware and software test data. Several authors have recently begun to apply genetic algorithms to the unit test process.
</P>
<P>Sthamer et al. [1] researched the use of genetic algorithms to produce test data for programs written in the ADA computer language. Their work focused on evaluating a GA’s performance on three sample programs. Various codings and fitness functions were evaluated for these sample programs. The goal of this research was to identify coding and fitness functions which could be used to select test data more likely to identify program faults. This was accomplished by configuring the genetic algorithm to reward those test cases which come closest to making the module’s decision logic change the thread of execution. If the set of all test data which causes a specific block of code within the module to be executed is thought of as the solution space, the sets of test data rewarded by Sthamer’s fitness function are boundary data and are thus more likely to detect faults in the program’s control logic.</P>
<P>Xanthakis et al. [2] investigated the automatic generation of unit test data for Pascal programs by analyzing source code and producing appropriate test data. The tool they produced analyzes a source module for constraints and then sets up a genetic algorithm to target a specific path through the module. This process is repeated for each path requiring verification.</P>
<P>Watkins [3] based her work on the earlier work by Xanthakis [2]. Watkins’ paper compares the efficiency of test data generation from a genetic algorithm to randomly generated data, showing the GA based technique to be much more effective. Two test programs were instrumented: trityp-easy and trityp-hard. These programs take three integer parameters and (1) determine if they form a triangle, and (2) identify the type of triangle formed. The trityp-hard program also checks for a right triangle. Test data was obtained from two sources, a random number generator and a genetic algorithm. For the trityp-easy program the GA only needed to examine 0.8% of the search space to find parameter values traversing the ten possible paths of execution through the trityp-easy program. Multiple GA runs were made. The results of each run (the paths traversed) were recorded and used to determine when 100% coverage was obtained. Random number generation covered 5% of the search space before traversing all possible program paths.</P>
<P>Watkins’ genetic algorithm used a small population size (twenty) and a fitness function which rewarded test data inversely proportional to the number of times that the resulting path through a module was executed. Here, path is defined as the set of linear code sequence and jumps, or code blocks, which were traversed. This is the same as mapping the nodes traversed in a control graph of the module under test. This fitness function rewarded data sets which traversed new code segments, but it did not actively seek them out.</P>
<P>This chapter most closely builds on Watkins’ work [3], though there are several major differences. Watkins has shown that a genetic algorithm is much more efficient than random selection at producing test data. Total coverage was achieved only with multiple runs of the genetic algorithm. The work described in this chapter seeks to produce a stable population of test cases in one run of a genetic algorithm.</P>
<P>Genetic algorithms are generally used to find <I>one</I> near optimal solution to some problem. The desired solution in this application is not one set of test data (consisting of parameters to the module under test), but rather all sets of test data which, as a whole, provide 100% module line coverage. This type of problem can be thought of as the formation of niches occupied by sets of test data executing a common path through the module under test. The concept of niche formation will be investigated in future work. All of the papers referenced above configured their genetic algorithms to use small population sizes, usually less than 50. The work presented here studied the effect of varying population size as a function of module complexity. The rationale is this: given the fact that there are usually multiple paths through a software module, test data formation is hindered when additional offspring for one highly fit path come at the expense of those traversing some other path. The solution is simply to increase the population size such that the probability of eliminating all test data for any path is remote.</P>
<P>Another major difference is the fitness calculation. Instead of basing fitness on the inverse of the number of times a path through the module was traversed (“inverse path probability”, termed IPP in this chapter), fitness is based on the probability of traversing the last block of code in the path through the module. This new fitness function emphasizes or rewards the traversal of unique terminal code blocks, producing a lower fitness when a path executes a block shared with other paths. Another difference is a fitness“bonus”given to test data that execute nodes on the control graph having untraversed children. The bonus helps the GA search for hard to reach code fragments.</P>
<P>Several of the authors reviewed the creation of a utility to generate unit test data with genetic algorithms. However, none of these authors proposed a large population, single run approach. Likewise, none of the authors disclosed how their fitness function operated.</P>
<P>Finally, Watkins’ work does a very good job of demonstrating that genetic algorithms are much better than random selection at producing test data. The fitness function examined in this chapter is similar to Watkins’, but is unique enough, and provides sufficient performance gains, to stand on its own. This being said, the author believes the work described in this chapter to be novel and worthy of investigation.</P><P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="050-052.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="054-059.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<hr width="90%" size="1" noshade>
<div align="center">
<font face="Verdana,sans-serif" size="1">Copyright © <a href="/reference/crc00001.html">CRC Press LLC</a></font>
</div>
<!-- all of the reference materials (books) have the footer and subfoot reveresed -->
<!-- reference_subfoot = footer -->
<!-- reference_footer = subfoot -->
</BODY>
</HTML>
<!-- END FOOTER -->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -