159-161.html

来自「遗传算法经典书籍-英文原版是研究遗传算法的很好的资料」· HTML 代码 · 共 88 行
HTML
88 行
<HTML>
<HEAD>
<META name=vsisbn content="0849398010">
<META name=vstitle content="Industrial Applications of Genetic Algorithms">
<META name=vsauthor content="Charles Karr; L. Michael Freeman">
<META name=vsimprint content="CRC Press">
<META name=vspublisher content="CRC Press LLC">
<META name=vspubdate content="12/01/98">
<META name=vscategory content="Web and Software Development: Artificial Intelligence: Other">




<TITLE>Industrial Applications of Genetic Algorithms:Data Mining Using Genetic Algorithms</TITLE>

<!-- HEADER -->

<STYLE type="text/css"> 
 <!--
 A:hover  {
 	color : Red;
 }
 -->
</STYLE>

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

<!--ISBN=0849398010//-->
<!--TITLE=Industrial Applications of Genetic Algorithms//-->
<!--AUTHOR=Charles Karr//-->
<!--AUTHOR=L. Michael Freeman//-->
<!--PUBLISHER=CRC Press LLC//-->
<!--IMPRINT=CRC Press//-->
<!--CHAPTER=9//-->
<!--PAGES=159-161//-->
<!--UNASSIGNED1//-->
<!--UNASSIGNED2//-->

<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="../ch08/152-157.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="161-163.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>
<P><BR></P>
<H2><A NAME="Heading1"></A><FONT COLOR="#000077">Chapter 9<BR>Data Mining Using Genetic Algorithms
</FONT></H2>
<P><I>Darrin J. Marshall</I></P>
<P>NCR Corporation<BR>17095 Via Del Campo<BR>San Diego, CA 92127<BR>email: darrin.marshall@sandiegoca.ncr.com</P>
<P><FONT SIZE="+1"><B>ABSTRACT</B></FONT></P>
<P>Data mining involves sifting through very large databases in search of frequently occurring patterns to detect trends and produce generalizations about the data content. This chapter describes an approach, using genetic algorithms, to search through, or &#147;mine,&#148; a large set of database transactions in order to determine a relationship between the transactions. Consider an example database consisting of transactions representing products purchased at a retail store. This genetic algorithm implementation focuses on determining, out of 100 possible items, the four items that are most often purchased together. Information of this sort can be used by businesses to aid in inventory, sales, and marketing strategy planning. For example, such information can allow a business to focus on marketing opportunities that encompass sets or groups of products and thus maximize their advertising dollars.
</P>
<P><FONT SIZE="+1"><B>INTRODUCTION</B></FONT></P>
<P>With the widespread proliferation of powerful and affordable computing and information gathering devices, we have seen a dramatic increase in the amount of electronic data that is being obtained and stored. It has been estimated that the amount of electronic information in the world doubles every 20 months, and the size and number of databases that store this information are increasing even faster [1].
</P>
<P>Given that we have vast amounts of data stored and available to us, and new mountains of information are being gathered each day with every credit card and point-of-sale bar code transaction, businesses are now faced with the task of making use of all this information. For instance, one of the countries largest retailers, Wal-Mart, as of 1995, had 65 weeks of point-of-sale transaction data on-line, taking up more than 3.5 terabytes of storage [2]. Considering that one terabyte of data is equivalent to approximately two million books, it is obvious that new and powerful techniques will be required to make effective use of this much information.</P>
<P>It is generally recognized within the business community that there is untapped value in large databases [3]. It is also recognized that this information is vital to business operations, and that decision-makers need to make use of the stored data in order to be competitive. To accomplish this, the large businesses that own these terabytes of data have turned to what is being called &#147;data mining.&#148; Data mining employs methodologies to sift through huge databases in search of frequently occurring patterns to detect trends and produce generalizations about the data content. As one might suspect, this type of analysis not only requires the use of very sophisticated and powerful computers, designed specifically for the purpose of managing large amounts of data, but it also requires data mining application software that can efficiently and effectively search through vast amounts of data in order to recognize the possible relationships that exist.</P>
<P>Data mining encompasses many technologies and techniques which are used to identify &#147;hidden&#148; pieces of information within a database. Current technologies and techniques employed in many data mining environments include parallel computer architectures, relational database management systems (RDMS), data warehouses (which organize data in such a way that retrieval and analysis is greatly facilitated), and often, artificial intelligence and neural networks. These technologies, although important in the overall implementation of a data mining environment, are beyond the scope of the application examined in this chapter.</P>
<P>This chapter focuses solely on a genetic algorithm-based search technique that can be used in data mining to aid in determining relationships that may exist within a data set. In order to determine what relationships exist, many combinations of transaction comparisons and database queries must take place. Depending on the nature of the data and the relationships that are being sought, millions of data attribute combinations may have to be investigated.</P>
<P>For example, consider a database that contains N transactions, where each transaction consists of a list of items purchased. Also, assume that there are 100 different items that can appear in a transaction. Now, consider an effort to determine the four items that are most often purchased together. This implies that there are 100 items chosen 4 at a time, or 3,921,225 combinations of items available for investigation. If any useful output from the data mining tool is expected within a reasonable amount of time, an exhaustive comparison of all possible combinations is probably not feasible. Instead, an efficient and effective search technique is required for relationship discovery. This is where a genetic algorithm comes into play, and it is this problem that is addressed in this chapter.</P>
<P>The remainder of this chapter will review current data mining search techniques, describe the problem that was solved using a genetic algorithm, present the specifics of the genetic algorithm that was developed, and present the results of simulations that were run to test and analyze different aspects of the genetic algorithm in the data mining problem domain.</P><P><BR></P>
<CENTER>
<TABLE BORDER>
<TR>
<TD><A HREF="../ch08/152-157.html">Previous</A></TD>
<TD><A HREF="../ewtoc.html">Table of Contents</A></TD>
<TD><A HREF="161-163.html">Next</A></TD>
</TR>
</TABLE>
</CENTER>

<hr width="90%" size="1" noshade>
<div align="center">
<font face="Verdana,sans-serif" size="1">Copyright &copy; <a href="/reference/crc00001.html">CRC Press LLC</a></font>
</div>
<!-- all of the reference materials (books) have the footer and subfoot reveresed -->
<!-- reference_subfoot = footer -->
<!-- reference_footer = subfoot -->

</BODY>
</HTML>

<!-- END FOOTER -->
159-161.html - 源码说明

本页面展示了「遗传算法经典书籍-英文原版是研究遗传算法的很好的资料」中的 159-161.html 源码文件，采用 HTML 编程语言编写，共 88 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫开发者社区收录了大量与遗传算法相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?