📄 geqo.sgml
字号:
<!--$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.24 2003/09/29 18:18:35 momjian Exp $Genetic Optimizer--> <chapter id="geqo"> <docinfo> <author> <firstname>Martin</firstname> <surname>Utesch</surname> <affiliation> <orgname> University of Mining and Technology </orgname> <orgdiv> Institute of Automatic Control </orgdiv> <address> <city> Freiberg </city> <country> Germany </country> </address> </affiliation> </author> <date>1997-10-02</date> </docinfo> <title id="geqo-title">Genetic Query Optimizer</title> <para> <note> <title>Author</title> <para> Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>) for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany. </para> </note> </para> <sect1 id="geqo-intro"> <title>Query Handling as a Complex Optimization Problem</title> <para> Among all relational operators the most difficult one to process and optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query grows exponentially with the number of joins included in it. Further optimization effort is caused by the support of a variety of <firstterm>join methods</firstterm> (e.g., nested loop, hash join, merge join in <productname>PostgreSQL</productname>) to process individual joins and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree, B-tree, hash in <productname>PostgreSQL</productname>) as access paths for relations. </para> <para> The current <productname>PostgreSQL</productname> optimizer implementation performs a <firstterm>near-exhaustive search</firstterm> over the space of alternative strategies. This algorithm, first introduced in the <quote>System R</quote> database, produces a near-optimal join order, but can take an enormous amount of time and memory space when the number of joins in the query grows large. This makes the ordinary <productname>PostgreSQL</productname> query optimizer inappropriate for database application domains that involve the need for extensive queries, such as artificial intelligence. </para> <para> The Institute of Automatic Control at the University of Mining and Technology, in Freiberg, Germany, encountered the described problems as its folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision support knowledge based system for the maintenance of an electrical power grid. The DBMS needed to handle large join queries for the inference machine of the knowledge based system. </para> <para> Performance difficulties in exploring the space of possible query plans created the demand for a new optimization technique to be developed. </para> <para> In the following we describe the implementation of a <firstterm>Genetic Algorithm</firstterm> to solve the join ordering problem in a manner that is efficient for queries involving large numbers of joins. </para> </sect1> <sect1 id="geqo-intro2"> <title>Genetic Algorithms</title> <para> The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which operates through determined, randomized search. The set of possible solutions for the optimization problem is considered as a <firstterm>population</firstterm> of <firstterm>individuals</firstterm>. The degree of adaptation of an individual to its environment is specified by its <firstterm>fitness</firstterm>. </para> <para> The coordinates of an individual in the search space are represented by <firstterm>chromosomes</firstterm>, in essence a set of character strings. A <firstterm>gene</firstterm> is a subsection of a chromosome which encodes the value of a single parameter being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or <firstterm>integer</firstterm>. </para> <para> Through simulation of the evolutionary operations <firstterm>recombination</firstterm>, <firstterm>mutation</firstterm>, and <firstterm>selection</firstterm> new generations of search points are found that show a higher average fitness than their ancestors. </para> <para> According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly non-random (better than random). </para> <figure id="geqo-diagram"> <title>Structured Diagram of a Genetic Algorithm</title> <informaltable frame="none"> <tgroup cols="2"> <tbody> <row> <entry>P(t)</entry> <entry>generation of ancestors at a time t</entry> </row> <row> <entry>P''(t)</entry> <entry>generation of descendants at a time t</entry> </row> </tbody> </tgroup> </informaltable><literallayout class="monospaced">+=========================================+|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|+=========================================+| INITIALIZE t := 0 |+=========================================+| INITIALIZE P(t) |+=========================================+| evaluate FITNESS of P(t) |+=========================================+| while not STOPPING CRITERION do || +-------------------------------------+| | P'(t) := RECOMBINATION{P(t)} || +-------------------------------------+| | P''(t) := MUTATION{P'(t)} || +-------------------------------------+| | P(t+1) := SELECTION{P''(t) + P(t)} || +-------------------------------------+| | evaluate FITNESS of P''(t) || +-------------------------------------+| | t := t + 1 |+===+=====================================+</literallayout> </figure> </sect1> <sect1 id="geqo-pg-intro"> <title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title> <para> The <acronym>GEQO</acronym> module is intended for the solution of the query optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>). Possible query plans are encoded as integer strings. Each string represents the join order from one relation of the query to the next. E. g., the query tree<literallayout class="monospaced"> /\ /\ 2 /\ 34 1</literallayout> is encoded by the integer string '4-1-3-2', which means, first join relation '4' and '1', then '3', and then '2', where 1, 2, 3, 4 are relation IDs within the <productname>PostgreSQL</productname> optimizer. </para> <para> Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor algorithm. </para> <para> Specific characteristics of the <acronym>GEQO</acronym> implementation in <productname>PostgreSQL</productname> are: <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit individuals in a population, not whole-generational replacement) allows fast convergence towards improved query plans. This is essential for query handling with reasonable time; </para> </listitem> <listitem> <para> Usage of <firstterm>edge recombination crossover</firstterm> which is especially suited to keep edge losses low for the solution of the <acronym>TSP</acronym> by means of a <acronym>GA</acronym>; </para> </listitem> <listitem> <para> Mutation as genetic operator is deprecated so that no repair mechanisms are needed to generate legal <acronym>TSP</acronym> tours. </para> </listitem> </itemizedlist> </para> <para> The <acronym>GEQO</acronym> module allows the <productname>PostgreSQL</productname> query optimizer to support large join queries effectively through non-exhaustive search. </para> <sect2 id="geqo-future"> <title>Future Implementation Tasks for <productname>PostgreSQL</> <acronym>GEQO</acronym></title> <para> Work is still needed to improve the genetic algorithm parameter settings. In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines <function>gimme_pool_size</function> and <function>gimme_number_generations</function>, we have to find a compromise for the parameter settings to satisfy two competing demands: <itemizedlist spacing="compact"> <listitem> <para> Optimality of the query plan </para> </listitem> <listitem> <para> Computing time </para> </listitem> </itemizedlist> </para> </sect2> </sect1> <sect1 id="geqo-biblio"> <title>Further Readings</title> <para> The following resources contain additional information about genetic algorithms: <itemizedlist> <listitem> <para> <ulink url="http://surf.de.uu.net/encore/www/">The Hitch-Hiker's Guide to Evolutionary Computation</ulink> (FAQ for <ulink url="news://comp.ai.genetic">comp.ai.genetic</ulink>) </para> </listitem> <listitem> <para> <ulink url="http://www.red3d.com/cwr/evolve.html">Evolutionary Computation and its application to art and design</ulink> by Craig Reynolds </para> </listitem> <listitem> <para> <xref linkend="ELMA99"> </para> </listitem> <listitem> <para> <xref linkend="FONG"> </para> </listitem> </itemizedlist> </para> </sect1></chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -