📄 geqo.sgml
字号:
<!--$Header: /usr/local/cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.8 1999/03/30 15:25:56 thomas Exp $Genetic Optimizer$Log: geqo.sgml,v $Revision 1.8 1999/03/30 15:25:56 thomasFix up small markup problems. Force omit-tags to nil so we have tag completion as required by the newest DocBook conventions.Revision 1.7 1999/02/19 01:57:08 thomasFix SGML markup from last content changes.Revision 1.6 1999/02/18 05:26:17 momjianEnable bushy plans by default.Revision 1.5 1998/12/29 02:24:15 thomasClean up to ensure tag completion as required by the newest versions of Norm's Modular Style Sheets and jade/docbook.From Vince Vielhaber <vev@michvhf.com>.Revision 1.4 1998/08/15 06:55:05 thomasChange Id field in chapter tag to change html output file name.--><Chapter Id="geqo"><DocInfo><Author><FirstName>Martin</FirstName><SurName>Utesch</SurName><Affiliation><Orgname>University of Mining and Technology</Orgname><Orgdiv>Institute of Automatic Control</Orgdiv><Address><City>Freiberg</City><Country>Germany</Country></Address></Affiliation></Author><Date>1997-10-02</Date></DocInfo><Title>Genetic Query Optimization in Database Systems</Title><Para><Note><Title>Author</Title><Para>Written by <ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink>for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.</Para></Note></para><Sect1><Title>Query Handling as a Complex Optimization Problem</Title><Para> Among all relational operators the most difficult one to process andoptimize is the <FirstTerm>join</FirstTerm>. The number of alternative plans to answer a querygrows exponentially with the number of <Command>join</Command>s included in it. Furtheroptimization effort is caused by the support of a variety of <FirstTerm>join methods</FirstTerm> (e.g., nested loop, index scan, merge join in <ProductName>Postgres</ProductName>) toprocess individual <Command>join</Command>s and a diversity of <FirstTerm>indices</FirstTerm> (e.g., r-tree,b-tree, hash in <ProductName>Postgres</ProductName>) as access paths for relations.</para><Para> The current <ProductName>Postgres</ProductName> optimizer implementation performs a <FirstTerm>near-exhaustive search</FirstTerm> over the space of alternative strategies. This queryoptimization technique is inadequate to support database applicationdomains that involve the need for extensive queries, such as artificialintelligence.</para><Para> The Institute of Automatic Control at the University of Mining andTechnology, in Freiberg, Germany, encountered the described problems as itsfolks wanted to take the <ProductName>Postgres</ProductName> DBMS as the backend for a decisionsupport knowledge based system for the maintenance of an electricalpower grid. The DBMS needed to handle large <Command>join</Command> queries for theinference machine of the knowledge based system.</para><Para> Performance difficulties within exploring the space of possible queryplans arose the demand for a new optimization technique being developed.</para><Para> In the following we propose the implementation of a <FirstTerm>Genetic Algorithm</FirstTerm> as an option for the database query optimization problem.</para></sect1><Sect1><Title>Genetic Algorithms (<Acronym>GA</Acronym>)</Title><Para> The <Acronym>GA</Acronym> is a heuristic optimization method which operates through determined, randomized search. The set of possible solutions for theoptimization problem is considered as a <FirstTerm>population</FirstTerm> of <FirstTerm>individuals</FirstTerm>.The degree of adaption of an individual to its environment is specifiedby its <FirstTerm>fitness</FirstTerm>.</para><Para> The coordinates of an individual in the search space are representedby <FirstTerm>chromosomes</FirstTerm>, in essence a set of character strings. A <FirstTerm>gene</FirstTerm> is asubsection of a chromosome which encodes the value of a single parameterbeing optimized. Typical encodings for a gene could be <FirstTerm>binary</FirstTerm> or<FirstTerm>integer</FirstTerm>.</para><Para> Through simulation of the evolutionary operations <FirstTerm>recombination</FirstTerm>,<FirstTerm>mutation</FirstTerm>, and <FirstTerm>selection</FirstTerm> new generations of search points are foundthat show a higher average fitness than their ancestors.</para><Para> According to the "comp.ai.genetic" <Acronym>FAQ</Acronym> it cannot be stressed toostrongly that a <Acronym>GA</Acronym> is not a pure random search for a solution to aproblem. A <Acronym>GA</Acronym> uses stochastic processes, but the result is distinctlynon-random (better than random). <ProgramListing>Structured Diagram of a <Acronym>GA</Acronym>:---------------------------P(t) generation of ancestors at a time tP''(t) generation of descendants at a time t+=========================================+|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|+=========================================+| INITIALIZE t := 0 |+=========================================+| INITIALIZE P(t) |+=========================================+| evalute FITNESS of P(t) |+=========================================+| while not STOPPING CRITERION do || +-------------------------------------+| | P'(t) := RECOMBINATION{P(t)} || +-------------------------------------+| | P''(t) := MUTATION{P'(t)} || +-------------------------------------+| | P(t+1) := SELECTION{P''(t) + P(t)} || +-------------------------------------+| | evalute FITNESS of P''(t) || +-------------------------------------+| | t := t + 1 |+===+=====================================+</ProgramListing></para></sect1><Sect1><Title>Genetic Query Optimization (<Acronym>GEQO</Acronym>) in Postgres</Title><Para> The <Acronym>GEQO</Acronym> module is intended for the solution of the queryoptimization problem similar to a traveling salesman problem (<Acronym>TSP</Acronym>).Possible query plans are encoded as integer strings. Each stringrepresents the <Command>join</Command> order from one relation of the query to the next.E. g., the query tree<ProgramListing> /\ /\ 2 /\ 3 4 1</ProgramListing>is encoded by the integer string '4-1-3-2',which means, first join relation '4' and '1', then '3', andthen '2', where 1, 2, 3, 4 are relids in <ProductName>Postgres</ProductName>.</para><Para> Parts of the <Acronym>GEQO</Acronym> module are adapted from D. Whitley's Genitoralgorithm.</para><Para> Specific characteristics of the <Acronym>GEQO</Acronym> implementation in <ProductName>Postgres</ProductName>are:<ItemizedList Mark="bullet" Spacing="compact"><ListItem><Para>Usage of a <FirstTerm>steady state</FirstTerm> <Acronym>GA</Acronym> (replacement of the least fit individuals in a population, not whole-generational replacement) allows fast convergence towards improved query plans. This is essential for query handling with reasonable time;</Para></ListItem><ListItem><Para>Usage of <FirstTerm>edge recombination crossover</FirstTerm> which is especially suited to keep edge losses low for the solution of the <Acronym>TSP</Acronym> by means of a <Acronym>GA</Acronym>;</Para></ListItem><ListItem><Para>Mutation as genetic operator is deprecated so that no repair mechanisms are needed to generate legal <Acronym>TSP</Acronym> tours.</Para></ListItem></ItemizedList></para><Para> The <Acronym>GEQO</Acronym> module gives the following benefits to the <ProductName>Postgres</ProductName> DBMScompared to the <ProductName>Postgres</ProductName> query optimizer implementation:<ItemizedList Mark="bullet" Spacing="compact"><ListItem><Para>Handling of large <Command>join</Command> queries through non-exhaustive search;</Para></ListItem><ListItem><Para>Improved cost size approximation of query plans since no longer plan merging is needed (the <Acronym>GEQO</Acronym> module evaluates the cost for a query plan as an individual).</Para></ListItem></ItemizedList></para></Sect1><Sect1><Title>Future Implementation Tasks for <ProductName>Postgres</ProductName> <Acronym>GEQO</Acronym></Title><Sect2><Title>Basic Improvements</Title><Sect3><Title>Improve freeing of memory when query is already processed</Title><Para>With large <Command>join</Command> queries the computing time spent for the genetic queryoptimization seems to be a mere <Emphasis>fraction</Emphasis> of the time <ProductName>Postgres</ProductName>needs for freeing memory via routine <Function>MemoryContextFree</Function>,file <FileName>backend/utils/mmgr/mcxt.c</FileName>.Debugging showed that it get stucked in a loop of routine<Function>OrderedElemPop</Function>, file <FileName>backend/utils/mmgr/oset.c</FileName>.The same problems arise with long queries when using the normal<ProductName>Postgres</ProductName> query optimization algorithm.</para></sect3><Sect3><Title>Improve genetic algorithm parameter settings</Title><Para>In file <FileName>backend/optimizer/geqo/geqo_params.c</FileName>, routines<Function>gimme_pool_size</Function> and <Function>gimme_number_generations</Function>,we have to find a compromise for the parameter settingsto satisfy two competing demands:<ItemizedList Spacing="compact"><ListItem><Para>Optimality of the query plan</Para></ListItem><ListItem><Para>Computing time</Para></ListItem></ItemizedList></para></sect3><Sect3><Title>Find better solution for integer overflow</Title><Para>In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine<Function>geqo_joinrel_size</Function>,the present hack for MAXINT overflow is to set the <ProductName>Postgres</ProductName> integervalue of <StructField>rel->size</StructField> to its logarithm.Modifications of <StructName>Rel</StructName> in <FileName>backend/nodes/relation.h</FileName> willsurely have severe impacts on the whole <ProductName>Postgres</ProductName> implementation.</para></sect3><Sect3><Title>Find solution for exhausted memory</Title><Para>Memory exhaustion may occur with more than 10 relations involved in a query.In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine<Function>gimme_tree</Function> is recursively called.Maybe I forgot something to be freed correctly, but I dunno what.Of course the <StructName>rel</StructName> data structure of the <Command>join</Command> keeps growing andgrowing the more relations are packed into it.Suggestions are welcome :-(</para></sect3></sect2><BIBLIOGRAPHY Id="geqo-biblio"><TITLE>References</TITLE><PARA>Reference information for <Acronym>GEQ</Acronym> algorithms.</PARA><BIBLIOENTRY><BOOKBIBLIO><TITLE>The Hitch-Hiker's Guide to Evolutionary Computation</TITLE><AUTHORGROUP><AUTHOR><FIRSTNAME>Jörg</FIRSTNAME><SURNAME>Heitkötter</SURNAME></AUTHOR><AUTHOR><FIRSTNAME>David</FIRSTNAME><SURNAME>Beasley</SURNAME></AUTHOR></AUTHORGROUP><PUBLISHER><PUBLISHERNAME>InterNet resource</PUBLISHERNAME></PUBLISHER><ABSTRACT><Para>FAQ in <ULink url="news://comp.ai.genetic">comp.ai.genetic</ULink>is available at <ULink url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ULink>.</Para></ABSTRACT></BOOKBIBLIO><BOOKBIBLIO><TITLE>The Design and Implementation of the Postgres Query Optimizer</TITLE><AUTHORGROUP><AUTHOR><FIRSTNAME>Z.</FIRSTNAME><SURNAME>Fong</SURNAME></AUTHOR></AUTHORGROUP><PUBLISHER><PUBLISHERNAME>University of California, Berkeley Computer Science Department</PUBLISHERNAME></PUBLISHER><ABSTRACT><Para>File <FileName>planner/Report.ps</FileName> in the 'postgres-papers' distribution.</Para></ABSTRACT></BOOKBIBLIO><BOOKBIBLIO><TITLE>Fundamentals of Database Systems</TITLE><AUTHORGROUP><AUTHOR><FIRSTNAME>R.</FIRSTNAME><SURNAME>Elmasri</SURNAME></AUTHOR><AUTHOR><FIRSTNAME>S.</FIRSTNAME><SURNAME>Navathe</SURNAME></AUTHOR></AUTHORGROUP><PUBLISHER><PUBLISHERNAME>The Benjamin/Cummings Pub., Inc.</PUBLISHERNAME></PUBLISHER></BOOKBIBLIO></BIBLIOENTRY></BIBLIOGRAPHY></sect1></Chapter><!-- Keep this comment at the end of the fileLocal variables:mode: sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:"/usr/lib/sgml/catalog"sgml-local-ecat-files:nilEnd:-->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -