📄 readme

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻
📖 第 1 页 / 共 2 页
字号:
12 下一页
Summary-------These directories take the Query structure returned by the parser, andgenerate a plan used by the executor.  The /plan directory generates theactual output plan, the /path code generates all possible ways to join thetables, and /prep handles various preprocessing steps for special cases./util is utility stuff.  /geqo is the separate "genetic optimization" planner--- it does a semi-random search through the join tree space, rather thanexhaustively considering all possible join trees.  (But each join consideredby /geqo is given to /path to create paths for, so we consider all possibleimplementation paths for each specific join pair even in GEQO mode.)Paths and Join Pairs--------------------During the planning/optimizing process, we build "Path" trees representingthe different ways of doing a query.  We select the cheapest Path thatgenerates the desired relation and turn it into a Plan to pass to theexecutor.  (There is pretty much a one-to-one correspondence between thePath and Plan trees, but Path nodes omit info that won't be needed duringplanning, and include info needed for planning that won't be needed by theexecutor.)The optimizer builds a RelOptInfo structure for each base relation used inthe query.  Base rels are either primitive tables, or subquery subselectsthat are planned via a separate recursive invocation of the planner.  ARelOptInfo is also built for each join relation that is considered duringplanning.  A join rel is simply a combination of base rels.  There is onlyone join RelOptInfo for any given set of baserels --- for example, the join{A B C} is represented by the same RelOptInfo no matter whether we build itby joining A and B first and then adding C, or joining B and C first andthen adding A, etc.  These different means of building the joinrel arerepresented as Paths.  For each RelOptInfo we build a list of Paths thatrepresent plausible ways to implement the scan or join of that relation.Once we've considered all the plausible Paths for a rel, we select the onethat is cheapest according to the planner's cost estimates.  The final planis derived from the cheapest Path for the RelOptInfo that includes all thebase rels of the query.Possible Paths for a primitive table relation include plain old sequentialscan, plus index scans for any indexes that exist on the table.  A subquerybase relation just has one Path, a "SubqueryScan" path (which links to thesubplan that was built by a recursive invocation of the planner).  Likewisea function-RTE base relation has only one possible Path.Joins always occur using two RelOptInfos.  One is outer, the other inner.Outers drive lookups of values in the inner.  In a nested loop, lookups ofvalues in the inner occur by scanning the inner path once per outer tupleto find each matching inner row.  In a mergejoin, inner and outer rows areordered, and are accessed in order, so only one scan is required to performthe entire join: both inner and outer paths are scanned in-sync.  (There'snot a lot of difference between inner and outer in a mergejoin...)  In ahashjoin, the inner is scanned first and all its rows are entered in ahashtable, then the outer is scanned and for each row we lookup the joinkey in the hashtable.A Path for a join relation is actually a tree structure, with the topPath node representing the join method.  It has left and right subpathsthat represent the scan or join methods used for the two input relations.Join Tree Construction----------------------The optimizer generates optimal query plans by doing a more-or-lessexhaustive search through the ways of executing the query.  The best Pathtree is found by a recursive process:1) Take each base relation in the query, and make a RelOptInfo structurefor it.  Find each potentially useful way of accessing the relation,including sequential and index scans, and make a Path representing thatway.  All the Paths made for a given relation are placed in itsRelOptInfo.pathlist.  (Actually, we discard Paths that are obviouslyinferior alternatives before they ever get into the pathlist --- whatends up in the pathlist is the cheapest way of generating each potentiallyuseful sort ordering of the relation.)  Also create a RelOptInfo.joininfolist including all the join clauses that involve this relation.  Forexample, the WHERE clause "tab1.col1 = tab2.col1" generates entries inboth tab1 and tab2's joininfo lists.If we have only a single base relation in the query, we are done.Otherwise we have to figure out how to join the base relations into asingle join relation.2) If the query's FROM clause contains explicit JOIN clauses, we jointhose pairs of relations in exactly the tree structure indicated by theJOIN clauses.  (This is absolutely necessary when dealing with outer JOINs.For inner JOINs we have more flexibility in theory, but don't currentlyexploit it in practice.)  For each such join pair, we generate a Pathfor each feasible join method, and select the cheapest Path.  Note thatthe JOIN clause structure determines the join Path structure, but itdoesn't constrain the join implementation method at each join (nestloop,merge, hash), nor does it say which rel is considered outer or inner ateach join.  We consider all these possibilities in building Paths.3) At the top level of the FROM clause we will have a list of relationsthat are either base rels or joinrels constructed per JOIN directives.We can join these rels together in any order the planner sees fit.The standard (non-GEQO) planner does this as follows:Consider joining each RelOptInfo to each other RelOptInfo specified in itsRelOptInfo.joininfo, and generate a Path for each possible join method foreach such pair.  (If we have a RelOptInfo with no join clauses, we have nochoice but to generate a clauseless Cartesian-product join; so we considerjoining that rel to each other available rel.  But in the presence of joinclauses we will only consider joins that use available join clauses.)If we only had two relations in the FROM list, we are done: we just pickthe cheapest path for the join RelOptInfo.  If we had more than two, we nowneed to consider ways of joining join RelOptInfos to each other to makejoin RelOptInfos that represent more than two FROM items.The join tree is constructed using a "dynamic programming" algorithm:in the first pass (already described) we consider ways to create join relsrepresenting exactly two FROM items.  The second pass considers waysto make join rels that represent exactly three FROM items; the next pass,four items, etc.  The last pass considers how to make the final joinrelation that includes all FROM items --- obviously there can be only onejoin rel at this top level, whereas there can be more than one join relat lower levels.  At each level we use joins that follow available joinclauses, if possible, just as described for the first level.For example:    SELECT  *    FROM    tab1, tab2, tab3, tab4    WHERE   tab1.col = tab2.col AND        tab2.col = tab3.col AND        tab3.col = tab4.col    Tables 1, 2, 3, and 4 are joined as:    {1 2},{2 3},{3 4}    {1 2 3},{2 3 4}    {1 2 3 4}    (other possibilities will be excluded for lack of join clauses)    SELECT  *    FROM    tab1, tab2, tab3, tab4    WHERE   tab1.col = tab2.col AND        tab1.col = tab3.col AND        tab1.col = tab4.col    Tables 1, 2, 3, and 4 are joined as:    {1 2},{1 3},{1 4}    {1 2 3},{1 3 4},{1 2 4}    {1 2 3 4}We consider left-handed plans (the outer rel of an upper join is a joinrel,but the inner is always a single FROM item); right-handed plans (outer relis always a single item); and bushy plans (both inner and outer can bejoins themselves).  For example, when building {1 2 3 4} we considerjoining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and{1 2} to {3 4} (bushy), among other choices.  Although the jointreescanning code produces these potential join combinations one at a time,all the ways to produce the same set of joined base rels will share thesame RelOptInfo, so the paths produced from different join combinationsthat produce equivalent joinrels will compete in add_path.Once we have built the final join rel, we use either the cheapest pathfor it or the cheapest path with the desired ordering (if that's cheaperthan applying a sort to the cheapest other path).Pulling up subqueries---------------------As we described above, a subquery appearing in the range table is plannedindependently and treated as a "black box" during planning of the outerquery.  This is necessary when the subquery uses features such asaggregates, GROUP, or DISTINCT.  But if the subquery is just a simplescan or join, treating the subquery as a black box may produce a poor plancompared to considering it as part of the entire plan search space.Therefore, at the start of the planning process the planner looks forsimple subqueries and pulls them up into the main query's jointree.Pulling up a subquery may result in FROM-list joins appearing below the topof the join tree.  Each FROM-list is planned using the dynamic-programmingsearch method described above.If pulling up a subquery produces a FROM-list as a direct child of anotherFROM-list (with no explicit JOIN directives between), then we can merge thetwo FROM-lists together.  Once that's done, the subquery is an absolutelyintegral part of the outer query and will not constrain the join treesearch space at all.  However, that could result in unpleasant growth ofplanning time, since the dynamic-programming search has runtime exponentialin the number of FROM-items considered.  Therefore, we don't mergeFROM-lists if the result would have too many FROM-items in one list.Optimizer Functions-------------------The primary entry point is planner().planner() set up for recursive handling of subqueries do final cleanup after planning.-subquery_planner() pull up subqueries from rangetable, if possible canonicalize qual     Attempt to simplify WHERE clause to the most useful form; this includes     flattening nested AND/ORs and detecting clauses that are duplicated in     different branches of an OR. simplify constant expressions process sublinks convert Vars of outer query levels into Params--grouping_planner()  preprocess target list for non-SELECT queries  handle UNION/INTERSECT/EXCEPT, GROUP BY, HAVING, aggregates,	ORDER BY, DISTINCT, LIMIT--query_planner()   pull out constant quals, which can be used to gate execution of the	whole plan (if any are found, we make a top-level Result node	to do the gating)   make list of base relations used in query   split up the qual into restrictions (a=1) and joins (b=c)   find qual clauses that enable merge and hash joins----make_one_rel()     set_base_rel_pathlist()      find scan and all index paths for each base relation      find selectivity of columns used in joins-----make_one_rel_by_joins()      jump to geqo if needed      else call make_rels_by_joins() for each level of join tree needed      make_rels_by_joins():        For each joinrel of the prior level, do make_rels_by_clause_joins()        if it has join clauses, or make_rels_by_clauseless_joins() if not.        Also generate "bushy plan" joins between joinrels of lower levels.      Back at make_one_rel_by_joins(), apply set_cheapest() to extract the      cheapest path for each newly constructed joinrel.      Loop back if this wasn't the top join level.   Back at query_planner:    put back any constant quals by adding a Result node Back at grouping_planner: do grouping(GROUP) do aggregates make unique(DISTINCT) make sort(ORDER BY) make limit(LIMIT/OFFSET)Optimizer Data Structures-------------------------PlannerInfo     - global information for planning a particular QueryRelOptInfo      - a relation or joined relations RestrictInfo   - WHERE clauses, like "x = 3" or "y = z"                  (note the same structure is used for restriction and                   join clauses) Path           - every way to generate a RelOptInfo(sequential,index,joins)
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -