📄 arch-dev.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 2 页
字号:
上一页 12
     call.  This may be transformed to either a <structname>FuncExpr</>     or <structname>Aggref</> node depending on whether the referenced     name turns out to be an ordinary function or an aggregate function.     Also, information about the actual data types of columns and expression     results is added to the query tree.    </para>   </sect2>  </sect1>  <sect1 id="rule-system">   <title>The <productname>PostgreSQL</productname> Rule System</title>   <para>    <productname>PostgreSQL</productname> supports a powerful    <firstterm>rule system</firstterm> for the specification    of <firstterm>views</firstterm> and ambiguous <firstterm>view updates</firstterm>.    Originally the <productname>PostgreSQL</productname>    rule system consisted of two implementations:    <itemizedlist>     <listitem>      <para>       The first one worked using <firstterm>row level</firstterm> processing and was       implemented deep in the <firstterm>executor</firstterm>. The rule system was       called whenever an individual row had been accessed. This       implementation was removed in 1995 when the last official release       of the <productname>Berkeley Postgres</productname> project was       transformed into <productname>Postgres95</productname>.       </para>     </listitem>     <listitem>      <para>       The second implementation of the rule system is a technique       called <firstterm>query rewriting</firstterm>.       The <firstterm>rewrite system</firstterm> is a module       that exists between the <firstterm>parser stage</firstterm> and the       <firstterm>planner/optimizer</firstterm>. This technique is still implemented.      </para>     </listitem>    </itemizedlist>   </para>   <para>    The query rewriter is discussed in some detail in    <xref linkend="rules">, so there is no need to cover it here.    We will only point out that both the input and the output of the    rewriter are query trees, that is, there is no change in the    representation or level of semantic detail in the trees.  Rewriting    can be thought of as a form of macro expansion.   </para>  </sect1>  <sect1 id="planner-optimizer">   <title>Planner/Optimizer</title>   <para>    The task of the <firstterm>planner/optimizer</firstterm> is to    create an optimal execution plan. A given SQL query (and hence, a    query tree) can be actually executed in a wide variety of    different ways, each of which will produce the same set of    results.  If it is computationally feasible, the query optimizer    will examine each of these possible execution plans, ultimately    selecting the execution plan that is expected to run the fastest.   </para>   <note>    <para>     In some situations, examining each possible way in which a query     may be executed would take an excessive amount of time and memory     space. In particular, this occurs when executing queries     involving large numbers of join operations. In order to determine     a reasonable (not optimal) query plan in a reasonable amount of     time, <productname>PostgreSQL</productname> uses a <xref     linkend="geqo" endterm="geqo-title">.    </para>   </note>   <para>    The planner's search procedure actually works with data structures    called <firstterm>paths</>, which are simply cut-down representations of    plans containing only as much information as the planner needs to make    its decisions. After the cheapest path is determined, a full-fledged    <firstterm>plan tree</> is built to pass to the executor.  This represents    the desired execution plan in sufficient detail for the executor to run it.    In the rest of this section we'll ignore the distinction between paths    and plans.   </para>   <sect2>    <title>Generating Possible Plans</title>    <para>     The planner/optimizer starts by generating plans for scanning each     individual relation (table) used in the query.  The possible plans     are determined by the available indexes on each relation.     There is always the possibility of performing a     sequential scan on a relation, so a sequential scan plan is always     created. Assume an index is defined on a     relation (for example a B-tree index) and a query contains the     restriction     <literal>relation.attribute OPR constant</literal>. If     <literal>relation.attribute</literal> happens to match the key of the B-tree     index and <literal>OPR</literal> is one of the operators listed in     the index's <firstterm>operator class</>, another plan is created using     the B-tree index to scan the relation. If there are further indexes     present and the restrictions in the query happen to match a key of an     index further plans will be considered.    </para>    <para>     After all feasible plans have been found for scanning single relations,     plans for joining relations are created. The planner/optimizer     preferentially considers joins between any two relations for which there     exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for     which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>     exists). Join pairs with no join clause are considered only when there     is no other choice, that is, a particular relation has no available     join clauses to any other relation. All possible plans are generated for     every join pair considered     by the planner/optimizer. The three possible join strategies are:     <itemizedlist>      <listitem>       <para>        <firstterm>nested loop join</firstterm>: The right relation is scanned        once for every row found in the left relation. This strategy        is easy to implement but can be very time consuming.  (However,        if the right relation can be scanned with an index scan, this can        be a good strategy.  It is possible to use values from the current        row of the left relation as keys for the index scan of the right.)       </para>      </listitem>      <listitem>       <para>        <firstterm>merge sort join</firstterm>: Each relation is sorted on the join        attributes before the join starts. Then the two relations are        scanned in parallel, and matching rows are combined to form        join rows. This kind of join is more        attractive because each relation has to be scanned only once.        The required sorting may be achieved either by an explicit sort        step, or by scanning the relation in the proper order using an        index on the join key.       </para>      </listitem>      <listitem>       <para>        <firstterm>hash join</firstterm>: the right relation is first scanned        and loaded into a hash table, using its join attributes as hash keys.        Next the left relation is scanned and the        appropriate values of every row found are used as hash keys to        locate the matching rows in the table.       </para>      </listitem>     </itemizedlist>    </para>    <para>     When the query involves more than two relations, the final result     must be built up by a tree of join steps, each with two inputs.     The planner examines different possible join sequences to find the     cheapest one.    </para>    <para>     The finished plan tree consists of sequential or index scans of     the base relations, plus nested-loop, merge, or hash join nodes as     needed, plus any auxiliary steps needed, such as sort nodes or     aggregate-function calculation nodes.  Most of these plan node     types have the additional ability to do <firstterm>selection</>     (discarding rows that do not meet a specified boolean condition)     and <firstterm>projection</> (computation of a derived column set     based on given column values, that is, evaluation of scalar     expressions where needed).  One of the responsibilities of the     planner is to attach selection conditions from the     <literal>WHERE</literal> clause and computation of required     output expressions to the most appropriate nodes of the plan     tree.    </para>   </sect2>  </sect1>  <sect1 id="executor">   <title>Executor</title>   <para>    The <firstterm>executor</firstterm> takes the plan handed back by the    planner/optimizer and recursively processes it to extract the required set    of rows.  This is essentially a demand-pull pipeline mechanism.    Each time a plan node is called, it must deliver one more row, or    report that it is done delivering rows.   </para>   <para>    To provide a concrete example, assume that the top    node is a <literal>MergeJoin</literal> node.     Before any merge can be done two rows have to be fetched (one from    each subplan). So the executor recursively calls itself to    process the subplans (it starts with the subplan attached to    <literal>lefttree</literal>). The new top node (the top node of the left    subplan) is, let's say, a    <literal>Sort</literal> node and again recursion is needed to obtain    an input row.  The child node of the <literal>Sort</literal> might    be a <literal>SeqScan</> node, representing actual reading of a table.    Execution of this node causes the executor to fetch a row from the    table and return it up to the calling node.  The <literal>Sort</literal>    node will repeatedly call its child to obtain all the rows to be sorted.    When the input is exhausted (as indicated by the child node returning    a NULL instead of a row), the <literal>Sort</literal> code performs    the sort, and finally is able to return its first output row, namely    the first one in sorted order.  It keeps the remaining rows stored so    that it can deliver them in sorted order in response to later demands.   </para>   <para>    The <literal>MergeJoin</literal> node similarly demands the first row    from its right subplan.  Then it compares the two rows to see if they    can be joined; if so, it returns a join row to its caller.  On the next    call, or immediately if it cannot join the current pair of inputs,    it advances to the next row of one table    or the other (depending on how the comparison came out), and again    checks for a match.  Eventually, one subplan or the other is exhausted,    and the <literal>MergeJoin</literal> node returns NULL to indicate that    no more join rows can be formed.   </para>   <para>    Complex queries may involve many levels of plan nodes, but the general    approach is the same: each node computes and returns its next output    row each time it is called.  Each node is also responsible for applying    any selection or projection expressions that were assigned to it by    the planner.   </para>   <para>    The executor mechanism is used to evaluate all four basic SQL query types:    <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, and    <command>DELETE</>.  For <command>SELECT</>, the top-level executor    code only needs to send each row returned by the query plan tree off    to the client.  For <command>INSERT</>, each returned row is inserted    into the target table specified for the <command>INSERT</>.  (A simple    <command>INSERT ... VALUES</> command creates a trivial plan tree    consisting of a single <literal>Result</> node, which computes just one    result row.  But <command>INSERT ... SELECT</> may demand the full power    of the executor mechanism.)  For <command>UPDATE</>, the planner arranges    that each computed row includes all the updated column values, plus    the <firstterm>TID</> (tuple ID, or row ID) of the original target row;    the executor top level uses this information to create a new updated row    and mark the old row deleted.  For <command>DELETE</>, the only column    that is actually returned by the plan is the TID, and the executor top    level simply uses the TID to visit each target row and mark it deleted.   </para>  </sect1> </chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
上一页 12
💿 文件大小 14179 K
👤 上传用户 babydog00
📂 所属分类其他数据库
🏷️ 相关标签

#PostgreSQL #Linux #源码 #开源
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -