📄 indexam.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
  <para>   Because of MVCC, it is always necessary to allow duplicate entries to   exist physically in an index: the entries might refer to successive   versions of a single logical row.  The behavior we actually want to   enforce is that no MVCC snapshot could include two rows with equal   index keys.  This breaks down into the following cases that must be   checked when inserting a new row into a unique index:    <itemizedlist>     <listitem>      <para>       If a conflicting valid row has been deleted by the current transaction,       it's okay.  (In particular, since an UPDATE always deletes the old row       version before inserting the new version, this will allow an UPDATE on       a row without changing the key.)      </para>     </listitem>     <listitem>      <para>       If a conflicting row has been inserted by an as-yet-uncommitted       transaction, the would-be inserter must wait to see if that transaction       commits.  If it rolls back then there is no conflict.  If it commits       without deleting the conflicting row again, there is a uniqueness       violation.  (In practice we just wait for the other transaction to       end and then redo the visibility check in toto.)      </para>     </listitem>     <listitem>      <para>       Similarly, if a conflicting valid row has been deleted by an       as-yet-uncommitted transaction, the would-be inserter must wait       for that transaction to commit or abort, and then repeat the test.      </para>     </listitem>    </itemizedlist>  </para>  <para>   We require the index access method to apply these tests itself, which   means that it must reach into the heap to check the commit status of   any row that is shown to have a duplicate key according to the index   contents.  This is without a doubt ugly and non-modular, but it saves   redundant work: if we did a separate probe then the index lookup for   a conflicting row would be essentially repeated while finding the place to   insert the new row's index entry.  What's more, there is no obvious way   to avoid race conditions unless the conflict check is an integral part   of insertion of the new index entry.  </para>  <para>   The main limitation of this scheme is that it has no convenient way   to support deferred uniqueness checks.  </para> </sect1> <sect1 id="index-cost-estimation">  <title>Index Cost Estimation Functions</title>  <para>   The amcostestimate function is given a list of WHERE clauses that have   been determined to be usable with the index.  It must return estimates   of the cost of accessing the index and the selectivity of the WHERE   clauses (that is, the fraction of parent-table rows that will be   retrieved during the index scan).  For simple cases, nearly all the   work of the cost estimator can be done by calling standard routines   in the optimizer; the point of having an amcostestimate function is   to allow index access methods to provide index-type-specific knowledge,   in case it is possible to improve on the standard estimates.  </para>  <para>   Each amcostestimate function must have the signature:<programlisting>voidamcostestimate (PlannerInfo *root,                IndexOptInfo *index,                List *indexQuals,                Cost *indexStartupCost,                Cost *indexTotalCost,                Selectivity *indexSelectivity,                double *indexCorrelation);</programlisting>   The first four parameters are inputs:   <variablelist>    <varlistentry>     <term>root</term>     <listitem>      <para>       The planner's information about the query being processed.      </para>     </listitem>    </varlistentry>    <varlistentry>     <term>index</term>     <listitem>      <para>       The index being considered.      </para>     </listitem>    </varlistentry>    <varlistentry>     <term>indexQuals</term>     <listitem>      <para>       List of index qual clauses (implicitly ANDed);       a NIL list indicates no qualifiers are available.       Note that the list contains expression trees, not ScanKeys.      </para>     </listitem>    </varlistentry>   </variablelist>  </para>  <para>   The last four parameters are pass-by-reference outputs:   <variablelist>    <varlistentry>     <term>*indexStartupCost</term>     <listitem>      <para>       Set to cost of index start-up processing      </para>     </listitem>    </varlistentry>    <varlistentry>     <term>*indexTotalCost</term>     <listitem>      <para>       Set to total cost of index processing      </para>     </listitem>    </varlistentry>    <varlistentry>     <term>*indexSelectivity</term>     <listitem>      <para>       Set to index selectivity      </para>     </listitem>    </varlistentry>    <varlistentry>     <term>*indexCorrelation</term>     <listitem>      <para>       Set to correlation coefficient between index scan order and       underlying table's order      </para>     </listitem>    </varlistentry>   </variablelist>  </para>  <para>   Note that cost estimate functions must be written in C, not in SQL or   any available procedural language, because they must access internal   data structures of the planner/optimizer.  </para>  <para>   The index access costs should be computed in the units used by   <filename>src/backend/optimizer/path/costsize.c</filename>: a sequential   disk block fetch has cost 1.0, a nonsequential fetch has cost   <varname>random_page_cost</>, and the cost of processing one index row   should usually be taken as <varname>cpu_index_tuple_cost</>.  In addition,   an appropriate multiple of <varname>cpu_operator_cost</> should be charged   for any comparison operators invoked during index processing (especially   evaluation of the indexQuals themselves).  </para>  <para>   The access costs should include all disk and CPU costs associated with   scanning the index itself, but <emphasis>not</> the costs of retrieving or   processing the parent-table rows that are identified by the index.  </para>  <para>   The <quote>start-up cost</quote> is the part of the total scan cost that must be expended   before we can begin to fetch the first row.  For most indexes this can   be taken as zero, but an index type with a high start-up cost might want   to set it nonzero.  </para>  <para>   The indexSelectivity should be set to the estimated fraction of the parent   table rows that will be retrieved during the index scan.  In the case   of a lossy index, this will typically be higher than the fraction of   rows that actually pass the given qual conditions.  </para>  <para>   The indexCorrelation should be set to the correlation (ranging between   -1.0 and 1.0) between the index order and the table order.  This is used   to adjust the estimate for the cost of fetching rows from the parent   table.  </para>  <procedure>   <title>Cost Estimation</title>   <para>    A typical cost estimator will proceed as follows:   </para>   <step>    <para>     Estimate and return the fraction of parent-table rows that will be visited     based on the given qual conditions.  In the absence of any index-type-specific     knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:<programlisting>*indexSelectivity = clauselist_selectivity(root, indexQuals,                                           index-&gt;rel-&gt;relid, JOIN_INNER);</programlisting>    </para>   </step>   <step>    <para>     Estimate the number of index rows that will be visited during the     scan.  For many index types this is the same as indexSelectivity times     the number of rows in the index, but it might be more.  (Note that the     index's size in pages and rows is available from the IndexOptInfo struct.)    </para>   </step>   <step>    <para>     Estimate the number of index pages that will be retrieved during the scan.     This might be just indexSelectivity times the index's size in pages.    </para>   </step>   <step>    <para>     Compute the index access cost.  A generic estimator might do this:<programlisting>    /*     * Our generic assumption is that the index pages will be read     * sequentially, so they have cost 1.0 each, not random_page_cost.     * Also, we charge for evaluation of the indexquals at each index row.     * All the costs are assumed to be paid incrementally during the scan.     */    cost_qual_eval(&amp;index_qual_cost, indexQuals);    *indexStartupCost = index_qual_cost.startup;    *indexTotalCost = numIndexPages +        (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;</programlisting>    </para>   </step>   <step>    <para>     Estimate the index correlation.  For a simple ordered index on a single     field, this can be retrieved from pg_statistic.  If the correlation     is not known, the conservative estimate is zero (no correlation).    </para>   </step>  </procedure>  <para>   Examples of cost estimator functions can be found in   <filename>src/backend/utils/adt/selfuncs.c</filename>.  </para> </sect1></chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -