📄 perform.sgml

📁 PostgreSQL7.4.6 for Linux
💻 SGML
📖 第 1 页 / 共 3 页
字号:
    also <xref linkend="explicit-joins">.)<programlisting>SET enable_nestloop = off;EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 &lt; 50 AND t1.unique2 = t2.unique2;                               QUERY PLAN-------------------------------------------------------------------------- Hash Join  (cost=179.45..563.06 rows=49 width=296)   Hash Cond: ("outer".unique2 = "inner".unique2)   -&gt;  Seq Scan on tenk2 t2  (cost=0.00..333.00 rows=10000 width=148)   -&gt;  Hash  (cost=179.33..179.33 rows=49 width=148)         -&gt;  Index Scan using tenk1_unique1 on tenk1 t1                                    (cost=0.00..179.33 rows=49 width=148)               Index Cond: (unique1 &lt; 50)</programlisting>    This plan proposes to extract the 50 interesting rows of <classname>tenk1</classname>    using ye same olde index scan, stash them into an in-memory hash table,    and then do a sequential scan of <classname>tenk2</classname>, probing into the hash table    for possible matches of <literal>t1.unique2 = t2.unique2</literal> at each <classname>tenk2</classname> row.    The cost to read <classname>tenk1</classname> and set up the hash table is entirely start-up    cost for the hash join, since we won't get any rows out until we can    start reading <classname>tenk2</classname>.  The total time estimate for the join also    includes a hefty charge for the CPU time to probe the hash table    10000 times.  Note, however, that we are <emphasis>not</emphasis> charging 10000 times 179.33;    the hash table setup is only done once in this plan type.   </para>   <para>    It is possible to check on the accuracy of the planner's estimated costs    by using <command>EXPLAIN ANALYZE</>.  This command actually executes the query,    and then displays the true run time accumulated within each plan node    along with the same estimated costs that a plain <command>EXPLAIN</command> shows.    For example, we might get a result like this:<screen>EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 &lt; 50 AND t1.unique2 = t2.unique2;                                   QUERY PLAN------------------------------------------------------------------------------- Nested Loop  (cost=0.00..327.02 rows=49 width=296)                                 (actual time=1.181..29.822 rows=50 loops=1)   -&gt;  Index Scan using tenk1_unique1 on tenk1 t1                  (cost=0.00..179.33 rows=49 width=148)                                 (actual time=0.630..8.917 rows=50 loops=1)         Index Cond: (unique1 &lt; 50)   -&gt;  Index Scan using tenk2_unique2 on tenk2 t2                  (cost=0.00..3.01 rows=1 width=148)                                 (actual time=0.295..0.324 rows=1 loops=50)         Index Cond: ("outer".unique2 = t2.unique2) Total runtime: 31.604 ms</screen>    Note that the <quote>actual time</quote> values are in milliseconds of    real time, whereas the <quote>cost</quote> estimates are expressed in    arbitrary units of disk fetches; so they are unlikely to match up.    The thing to pay attention to is the ratios.   </para>   <para>    In some query plans, it is possible for a subplan node to be executed more    than once.  For example, the inner index scan is executed once per outer    row in the above nested-loop plan.  In such cases, the    <quote>loops</quote> value reports the    total number of executions of the node, and the actual time and rows    values shown are averages per-execution.  This is done to make the numbers    comparable with the way that the cost estimates are shown.  Multiply by    the <quote>loops</quote> value to get the total time actually spent in    the node.   </para>   <para>    The <literal>Total runtime</literal> shown by <command>EXPLAIN ANALYZE</command> includes    executor start-up and shut-down time, as well as time spent processing    the result rows.  It does not include parsing, rewriting, or planning    time.  For a <command>SELECT</> query, the total run time will normally be just a    little larger than the total time reported for the top-level plan node.    For <command>INSERT</>, <command>UPDATE</>, and <command>DELETE</> commands, the total run time may be    considerably larger, because it includes the time spent processing the    result rows.  In these commands, the time for the top plan node    essentially is the time spent computing the new rows and/or locating    the old ones, but it doesn't include the time spent making the changes.   </para>   <para>    It is worth noting that <command>EXPLAIN</> results should not be extrapolated    to situations other than the one you are actually testing; for example,    results on a toy-sized table can't be assumed to apply to large tables.    The planner's cost estimates are not linear and so it may well choose    a different plan for a larger or smaller table.  An extreme example    is that on a table that only occupies one disk page, you'll nearly    always get a sequential scan plan whether indexes are available or not.    The planner realizes that it's going to take one disk page read to    process the table in any case, so there's no value in expending additional    page reads to look at an index.   </para>  </sect1> <sect1 id="planner-stats">  <title>Statistics Used by the Planner</title>  <indexterm zone="planner-stats">   <primary>statistics</primary>   <secondary>of the planner</secondary>  </indexterm>  <para>   As we saw in the previous section, the query planner needs to estimate   the number of rows retrieved by a query in order to make good choices   of query plans.  This section provides a quick look at the statistics   that the system uses for these estimates.  </para>  <para>   One component of the statistics is the total number of entries in each   table and index, as well as the number of disk blocks occupied by each   table and index.  This information is kept in the table   <structname>pg_class</structname> in the columns <structfield>reltuples</structfield>   and <structfield>relpages</structfield>.  We can look at it   with queries similar to this one:<screen>SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%';    relname    | relkind | reltuples | relpages---------------+---------+-----------+---------- tenk1         | r       |     10000 |      233 tenk1_hundred | i       |     10000 |       30 tenk1_unique1 | i       |     10000 |       30 tenk1_unique2 | i       |     10000 |       30(4 rows)</screen>   Here we can see that <structname>tenk1</structname> contains 10000   rows, as do its indexes, but the indexes are (unsurprisingly) much   smaller than the table.  </para>  <para>   For efficiency reasons, <structfield>reltuples</structfield>    and <structfield>relpages</structfield> are not updated on-the-fly,   and so they usually contain only approximate values (which is good   enough for the planner's purposes).  They are initialized with dummy   values (presently 1000 and 10 respectively) when a table is created.   They are updated by certain commands, presently <command>VACUUM</>,   <command>ANALYZE</>, and <command>CREATE INDEX</>.  A stand-alone   <command>ANALYZE</>, that is one not part of <command>VACUUM</>,   generates an approximate <structfield>reltuples</structfield> value   since it does not read every row of the table.  </para>  <indexterm>   <primary>pg_statistic</primary>  </indexterm>  <para>   Most queries retrieve only a fraction of the rows in a table, due   to having <literal>WHERE</> clauses that restrict the rows to be examined.   The planner thus needs to make an estimate of the   <firstterm>selectivity</> of <literal>WHERE</> clauses, that is, the fraction of   rows that match each condition in the <literal>WHERE</> clause.  The information   used for this task is stored in the <structname>pg_statistic</structname>   system catalog.  Entries in <structname>pg_statistic</structname> are   updated by <command>ANALYZE</> and <command>VACUUM ANALYZE</> commands   and are always approximate even when freshly updated.  </para>  <indexterm>   <primary>pg_stats</primary>  </indexterm>  <para>   Rather than look at <structname>pg_statistic</structname> directly,   it's better to look at its view <structname>pg_stats</structname>   when examining the statistics manually.  <structname>pg_stats</structname>   is designed to be more easily readable.  Furthermore,   <structname>pg_stats</structname> is readable by all, whereas   <structname>pg_statistic</structname> is only readable by a superuser.   (This prevents unprivileged users from learning something about   the contents of other people's tables from the statistics.  The   <structname>pg_stats</structname> view is restricted to show only   rows about tables that the current user can read.)   For example, we might do:<screen>SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road'; attname | n_distinct |                                                                                                                                                                                  most_common_vals                                                                                                                                                                                   ---------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- name    |  -0.467008 | {"I- 580                        Ramp","I- 880                        Ramp","Sp Railroad                       ","I- 580                            ","I- 680                        Ramp","I- 80                         Ramp","14th                          St  ","5th                           St  ","Mission                       Blvd","I- 880                            "} thepath |         20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}(2 rows)</screen>  </para>  <para>   <structname>pg_stats</structname> is described in detail in   <xref linkend="view-pg-stats">.  </para>  <para>   The amount of information stored in <structname>pg_statistic</structname>,   in particular the maximum number of entries in the   <structfield>most_common_vals</> and <structfield>histogram_bounds</>   arrays for each column, can be set on a   column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>   command, or globally by setting the   <varname>default_statistics_target</varname> runtime parameter.   The default limit is presently 10 entries.  Raising the limit   may allow more accurate planner estimates to be made, particularly for   columns with irregular data distributions, at the price of consuming   more space in <structname>pg_statistic</structname> and slightly more   time to compute the estimates.  Conversely, a lower limit may be   appropriate for columns with simple data distributions.  </para> </sect1> <sect1 id="explicit-joins">  <title>Controlling the Planner with Explicit <literal>JOIN</> Clauses</title>  <indexterm zone="explicit-joins">   <primary>join</primary>   <secondary>controlling the order</secondary>  </indexterm>  <para>   It is possible   to control the query planner to some extent by using the explicit <literal>JOIN</>   syntax.  To see why this matters, we first need some background.  </para>  <para>   In a simple join query, such as<programlisting>SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;</programlisting>   the planner is free to join the given tables in any order.  For   example, it could generate a query plan that joins A to B, using   the <literal>WHERE</> condition <literal>a.id = b.id</>, and then   joins C to this joined table, using the other <literal>WHERE</>   condition.  Or it could join B to C and then join A to that result.   Or it could join A to C and then join them with B, but that
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -