📄 perform.sgml
字号:
<para> In this nested-loop join, the outer scan is the same bitmap index scan we saw earlier, and so its cost and row count are the same because we are applying the <literal>WHERE</> clause <literal>unique1 < 100</literal> at that node. The <literal>t1.unique2 = t2.unique2</literal> clause is not relevant yet, so it doesn't affect row count of the outer scan. For the inner scan, the <literal>unique2</> value of the current outer-scan row is plugged into the inner index scan to produce an index condition like <literal>t2.unique2 = <replaceable>constant</replaceable></literal>. So we get the same inner-scan plan and costs that we'd get from, say, <literal>EXPLAIN SELECT * FROM tenk2 WHERE unique2 = 42</literal>. The costs of the loop node are then set on the basis of the cost of the outer scan, plus one repetition of the inner scan for each outer row (106 * 3.01, here), plus a little CPU time for join processing. </para> <para> In this example the join's output row count is the same as the product of the two scans' row counts, but that's not true in general, because in general you can have <literal>WHERE</> clauses that mention both tables and so can only be applied at the join point, not to either input scan. For example, if we added <literal>WHERE ... AND t1.hundred < t2.hundred</literal>, that would decrease the output row count of the join node, but not change either input scan. </para> <para> One way to look at variant plans is to force the planner to disregard whatever strategy it thought was the winner, using the enable/disable flags described in <xref linkend="runtime-config-query-enable">. (This is a crude tool, but useful. See also <xref linkend="explicit-joins">.)<programlisting>SET enable_nestloop = off;EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN------------------------------------------------------------------------------------------ Hash Join (cost=232.61..741.67 rows=106 width=488) Hash Cond: ("outer".unique2 = "inner".unique2) -> Seq Scan on tenk2 t2 (cost=0.00..458.00 rows=10000 width=244) -> Hash (cost=232.35..232.35 rows=106 width=244) -> Bitmap Heap Scan on tenk1 t1 (cost=2.37..232.35 rows=106 width=244) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0) Index Cond: (unique1 < 100)</programlisting> This plan proposes to extract the 100 interesting rows of <classname>tenk1</classname> using that same old index scan, stash them into an in-memory hash table, and then do a sequential scan of <classname>tenk2</classname>, probing into the hash table for possible matches of <literal>t1.unique2 = t2.unique2</literal> at each <classname>tenk2</classname> row. The cost to read <classname>tenk1</classname> and set up the hash table is entirely start-up cost for the hash join, since we won't get any rows out until we can start reading <classname>tenk2</classname>. The total time estimate for the join also includes a hefty charge for the CPU time to probe the hash table 10000 times. Note, however, that we are <emphasis>not</emphasis> charging 10000 times 232.35; the hash table setup is only done once in this plan type. </para> <para> It is possible to check on the accuracy of the planner's estimated costs by using <command>EXPLAIN ANALYZE</>. This command actually executes the query, and then displays the true run time accumulated within each plan node along with the same estimated costs that a plain <command>EXPLAIN</command> shows. For example, we might get a result like this:<screen>EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=2.37..553.11 rows=106 width=488) (actual time=1.392..12.700 rows=100 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=2.37..232.35 rows=106 width=244) (actual time=0.878..2.367 rows=100 loops=1) Recheck Cond: (unique1 < 100) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0) (actual time=0.546..0.546 rows=100 loops=1) Index Cond: (unique1 < 100) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..3.01 rows=1 width=244) (actual time=0.067..0.078 rows=1 loops=100) Index Cond: ("outer".unique2 = t2.unique2) Total runtime: 14.452 ms</screen> Note that the <quote>actual time</quote> values are in milliseconds of real time, whereas the <quote>cost</quote> estimates are expressed in arbitrary units of disk fetches; so they are unlikely to match up. The thing to pay attention to is the ratios. </para> <para> In some query plans, it is possible for a subplan node to be executed more than once. For example, the inner index scan is executed once per outer row in the above nested-loop plan. In such cases, the <quote>loops</quote> value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown. Multiply by the <quote>loops</quote> value to get the total time actually spent in the node. </para> <para> The <literal>Total runtime</literal> shown by <command>EXPLAIN ANALYZE</command> includes executor start-up and shut-down time, as well as time spent processing the result rows. It does not include parsing, rewriting, or planning time. For a <command>SELECT</> query, the total run time will normally be just a little larger than the total time reported for the top-level plan node. For <command>INSERT</>, <command>UPDATE</>, and <command>DELETE</> commands, the total run time may be considerably larger, because it includes the time spent processing the result rows. In these commands, the time for the top plan node essentially is the time spent computing the new rows and/or locating the old ones, but it doesn't include the time spent making the changes. Time spent firing triggers, if any, is also outside the top plan node, and is shown separately for each trigger. </para> <para> It is worth noting that <command>EXPLAIN</> results should not be extrapolated to situations other than the one you are actually testing; for example, results on a toy-sized table can't be assumed to apply to large tables. The planner's cost estimates are not linear and so it may well choose a different plan for a larger or smaller table. An extreme example is that on a table that only occupies one disk page, you'll nearly always get a sequential scan plan whether indexes are available or not. The planner realizes that it's going to take one disk page read to process the table in any case, so there's no value in expending additional page reads to look at an index. </para> </sect1> <sect1 id="planner-stats"> <title>Statistics Used by the Planner</title> <indexterm zone="planner-stats"> <primary>statistics</primary> <secondary>of the planner</secondary> </indexterm> <para> As we saw in the previous section, the query planner needs to estimate the number of rows retrieved by a query in order to make good choices of query plans. This section provides a quick look at the statistics that the system uses for these estimates. </para> <para> One component of the statistics is the total number of entries in each table and index, as well as the number of disk blocks occupied by each table and index. This information is kept in the table <link linkend="catalog-pg-class"><structname>pg_class</structname></link>, in the columns <structfield>reltuples</structfield> and <structfield>relpages</structfield>. We can look at it with queries similar to this one:<screen>SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%'; relname | relkind | reltuples | relpages----------------------+---------+-----------+---------- tenk1 | r | 10000 | 358 tenk1_hundred | i | 10000 | 30 tenk1_thous_tenthous | i | 10000 | 30 tenk1_unique1 | i | 10000 | 30 tenk1_unique2 | i | 10000 | 30(5 rows)</screen> Here we can see that <structname>tenk1</structname> contains 10000 rows, as do its indexes, but the indexes are (unsurprisingly) much smaller than the table. </para> <para> For efficiency reasons, <structfield>reltuples</structfield> and <structfield>relpages</structfield> are not updated on-the-fly, and so they usually contain somewhat out-of-date values. They are updated by <command>VACUUM</>, <command>ANALYZE</>, and a few DDL commands such as <command>CREATE INDEX</>. A stand-alone <command>ANALYZE</>, that is one not part of <command>VACUUM</>, generates an approximate <structfield>reltuples</structfield> value since it does not read every row of the table. The planner will scale the values it finds in <structname>pg_class</structname> to match the current physical table size, thus obtaining a closer approximation. </para> <indexterm> <primary>pg_statistic</primary> </indexterm> <para> Most queries retrieve only a fraction of the rows in a table, due to having <literal>WHERE</> clauses that restrict the rows to be examined. The planner thus needs to make an estimate of the <firstterm>selectivity</> of <literal>WHERE</> clauses, that is, the fraction of rows that match each condition in the <literal>WHERE</> clause. The information used for this task is stored in the <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link> system catalog. Entries in <structname>pg_statistic</structname> are updated by the <command>ANALYZE</> and <command>VACUUM ANALYZE</> commands, and are always approximate even when freshly updated. </para> <indexterm> <primary>pg_stats</primary> </indexterm> <para> Rather than look at <structname>pg_statistic</structname> directly, it's better to look at its view <structname>pg_stats</structname> when examining the statistics manually. <structname>pg_stats</structname> is designed to be more easily readable. Furthermore, <structname>pg_stats</structname> is readable by all, whereas <structname>pg_statistic</structname> is only readable by a superuser. (This prevents unprivileged users from learning something about the contents of other people's tables from the statistics. The <structname>pg_stats</structname> view is restricted to show only rows about tables that the current user can read.) For example, we might do:<screen>SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road'; attname | n_distinct | most_common_vals ---------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- name | -0.467008 | {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "} thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}(2 rows)</screen> </para> <para> <structname>pg_stats</structname> is described in detail in <xref linkend="view-pg-stats">. </para> <para> The amount of information stored in <structname>pg_statistic</structname>, in particular the maximum number of entries in the <structfield>most_common_vals</> and <structfield>histogram_bounds</> arrays for each column, can be set on a column-by-column basis using the <command>ALTER TABLE SET STATISTICS</> command, or globally by setting the <xref linkend="guc-default-statistics-target"> configuration variable. The default limit is presently 10 entries. Raising the limit may allow more accurate planner estimates to be made, particularly for columns with irregular data distributions, at the price of consuming more space in <structname>pg_statistic</structname> and slightly more time to compute the estimates. Conversely, a lower limit may be appropriate for columns with simple data distributions. </para> </sect1> <sect1 id="explicit-joins"> <title>Controlling the Planner with Explicit <literal>JOIN</> Clauses</title> <indexterm zone="explicit-joins"> <primary>join</primary> <secondary>controlling the order</secondary> </indexterm> <para> It is possible to control the query planner to some extent by using the explicit <literal>JOIN</> syntax. To see why this matters, we first need some background. </para> <para> In a simple join query, such as<programlisting>SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;</programlisting> the planner is free to join the given tables in any order. For example, it could generate a query plan that joins A to B, using the <literal>WHERE</> condition <literal>a.id = b.id</>, and then joins C to this joined table, using the other <literal>WHERE</> condition. Or it could join B to C and then join A to that result. Or it could join A to C and then join them with B — but that would be inefficient, since the full Cartesian product of A and C would have to be formed, there being no applicable condition in the <literal>WHERE</> clause to allow optimization of the join. (All joins in the <productname>PostgreSQL</productname> executor happen between two input tables, so it's necessary to build up the result in one or another of these fashions.) The important point is that these different join possibilities give semantically equivalent results but may have hugely different execution costs. Therefore, the planner will explore all of them to try to find the most efficient query plan. </para> <para> When a query only involves two or three tables, there aren't many join orders to worry about. But the number of possible join orders grows exponentially as the number of tables expands. Beyond ten or so input tables it's no longer practical to do an exhaustive search of all the possibilities, and even for six or seven tables planning may take an annoyingly long time. When there are too many input tables, the <productname>PostgreSQL</productname> planner will switch from exhaustive search to a <firstterm>genetic</firstterm> probabilistic search through a limited number of possibilities. (The switch-over threshold is set by the <xref linkend="guc-geqo-threshold"> run-time parameter.) The genetic search takes less time, but it won't necessarily find the best possible plan. </para> <para> When the query involves outer joins, the planner has much less freedom than it does for plain (inner) joins. For example, consider<programlisting>SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);</programlisting>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -