📄 perform.sgml

📁 PostgreSQL7.4.6 for Linux
💻 SGML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
   would be inefficient, since the full Cartesian product of A and C   would have to be formed, there being no applicable condition in the   <literal>WHERE</> clause to allow optimization of the join.  (All   joins in the <productname>PostgreSQL</productname> executor happen   between two input tables, so it's necessary to build up the result   in one or another of these fashions.)  The important point is that   these different join possibilities give semantically equivalent   results but may have hugely different execution costs.  Therefore,   the planner will explore all of them to try to find the most   efficient query plan.  </para>  <para>   When a query only involves two or three tables, there aren't many join   orders to worry about.  But the number of possible join orders grows   exponentially as the number of tables expands.  Beyond ten or so input   tables it's no longer practical to do an exhaustive search of all the   possibilities, and even for six or seven tables planning may take an   annoyingly long time.  When there are too many input tables, the   <productname>PostgreSQL</productname> planner will switch from exhaustive   search to a <firstterm>genetic</firstterm> probabilistic search   through a limited number of possibilities.  (The switch-over threshold is   set by the <varname>geqo_threshold</varname> run-time   parameter.)   The genetic search takes less time, but it won't   necessarily find the best possible plan.  </para>  <para>   When the query involves outer joins, the planner has much less freedom   than it does for plain (inner) joins. For example, consider<programlisting>SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);</programlisting>   Although this query's restrictions are superficially similar to the   previous example, the semantics are different because a row must be   emitted for each row of A that has no matching row in the join of B and C.   Therefore the planner has no choice of join order here: it must join   B to C and then join A to that result.  Accordingly, this query takes   less time to plan than the previous query.  </para>  <para>   Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS   JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as   listing the input relations in <literal>FROM</>, so it does not need to   constrain the join order.  But it is possible to instruct the   <productname>PostgreSQL</productname> query planner to treat   explicit inner <literal>JOIN</>s as constraining the join order anyway.   For example, these three queries are logically equivalent:<programlisting>SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);</programlisting>   But if we tell the planner to honor the <literal>JOIN</> order,   the second and third take less time to plan than the first.  This effect   is not worth worrying about for only three tables, but it can be a   lifesaver with many tables.  </para>  <para>   To force the planner to follow the <literal>JOIN</> order for inner joins,   set the <varname>join_collapse_limit</> run-time parameter to 1.   (Other possible values are discussed below.)  </para>  <para>   You do not need to constrain the join order completely in order to   cut search time, because it's OK to use <literal>JOIN</> operators   within items of a plain <literal>FROM</> list.  For example, consider<programlisting>SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;</programlisting>   With <varname>join_collapse_limit</> = 1, this   forces the planner to join A to B before joining them to other tables,   but doesn't constrain its choices otherwise.  In this example, the   number of possible join orders is reduced by a factor of 5.  </para>  <para>   Constraining the planner's search in this way is a useful technique   both for reducing planning time and for directing the planner to a   good query plan.  If the planner chooses a bad join order by default,   you can force it to choose a better order via <literal>JOIN</> syntax   --- assuming that you know of a better order, that is.  Experimentation   is recommended.  </para>  <para>   A closely related issue that affects planning time is collapsing of   subqueries into their parent query.  For example, consider<programlisting>SELECT *FROM x, y,    (SELECT * FROM a, b, c WHERE something) AS ssWHERE somethingelse;</programlisting>   This situation might arise from use of a view that contains a join;   the view's <literal>SELECT</> rule will be inserted in place of the view reference,   yielding a query much like the above.  Normally, the planner will try   to collapse the subquery into the parent, yielding<programlisting>SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;</programlisting>   This usually results in a better plan than planning the subquery   separately.  (For example, the outer <literal>WHERE</> conditions might be such that   joining X to A first eliminates many rows of A, thus avoiding the need to   form the full logical output of the subquery.)  But at the same time,   we have increased the planning time; here, we have a five-way join   problem replacing two separate three-way join problems.  Because of the   exponential growth of the number of possibilities, this makes a big   difference.  The planner tries to avoid getting stuck in huge join search   problems by not collapsing a subquery if more than   <varname>from_collapse_limit</> <literal>FROM</> items would result in the parent   query.  You can trade off planning time against quality of plan by   adjusting this run-time parameter up or down.  </para>  <para>   <varname>from_collapse_limit</> and <varname>join_collapse_limit</>   are similarly named because they do almost the same thing: one controls   when the planner will <quote>flatten out</> subselects, and the   other controls when it will flatten out explicit inner joins.  Typically   you would either set <varname>join_collapse_limit</> equal to   <varname>from_collapse_limit</> (so that explicit joins and subselects   act similarly) or set <varname>join_collapse_limit</> to 1 (if you want   to control join order with explicit joins).  But you might set them   differently if you are trying to fine-tune the trade off between planning   time and run time.  </para> </sect1> <sect1 id="populate">  <title>Populating a Database</title>  <para>   One may need to do a large number of table insertions when first   populating a database. Here are some tips and techniques for making that as   efficient as possible.  </para>  <sect2 id="disable-autocommit">   <title>Disable Autocommit</title>   <indexterm zone="disable-autocommit">    <primary>autocommit</primary>   </indexterm>   <para>    Turn off autocommit and just do one commit at    the end.  (In plain SQL, this means issuing <command>BEGIN</command>    at the start and <command>COMMIT</command> at the end.  Some client    libraries may do this behind your back, in which case you need to    make sure the library does it when you want it done.)    If you allow each insertion to be committed separately,    <productname>PostgreSQL</productname> is doing a lot of work for each    row added.    An additional benefit of doing all insertions in one transaction    is that if the insertion of one row were to fail then the    insertion of all rows inserted up to that point would be rolled    back, so you won't be stuck with partially loaded data.   </para>  </sect2>  <sect2 id="populate-copy-from">   <title>Use <command>COPY FROM</command></title>   <para>    Use <command>COPY FROM STDIN</command> to load all the rows in one    command, instead of using a series of <command>INSERT</command>    commands.  This reduces parsing, planning, etc.  overhead a great    deal. If you do this then it is not necessary to turn off    autocommit, since it is only one command anyway.   </para>  </sect2>  <sect2 id="populate-rm-indexes">   <title>Remove Indexes</title>   <para>    If you are loading a freshly created table, the fastest way is to    create the table, bulk load the table's data using    <command>COPY</command>, then create any indexes needed for the    table.  Creating an index on pre-existing data is quicker than    updating it incrementally as each row is loaded.   </para>   <para>    If you are augmenting an existing table, you can drop the index,    load the table, then recreate the index. Of    course, the database performance for other users may be adversely     affected during the time that the index is missing.  One should also    think twice before dropping unique indexes, since the error checking    afforded by the unique constraint will be lost while the index is missing.   </para>  </sect2>  <sect2 id="populate-sort-mem">   <title>Increase <varname>sort_mem</varname></title>   <para>    Temporarily increasing the <varname>sort_mem</varname>    configuration variable when restoring large amounts of data can    lead to improved performance. This is because when a B-tree index    is created from scratch, the existing content of the table needs    to be sorted. Allowing the merge sort to use more buffer pages    means that fewer merge passes will be required.   </para>  </sect2>  <sect2 id="populate-analyze">   <title>Run <command>ANALYZE</command> Afterwards</title>   <para>    It's a good idea to run <command>ANALYZE</command> or <command>VACUUM    ANALYZE</command> anytime you've added or updated a lot of data,    including just after initially populating a table.  This ensures that    the planner has up-to-date statistics about the table.  With no statistics    or obsolete statistics, the planner may make poor choices of query plans,    leading to bad performance on queries that use your table.   </para>  </sect2>  </sect1> </chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -