📄 perform.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
   Although this query's restrictions are superficially similar to the   previous example, the semantics are different because a row must be   emitted for each row of A that has no matching row in the join of B and C.   Therefore the planner has no choice of join order here: it must join   B to C and then join A to that result.  Accordingly, this query takes   less time to plan than the previous query.  </para>  <para>   Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS   JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as   listing the input relations in <literal>FROM</>, so it does not need to   constrain the join order.  But it is possible to instruct the   <productname>PostgreSQL</productname> query planner to treat   explicit inner <literal>JOIN</>s as constraining the join order anyway.   For example, these three queries are logically equivalent:<programlisting>SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);</programlisting>   But if we tell the planner to honor the <literal>JOIN</> order,   the second and third take less time to plan than the first.  This effect   is not worth worrying about for only three tables, but it can be a   lifesaver with many tables.  </para>  <para>   To force the planner to follow the <literal>JOIN</> order for inner joins,   set the <xref linkend="guc-join-collapse-limit"> run-time parameter to 1.   (Other possible values are discussed below.)  </para>  <para>   You do not need to constrain the join order completely in order to   cut search time, because it's OK to use <literal>JOIN</> operators   within items of a plain <literal>FROM</> list.  For example, consider<programlisting>SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;</programlisting>   With <varname>join_collapse_limit</> = 1, this   forces the planner to join A to B before joining them to other tables,   but doesn't constrain its choices otherwise.  In this example, the   number of possible join orders is reduced by a factor of 5.  </para>  <para>   Constraining the planner's search in this way is a useful technique   both for reducing planning time and for directing the planner to a   good query plan.  If the planner chooses a bad join order by default,   you can force it to choose a better order via <literal>JOIN</> syntax   &mdash; assuming that you know of a better order, that is.  Experimentation   is recommended.  </para>  <para>   A closely related issue that affects planning time is collapsing of   subqueries into their parent query.  For example, consider<programlisting>SELECT *FROM x, y,    (SELECT * FROM a, b, c WHERE something) AS ssWHERE somethingelse;</programlisting>   This situation might arise from use of a view that contains a join;   the view's <literal>SELECT</> rule will be inserted in place of the view reference,   yielding a query much like the above.  Normally, the planner will try   to collapse the subquery into the parent, yielding<programlisting>SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;</programlisting>   This usually results in a better plan than planning the subquery   separately.  (For example, the outer <literal>WHERE</> conditions might be such that   joining X to A first eliminates many rows of A, thus avoiding the need to   form the full logical output of the subquery.)  But at the same time,   we have increased the planning time; here, we have a five-way join   problem replacing two separate three-way join problems.  Because of the   exponential growth of the number of possibilities, this makes a big   difference.  The planner tries to avoid getting stuck in huge join search   problems by not collapsing a subquery if more than <varname>from_collapse_limit</>   <literal>FROM</> items would result in the parent   query.  You can trade off planning time against quality of plan by   adjusting this run-time parameter up or down.  </para>  <para>   <xref linkend="guc-from-collapse-limit"> and <xref   linkend="guc-join-collapse-limit">   are similarly named because they do almost the same thing: one controls   when the planner will <quote>flatten out</> subselects, and the   other controls when it will flatten out explicit inner joins.  Typically   you would either set <varname>join_collapse_limit</> equal to   <varname>from_collapse_limit</> (so that explicit joins and subselects   act similarly) or set <varname>join_collapse_limit</> to 1 (if you want   to control join order with explicit joins).  But you might set them   differently if you are trying to fine-tune the trade off between planning   time and run time.  </para> </sect1> <sect1 id="populate">  <title>Populating a Database</title>  <para>   One may need to insert a large amount of data when first populating   a database. This section contains some suggestions on how to make   this process as efficient as possible.  </para>  <sect2 id="disable-autocommit">   <title>Disable Autocommit</title>   <indexterm>    <primary>autocommit</primary>    <secondary>bulk-loading data</secondary>   </indexterm>   <para>    Turn off autocommit and just do one commit at the end.  (In plain    SQL, this means issuing <command>BEGIN</command> at the start and    <command>COMMIT</command> at the end.  Some client libraries may    do this behind your back, in which case you need to make sure the    library does it when you want it done.)  If you allow each    insertion to be committed separately,    <productname>PostgreSQL</productname> is doing a lot of work for    each row that is added.  An additional benefit of doing all    insertions in one transaction is that if the insertion of one row    were to fail then the insertion of all rows inserted up to that    point would be rolled back, so you won't be stuck with partially    loaded data.   </para>  </sect2>  <sect2 id="populate-copy-from">   <title>Use <command>COPY</command></title>   <para>    Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load    all the rows in one command, instead of using a series of    <command>INSERT</command> commands.  The <command>COPY</command>    command is optimized for loading large numbers of rows; it is less    flexible than <command>INSERT</command>, but incurs significantly    less overhead for large data loads. Since <command>COPY</command>    is a single command, there is no need to disable autocommit if you    use this method to populate a table.   </para>   <para>    If you cannot use <command>COPY</command>, it may help to use <xref    linkend="sql-prepare" endterm="sql-prepare-title"> to create a    prepared <command>INSERT</command> statement, and then use    <command>EXECUTE</command> as many times as required.  This avoids    some of the overhead of repeatedly parsing and planning    <command>INSERT</command>.   </para>   <para>    Note that loading a large number of rows using    <command>COPY</command> is almost always faster than using    <command>INSERT</command>, even if <command>PREPARE</> is used and    multiple insertions are batched into a single transaction.   </para>  </sect2>  <sect2 id="populate-rm-indexes">   <title>Remove Indexes</title>   <para>    If you are loading a freshly created table, the fastest way is to    create the table, bulk load the table's data using    <command>COPY</command>, then create any indexes needed for the    table.  Creating an index on pre-existing data is quicker than    updating it incrementally as each row is loaded.   </para>   <para>    If you are adding large amounts of data to an existing table,    it may be a win to drop the index,    load the table, and then recreate the index.  Of course, the    database performance for other users may be adversely affected    during the time that the index is missing.  One should also think    twice before dropping unique indexes, since the error checking    afforded by the unique constraint will be lost while the index is    missing.   </para>  </sect2>  <sect2 id="populate-rm-fkeys">   <title>Remove Foreign Key Constraints</title>   <para>    Just as with indexes, a foreign key constraint can be checked    <quote>in bulk</> more efficiently than row-by-row.  So it may be    useful to drop foreign key constraints, load data, and re-create    the constraints.  Again, there is a trade-off between data load    speed and loss of error checking while the constraint is missing.   </para>  </sect2>  <sect2 id="populate-work-mem">   <title>Increase <varname>maintenance_work_mem</varname></title>   <para>    Temporarily increasing the <xref linkend="guc-maintenance-work-mem">    configuration variable when loading large amounts of data can    lead to improved performance.  This will help to speed up <command>CREATE    INDEX</> commands and <command>ALTER TABLE ADD FOREIGN KEY</> commands.    It won't do much for <command>COPY</> itself, so this advice is    only useful when you are using one or both of the above techniques.   </para>  </sect2>  <sect2 id="populate-checkpoint-segments">   <title>Increase <varname>checkpoint_segments</varname></title>   <para>    Temporarily increasing the <xref    linkend="guc-checkpoint-segments"> configuration variable can also    make large data loads faster.  This is because loading a large    amount of data into <productname>PostgreSQL</productname> will    cause checkpoints to occur more often than the normal checkpoint    frequency (specified by the <varname>checkpoint_timeout</varname>    configuration variable). Whenever a checkpoint occurs, all dirty    pages must be flushed to disk. By increasing    <varname>checkpoint_segments</varname> temporarily during bulk    data loads, the number of checkpoints that are required can be    reduced.   </para>  </sect2>  <sect2 id="populate-analyze">   <title>Run <command>ANALYZE</command> Afterwards</title>   <para>    Whenever you have significantly altered the distribution of data    within a table, running <xref linkend="sql-analyze"    endterm="sql-analyze-title"> is strongly recommended. This    includes bulk loading large amounts of data into the table.  Running    <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)    ensures that the planner has up-to-date statistics about the    table.  With no statistics or obsolete statistics, the planner may    make poor decisions during query planning, leading to poor    performance on any tables with inaccurate or nonexistent    statistics.   </para>  </sect2>  <sect2 id="populate-pg-dump">   <title>Some Notes About <application>pg_dump</></title>   <para>    Dump scripts generated by <application>pg_dump</> automatically apply    several, but not all, of the above guidelines.  To reload a    <application>pg_dump</> dump as quickly as possible, you need to    do a few extra things manually.  (Note that these points apply while    <emphasis>restoring</> a dump, not while <emphasis>creating</> it.    The same points apply when using <application>pg_restore</> to load    from a <application>pg_dump</> archive file.)   </para>   <para>    By default, <application>pg_dump</> uses <command>COPY</>, and when    it is generating a complete schema-and-data dump, it is careful to    load data before creating indexes and foreign keys.  So in this case    the first several guidelines are handled automatically.  What is left    for you to do is to set appropriate (i.e., larger than normal) values    for <varname>maintenance_work_mem</varname> and    <varname>checkpoint_segments</varname> before loading the dump script,    and then to run <command>ANALYZE</> afterwards.   </para>   <para>    A data-only dump will still use <command>COPY</>, but it does not    drop or recreate indexes, and it does not normally touch foreign    keys.     <footnote>      <para>       You can get the effect of disabling foreign keys by using       the <option>-X disable-triggers</> option &mdash; but realize that       that eliminates, rather than just postponing, foreign key       validation, and so it is possible to insert bad data if you use it.      </para>     </footnote>    So when loading a data-only dump, it is up to you to drop and recreate    indexes and foreign keys if you wish to use those techniques.    It's still useful to increase <varname>checkpoint_segments</varname>    while loading the data, but don't bother increasing    <varname>maintenance_work_mem</varname>; rather, you'd do that while    manually recreating indexes and foreign keys afterwards.    And don't forget to <command>ANALYZE</> when you're done.   </para>  </sect2>  </sect1> </chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
上一页 1 23
💿 文件大小 14179 K
👤 上传用户 babydog00
📂 所属分类其他数据库
🏷️ 相关标签

#PostgreSQL #Linux #源码 #开源
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -