📄 indices.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.54.2.1 2006/01/18 22:26:01 momjian Exp $ --><chapter id="indexes"> <title id="indexes-title">Indexes</title> <indexterm zone="indexes">  <primary>index</primary> </indexterm> <para>  Indexes are a common way to enhance database performance.  An index  allows the database server to find and retrieve specific rows much  faster than it could do without an index.  But indexes also add  overhead to the database system as a whole, so they should be used  sensibly. </para> <sect1 id="indexes-intro">  <title>Introduction</title>  <para>   Suppose we have a table similar to this:<programlisting>CREATE TABLE test1 (    id integer,    content varchar);</programlisting>   and the application requires a lot of queries of the form<programlisting>SELECT content FROM test1 WHERE id = <replaceable>constant</replaceable>;</programlisting>   With no advance preparation, the system would have to scan the entire   <structname>test1</structname> table, row by row, to find all   matching entries.  If there are a lot of rows in   <structname>test1</structname> and only a few rows (perhaps only zero   or one) that would be returned by such a query, then this is clearly an   inefficient method.  But if the system has been instructed to maintain an   index on the <structfield>id</structfield> column, then it can use a more   efficient method for locating matching rows.  For instance, it   might only have to walk a few levels deep into a search tree.  </para>  <para>   A similar approach is used in most books of non-fiction:  terms and   concepts that are frequently looked up by readers are collected in   an alphabetic index at the end of the book.  The interested reader   can scan the index relatively quickly and flip to the appropriate   page(s), rather than having to read the entire book to find the   material of interest.  Just as it is the task of the author to   anticipate the items that the readers are likely to look up,   it is the task of the database programmer to foresee which indexes   will be of advantage.  </para>  <para>   The following command would be used to create the index on the   <structfield>id</structfield> column, as discussed:<programlisting>CREATE INDEX test1_id_index ON test1 (id);</programlisting>   The name <structname>test1_id_index</structname> can be chosen   freely, but you should pick something that enables you to remember   later what the index was for.  </para>  <para>   To remove an index, use the <command>DROP INDEX</command> command.   Indexes can be added to and removed from tables at any time.  </para>  <para>   Once an index is created, no further intervention is required: the   system will update the index when the table is modified, and it will   use the index in queries when it thinks this would be more efficient   than a sequential table scan.  But you may have to run the   <command>ANALYZE</command> command regularly to update   statistics to allow the query planner to make educated decisions.   See <xref linkend="performance-tips"> for information about   how to find out whether an index is used and when and why the   planner may choose <emphasis>not</emphasis> to use an index.  </para>  <para>   Indexes can also benefit <command>UPDATE</command> and   <command>DELETE</command> commands with search conditions.   Indexes can moreover be used in join searches.  Thus,   an index defined on a column that is part of a join condition can   significantly speed up queries with joins.  </para>  <para>   After an index is created, the system has to keep it synchronized with the   table.  This adds overhead to data manipulation operations.   Therefore indexes that are seldom or never used in queries   should be removed.  </para> </sect1> <sect1 id="indexes-types">  <title>Index Types</title>  <para>   <productname>PostgreSQL</productname> provides several index types:   B-tree, R-tree, Hash, and GiST.  Each index type uses a different   algorithm that is best suited to different types of queries.   By default, the <command>CREATE INDEX</command> command will create a   B-tree index, which fits the most common situations.  </para>  <para>   <indexterm>    <primary>index</primary>    <secondary>B-tree</secondary>   </indexterm>   <indexterm>    <primary>B-tree</primary>    <see>index</see>   </indexterm>   B-trees can handle equality and range queries on data that can be sorted   into some ordering.   In particular, the <productname>PostgreSQL</productname> query planner   will consider using a B-tree index whenever an indexed column is   involved in a comparison using one of these operators:   <simplelist>    <member><literal>&lt;</literal></member>    <member><literal>&lt;=</literal></member>    <member><literal>=</literal></member>    <member><literal>&gt;=</literal></member>    <member><literal>&gt;</literal></member>   </simplelist>   Constructs equivalent to combinations of these operators, such as   <literal>BETWEEN</> and <literal>IN</>, can also be implemented with   a B-tree index search.  (But note that <literal>IS NULL</> is not   equivalent to <literal>=</> and is not indexable.)  </para>  <para>   The optimizer can also use a B-tree index for queries involving the   pattern matching operators <literal>LIKE</> and <literal>~</literal>   <emphasis>if</emphasis> the pattern is a constant and is anchored to   the beginning of the string &mdash; for example, <literal>col LIKE   'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not   <literal>col LIKE '%bar'</literal>. However, if your server does not   use the C locale you will need to create the index with a special   operator class to support indexing of pattern-matching queries. See   <xref linkend="indexes-opclass"> below. It is also possible to use   B-tree indexes for <literal>ILIKE</literal> and   <literal>~*</literal>, but only if the pattern starts with   non-alphabetic characters, i.e. characters that are not affected by   upper/lower case conversion.  </para>  <para>   <indexterm>    <primary>index</primary>    <secondary>R-tree</secondary>   </indexterm>   <indexterm>    <primary>R-tree</primary>    <see>index</see>   </indexterm>   R-tree indexes are suited for queries on two-dimensional spatial data.   To create an R-tree index, use a command of the form<synopsis>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING rtree (<replaceable>column</replaceable>);</synopsis>   The <productname>PostgreSQL</productname> query planner will   consider using an R-tree index whenever an indexed column is   involved in a comparison using one of these operators:   <simplelist>    <member><literal>&lt;&lt;</literal></member>    <member><literal>&amp;&lt;</literal></member>    <member><literal>&amp;&gt;</literal></member>    <member><literal>&gt;&gt;</literal></member>    <member><literal>&lt;&lt;|</literal></member>    <member><literal>&amp;&lt;|</literal></member>    <member><literal>|&amp;&gt;</literal></member>    <member><literal>|&gt;&gt;</literal></member>    <member><literal>~</literal></member>    <member><literal>@</literal></member>    <member><literal>~=</literal></member>    <member><literal>&amp;&amp;</literal></member>   </simplelist>   (See <xref linkend="functions-geometry"> for the meaning of   these operators.)  </para>  <para>   <indexterm>    <primary>index</primary>    <secondary>hash</secondary>   </indexterm>   <indexterm>    <primary>hash</primary>    <see>index</see>   </indexterm>   Hash indexes can only handle simple equality comparisons.   The query planner will consider using a hash index whenever an   indexed column is involved in a comparison using the   <literal>=</literal> operator.  The following command is used to   create a hash index:<synopsis>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING hash (<replaceable>column</replaceable>);</synopsis>  </para>  <para>   GiST indexes are not a single kind of index, but rather an infrastructure   within which many different indexing strategies can be implemented.   Accordingly, the particular operators with which a GiST index can be   used vary depending on the indexing strategy (the <firstterm>operator   class</>).  The standard distribution of   <productname>PostgreSQL</productname> includes GiST operator classes   equivalent to the R-tree operator classes, and many other GiST operator   classes are available in the <literal>contrib</> collection or as separate   projects.  For more information see <xref linkend="GiST">.  </para>  <note>   <para>    Testing has shown <productname>PostgreSQL</productname>'s hash    indexes to perform no better than B-tree indexes, and the    index size and build time for hash indexes is much worse.    Furthermore, hash index operations are not presently WAL-logged,    so hash indexes may need to be rebuilt with <command>REINDEX</>    after a database crash.    For these reasons, hash index use is presently discouraged.   </para>   <para>    Similarly, R-tree indexes do not seem to have any performance    advantages compared to the equivalent operations of GiST indexes.    Like hash indexes, they are not WAL-logged and may need    reindexing after a database crash.   </para>   <para>    While the problems with hash indexes may be fixed eventually,    it is likely that the R-tree index type will be retired in a future    release.  Users are encouraged to migrate applications that use R-tree    indexes to GiST indexes.   </para>  </note>   </sect1> <sect1 id="indexes-multicolumn">  <title>Multicolumn Indexes</title>  <indexterm zone="indexes-multicolumn">   <primary>index</primary>   <secondary>multicolumn</secondary>  </indexterm>  <para>   An index can be defined on more than one column of a table.  For example, if   you have a table of this form:<programlisting>CREATE TABLE test2 (  major int,  minor int,  name varchar);</programlisting>   (say, you keep your <filename class="directory">/dev</filename>   directory in a database...) and you frequently make queries like<programlisting>SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> AND minor = <replaceable>constant</replaceable>;</programlisting>   then it may be appropriate to define an index on the columns   <structfield>major</structfield> and   <structfield>minor</structfield> together, e.g.,<programlisting>CREATE INDEX test2_mm_idx ON test2 (major, minor);</programlisting>  </para>  <para>   Currently, only the B-tree and GiST index types support multicolumn   indexes.  Up to 32 columns may be specified.  (This limit can be   altered when building <productname>PostgreSQL</productname>; see the   file <filename>pg_config_manual.h</filename>.)  </para>  <para>   A multicolumn B-tree index can be used with query conditions that   involve any subset of the index's columns, but the index is most   efficient when there are constraints on the leading (leftmost) columns.   The exact rule is that equality constraints on leading columns, plus   any inequality constraints on the first column that does not have an   equality constraint, will be used to limit the portion of the index   that is scanned.  Constraints on columns to the right of these columns   are checked in the index, so they save visits to the table proper, but   they do not reduce the portion of the index that has to be scanned.   For example, given an index on <literal>(a, b, c)</literal> and a   query condition <literal>WHERE a = 5 AND b &gt;= 42 AND c &lt; 77</>,   the index would have to be scanned from the first entry with   <literal>a</> = 5 and <literal>b</> = 42 up through the last entry with   <literal>a</> = 5.  Index entries with <literal>c</> &gt;= 77 would be   skipped, but they'd still have to be scanned through.   This index could in principle be used for queries that have constraints   on <literal>b</> and/or <literal>c</> with no constraint on <literal>a</>   &mdash; but the entire index would have to be scanned, so in most cases   the planner would prefer a sequential table scan over using the index.  </para>  <para>   A multicolumn GiST index can only be used when there is a query condition   on its leading column.  Conditions on additional columns restrict the   entries returned by the index, but the condition on the first column is the   most important one for determining how much of the index needs to be   scanned.  A GiST index will be relatively ineffective if its first column   has only a few distinct values, even if there are many distinct values in   additional columns.  </para>  <para>   Of course, each column must be used with operators appropriate to the index   type; clauses that involve other operators will not be considered.  </para>
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -