📄 indices.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 3 页
字号:
  <para>   Multicolumn indexes should be used sparingly.  In most situations,   an index on a single column is sufficient and saves space and time.   Indexes with more than three columns are unlikely to be helpful   unless the usage of the table is extremely stylized.  See also   <xref linkend="indexes-bitmap-scans"> for some discussion of the   merits of different index setups.  </para> </sect1> <sect1 id="indexes-bitmap-scans">  <title>Combining Multiple Indexes</title>  <indexterm zone="indexes-bitmap-scans">   <primary>index</primary>   <secondary>combining multiple indexes</secondary>  </indexterm>  <indexterm zone="indexes-bitmap-scans">   <primary>bitmap scan</primary>  </indexterm>  <para>   A single index scan can only use query clauses that use the index's   columns with operators of its operator class and are joined with   <literal>AND</>.  For example, given an index on <literal>(a, b)</literal>   a query condition like <literal>WHERE a = 5 AND b = 6</> could   use the index, but a query like <literal>WHERE a = 5 OR b = 6</> could not   directly use the index.  </para>  <para>   Beginning in release 8.1,   <productname>PostgreSQL</> has the ability to combine multiple indexes   (including multiple uses of the same index) to handle cases that cannot   be implemented by single index scans.  The system can form <literal>AND</>   and <literal>OR</> conditions across several index scans.  For example,   a query like <literal>WHERE x = 42 OR x = 47 OR x = 53 OR x = 99</>   could be broken down into four separate scans of an index on <literal>x</>,   each scan using one of the query clauses.  The results of these scans are   then ORed together to produce the result.  Another example is that if we   have separate indexes on <literal>x</> and <literal>y</>, one possible   implementation of a query like <literal>WHERE x = 5 AND y = 6</> is to   use each index with the appropriate query clause and then AND together   the index results to identify the result rows.  </para>  <para>   To combine multiple indexes, the system scans each needed index and   prepares a <firstterm>bitmap</> in memory giving the locations of   table rows that are reported as matching that index's conditions.   The bitmaps are then ANDed and ORed together as needed by the query.   Finally, the actual table rows are visited and returned.  The table rows   are visited in physical order, because that is how the bitmap is laid   out; this means that any ordering of the original indexes is lost, and   so a separate sort step will be needed if the query has an <literal>ORDER   BY</> clause.  For this reason, and because each additional index scan   adds extra time, the planner will sometimes choose to use a simple index   scan even though additional indexes are available that could have been   used as well.  </para>  <para>   In all but the simplest applications, there are various combinations of   indexes that may be useful, and the database developer must make   trade-offs to decide which indexes to provide.  Sometimes multicolumn   indexes are best, but sometimes it's better to create separate indexes   and rely on the index-combination feature.  For example, if your   workload includes a mix of queries that sometimes involve only column   <literal>x</>, sometimes only column <literal>y</>, and sometimes both   columns, you might choose to create two separate indexes on   <literal>x</> and <literal>y</>, relying on index combination to   process the queries that use both columns.  You could also create a   multicolumn index on <literal>(x, y)</>.  This index would typically be   more efficient than index combination for queries involving both   columns, but as discussed in <xref linkend="indexes-multicolumn">, it   would be almost useless for queries involving only <literal>y</>, so it   could not be the only index.  A combination of the multicolumn index   and a separate index on <literal>y</> would serve reasonably well.  For   queries involving only <literal>x</>, the multicolumn index could be   used, though it would be larger and hence slower than an index on   <literal>x</> alone.  The last alternative is to create all three   indexes, but this is probably only reasonable if the table is searched   much more often than it is updated and all three types of query are   common.  If one of the types of query is much less common than the   others, you'd probably settle for creating just the two indexes that   best match the common types.  </para> </sect1> <sect1 id="indexes-unique">  <title>Unique Indexes</title>  <indexterm zone="indexes-unique">   <primary>index</primary>   <secondary>unique</secondary>  </indexterm>  <para>   Indexes may also be used to enforce uniqueness of a column's value,   or the uniqueness of the combined values of more than one column.<synopsis>CREATE UNIQUE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> (<replaceable>column</replaceable> <optional>, ...</optional>);</synopsis>   Currently, only B-tree indexes can be declared unique.  </para>  <para>   When an index is declared unique, multiple table rows with equal   indexed values will not be allowed.  Null values are not considered   equal.  A multicolumn unique index will only reject cases where all   of the indexed columns are equal in two rows.  </para>  <para>   <productname>PostgreSQL</productname> automatically creates a unique   index when a unique constraint or a primary key is defined for a table.   The index covers the columns that make up the primary key or unique    columns (a multicolumn index, if appropriate), and is the mechanism   that enforces the constraint.  </para>  <note>   <para>    The preferred way to add a unique constraint to a table is    <literal>ALTER TABLE ... ADD CONSTRAINT</literal>.  The use of    indexes to enforce unique constraints could be considered an    implementation detail that should not be accessed directly.    One should, however, be aware that there's no need to manually    create indexes on unique columns; doing so would just duplicate    the automatically-created index.   </para>  </note> </sect1> <sect1 id="indexes-expressional">  <title>Indexes on Expressions</title>  <indexterm zone="indexes-expressional">   <primary>index</primary>   <secondary sortas="expressions">on expressions</secondary>  </indexterm>  <para>   An index column need not be just a column of the underlying table,   but can be a function or scalar expression computed from one or   more columns of the table.  This feature is useful to obtain fast   access to tables based on the results of computations.  </para>  <para>   For example, a common way to do case-insensitive comparisons is to   use the <function>lower</function> function:<programlisting>SELECT * FROM test1 WHERE lower(col1) = 'value';</programlisting>   This query can use an index, if one has been   defined on the result of the <literal>lower(col1)</literal>   operation:<programlisting>CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));</programlisting>  </para>  <para>   If we were to declare this index <literal>UNIQUE</>, it would prevent   creation of rows whose <literal>col1</> values differ only in case,   as well as rows whose <literal>col1</> values are actually identical.   Thus, indexes on expressions can be used to enforce constraints that   are not definable as simple unique constraints.  </para>  <para>   As another example, if one often does queries like this:<programlisting>SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';</programlisting>   then it might be worth creating an index like this:<programlisting>CREATE INDEX people_names ON people ((first_name || ' ' || last_name));</programlisting>  </para>  <para>   The syntax of the <command>CREATE INDEX</> command normally requires   writing parentheses around index expressions, as shown in the second   example.  The parentheses may be omitted when the expression is just   a function call, as in the first example.  </para>  <para>   Index expressions are relatively expensive to maintain, because the   derived expression(s) must be computed for each row upon insertion   and whenever it is updated.  However, the index expressions are   <emphasis>not</> recomputed during an indexed search, since they are   already stored in the index.  In both examples above, the system   sees the query as just <literal>WHERE indexedcolumn = 'constant'</>   and so the speed of the search is equivalent to any other simple index   query.  Thus, indexes on expressions are useful when retrieval speed   is more important than insertion and update speed.  </para> </sect1> <sect1 id="indexes-partial">  <title>Partial Indexes</title>  <indexterm zone="indexes-partial">   <primary>index</primary>   <secondary>partial</secondary>  </indexterm>  <para>   A <firstterm>partial index</firstterm> is an index built over a   subset of a table; the subset is defined by a conditional   expression (called the <firstterm>predicate</firstterm> of the   partial index).  The index contains entries for only those table   rows that satisfy the predicate.  Partial indexes are a specialized   feature, but there are several situations in which they are useful.  </para>  <para>   One major reason for using a partial index is to avoid indexing common   values.  Since a query searching for a common value (one that   accounts for more than a few percent of all the table rows) will not   use the index anyway, there is no point in keeping those rows in the   index at all.  This reduces the size of the index, which will speed   up queries that do use the index.  It will also speed up many table   update operations because the index does not need to be   updated in all cases.  <xref linkend="indexes-partial-ex1"> shows a   possible application of this idea.  </para>  <example id="indexes-partial-ex1">   <title>Setting up a Partial Index to Exclude Common Values</title>   <para>    Suppose you are storing web server access logs in a database.    Most accesses originate from the IP address range of your organization but    some are from elsewhere (say, employees on dial-up connections).    If your searches by IP are primarily for outside accesses,    you probably do not need to index the IP range that corresponds to your    organization's subnet.   </para>   <para>    Assume a table like this:<programlisting>CREATE TABLE access_log (    url varchar,    client_ip inet,    ...);</programlisting>   </para>   <para>    To create a partial index that suits our example, use a command    such as this:<programlisting>CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)    WHERE NOT (client_ip &gt; inet '192.168.100.0' AND client_ip &lt; inet '192.168.100.255');</programlisting>   </para>   <para>    A typical query that can use this index would be:<programlisting>SELECT * FROM access_log WHERE url = '/index.html' AND client_ip = inet '212.78.10.32';</programlisting>    A query that cannot use this index is:<programlisting>SELECT * FROM access_log WHERE client_ip = inet '192.168.100.23';</programlisting>   </para>   <para>    Observe that this kind of partial index requires that the common    values be predetermined.  If the distribution of values is    inherent (due to the nature of the application) and static (not    changing over time), this is not difficult, but if the common values are    merely due to the coincidental data load this can require a lot of    maintenance work to change the index definition from time to time.   </para>  </example>  <para>   Another possible use for a partial index is to exclude values from the   index that the   typical query workload is not interested in; this is shown in <xref   linkend="indexes-partial-ex2">.  This results in the same   advantages as listed above, but it prevents the   <quote>uninteresting</quote> values from being accessed via that   index at all, even if an index scan might be profitable in that   case.  Obviously, setting up partial indexes for this kind of   scenario will require a lot of care and experimentation.  </para>  <example id="indexes-partial-ex2">   <title>Setting up a Partial Index to Exclude Uninteresting Values</title>   <para>    If you have a table that contains both billed and unbilled orders,    where the unbilled orders take up a small fraction of the total    table and yet those are the most-accessed rows, you can improve    performance by creating an index on just the unbilled rows.  The    command to create the index would look like this:<programlisting>CREATE INDEX orders_unbilled_index ON orders (order_nr)    WHERE billed is not true;</programlisting>   </para>   <para>    A possible query to use this index would be<programlisting>SELECT * FROM orders WHERE billed is not true AND order_nr &lt; 10000;</programlisting>    However, the index can also be used in queries that do not involve    <structfield>order_nr</> at all, e.g.,<programlisting>SELECT * FROM orders WHERE billed is not true AND amount &gt; 5000.00;</programlisting>    This is not as efficient as a partial index on the    <structfield>amount</> column would be, since the system has to
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -