📄 indexam.sgml

📁 PostgreSQL 8.1.4的源码适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!--$PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.7 2005/11/04 23:14:00 petere Exp $--><chapter id="indexam"> <title>Index Access Method Interface Definition</title>  <para>   This chapter defines the interface between the core   <productname>PostgreSQL</productname> system and <firstterm>index access   methods</>, which manage individual index types.  The core system   knows nothing about indexes beyond what is specified here, so it is   possible to develop entirely new index types by writing add-on code.  </para>  <para>   All indexes in <productname>PostgreSQL</productname> are what are known   technically as <firstterm>secondary indexes</>; that is, the index is   physically separate from the table file that it describes.  Each index   is stored as its own physical <firstterm>relation</> and so is described   by an entry in the <structname>pg_class</> catalog.  The contents of an   index are entirely under the control of its index access method.  In   practice, all index access methods divide indexes into standard-size   pages so that they can use the regular storage manager and buffer manager   to access the index contents.  (All the existing index access methods   furthermore use the standard page layout described in <xref   linkend="storage-page-layout">, and they all use the same format for index   tuple headers; but these decisions are not forced on an access method.)  </para>  <para>   An index is effectively a mapping from some data key values to   <firstterm>tuple identifiers</>, or <acronym>TIDs</>, of row versions   (tuples) in the index's parent table.  A TID consists of a   block number and an item number within that block (see <xref   linkend="storage-page-layout">).  This is sufficient   information to fetch a particular row version from the table.   Indexes are not directly aware that under MVCC, there may be multiple   extant versions of the same logical row; to an index, each tuple is   an independent object that needs its own index entry.  Thus, an   update of a row always creates all-new index entries for the row, even if   the key values did not change.  Index entries for dead tuples are   reclaimed (by vacuuming) when the dead tuples themselves are reclaimed.  </para> <sect1 id="index-catalog">  <title>Catalog Entries for Indexes</title>  <para>   Each index access method is described by a row in the   <structname>pg_am</structname> system catalog (see   <xref linkend="catalog-pg-am">).  The principal contents of a   <structname>pg_am</structname> row are references to   <link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>   entries that identify the index access   functions supplied by the access method.  The APIs for these functions   are defined later in this chapter.  In addition, the   <structname>pg_am</structname> row specifies a few fixed properties of   the access method, such as whether it can support multicolumn indexes.   There is not currently any special support   for creating or deleting <structname>pg_am</structname> entries;   anyone able to write a new access method is expected to be competent   to insert an appropriate row for themselves.  </para>  <para>   To be useful, an index access method must also have one or more   <firstterm>operator classes</> defined in   <link linkend="catalog-pg-opclass"><structname>pg_opclass</structname></link>,   <link linkend="catalog-pg-amop"><structname>pg_amop</structname></link>, and   <link linkend="catalog-pg-amproc"><structname>pg_amproc</structname></link>.   These entries allow the planner   to determine what kinds of query qualifications can be used with   indexes of this access method.  Operator classes are described   in <xref linkend="xindex">, which is prerequisite material for reading   this chapter.  </para>  <para>   An individual index is defined by a    <link linkend="catalog-pg-class"><structname>pg_class</structname></link>   entry that describes it as a physical relation, plus a   <link linkend="catalog-pg-index"><structname>pg_index</structname></link>   entry that shows the logical content of the index &mdash; that is, the set   of index columns it has and the semantics of those columns, as captured by   the associated operator classes.  The index columns (key values) can be   either simple columns of the underlying table or expressions over the table   rows.  The index access method normally has no interest in where the index   key values come from (it is always handed precomputed key values) but it   will be very interested in the operator class information in   <structname>pg_index</structname>.  Both of these catalog entries can be   accessed as part of the <structname>Relation</> data structure that is   passed to all operations on the index.  </para>  <para>   Some of the flag columns of <structname>pg_am</structname> have nonobvious   implications.  The requirements of <structfield>amcanunique</structfield>   are discussed in <xref linkend="index-unique-checks">, and those of   <structfield>amconcurrent</structfield> in <xref linkend="index-locking">.   The <structfield>amcanmulticol</structfield> flag asserts that the   access method supports multicolumn indexes, while   <structfield>amoptionalkey</structfield> asserts that it allows scans   where no indexable restriction clause is given for the first index column.   When <structfield>amcanmulticol</structfield> is false,   <structfield>amoptionalkey</structfield> essentially says whether the   access method allows full-index scans without any restriction clause.   Access methods that support multiple index columns <emphasis>must</>   support scans that omit restrictions on any or all of the columns after   the first; however they are permitted to require some restriction to   appear for the first index column, and this is signaled by setting   <structfield>amoptionalkey</structfield> false.   <structfield>amindexnulls</structfield> asserts that index entries are   created for NULL key values.  Since most indexable operators are   strict and hence cannot return TRUE for NULL inputs,   it is at first sight attractive to not store index entries for null values:   they could never be returned by an index scan anyway.  However, this   argument fails when an index scan has no restriction clause for a given   index column.  In practice this means that   indexes that have <structfield>amoptionalkey</structfield> true must   index nulls, since the planner might decide to use such an index   with no scan keys at all.  A related restriction is that an index   access method that supports multiple index columns <emphasis>must</>   support indexing null values in columns after the first, because the planner   will assume the index can be used for queries that do not restrict   these columns.  For example, consider an index on (a,b) and a query with   <literal>WHERE a = 4</literal>.  The system will assume the index can be   used to scan for rows with <literal>a = 4</literal>, which is wrong if the   index omits rows where <literal>b</> is null.   It is, however, OK to omit rows where the first indexed column is null.   (GiST currently does so.)  Thus,   <structfield>amindexnulls</structfield> should be set true only if the   index access method indexes all rows, including arbitrary combinations of   null values.  </para> </sect1> <sect1 id="index-functions">  <title>Index Access Method Functions</title>  <para>   The index construction and maintenance functions that an index access   method must provide are:  </para>  <para><programlisting>voidambuild (Relation heapRelation,         Relation indexRelation,         IndexInfo *indexInfo);</programlisting>   Build a new index.  The index relation has been physically created,   but is empty.  It must be filled in with whatever fixed data the   access method requires, plus entries for all tuples already existing   in the table.  Ordinarily the <function>ambuild</> function will call   <function>IndexBuildHeapScan()</> to scan the table for existing tuples   and compute the keys that need to be inserted into the index.  </para>  <para><programlisting>boolaminsert (Relation indexRelation,          Datum *values,          bool *isnull,          ItemPointer heap_tid,          Relation heapRelation,          bool check_uniqueness);</programlisting>   Insert a new tuple into an existing index.  The <literal>values</> and   <literal>isnull</> arrays give the key values to be indexed, and   <literal>heap_tid</> is the TID to be indexed.   If the access method supports unique indexes (its   <structname>pg_am</>.<structfield>amcanunique</> flag is true) then   <literal>check_uniqueness</> may be true, in which case the access method   must verify that there is no conflicting row; this is the only situation in   which the access method normally needs the <literal>heapRelation</>   parameter.  See <xref linkend="index-unique-checks"> for details.   The result is TRUE if an index entry was inserted, FALSE if not. (A FALSE   result does not denote an error condition, but is used for cases such   as an index AM refusing to index a NULL.)  </para>  <para><programlisting>IndexBulkDeleteResult *ambulkdelete (Relation indexRelation,              IndexBulkDeleteCallback callback,              void *callback_state);</programlisting>   Delete tuple(s) from the index.  This is a <quote>bulk delete</> operation   that is intended to be implemented by scanning the whole index and checking   each entry to see if it should be deleted.   The passed-in <literal>callback</> function may be called, in the style   <literal>callback(<replaceable>TID</>, callback_state) returns bool</literal>,   to determine whether any particular index entry, as identified by its   referenced TID, is to be deleted.  Must return either NULL or a palloc'd   struct containing statistics about the effects of the deletion operation.  </para>  <para><programlisting>IndexBulkDeleteResult *amvacuumcleanup (Relation indexRelation,                 IndexVacuumCleanupInfo *info,                 IndexBulkDeleteResult *stats);</programlisting>   Clean up after a <command>VACUUM</command> operation (one or more   <function>ambulkdelete</> calls).  An index access method does not have   to provide this function (if so, the entry in <structname>pg_am</> must   be zero).  If it is provided, it is typically used for bulk cleanup   such as reclaiming empty index pages.  <literal>info</>   provides some additional arguments such as a message level for statistical   reports, and <literal>stats</> is whatever the last   <function>ambulkdelete</> call returned.  <function>amvacuumcleanup</>   may replace or modify this struct before returning it.  If the result   is not NULL it must be a palloc'd struct.  The statistics it contains   will be reported by <command>VACUUM</> if <literal>VERBOSE</> is given.  </para>  <para>   The purpose of an index, of course, is to support scans for tuples matching   an indexable <literal>WHERE</> condition, often called a   <firstterm>qualifier</> or <firstterm>scan key</>.  The semantics of   index scanning are described more fully in <xref linkend="index-scanning">,   below.  The scan-related functions that an index access method must provide   are:  </para>  <para><programlisting>IndexScanDescambeginscan (Relation indexRelation,             int nkeys,             ScanKey key);</programlisting>   Begin a new scan.  The <literal>key</> array (of length <literal>nkeys</>)   describes the scan key(s) for the index scan.  The result must be a   palloc'd struct. For implementation reasons the index access method   <emphasis>must</> create this struct by calling   <function>RelationGetIndexScan()</>.  In most cases   <function>ambeginscan</> itself does little beyond making that call;   the interesting parts of index-scan startup are in <function>amrescan</>.  </para>  <para><programlisting>booleanamgettuple (IndexScanDesc scan,            ScanDirection direction);</programlisting>   Fetch the next tuple in the given scan, moving in the given   direction (forward or backward in the index).  Returns TRUE if a tuple was   obtained, FALSE if no matching tuples remain.  In the TRUE case the tuple   TID is stored into the <literal>scan</> structure.  Note that   <quote>success</> means only that the index contains an entry that matches   the scan keys, not that the tuple necessarily still exists in the heap or   will pass the caller's snapshot test.  </para>  <para><programlisting>booleanamgetmulti (IndexScanDesc scan,            ItemPointer tids,            int32 max_tids,            int32 *returned_tids);</programlisting>   Fetch multiple tuples in the given scan.  Returns TRUE if the scan should   continue, FALSE if no matching tuples remain.  <literal>tids</> points to   a caller-supplied array of <literal>max_tids</>   <structname>ItemPointerData</> records, which the call fills with TIDs of   matching tuples.  <literal>*returned_tids</> is set to the number of TIDs   actually returned.  This can be less than <literal>max_tids</>, or even   zero, even when the return value is TRUE.  (This provision allows the   access method to choose the most efficient stopping points in its scan,   for example index page boundaries.)  <function>amgetmulti</> and   <function>amgettuple</> cannot be used in the same index scan; there   are other restrictions too when using <function>amgetmulti</>, as explained   in <xref linkend="index-scanning">.  </para>  <para><programlisting>voidamrescan (IndexScanDesc scan,          ScanKey key);</programlisting>   Restart the given scan, possibly with new scan keys (to continue using   the old keys, NULL is passed for <literal>key</>).  Note that it is not   possible for the number of keys to be changed.  In practice the restart   feature is used when a new outer tuple is selected by a nested-loop join   and so a new key comparison value is needed, but the scan key structure
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -