📄 maintenance.sgml
字号:
<!--$PostgreSQL: pgsql/doc/src/sgml/maintenance.sgml,v 1.51.2.2 2005/12/07 14:36:08 momjian Exp $--><chapter id="maintenance"> <title>Routine Database Maintenance Tasks</title> <indexterm zone="maintenance"> <primary>maintenance</primary> </indexterm> <para> There are a few routine maintenance chores that must be performed on a regular basis to keep a <productname>PostgreSQL</productname> server running smoothly. The tasks discussed here are repetitive in nature and can easily be automated using standard Unix tools such as <application>cron</application> scripts. But it is the database administrator's responsibility to set up appropriate scripts, and to check that they execute successfully. </para> <para> One obvious maintenance task is creation of backup copies of the data on a regular schedule. Without a recent backup, you have no chance of recovery after a catastrophe (disk failure, fire, mistakenly dropping a critical table, etc.). The backup and recovery mechanisms available in <productname>PostgreSQL</productname> are discussed at length in <xref linkend="backup">. </para> <para> The other main category of maintenance task is periodic <quote>vacuuming</> of the database. This activity is discussed in <xref linkend="routine-vacuuming">. </para> <para> Something else that might need periodic attention is log file management. This is discussed in <xref linkend="logfile-maintenance">. </para> <para> <productname>PostgreSQL</productname> is low-maintenance compared to some other database management systems. Nonetheless, appropriate attention to these tasks will go far towards ensuring a pleasant and productive experience with the system. </para> <sect1 id="routine-vacuuming"> <title>Routine Vacuuming</title> <indexterm zone="routine-vacuuming"> <primary>vacuum</primary> </indexterm> <para> <productname>PostgreSQL</productname>'s <command>VACUUM</> command must be run on a regular basis for several reasons: <orderedlist> <listitem> <simpara>To recover disk space occupied by updated or deleted rows.</simpara> </listitem> <listitem> <simpara>To update data statistics used by the <productname>PostgreSQL</productname> query planner.</simpara> </listitem> <listitem> <simpara>To protect against loss of very old data due to <firstterm>transaction ID wraparound</>.</simpara> </listitem> </orderedlist> The frequency and scope of the <command>VACUUM</> operations performed for each of these reasons will vary depending on the needs of each site. Therefore, database administrators must understand these issues and develop an appropriate maintenance strategy. This section concentrates on explaining the high-level issues; for details about command syntax and so on, see the <xref linkend="sql-vacuum" endterm="sql-vacuum-title"> reference page. </para> <para> Beginning in <productname>PostgreSQL</productname> 7.2, the standard form of <command>VACUUM</> can run in parallel with normal database operations (selects, inserts, updates, deletes, but not changes to table definitions). Routine vacuuming is therefore not nearly as intrusive as it was in prior releases, and it is not as critical to try to schedule it at low-usage times of day. </para> <para> Beginning in <productname>PostgreSQL</productname> 8.0, there are configuration parameters that can be adjusted to further reduce the performance impact of background vacuuming. See <xref linkend="runtime-config-resource-vacuum-cost">. </para> <para> An automated mechanism for performing the necessary <command>VACUUM</> operations has been added in <productname>PostgreSQL</productname> 8.1. See <xref linkend="autovacuum">. </para> <sect2 id="vacuum-for-space-recovery"> <title>Recovering disk space</title> <indexterm zone="vacuum-for-space-recovery"> <primary>disk space</primary> </indexterm> <para> In normal <productname>PostgreSQL</productname> operation, an <command>UPDATE</> or <command>DELETE</> of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control (see <xref linkend="mvcc">): the row version must not be deleted while it is still potentially visible to other transactions. But eventually, an outdated or deleted row version is no longer of interest to any transaction. The space it occupies must be reclaimed for reuse by new rows, to avoid infinite growth of disk space requirements. This is done by running <command>VACUUM</>. </para> <para> Clearly, a table that receives frequent updates or deletes will need to be vacuumed more often than tables that are seldom updated. It may be useful to set up periodic <application>cron</> tasks that <command>VACUUM</command> only selected tables, skipping tables that are known not to change often. This is only likely to be helpful if you have both large heavily-updated tables and large seldom-updated tables — the extra cost of vacuuming a small table isn't enough to be worth worrying about. </para> <para> There are two variants of the <command>VACUUM</command> command. The first form, known as <quote>lazy vacuum</quote> or just <command>VACUUM</command>, marks expired data in tables and indexes for future reuse; it does <emphasis>not</emphasis> attempt to reclaim the space used by this expired data unless the space is at the end of the table and an exclusive table lock can be easily obtained. Unused space at the start or middle of the file does not result in the file being shortened and space returned to the operating system. This variant of <command>VACUUM</command> can be run concurrently with normal database operations. </para> <para> The second form is the <command>VACUUM FULL</command> command. This uses a more aggressive algorithm for reclaiming the space consumed by expired row versions. Any space that is freed by <command>VACUUM FULL</command> is immediately returned to the operating system. Unfortunately, this variant of the <command>VACUUM</command> command acquires an exclusive lock on each table while <command>VACUUM FULL</command> is processing it. Therefore, frequently using <command>VACUUM FULL</command> can have an extremely negative effect on the performance of concurrent database queries. </para> <para> The standard form of <command>VACUUM</> is best used with the goal of maintaining a fairly level steady-state usage of disk space. If you need to return disk space to the operating system you can use <command>VACUUM FULL</> — but what's the point of releasing disk space that will only have to be allocated again soon? Moderately frequent standard <command>VACUUM</> runs are a better approach than infrequent <command>VACUUM FULL</> runs for maintaining heavily-updated tables. </para> <para> Recommended practice for most sites is to schedule a database-wide <command>VACUUM</> once a day at a low-usage time of day, supplemented by more frequent vacuuming of heavily-updated tables if necessary. (Some installations with an extremely high rate of data modification <command>VACUUM</command> busy tables as often as once every few minutes.) If you have multiple databases in a cluster, don't forget to <command>VACUUM</command> each one; the program <xref linkend="app-vacuumdb" endterm="app-vacuumdb-title"> may be helpful. </para> <para> <command>VACUUM FULL</> is recommended for cases where you know you have deleted the majority of rows in a table, so that the steady-state size of the table can be shrunk substantially with <command>VACUUM FULL</>'s more aggressive approach. Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine vacuuming for space recovery. </para> <para> If you have a table whose contents are deleted on a periodic basis, consider doing it with <command>TRUNCATE</command> rather than using <command>DELETE</command> followed by <command>VACUUM</command>. <command>TRUNCATE</command> removes the entire content of the table immediately, without requiring a subsequent <command>VACUUM</command> or <command>VACUUM FULL</command> to reclaim the now-unused disk space. </para> </sect2> <sect2 id="vacuum-for-statistics"> <title>Updating planner statistics</title> <indexterm zone="vacuum-for-statistics"> <primary>statistics</primary> <secondary>of the planner</secondary> </indexterm> <indexterm zone="vacuum-for-statistics"> <primary>ANALYZE</primary> </indexterm> <para> The <productname>PostgreSQL</productname> query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the <command>ANALYZE</> command, which can be invoked by itself or as an optional step in <command>VACUUM</>. It is important to have reasonably accurate statistics, otherwise poor choices of plans may degrade database performance. </para> <para> As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated ones. But even for a heavily-updated table, there may be no need for statistics updates if the statistical distribution of the data is not changing much. A simple rule of thumb is to think about how much the minimum and maximum values of the columns in the table change. For example, a <type>timestamp</type> column that contains the time of row update will have a constantly-increasing maximum value as rows are added and updated; such a column will probably need more frequent statistics updates than, say, a column containing URLs for pages accessed on a website. The URL column may receive changes just as often, but the statistical distribution of its values probably changes relatively slowly. </para> <para> It is possible to run <command>ANALYZE</> on specific tables and even
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -