📄 maintenance.sgml
字号:
<!--$Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.27.2.1 2003/11/04 09:45:29 petere Exp $--><chapter id="maintenance"> <title>Routine Database Maintenance Tasks</title> <indexterm zone="maintenance"> <primary>maintenance</primary> </indexterm> <para> There are a few routine maintenance chores that must be performed on a regular basis to keep a <productname>PostgreSQL</productname> server running smoothly. The tasks discussed here are repetitive in nature and can easily be automated using standard Unix tools such as <application>cron</application> scripts. But it is the database administrator's responsibility to set up appropriate scripts, and to check that they execute successfully. </para> <para> One obvious maintenance task is creation of backup copies of the data on a regular schedule. Without a recent backup, you have no chance of recovery after a catastrophe (disk failure, fire, mistakenly dropping a critical table, etc.). The backup and recovery mechanisms available in <productname>PostgreSQL</productname> are discussed at length in <xref linkend="backup">. </para> <para> The other main category of maintenance task is periodic <quote>vacuuming</> of the database. This activity is discussed in <xref linkend="routine-vacuuming">. </para> <para> Something else that might need periodic attention is log file management. This is discussed in <xref linkend="logfile-maintenance">. </para> <para> <productname>PostgreSQL</productname> is low-maintenance compared to some other database management systems. Nonetheless, appropriate attention to these tasks will go far towards ensuring a pleasant and productive experience with the system. </para> <sect1 id="routine-vacuuming"> <title>Routine Vacuuming</title> <indexterm zone="routine-vacuuming"> <primary>vacuum</primary> </indexterm> <para> <productname>PostgreSQL</productname>'s <command>VACUUM</> command must be run on a regular basis for several reasons: <orderedlist> <listitem> <simpara>To recover disk space occupied by updated or deleted rows.</simpara> </listitem> <listitem> <simpara>To update data statistics used by the <productname>PostgreSQL</productname> query planner.</simpara> </listitem> <listitem> <simpara>To protect against loss of very old data due to <firstterm>transaction ID wraparound</>.</simpara> </listitem> </orderedlist> The frequency and scope of the <command>VACUUM</> operations performed for each of these reasons will vary depending on the needs of each site. Therefore, database administrators must understand these issues and develop an appropriate maintenance strategy. This section concentrates on explaining the high-level issues; for details about command syntax and so on, see the <command>VACUUM</> command reference page. </para> <para> Beginning in <productname>PostgreSQL</productname> 7.2, the standard form of <command>VACUUM</> can run in parallel with normal database operations (selects, inserts, updates, deletes, but not changes to table definitions). Routine vacuuming is therefore not nearly as intrusive as it was in prior releases, and it's not as critical to try to schedule it at low-usage times of day. </para> <sect2 id="vacuum-for-space-recovery"> <title>Recovering disk space</title> <indexterm zone="vacuum-for-space-recovery"> <primary>disk space</primary> </indexterm> <para> In normal <productname>PostgreSQL</productname> operation, an <command>UPDATE</> or <command>DELETE</> of a row does not immediately remove the old version of the row. This approach is necessary to gain the benefits of multiversion concurrency control (see <xref linkend="mvcc">): the row version must not be deleted while it is still potentially visible to other transactions. But eventually, an outdated or deleted row version is no longer of interest to any transaction. The space it occupies must be reclaimed for reuse by new rows, to avoid infinite growth of disk space requirements. This is done by running <command>VACUUM</>. </para> <para> Clearly, a table that receives frequent updates or deletes will need to be vacuumed more often than tables that are seldom updated. It may be useful to set up periodic <application>cron</> tasks that vacuum only selected tables, skipping tables that are known not to change often. This is only likely to be helpful if you have both large heavily-updated tables and large seldom-updated tables --- the extra cost of vacuuming a small table isn't enough to be worth worrying about. </para> <para> The standard form of <command>VACUUM</> is best used with the goal of maintaining a fairly level steady-state usage of disk space. The standard form finds old row versions and makes their space available for re-use within the table, but it does not try very hard to shorten the table file and return disk space to the operating system. If you need to return disk space to the operating system you can use <command>VACUUM FULL</> --- but what's the point of releasing disk space that will only have to be allocated again soon? Moderately frequent standard <command>VACUUM</> runs are a better approach than infrequent <command>VACUUM FULL</> runs for maintaining heavily-updated tables. </para> <para> Recommended practice for most sites is to schedule a database-wide <command>VACUUM</> once a day at a low-usage time of day, supplemented by more frequent vacuuming of heavily-updated tables if necessary. (If you have multiple databases in a cluster, don't forget to vacuum each one; the program <filename>vacuumdb</> may be helpful.) Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine vacuuming for space recovery. </para> <para> <command>VACUUM FULL</> is recommended for cases where you know you have deleted the majority of rows in a table, so that the steady-state size of the table can be shrunk substantially with <command>VACUUM FULL</>'s more aggressive approach. </para> <para> If you have a table whose contents are deleted completely every so often, consider doing it with <command>TRUNCATE</> rather than using <command>DELETE</> followed by <command>VACUUM</>. </para> </sect2> <sect2 id="vacuum-for-statistics"> <title>Updating planner statistics</title> <indexterm zone="vacuum-for-statistics"> <primary>statistics</primary> <secondary>of the planner</secondary> </indexterm> <indexterm zone="vacuum-for-statistics"> <primary>ANALYZE</primary> </indexterm> <para> The <productname>PostgreSQL</productname> query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the <command>ANALYZE</> command, which can be invoked by itself or as an optional step in <command>VACUUM</>. It is important to have reasonably accurate statistics, otherwise poor choices of plans may degrade database performance. </para> <para> As with vacuuming for space recovery, frequent updates of statistics are more useful for heavily-updated tables than for seldom-updated ones. But even for a heavily-updated table, there may be no need for statistics updates if the statistical distribution of the data is not changing much. A simple rule of thumb is to think about how much the minimum and maximum values of the columns in the table change. For example, a <type>timestamp</type> column that contains the time of row update will have a constantly-increasing maximum value as rows are added and updated; such a column will probably need more frequent statistics updates than, say, a column containing URLs for pages accessed on a website. The URL column may receive changes just as often, but the statistical distribution of its values probably changes relatively slowly. </para> <para> It is possible to run <command>ANALYZE</> on specific tables and even just specific columns of a table, so the flexibility exists to update some statistics more frequently than others if your application requires it. In practice, however, the usefulness of this feature is doubtful. Beginning in <productname>PostgreSQL</productname> 7.2, <command>ANALYZE</> is a fairly fast operation even on large tables, because it uses a statistical random sampling of the rows of a table rather than reading every single row. So it's probably much simpler to just run it over the whole database every so often. </para> <tip> <para> Although per-column tweaking of <command>ANALYZE</> frequency may not be very productive, you may well find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by <command>ANALYZE</>. Columns that are heavily used in <literal>WHERE</> clauses and have highly irregular data distributions may require a finer-grain data histogram than other columns. See <command>ALTER TABLE SET STATISTICS</>. </para> </tip> <para> Recommended practice for most sites is to schedule a database-wide <command>ANALYZE</> once a day at a low-usage time of day; this can usefully be combined with a nightly <command>VACUUM</>. However, sites with relatively slowly changing table statistics may find that this is overkill, and that less-frequent <command>ANALYZE</> runs are sufficient. </para> </sect2> <sect2 id="vacuum-for-wraparound"> <title>Preventing transaction ID wraparound failures</title> <indexterm zone="vacuum-for-wraparound"> <primary>transaction ID</primary>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -