📄 atomiccommit.html
字号:
in the header of the rollback journal. The -1 page count valuetells any process attempting to rollback the journal that thenumber of pages in the journal should be computed from the journalsize. This -1 value is never changed. So that when a commitoccurs, we save a single flush operation and a sector write ofthe first page of the journal file. Furthermore, when a cachespill occurs we no longer need to append a new journal headerto the end of the journal; we can simply continue appendingnew pages to the end of the existing journal.</p><a name="section_7_6"></a><h3>7.6 Persistent Rollback Journals</h3><p>Deleting a file is an expensive operation on many systems.So as an optimization, SQLite can be configured to avoid thedelete operation of <a href="#section_3_11">section 3.11</a>.Instead of deleting the journal file in order to commit a transaction,the file is either truncated to zero bytes in length or itsheader is overwritten with zeros. Truncating the file to zerolength saves having to make modifications to the directory containingthe file since the file is not removed from the directory. Overwriting the header has the additional savings of not havingto update the length of the file (in the "inode" on many systems)and not having to deal with newly freed disk sectors. Furthermore,at the next transaction the journal will be created by overwritingexisting content rather than appending new content onto the endof a file, and overwriting is often much faster than appending.</p><p>SQLite can be configured to commit transactions by overwritingthe journal header with zeros instead of deleting the journal fileby setting the "PERSIST" journaling mode using the <a href="pragma.html#pragma_journal_mode">journal_mode</a> PRAGMA.For example:</p><blockquote><pre>PRAGMA journal_mode=PERSIST;</per></blockquote><p>The use of persistent journal mode provide a noticeable performanceimprovement on many systems. Of course, the drawback is that the journal files remain on the disk, using disk space and clutteringdirectories, long after the transaction commits. The only safe wayto delete a persistent journal file is to commit a transactionwith journaling mode set to DELETE:</p><blockquote><pre>PRAGMA journal_mode=DELETE;BEGIN EXCLUSIVE;COMMIT;</per></blockquote><p>Beware of deleting persistent journal files by any other meanssince the journal file might be hot, in which case deleting it willcorrupt the corresponding database file.</p><p>Beginning in SQLite version 3.6.4, the TRUNCATE journal mode isalso supported:</p><blockquote><pre>PRAGMA journal_mode=TRUNCATE;</pre></blockquote><p>In truncate journal mode, the transaction is committed by truncatingthe journal file to zero length rather than deleting the journal file(as in DELETE mode) or by zeroing the header (as in PERSIST mode).TRUNCATE mode shares the advantage of PERSIST mode that the directorythat contains the journal file and database does not need to be updated.Hence truncating a file is often faster than deleting it. TRUNCATE hasthe additional advantage that it is not followed by asystem call (ex: fsync()) to synchronize the change to disk. It mightbe safer if it did.But on many modern filesystems, a truncate is an atomic andsynchronous operation and so we think that TRUNCATE will usually be safein the face of power failures. If you are uncertain about whether ornot TRUNCATE will be synchronous and atomic on your filesystem and it isimportant to you that your database survive a power loss or operatingsystem crash that occurs during the truncation operation, then you mightconsider using a different journaling mode.</p><p>On embedded systems with synchronous filesystems, TRUNCATE is resultsin slower behavior than PERSIST. The commit operation is the same speed.But subsequent transactions are slower following a TRUNCATE because it isfaster to overwrite existing content than to append to the end of a file.New journal file entries will always be appended following a TRUNCATE butwill usually overwrite with PERSIST.</p><h2>8.0 Testing Atomic Commit Behavior</h2><p>The developers of SQLite are confident that it is robustin the face of power failures and system crashes because theautomatic test procedures do extensive checks onthe ability of SQLite to recover from simulated power loss.We call these the "crash tests".</p><p>Crash tests in SQLite use a modified VFS that can simulatethe kinds of filesystem damage that occur during a powerloss or operating system crash. The crash-test VFS can simulateincomplete sector writes, pages filled with garbage data becausea write has not completed, and out of order writes, all occurringat varying points during a test scenario. Crash tests executetransactions over and over, varying the time at which a simulatedpower loss occurs and the properties of the damage inflicted.Each test then reopens the database after the simulated crash andverifies that the transaction either occurred completelyor not at all and that the database is in a completelyconsistent state.</p><p>The crash tests in SQLite have discovered a number of verysubtle bugs (now fixed) in the recovery mechanism. Some of these bugs were very obscure and unlikely to have been foundusing only code inspection and analysis techniques. From thisexperience, the developers of SQLite feel confident that any otherdatabase system that does not use a similar crash test systemlikely contains undetected bugs that will lead to databasecorruption following a system crash or power failure.</p><h2>9.0 Things That Can Go Wrong</h2><p>The atomic commit mechanism in SQLite has proven to be robust,but it can be circumvented by a sufficiently creativeadversary or a sufficiently broken operating system implementation.This section describes a few of the ways in which an SQLite databasemight be corrupted by a power failure or system crash.</p><h3>9.1 Broken Locking Implementations</h3><p>SQLite uses filesystem locks to make sure that only oneprocess and database connection is trying to modify the databaseat a time. The filesystem locking mechanism is implementedin the VFS layer and is different for every operating system.SQLite depends on this implementation being correct. If somethinggoes wrong and two or more processes are able to write the samedatabase file at the same time, severe damage can result.</p><p>We have received reports of implementations of bothWindows network filesystems and NFS in which locking wassubtly broken. We can not verify these reports, but aslocking is difficult to get right on a network filesystemwe have no reason to doubt them. You are advised to avoid using SQLite on a network filesystem in the first place,since performance will be slow. But if you must use a network filesystem to store SQLite database files, considerusing a secondary locking mechanism to prevent simultaneouswrites to the same database even if the native filesystemlocking mechanism malfunctions.</p><p>The versions of SQLite that come preinstalled on AppleMac OS X computers contain a version of SQLite that has beenextended to use alternative locking strategies that work onall network filesystems that Apple supports. These extensionsused by Apple work great as long as all processes are accessingthe database file in the same way. Unfortunately, the lockingmechanisms do not exclude one another, so if one process isaccessing a file using (for example) AFP locking and anotherprocess (perhaps on a different machine) is using dot-file locks,the two processes might collide because AFP locks do not excludedot-file locks or vice versa.</p><h3>9.2 Incomplete Disk Flushes</h3><p>SQLite uses the fsync() system call on Unix and the FlushFileBuffers()system call on w32 in order to sync the file system buffers onto diskoxide as shown in <a href="#section_3_7">step 3.7</a> and<a href="#section_3_10">step 3.10</a>. Unfortunately, we have receivedreports that neither of these interfaces works as advertised on manysystems. We hear that FlushFileBuffers() can be completely disabledusing registry settings on some Windows versions. Some historicalversions of Linux contain versions of fsync() which are no-ops onsome filesystems, we are told. Even on systems where FlushFileBuffers() and fsync() are said to be working, oftenthe IDE disk control lies and says that data has reached oxidewhile it is still held only in the volatile control cache.</p><p>On the Mac, you can set this pragma:</p><blockquote>PRAGMA fullfsync=ON;</blockquote><p>Setting fullfsync on a Mac will guarantee that data really doesget pushed out to the disk platter on a flush. But the implementationof fullfsync involves resetting the disk controller. And so not onlyis it profoundly slow, it also slows down other unrelated disk I/O.So its use is not recommended.</p><h3>9.3 Partial File Deletions</h3><p>SQLite assumes that file deletion is an atomic operation from thepoint of view of a user process. If power fails in the middle ofa file deletion, then after power is restored SQLite expects to seeeither the entire file with all of its original data intact, or itexpects not to find the file at all. Transactions may not be atomicon systems that do not work this way.</p><h3>9.4 Garbage Written Into Files</h3><p>SQLite database files are ordinary disk files that can beopened and written by ordinary user processes. A rogue processcan open an SQLite database and fill it with corrupt data. Corrupt data might also be introduced into an SQLite databaseby bugs in the operating system or disk controller; especiallybugs triggered by a power failure. There is nothing SQLite cando to defend against these kinds of problems.</p><h3>9.5 Deleting Or Renaming A Hot Journal</h3><p>If a crash or power loss does occur and a hot journal is left onthe disk, it is essential that the original database file and the hotjournal remain on disk with their original names until the databasefile is opened by another SQLite process and rolled back. During recovery at <a href="section_4_2">step 4.2</a> SQLite locatesthe hot journal by looking for a file in the same directory as thedatabase being opened and whose name is derived from the name of thefile being opened. If either the original database file or thehot journal have been moved or renamed, then the hot journal willnot be seen and the database will not be rolled back.</p><p>We suspect that a common failure mode for SQLite recovery happenslike this: A power failure occurs. After power is restored, a well-meaninguser or system administrator begins looking around on the disk fordamage. They see their database file named "important.data". This fileis perhaps familiar to them. But after the crash, there is also ahot journal named "important.data-journal". The user then deletesthe hot journal, thinking that they are helping to cleanup the system.We know of no way to prevent this other than user education.</p><p>If there are multiple (hard or symbolic) links to a database file,the journal will be created using the name of the link through whichthe file was opened. If a crash occurs and the database is opened againusing a different link, the hot journal will not be located and norollback will occur.</p><p>Sometimes a power failure will cause a filesystem to be corruptedsuch that recently changed filenames are forgotten and the file ismoved into a "/lost+found" directory. When that happens, the hotjournal will not be found and recovery will not occur.SQLite tries to prevent thisby opening and syncing the directory containing the rollback journalat the same time it syncs the journal file itself. However, themovement of files into /lost+found can be caused by unrelated processescreating unrelated files in the same directory as the main database file.And since this is out from under the control of SQLite, there is nothingthat SQLite can do to prevent it. If you are running on a system thatis vulnerable to this kind of filesystem namespace corruption (mostmodern journalling filesystems are immune, we believe) then you mightwant to consider putting each SQLite database file in its own privatesubdirectory.</p><h2>10.0 Future Directions And Conclusion</h2><p>Every now and then someone discovers a new failure mode forthe atomic commit mechanism in SQLite and the developers have toput in a patch. This is happening less and less and thefailure modes are becoming more and more obscure. But it wouldstill be foolish to suppose that the atomic commit logic ofSQLite is entirely bug-free. The developers are committed to fixingthese bugs as quickly as they might be found.</p><p>The developers are also on the lookout for new ways tooptimize the commit mechanism. The current VFS implementationsfor Unix (Linux and Mac OS X) and Windows make pessimistic assumptions aboutthe behavior of those systems. After consultation with expertson how these systems work, we might be able to relax some of theassumptions on these systems and allow them to run faster. Inparticular, we suspect that most modern filesystems exhibit thesafe append property and that many of them might support atomicsector writes. But until this is known for certain, SQLite willtake the conservative approac
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -