📄 atomiccommit.html
字号:
lock allows two or more database connections to read from thedatabase file at the same time. But a shared lock preventsanother database connection from writing to the database filewhile we are reading it. This is necessary because if anotherdatabase connection were writing to the database file at thesame time we are reading from the database file, we might readsome data before the change and other data after the change.This would make it appears as if the change made by the otherprocess is not atomic.</p><p>Notice that the shared lock is on the operating systemdisk cache, not on the disk itself. File locksreally are just flags within the operating system kernel,usually. (The details depend on the specific OS layerinterface.) Hence, the lock will instantly vanish if theoperating system crashes or if there is a power loss. Itis usually also the case that the lock will vanish if theprocess that created the lock exits.</p><br clear="both"><a name="section_3_3"></a><h3>3.3 Reading Information Out Of The Database</h3><img src="images/ac/commit-2.gif" align="right" hspace="15"><p>After the shared lock is acquired, we can begin readinginformation from the database file. In this scenario, weare assuming a cold cache, so information must first beread from mass storage into the operating system cache thentransferred from operating system cache into user space.On subsequent reads, some or all of the information mightalready be found in the operating system cache and so onlythe transfer to user space would be required.</p><p>Usually only a subset of the pages in the database fileare read. In this example we are showing threepages out of eight being read. In a typical application, adatabase will have thousands of pages and a query will normallyonly touch a small percentage of those pages.</p><br clear="both"><h3>3.4 Obtaining A Reserved Lock</h3><img src="images/ac/commit-3.gif" align="right" hspace="15"><p>Before making changes to the database, SQLite firstobtains a "reserved" lock on the database file. A reservedlock is similar to a shared lock in that both a reserved lockand shared lock allow other processes to read from the databasefile. A single reserve lock can coexist with multiple sharedlocks from other processes. However, there can only be asingle reserved lock on the database file. Hence only asingle process can be attempting to write to the databaseat one time.</p><p>The idea behind a reserved locks is that it signals thata process intends to modify the database file in the nearfuture but has not yet started to make the modifications.And because the modifications have not yet started, otherprocesses can continue to read from the database. However,no other process should also begin trying to write to thedatabase.</p><br clear="both"><a name="section_3_5"></a><h3>3.5 Creating A Rollback Journal File</h3><img src="images/ac/commit-4.gif" align="right" hspace="15"><p>Prior to making any changes to the database file, SQLite firstcreates a separate rollback journal file and writes into the rollback journal the originalcontent of the database pages that are to be altered.The idea behind the rollback journal is that it containsall information needed to restore the database back to its original state.</p><p>The rollback journal contains a small header (shown in greenin the diagram) that records the original size of the databasefile. So if a change causes the database file to grow, wewill still know the original size of the database. The pagenumber is stored together with each database page that is written into the rollback journal.</p><p>When a new file is created, most desktop operating systems(windows, linux, macOSX) will not actually write anything todisk. The new file is created in the operating systems diskcache only. The file is not created on mass storage until sometimelater, when the operating system has a spare moment. This createsthe impression to users that I/O is happening much faster thanis possible when doing real disk I/O. We illustrate this idea inthe diagram to the right by showing that the new rollback journalappears in the operating system disk cache only and not on thedisk itself.</p><br clear="both"><a name="section_3_6"></a><h3>3.6 Changing Database Pages In User Space</h3><img src="images/ac/commit-5.gif" align="right" hspace="15"><p>After the original page content has been saved in the rollbackjournal, the pages can be modified in user memory. Each databaseconnection has its own private copy of user space, so the changesthat are made in user space are only visible to the database connectionthat is making the changes. Other database connections still seethe information in operating system disk cache buffers which havenot yet been changed. And so even though one process is busymodifying the database, other processes can continue to read theirown copies of the original database content.</p><br clear="both"><a name="section_3_7"></a><h3>3.7 Flushing The Rollback Journal File To Mass Storage</h3><img src="images/ac/commit-6.gif" align="right" hspace="15"><p>The next step is to flush the content of the rollback journalfile to nonvolatile storage.As we will see later, this is a critical step in insuring that the database can survivean unexpected power loss.This step also takes a lot of time, since writing to nonvolatilestorage is normally a slow operation.</p><p>This step is usually more complicated than simply flushingthe rollback journal to the disk. On most platforms two separateflush (or fsync()) operations are required. The first flush writesout the base rollback journal content. Then the header of therollback journal is modified to show the number of pages in the rollback journal. Then the header is flushed to disk. The detailson why we do this header modification and extra flush are providedin a later section of this paper.</p><br clear="both"><a name="section_3_8"></a><h3>3.8 Obtaining An Exclusive Lock</h3><img src="images/ac/commit-7.gif" align="right" hspace="15"><p>Prior to making changes to the database file itself, we mustobtain an exclusive lock on the database file. Obtaining anexclusive lock is really a two-step process. First SQLite obtainsa "pending" lock. Then it escalates the pending lock to anexclusive lock.</p><p>A pending lock allows other processes that already have ashared lock to continue reading the database file. But itprevents new shared locks from being established. The ideabehind a pending lock is to prevent writer starvation causedby a large pool of readers. There might be dozens, even hundreds,of other processes trying to read the database file. Each processacquires a shared lock before it starts reading, reads what itneeds, then releases the shared lock. If, however, there aremany different processes all reading from the same database, itmight happen that a new process always acquires its shared lock beforethe previous process releases its shared lock. And so there isnever an instant when there are no shared locks on the databasefile and hence there is never an opportunity for the writer toseize the exclusive lock. A pending lock is designed to preventthat cycle by allowing existing shared locks to proceed butblocking new shared locks from being established. Eventuallyall shared locks will clear and the pending lock will then beable to escalate into an exclusive lock.</p><br clear="both"><a name="section_3_9"></a><h3>3.9 Writing Changes To The Database File</h3><img src="images/ac/commit-8.gif" align="right" hspace="15"><p>Once an exclusive lock is held, we know that no otherprocesses are reading from the database file and it issafe to write changes into the database file. Usuallythose changes only go as far as the operating systems diskcache and do not make it all the way to mass storage.</p><br clear="both"><a name="section_3_10"></a><h3>3.10 Flushing Changes To Mass Storage</h3><img src="images/ac/commit-9.gif" align="right" hspace="15"><p>Another flush must occur to make sure that all thedatabase changes are written into nonvolatile storage.This is a critical step to insure that the database willsurvive a power loss without damage. However, becauseof the inherent slowness of writing to disk or flash memory, this step together with the rollback journal file flush in section3.7 above takes up most the time required to complete atransaction commit in SQLite.</p><br clear="both"><a name="section_3_11"></a><h3>3.11 Deleting The Rollback Journal</h3><img src="images/ac/commit-A.gif" align="right" hspace="15"><p>After the database changes are all safely on the massstorage device, the rollback journal file is deleted.This is the instant where the transaction commits.If a power failure or system crash occurs prior to thispoint, then recovery processes to be described later makeit appears as if no changes were ever made to the databasefile. If a power failure or system crash occurs afterthe rollback journal is deleted, then it appears as ifall changes have been written to disk. Thus, SQLite givesthe appearance of having made no changes to the databasefile or having made the complete set of changes to thedatabase file depending on whether or not the rollbackjournal file exists.</p><p>Deleting a file is not really an atomic operation, butit appears to be from the point of view of a user process.A process is always able to ask the operating system "doesthis file exist?" and the process will get back a yes or noanswer. After a power failure that occurs during a transaction commit, SQLite will ask the operating systemwhether or not the rollback journal file exists. If theanswer is "yes" then the transaction is incomplete and isrolled back. If the answer is "no" then it means the transactiondid commit.</p><p>The existence of a transaction depends on whether ornot the rollback journal file exists and the deletionof a file appears to be an atomic operation from the point ofview of a user-space process. Therefore, a transaction appears to be an atomic operation.</p><p>The act of deleting a file is expensive on many systems.As an optimization, SQLite can be configured to truncatethe journal file to zero bytes in lengthor overwrite the journal file header with zeros. In eithercase, the resulting journal file is no longer capable of rollingback and so the transaction still commits. Truncating a fileto zero length, like deleting a file, is assumed to be an atomicoperation from the point of view of a user process. Overwritingthe header of the journal with zeros is not atomic, but if anypart of the header is malformed the journal will not roll back.Hence, one can say that the commit occurs as soon as the headeris sufficiently changed to make it invalid. Typically this happensas soon as the first byte of the header is zeroed.</p><br clear="both"><a name="section_3_12"></a><h3>3.12 Releasing The Lock</h3><img src="images/ac/commit-B.gif" align="right" hspace="15"><p>The last step in the commit process is to release theexclusive lock so that other processes can once againstart accessing the database file.</p><p>In the diagram at the right, we show that the informationthat was held in user space is cleared when the lock is released.This used to be literally true for older versions of SQLite. Butmore recent versions of SQLite keep the user space informationin memory in case it might be needed again at the start of thenext transaction. It is cheaper to reuse information that isalready in local memory than to transfer the information backfrom the operating system disk cache or to read it off of thedisk drive again. Prior to reusing the information in user space,we must first reacquire the shared lock and then we have to checkto make sure that no other process modified the database file whilewe were not holding a lock. There is a counter in the first pageof the database that is incremented every time the database fileis modified. We can find out if another process has modified thedatabase by checking that counter. If the database was modified,then the user space cache must be cleared and reread. But it iscommonly the case that no changes have been made and the userspace cache can be reused for a significant performance savings.</p><br clear="both">
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -