📄 readme
字号:
$PostgreSQL: pgsql/src/backend/storage/lmgr/README,v 1.22 2007/10/26 20:45:10 alvherre Exp $LOCKING OVERVIEWPostgres uses three types of interprocess locks:* Spinlocks. These are intended for *very* short-term locks. If a lockis to be held more than a few dozen instructions, or across any sort ofkernel call (or even a call to a nontrivial subroutine), don't use aspinlock. Spinlocks are primarily used as infrastructure for lightweightlocks. They are implemented using a hardware atomic-test-and-setinstruction, if available. Waiting processes busy-loop until they canget the lock. There is no provision for deadlock detection, automaticrelease on error, or any other nicety. There is a timeout if the lockcannot be gotten after a minute or so (which is approximately forever incomparison to the intended lock hold time, so this is certainly an errorcondition).* Lightweight locks (LWLocks). These locks are typically used tointerlock access to datastructures in shared memory. LWLocks supportboth exclusive and shared lock modes (for read/write and read-onlyaccess to a shared object). There is no provision for deadlockdetection, but the LWLock manager will automatically release heldLWLocks during elog() recovery, so it is safe to raise an error whileholding LWLocks. Obtaining or releasing an LWLock is quite fast (a fewdozen instructions) when there is no contention for the lock. When aprocess has to wait for an LWLock, it blocks on a SysV semaphore so asto not consume CPU time. Waiting processes will be granted the lock inarrival order. There is no timeout.* Regular locks (a/k/a heavyweight locks). The regular lock managersupports a variety of lock modes with table-driven semantics, and it hasfull deadlock detection and automatic release at transaction end. Regular locks should be used for all user-driven lock requests.Acquisition of either a spinlock or a lightweight lock causes querycancel and die() interrupts to be held off until all such locks arereleased. No such restriction exists for regular locks, however. Alsonote that we can accept query cancel and die() interrupts while waitingfor a regular lock, but we will not accept them while waiting forspinlocks or LW locks. It is therefore not a good idea to use LW lockswhen the wait time might exceed a few seconds.The rest of this README file discusses the regular lock manager in detail.LOCK DATA STRUCTURESLock methods describe the overall locking behavior. Currently there aretwo lock methods: DEFAULT and USER.Lock modes describe the type of the lock (read/write or shared/exclusive).In principle, each lock method can have its own set of lock modes withdifferent conflict rules, but currently DEFAULT and USER methods useidentical lock mode sets. See src/tools/backend/index.html andsrc/include/storage/lock.h for more details. (Lock modes are also calledlock types in some places in the code and documentation.)There are two fundamental lock structures in shared memory: theper-lockable-object LOCK struct, and the per-lock-and-requestor PROCLOCKstruct. A LOCK object exists for each lockable object that currently haslocks held or requested on it. A PROCLOCK struct exists for each backendthat is holding or requesting lock(s) on each LOCK object.In addition to these, each backend maintains an unshared LOCALLOCK structurefor each lockable object and lock mode that it is currently holding orrequesting. The shared lock structures only allow a single lock grant tobe made per lockable object/lock mode/backend. Internally to a backend,however, the same lock may be requested and perhaps released multiple timesin a transaction, and it can also be held both transactionally and session-wide. The internal request counts are held in LOCALLOCK so that the shareddata structures need not be accessed to alter them.---------------------------------------------------------------------------The lock manager's LOCK objects contain:tag - The key fields that are used for hashing locks in the shared memory lock hash table. The contents of the tag essentially define an individual lockable object. See include/storage/lock.h for details about the supported types of lockable objects. This is declared as a separate struct to ensure that we always zero out the correct number of bytes. It is critical that any alignment-padding bytes the compiler might insert in the struct be zeroed out, else the hash computation will be random. (Currently, we are careful to define struct LOCKTAG so that there are no padding bytes.)grantMask - This bitmask indicates what types of locks are currently held on the given lockable object. It is used (against the lock table's conflict table) to determine if a new lock request will conflict with existing lock types held. Conflicts are determined by bitwise AND operations between the grantMask and the conflict table entry for the requested lock type. Bit i of grantMask is 1 if and only if granted[i] > 0.waitMask - This bitmask shows the types of locks being waited for. Bit i of waitMask is 1 if and only if requested[i] > granted[i].procLocks - This is a shared memory queue of all the PROCLOCK structs associated with the lock object. Note that both granted and waiting PROCLOCKs are in this list (indeed, the same PROCLOCK might have some already-granted locks and be waiting for more!).waitProcs - This is a shared memory queue of all PGPROC structures corresponding to backends that are waiting (sleeping) until another backend releases this lock. The process structure holds the information needed to determine if it should be woken up when the lock is released.nRequested - Keeps a count of how many times this lock has been attempted to be acquired. The count includes attempts by processes which were put to sleep due to conflicts. It also counts the same backend twice if, for example, a backend process first acquires a read and then acquires a write. (But multiple acquisitions of the same lock/lock mode within a backend are not multiply counted here; they are recorded only in the backend's LOCALLOCK structure.)requested - Keeps a count of how many locks of each type have been attempted. Only elements 1 through MAX_LOCKMODES-1 are used as they correspond to the lock type defined constants. Summing the values of requested[] should come out equal to nRequested.nGranted - Keeps count of how many times this lock has been successfully acquired. This count does not include attempts that are waiting due to conflicts. Otherwise the counting rules are the same as for nRequested.granted - Keeps count of how many locks of each type are currently held. Once again only elements 1 through MAX_LOCKMODES-1 are used (0 is not). Also, like requested[], summing the values of granted[] should total to the value of nGranted.We should always have 0 <= nGranted <= nRequested, and0 <= granted[i] <= requested[i] for each i. When all the request countsgo to zero, the LOCK object is no longer needed and can be freed.---------------------------------------------------------------------------The lock manager's PROCLOCK objects contain:tag - The key fields that are used for hashing entries in the shared memory PROCLOCK hash table. This is declared as a separate struct to ensure that we always zero out the correct number of bytes. It is critical that any alignment-padding bytes the compiler might insert in the struct be zeroed out, else the hash computation will be random. (Currently, we are careful to define struct PROCLOCKTAG so that there are no padding bytes.) tag.myLock Pointer to the shared LOCK object this PROCLOCK is for. tag.myProc Pointer to the PGPROC of backend process that owns this PROCLOCK. Note: it's OK to use pointers here because a PROCLOCK never outlives either its lock or its proc. The tag is therefore unique for as long as it needs to be, even though the same tag values might mean something else at other times.holdMask - A bitmask for the lock modes successfully acquired by this PROCLOCK. This should be a subset of the LOCK object's grantMask, and also a subset of the PGPROC object's heldLocks mask (if the PGPROC is currently waiting for another lock mode on this lock).releaseMask - A bitmask for the lock modes due to be released during LockReleaseAll. This must be a subset of the holdMask. Note that it is modified without taking the partition LWLock, and therefore it is unsafe for any backend except the one owning the PROCLOCK to examine/change it.lockLink - List link for shared memory queue of all the PROCLOCK objects for the same LOCK.procLink - List link for shared memory queue of all the PROCLOCK objects for the same backend.---------------------------------------------------------------------------LOCK MANAGER INTERNAL LOCKINGBefore PostgreSQL 8.2, all of the shared-memory data structures used bythe lock manager were protected by a single LWLock, the LockMgrLock;any operation involving these data structures had to exclusively lockLockMgrLock. Not too surprisingly, this became a contention bottleneck.To reduce contention, the lock manager's data structures have been splitinto multiple "partitions", each protected by an independent LWLock.Most operations only need to lock the single partition they are working in.Here are the details:* Each possible lock is assigned to one partition according to a hash ofits LOCKTAG value. The partition's LWLock is considered to protect all theLOCK objects of that partition as well as their subsidiary PROCLOCKs.* The shared-memory hash tables for LOCKs and PROCLOCKs are organizedso that different partitions use different hash chains, and thus thereis no conflict in working with objects in different partitions. Thisis supported directly by dynahash.c's "partitioned table" mechanismfor the LOCK table: we need only ensure that the partition number istaken from the low-order bits of the dynahash hash value for the LOCKTAG.To make it work for PROCLOCKs, we have to ensure that a PROCLOCK's hashvalue has the same low-order bits as its associated LOCK. This requiresa specialized hash function (see proclock_hash).* Formerly, each PGPROC had a single list of PROCLOCKs belonging to it.This has now been split into per-partition lists, so that access to aparticular PROCLOCK list can be protected by the associated partition'sLWLock. (This is not strictly necessary at the moment, because at thiswriting a PGPROC's PROCLOCK list is only accessed by the owning backendanyway. But it seems forward-looking to maintain a convention for howother backends could access it. In any case LockReleaseAll needs to beable to quickly determine which partition each LOCK belongs to, andfor the currently contemplated number of partitions, this way takes lessshared memory than explicitly storing a partition number in LOCK structswould require.)* The other lock-related fields of a PGPROC are only interesting whenthe PGPROC is waiting for a lock, so we consider that they are protectedby the partition LWLock of the awaited lock.For normal lock acquisition and release, it is sufficient to lock thepartition containing the desired lock. Deadlock checking needs to touchmultiple partitions in general; for simplicity, we just make it lock allthe partitions in partition-number order. (To prevent LWLock deadlock,we establish the rule that any backend needing to lock more than onepartition at once must lock them in partition-number order.) It'spossible that deadlock checking could be done without touching everypartition in typical cases, but since in a properly functioning systemdeadlock checking should not occur often enough to be performance-critical,trying to make this work does not seem a productive use of effort.A backend's internal LOCALLOCK hash table is not partitioned. We do storea copy of the locktag hash code in LOCALLOCK table entries, from which thepartition number can be computed, but this is a straight speed-for-spacetradeoff: we could instead recalculate the partition number from the LOCKTAGwhen needed.THE DEADLOCK DETECTION ALGORITHMSince we allow user transactions to request locks in any order, deadlockis possible. We use a deadlock detection/breaking algorithm that isfairly standard in essence, but there are many special considerationsneeded to deal with Postgres' generalized locking model.A key design consideration is that we want to make routine operations(lock grant and release) run quickly when there is no deadlock, andavoid the overhead of deadlock handling as much as possible. We do this
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -