📄 readme
字号:
$Header: /cvsroot/pgsql/src/backend/storage/lmgr/README,v 1.13 2003/02/18 03:33:50 momjian Exp $LOCKING OVERVIEWPostgres uses three types of interprocess locks:* Spinlocks. These are intended for *very* short-term locks. If a lockis to be held more than a few dozen instructions, or across any sort ofkernel call (or even a call to a nontrivial subroutine), don't use aspinlock. Spinlocks are primarily used as infrastructure for lightweightlocks. They are implemented using a hardware atomic-test-and-setinstruction, if available. Waiting processes busy-loop until they canget the lock. There is no provision for deadlock detection, automaticrelease on error, or any other nicety. There is a timeout if the lockcannot be gotten after a minute or so (which is approximately forever incomparison to the intended lock hold time, so this is certainly an errorcondition).* Lightweight locks (LWLocks). These locks are typically used tointerlock access to datastructures in shared memory. LWLocks supportboth exclusive and shared lock modes (for read/write and read-onlyaccess to a shared object). There is no provision for deadlockdetection, but the LWLock manager will automatically release heldLWLocks during elog() recovery, so it is safe to raise an error whileholding LWLocks. Obtaining or releasing an LWLock is quite fast (a fewdozen instructions) when there is no contention for the lock. When aprocess has to wait for an LWLock, it blocks on a SysV semaphore so asto not consume CPU time. Waiting processes will be granted the lock inarrival order. There is no timeout.* Regular locks (a/k/a heavyweight locks). The regular lock managersupports a variety of lock modes with table-driven semantics, and it hasfull deadlock detection and automatic release at transaction end. Regular locks should be used for all user-driven lock requests.Acquisition of either a spinlock or a lightweight lock causes querycancel and die() interrupts to be held off until all such locks arereleased. No such restriction exists for regular locks, however. Alsonote that we can accept query cancel and die() interrupts while waitingfor a regular lock, but we will not accept them while waiting forspinlocks or LW locks. It is therefore not a good idea to use LW lockswhen the wait time might exceed a few seconds.The rest of this README file discusses the regular lock manager in detail.LOCK DATA STRUCTURESThere are two fundamental lock structures: the per-lockable-object LOCKstruct, and the per-lock PROCLOCK struct. A LOCK object existsfor each lockable object that currently has locks held or requested on it.A PROCLOCK struct exists for each transaction that is holding or requestinglock(s) on each LOCK object.Lock methods describe the overall locking behavior. Currently there aretwo lock methods: DEFAULT and USER. (USER locks are non-blocking.)Lock modes describe the type of the lock (read/write or shared/exclusive). See src/tools/backend/index.html and src/include/storage/lock.h for moredetails.---------------------------------------------------------------------------The lock manager's LOCK objects contain:tag - The key fields that are used for hashing locks in the shared memory lock hash table. This is declared as a separate struct to ensure that we always zero out the correct number of bytes. It is critical that any alignment-padding bytes the compiler might insert in the struct be zeroed out, else the hash computation will be random. tag.relId - Uniquely identifies the relation that the lock corresponds to. tag.dbId - Uniquely identifies the database in which the relation lives. If this is a shared system relation (e.g. pg_database) the dbId must be set to 0. tag.objId - Uniquely identifies the block/page within the relation and the tuple within the block. If we are setting a table level lock both the blockId and tupleId (in an item pointer this is called the position) are set to invalid, if it is a page level lock the blockId is valid, while the tupleId is still invalid. Finally if this is a tuple level lock (we currently never do this) then both the blockId and tupleId are set to valid specifications. This is how we get the appearance of a multi-level lock table while using only a single table (see Gray's paper on 2 phase locking if you are puzzled about how multi-level lock tables work).grantMask - This bitmask indicates what types of locks are currently held on the given lockable object. It is used (against the lock table's conflict table) to determine if a new lock request will conflict with existing lock types held. Conflicts are determined by bitwise AND operations between the grantMask and the conflict table entry for the requested lock type. Bit i of grantMask is 1 if and only if granted[i] > 0.waitMask - This bitmask shows the types of locks being waited for. Bit i of waitMask is 1 if and only if requested[i] > granted[i].lockHolders - This is a shared memory queue of all the PROCLOCK structs associated with the lock object. Note that both granted and waiting PROCLOCKs are in this list (indeed, the same PROCLOCK might have some already-granted locks and be waiting for more!).waitProcs - This is a shared memory queue of all process structures corresponding to a backend that is waiting (sleeping) until another backend releases this lock. The process structure holds the information needed to determine if it should be woken up when this lock is released.nRequested - Keeps a count of how many times this lock has been attempted to be acquired. The count includes attempts by processes which were put to sleep due to conflicts. It also counts the same backend twice if, for example, a backend process first acquires a read and then acquires a write, or acquires a read lock twice.requested - Keeps a count of how many locks of each type have been attempted. Only elements 1 through MAX_LOCKMODES-1 are used as they correspond to the lock type defined constants. Summing the values of requested[] should come out equal to nRequested.nGranted - Keeps count of how many times this lock has been successfully acquired. This count does not include attempts that are waiting due to conflicts, but can count the same backend twice (e.g. a read then a write -- since its the same transaction this won't cause a conflict).granted - Keeps count of how many locks of each type are currently held. Once again only elements 1 through MAX_LOCKMODES-1 are used (0 is not). Also, like requested, summing the values of granted should total to the value of nGranted.We should always have 0 <= nGranted <= nRequested, and0 <= granted[i] <= requested[i] for each i. If the request counts go tozero, the lock object is no longer needed and can be freed.---------------------------------------------------------------------------The lock manager's PROCLOCK objects contain:tag - The key fields that are used for hashing entries in the shared memory PROCLOCK hash table. This is declared as a separate struct to ensure that we always zero out the correct number of bytes. tag.lock SHMEM offset of the LOCK object this PROCLOCK is for. tag.proc SHMEM offset of PROC of backend process that owns this PROCLOCK. tag.xid XID of transaction this PROCLOCK is for, or InvalidTransactionId if the PROCLOCK is for session-level locking. Note that this structure will support multiple transactions running concurrently in one backend, which may be handy if we someday decide to support nested transactions. Currently, the XID field is only needed to distinguish per-transaction locks from session locks. User locks are always session locks, and we also use session locks for multi- transaction operations like VACUUM.holding - The number of successfully acquired locks of each type for this PROCLOCK. This should be <= the corresponding granted[] value of the lock object!nHolding - Sum of the holding[] array.lockLink - List link for shared memory queue of all the PROCLOCK objects for the same LOCK.procLink - List link for shared memory queue of all the PROCLOCK objects for the same backend.---------------------------------------------------------------------------The deadlock detection algorithm:Since we allow user transactions to request locks in any order, deadlockis possible. We use a deadlock detection/breaking algorithm that isfairly standard in essence, but there are many special considerationsneeded to deal with Postgres' generalized locking model.A key design consideration is that we want to make routine operations(lock grant and release) run quickly when there is no deadlock, andavoid the overhead of deadlock handling as much as possible. We do thisusing an "optimistic waiting" approach: if a process cannot acquire thelock it wants immediately, it goes to sleep without any deadlock check. But it also sets a delay timer, with a delay of DeadlockTimeoutmilliseconds (typically set to one second). If the delay expires beforethe process is granted the lock it wants, it runs the deadlockdetection/breaking code. Normally this code will determine that there isno deadlock condition, and then the process will go back to sleep andwait quietly until it is granted the lock. But if a deadlock conditiondoes exist, it will be resolved, usually by aborting the detectingprocess' transaction. In this way, we avoid deadlock handling overheadwhenever the wait time for a lock is less than DeadlockTimeout, whilenot imposing an unreasonable delay of detection when there is an error.Lock acquisition (routines LockAcquire and ProcSleep) follows these rules:1. A lock request is granted immediately if it does not conflict withany existing or waiting lock request, or if the process already holds an
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -