⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme

📁 PostgreSQL 8.1.4的源码 适用于Linux下的开源数据库系统
💻
📖 第 1 页 / 共 2 页
字号:
It is clearly impossible for readers and inserters to deadlock, and infact this algorithm allows them a very high degree of concurrency.(The exclusive metapage lock taken to update the tuple count is strongerthan necessary, since readers do not care about the tuple count, but thelock is held for such a short time that this is probably not an issue.)When an inserter cannot find space in any existing page of a bucket, itmust obtain an overflow page and add that page to the bucket's chain.Details of that part of the algorithm appear later.The page split algorithm is entered whenever an inserter observes that theindex is overfull (has a higher-than-wanted ratio of tuples to buckets).The algorithm attempts, but does not necessarily succeed, to split oneexisting bucket in two, thereby lowering the fill ratio:	exclusive-lock page 0 (assert the right to begin a split)	read/exclusive-lock meta page	check split still needed	if split not needed anymore, drop locks and exit	decide which bucket to split	Attempt to X-lock old bucket number (definitely could fail)	Attempt to X-lock new bucket number (shouldn't fail, but...)	if above fail, drop locks and exit	update meta page to reflect new number of buckets	write/release meta page	release X-lock on page 0	-- now, accesses to all other buckets can proceed.	Perform actual split of bucket, moving tuples as needed	>> see below about acquiring needed extra space	Release X-locks of old and new bucketsNote the page zero and metapage locks are not held while the actual tuplerearrangement is performed, so accesses to other buckets can proceed inparallel; in fact, it's possible for multiple bucket splits to proceedin parallel.Split's attempt to X-lock the old bucket number could fail if anotherprocess holds S-lock on it.  We do not want to wait if that happens, firstbecause we don't want to wait while holding the metapage exclusive-lock,and second because it could very easily result in deadlock.  (The otherprocess might be out of the hash AM altogether, and could do somethingthat blocks on another lock this process holds; so even if the hashalgorithm itself is deadlock-free, a user-induced deadlock could occur.)So, this is a conditional LockAcquire operation, and if it fails we justabandon the attempt to split.  This is all right since the index isoverfull but perfectly functional.  Every subsequent inserter will try tosplit, and eventually one will succeed.  If multiple inserters failed tosplit, the index might still be overfull, but eventually, the index willnot be overfull and split attempts will stop.  (We could make a successfulsplitter loop to see if the index is still overfull, but it seems better todistribute the split overhead across successive insertions.)A problem is that if a split fails partway through (eg due to insufficientdisk space) the index is left corrupt.  The probability of that could bemade quite low if we grab a free page or two before we update the metapage, but the only real solution is to treat a split as a WAL-loggable,must-complete action.  I'm not planning to teach hash about WAL in thisgo-round.The fourth operation is garbage collection (bulk deletion):	next bucket := 0	read/sharelock meta page	fetch current max bucket number	release meta page	while next bucket <= max bucket do		Acquire X lock on target bucket		Scan and remove tuples, compact free space as needed		Release X lock		next bucket ++	end loop	exclusive-lock meta page	check if number of buckets changed	if so, release lock and return to for-each-bucket loop	else update metapage tuple count	write/release meta pageNote that this is designed to allow concurrent splits.  If a split occurs,tuples relocated into the new bucket will be visited twice by the scan,but that does no harm.  (We must however be careful about the statisticsreported by the VACUUM operation.  What we can do is count the number oftuples scanned, and believe this in preference to the stored tuple countif the stored tuple count and number of buckets did *not* change at anytime during the scan.  This provides a way of correcting the stored tuplecount if it gets out of sync for some reason.  But if a split or insertiondoes occur concurrently, the scan count is untrustworthy; instead,subtract the number of tuples deleted from the stored tuple count anduse that.)The exclusive lock request could deadlock in some strange scenarios, butwe can just error out without any great harm being done.Free space management---------------------Free space management consists of two sub-algorithms, one for reservingan overflow page to add to a bucket chain, and one for returning an emptyoverflow page to the free pool.Obtaining an overflow page:	read/exclusive-lock meta page	determine next bitmap page number; if none, exit loop	release meta page lock	read/exclusive-lock bitmap page	search for a free page (zero bit in bitmap)	if found:		set bit in bitmap		write/release bitmap page		read/exclusive-lock meta page		if first-free-bit value did not change,			update it and write meta page		release meta page		return page number	else (not found):	release bitmap page	loop back to try next bitmap page, if any-- here when we have checked all bitmap pages; we hold meta excl. lock	extend index to add another overflow page; update meta information	write/release meta page	return page numberIt is slightly annoying to release and reacquire the metapage lockmultiple times, but it seems best to do it that way to minimize loss ofconcurrency against processes just entering the index.  We don't wantto hold the metapage exclusive lock while reading in a bitmap page.(We can at least avoid repeated buffer pin/unpin here.)The normal path for extending the index does not require doing I/O whileholding the metapage lock.  We do have to do I/O when the extensionrequires adding a new bitmap page as well as the required overflow page... but that is an infrequent case, so the loss of concurrency seemsacceptable.The portion of tuple insertion that calls the above subroutine lookslike this:	-- having determined that no space is free in the target bucket:	remember last page of bucket, drop write lock on it	call free-page-acquire routine	re-write-lock last page of bucket	if it is not last anymore, step to the last page	update (former) last page to point to new page	write-lock and initialize new page, with back link to former last page	write and release former last page	insert tuple into new page	-- etc.Notice this handles the case where two concurrent inserters try to extendthe same bucket.  They will end up with a valid, though perhapsspace-inefficient, configuration: two overflow pages will be added to thebucket, each containing one tuple.The last part of this violates the rule about holding write lock on twopages concurrently, but it should be okay to write-lock the previouslyfree page; there can be no other process holding lock on it.Bucket splitting uses a similar algorithm if it has to extend the newbucket, but it need not worry about concurrent extension since it hasexclusive lock on the new bucket.Freeing an overflow page is done by garbage collection and by bucketsplitting (the old bucket may contain no-longer-needed overflow pages).In both cases, the process holds exclusive lock on the containing bucket,so need not worry about other accessors of pages in the bucket.  Thealgorithm is:	delink overflow page from bucket chain	(this requires read/update/write/release of fore and aft siblings)	read/share-lock meta page	determine which bitmap page contains the free space bit for page	release meta page	read/exclusive-lock bitmap page	update bitmap bit	write/release bitmap page	if page number is less than what we saw as first-free-bit in meta:	read/exclusive-lock meta page	if page number is still less than first-free-bit,		update first-free-bit field and write meta page	release meta pageWe have to do it this way because we must clear the bitmap bit beforechanging the first-free-bit field (hashm_firstfree).  It is possible thatwe set first-free-bit too small (because someone has already reused thepage we just freed), but that is okay; the only cost is the next overflowpage acquirer will scan more bitmap bits than he needs to.  What must beavoided is having first-free-bit greater than the actual first free bit,because then that free page would never be found by searchers.All the freespace operations should be called while holding no bufferlocks.  Since they need no lmgr locks, deadlock is not possible.Other notes-----------All the shenanigans with locking prevent a split occurring while *another*process is stopped in a given bucket.  They do not ensure that one ofour *own* backend's scans is not stopped in the bucket, because lmgrdoesn't consider a process's own locks to conflict.  So the Splitalgorithm must check for that case separately before deciding it can goahead with the split.  VACUUM does not have this problem since nothingelse can be happening within the vacuuming backend.Should we instead try to fix the state of any conflicting local scan?Seems mighty ugly --- got to move the held bucket S-lock as well as lotsof other messiness.  For now, just punt and don't split.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -