⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 nbtinsert.c

📁 PostgreSQL7.4.6 for Linux
💻 C
📖 第 1 页 / 共 4 页
字号:
/*------------------------------------------------------------------------- * * btinsert.c *	  Item insertion in Lehman and Yao btrees for Postgres. * * Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION *	  $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.106.2.1 2004/08/17 23:16:07 tgl Exp $ * *------------------------------------------------------------------------- */#include "postgres.h"#include "access/heapam.h"#include "access/nbtree.h"#include "miscadmin.h"typedef struct{	/* context data for _bt_checksplitloc */	Size		newitemsz;		/* size of new item to be inserted */	bool		is_leaf;		/* T if splitting a leaf page */	bool		is_rightmost;	/* T if splitting a rightmost page */	bool		have_split;		/* found a valid split? */	/* these fields valid only if have_split is true */	bool		newitemonleft;	/* new item on left or right of best split */	OffsetNumber firstright;	/* best split point */	int			best_delta;		/* best size delta so far */} FindSplitData;static Buffer _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf);static TransactionId _bt_check_unique(Relation rel, BTItem btitem,				 Relation heapRel, Buffer buf,				 ScanKey itup_scankey);static InsertIndexResult _bt_insertonpg(Relation rel, Buffer buf,			   BTStack stack,			   int keysz, ScanKey scankey,			   BTItem btitem,			   OffsetNumber afteritem,			   bool split_only_page);static Buffer _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,		  OffsetNumber newitemoff, Size newitemsz,		  BTItem newitem, bool newitemonleft,		  OffsetNumber *itup_off, BlockNumber *itup_blkno);static OffsetNumber _bt_findsplitloc(Relation rel, Page page,				 OffsetNumber newitemoff,				 Size newitemsz,				 bool *newitemonleft);static void _bt_checksplitloc(FindSplitData *state, OffsetNumber firstright,				  int leftfree, int rightfree,				  bool newitemonleft, Size firstrightitemsz);static void _bt_pgaddtup(Relation rel, Page page,			 Size itemsize, BTItem btitem,			 OffsetNumber itup_off, const char *where);static bool _bt_isequal(TupleDesc itupdesc, Page page, OffsetNumber offnum,			int keysz, ScanKey scankey);/* *	_bt_doinsert() -- Handle insertion of a single btitem in the tree. * *		This routine is called by the public interface routines, btbuild *		and btinsert.  By here, btitem is filled in, including the TID. */InsertIndexResult_bt_doinsert(Relation rel, BTItem btitem,			 bool index_is_unique, Relation heapRel){	IndexTuple	itup = &(btitem->bti_itup);	int			natts = rel->rd_rel->relnatts;	ScanKey		itup_scankey;	BTStack		stack;	Buffer		buf;	InsertIndexResult res;	/* we need a scan key to do our search, so build one */	itup_scankey = _bt_mkscankey(rel, itup);top:	/* find the page containing this key */	stack = _bt_search(rel, natts, itup_scankey, &buf, BT_WRITE);	/* trade in our read lock for a write lock */	LockBuffer(buf, BUFFER_LOCK_UNLOCK);	LockBuffer(buf, BT_WRITE);	/*	 * If the page was split between the time that we surrendered our read	 * lock and acquired our write lock, then this page may no longer be	 * the right place for the key we want to insert.  In this case, we	 * need to move right in the tree.	See Lehman and Yao for an	 * excruciatingly precise description.	 */	buf = _bt_moveright(rel, buf, natts, itup_scankey, BT_WRITE);	/*	 * If we're not allowing duplicates, make sure the key isn't already	 * in the index.	 *	 * NOTE: obviously, _bt_check_unique can only detect keys that are	 * already in the index; so it cannot defend against concurrent	 * insertions of the same key.	We protect against that by means of	 * holding a write lock on the target page.  Any other would-be	 * inserter of the same key must acquire a write lock on the same	 * target page, so only one would-be inserter can be making the check	 * at one time.  Furthermore, once we are past the check we hold write	 * locks continuously until we have performed our insertion, so no	 * later inserter can fail to see our insertion.  (This requires some	 * care in _bt_insertonpg.)	 *	 * If we must wait for another xact, we release the lock while waiting,	 * and then must start over completely.	 */	if (index_is_unique)	{		TransactionId xwait;		xwait = _bt_check_unique(rel, btitem, heapRel, buf, itup_scankey);		if (TransactionIdIsValid(xwait))		{			/* Have to wait for the other guy ... */			_bt_relbuf(rel, buf);			XactLockTableWait(xwait);			/* start over... */			_bt_freestack(stack);			goto top;		}	}	/* do the insertion */	res = _bt_insertonpg(rel, buf, stack, natts, itup_scankey, btitem,						 0, false);	/* be tidy */	_bt_freestack(stack);	_bt_freeskey(itup_scankey);	return res;}/* *	_bt_check_unique() -- Check for violation of unique index constraint * * Returns InvalidTransactionId if there is no conflict, else an xact ID * we must wait for to see if it commits a conflicting tuple.	If an actual * conflict is detected, no return --- just ereport(). */static TransactionId_bt_check_unique(Relation rel, BTItem btitem, Relation heapRel,				 Buffer buf, ScanKey itup_scankey){	TupleDesc	itupdesc = RelationGetDescr(rel);	int			natts = rel->rd_rel->relnatts;	OffsetNumber offset,				maxoff;	Page		page;	BTPageOpaque opaque;	Buffer		nbuf = InvalidBuffer;	page = BufferGetPage(buf);	opaque = (BTPageOpaque) PageGetSpecialPointer(page);	maxoff = PageGetMaxOffsetNumber(page);	/*	 * Find first item >= proposed new item.  Note we could also get a	 * pointer to end-of-page here.	 */	offset = _bt_binsrch(rel, buf, natts, itup_scankey);	/*	 * Scan over all equal tuples, looking for live conflicts.	 */	for (;;)	{		HeapTupleData htup;		Buffer		hbuffer;		ItemId		curitemid;		BTItem		cbti;		BlockNumber nblkno;		/*		 * make sure the offset points to an actual item before trying to		 * examine it...		 */		if (offset <= maxoff)		{			curitemid = PageGetItemId(page, offset);			/*			 * We can skip items that are marked killed.			 *			 * Formerly, we applied _bt_isequal() before checking the kill			 * flag, so as to fall out of the item loop as soon as possible.			 * However, in the presence of heavy update activity an index			 * may contain many killed items with the same key; running			 * _bt_isequal() on each killed item gets expensive.  Furthermore			 * it is likely that the non-killed version of each key appears			 * first, so that we didn't actually get to exit any sooner anyway.			 * So now we just advance over killed items as quickly as we can.			 * We only apply _bt_isequal() when we get to a non-killed item or			 * the end of the page.			 */			if (!ItemIdDeleted(curitemid))			{				/*				 * _bt_compare returns 0 for (1,NULL) and (1,NULL) - this's				 * how we handling NULLs - and so we must not use _bt_compare				 * in real comparison, but only for ordering/finding items on				 * pages. - vadim 03/24/97				 */				if (!_bt_isequal(itupdesc, page, offset, natts, itup_scankey))					break;			/* we're past all the equal tuples */				/* okay, we gotta fetch the heap tuple ... */				cbti = (BTItem) PageGetItem(page, curitemid);				htup.t_self = cbti->bti_itup.t_tid;				if (heap_fetch(heapRel, SnapshotDirty, &htup, &hbuffer,							   true, NULL))				{					/* it is a duplicate */					TransactionId xwait =					(TransactionIdIsValid(SnapshotDirty->xmin)) ?					SnapshotDirty->xmin : SnapshotDirty->xmax;					ReleaseBuffer(hbuffer);					/*					 * If this tuple is being updated by other transaction					 * then we have to wait for its commit/abort.					 */					if (TransactionIdIsValid(xwait))					{						if (nbuf != InvalidBuffer)							_bt_relbuf(rel, nbuf);						/* Tell _bt_doinsert to wait... */						return xwait;					}					/*					 * Otherwise we have a definite conflict.					 */					ereport(ERROR,							(errcode(ERRCODE_UNIQUE_VIOLATION),							 errmsg("duplicate key violates unique constraint \"%s\"",									RelationGetRelationName(rel))));				}				else if (htup.t_data != NULL)				{					/*					 * Hmm, if we can't see the tuple, maybe it can be					 * marked killed.  This logic should match					 * index_getnext and btgettuple.					 */					uint16		sv_infomask;					LockBuffer(hbuffer, BUFFER_LOCK_SHARE);					sv_infomask = htup.t_data->t_infomask;					if (HeapTupleSatisfiesVacuum(htup.t_data,												 RecentGlobalXmin) ==						HEAPTUPLE_DEAD)					{						curitemid->lp_flags |= LP_DELETE;						SetBufferCommitInfoNeedsSave(buf);					}					if (sv_infomask != htup.t_data->t_infomask)						SetBufferCommitInfoNeedsSave(hbuffer);					LockBuffer(hbuffer, BUFFER_LOCK_UNLOCK);					ReleaseBuffer(hbuffer);				}			}		}		/*		 * Advance to next tuple to continue checking.		 */		if (offset < maxoff)			offset = OffsetNumberNext(offset);		else		{			/* If scankey == hikey we gotta check the next page too */			if (P_RIGHTMOST(opaque))				break;			if (!_bt_isequal(itupdesc, page, P_HIKEY,							 natts, itup_scankey))				break;			/* Advance to next non-dead page --- there must be one */			for (;;)			{				nblkno = opaque->btpo_next;				if (nbuf != InvalidBuffer)					_bt_relbuf(rel, nbuf);				nbuf = _bt_getbuf(rel, nblkno, BT_READ);				page = BufferGetPage(nbuf);				opaque = (BTPageOpaque) PageGetSpecialPointer(page);				if (!P_IGNORE(opaque))					break;				if (P_RIGHTMOST(opaque))					elog(ERROR, "fell off the end of \"%s\"",						 RelationGetRelationName(rel));			}			maxoff = PageGetMaxOffsetNumber(page);			offset = P_FIRSTDATAKEY(opaque);		}	}	if (nbuf != InvalidBuffer)		_bt_relbuf(rel, nbuf);	return InvalidTransactionId;}/*---------- *	_bt_insertonpg() -- Insert a tuple on a particular page in the index. * *		This recursive procedure does the following things: * *			+  finds the right place to insert the tuple. *			+  if necessary, splits the target page (making sure that the *			   split is equitable as far as post-insert free space goes). *			+  inserts the tuple. *			+  if the page was split, pops the parent stack, and finds the *			   right place to insert the new child pointer (by walking *			   right using information stored in the parent stack). *			+  invokes itself with the appropriate tuple for the right *			   child page on the parent. *			+  updates the metapage if a true root or fast root is split. * *		On entry, we must have the right buffer on which to do the *		insertion, and the buffer must be pinned and locked.  On return, *		we will have dropped both the pin and the write lock on the buffer. * *		If 'afteritem' is >0 then the new tuple must be inserted after the *		existing item of that number, noplace else.  If 'afteritem' is 0 *		then the procedure finds the exact spot to insert it by searching. *		(keysz and scankey parameters are used ONLY if afteritem == 0.) * *		NOTE: if the new key is equal to one or more existing keys, we can *		legitimately place it anywhere in the series of equal keys --- in fact, *		if the new key is equal to the page's "high key" we can place it on *		the next page.	If it is equal to the high key, and there's not room *		to insert the new tuple on the current page without splitting, then *		we can move right hoping to find more free space and avoid a split. *		(We should not move right indefinitely, however, since that leads to *		O(N^2) insertion behavior in the presence of many equal keys.) *		Once we have chosen the page to put the key on, we'll insert it before *		any existing equal keys because of the way _bt_binsrch() works. * *		The locking interactions in this code are critical.  You should *		grok Lehman and Yao's paper before making any changes.  In addition, *		you need to understand how we disambiguate duplicate keys in this *		implementation, in order to be able to find our location using *		L&Y "move right" operations.  Since we may insert duplicate user *		keys, and since these dups may propagate up the tree, we use the *		'afteritem' parameter to position ourselves correctly for the *		insertion on internal pages. *---------- */static InsertIndexResult_bt_insertonpg(Relation rel,			   Buffer buf,			   BTStack stack,			   int keysz,			   ScanKey scankey,			   BTItem btitem,			   OffsetNumber afteritem,			   bool split_only_page){	InsertIndexResult res;	Page		page;	BTPageOpaque lpageop;	OffsetNumber itup_off;	BlockNumber itup_blkno;	OffsetNumber newitemoff;	OffsetNumber firstright = InvalidOffsetNumber;	Size		itemsz;	page = BufferGetPage(buf);	lpageop = (BTPageOpaque) PageGetSpecialPointer(page);	itemsz = IndexTupleDSize(btitem->bti_itup)		+ (sizeof(BTItemData) - sizeof(IndexTupleData));	itemsz = MAXALIGN(itemsz);	/* be safe, PageAddItem will do this but								 * we need to be consistent */	/*	 * Check whether the item can fit on a btree page at all. (Eventually,	 * we ought to try to apply TOAST methods if not.) We actually need to	 * be able to fit three items on every page, so restrict any one item	 * to 1/3 the per-page available space. Note that at this point,	 * itemsz doesn't include the ItemId.	 */	if (itemsz > BTMaxItemSize(page))

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -