📄 ibuf0ibuf.c
字号:
/******************************************************Insert buffer(c) 1997 Innobase OyCreated 7/19/1997 Heikki Tuuri*******************************************************/#include "ibuf0ibuf.h"#ifdef UNIV_NONINL#include "ibuf0ibuf.ic"#endif#include "buf0buf.h"#include "buf0rea.h"#include "fsp0fsp.h"#include "trx0sys.h"#include "fil0fil.h"#include "thr0loc.h"#include "rem0rec.h"#include "btr0cur.h"#include "btr0pcur.h"#include "btr0btr.h"#include "sync0sync.h"#include "dict0boot.h"#include "fut0lst.h"#include "lock0lock.h"#include "log0recv.h"#include "que0que.h"/* STRUCTURE OF AN INSERT BUFFER RECORDIn versions < 4.1.x:1. The first field is the page number.2. The second field is an array which stores type info for each subsequent field. We store the information which affects the ordering of records, and also the physical storage size of an SQL NULL value. E.g., for CHAR(10) it is 10 bytes.3. Next we have the fields of the actual index record.In versions >= 4.1.x:Note that contary to what we planned in the 1990's, there will only be oneinsert buffer tree, and that is in the system tablespace of InnoDB.1. The first field is the space id.2. The second field is a one-byte marker (0) which differentiates records from the < 4.1.x storage format.3. The third field is the page number.4. The fourth field contains the type info, where we have also added 2 bytes to store the charset. In the compressed table format of 5.0.x we must add more information here so that we can build a dummy 'index' struct which 5.0.x can use in the binary search on the index page in the ibuf merge phase.5. The rest of the fields contain the fields of the actual index record.In versions >= 5.0.3:The first byte of the fourth field is an additional marker (0) if the recordis in the compact format. The presence of this marker can be detected bylooking at the length of the field modulo DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE.The high-order bit of the character set field in the type info is the"nullable" flag for the field. *//* PREVENTING DEADLOCKS IN THE INSERT BUFFER SYSTEMIf an OS thread performs any operation that brings in disk pages fromnon-system tablespaces into the buffer pool, or creates such a page there,then the operation may have as a side effect an insert buffer index treecompression. Thus, the tree latch of the insert buffer tree may be acquiredin the x-mode, and also the file space latch of the system tablespace maybe acquired in the x-mode.Also, an insert to an index in a non-system tablespace can have the sameeffect. How do we know this cannot lead to a deadlock of OS threads? Thereis a problem with the i\o-handler threads: they break the latching orderbecause they own x-latches to pages which are on a lower level than theinsert buffer tree latch, its page latches, and the tablespace latch aninsert buffer operation can reserve.The solution is the following: Let all the tree and page latches connectedwith the insert buffer be later in the latching order than the fsp latch andfsp page latches.Insert buffer pages must be such that the insert buffer is never invokedwhen these pages are accessed as this would result in a recursion violatingthe latching order. We let a special i/o-handler thread take care of i/o tothe insert buffer pages and the ibuf bitmap pages, as well as the fsp bitmappages and the first inode page, which contains the inode of the ibuf tree: letus call all these ibuf pages. To prevent deadlocks, we do not let a read-aheadaccess both non-ibuf and ibuf pages.Then an i/o-handler for the insert buffer never needs to access recursively theinsert buffer tree and thus obeys the latching order. On the other hand, otheri/o-handlers for other tablespaces may require access to the insert buffer,but because all kinds of latches they need to access there are later in thelatching order, no violation of the latching order occurs in this case,either.A problem is how to grow and contract an insert buffer tree. As it is laterin the latching order than the fsp management, we have to reserve the fsplatch first, before adding or removing pages from the insert buffer tree.We let the insert buffer tree have its own file space management: a freelist of pages linked to the tree root. To prevent recursive using of theinsert buffer when adding pages to the tree, we must first load these pagesto memory, obtaining a latch on them, and only after that add them to thefree list of the insert buffer tree. More difficult is removing of pagesfrom the free list. If there is an excess of pages in the free list of theibuf tree, they might be needed if some thread reserves the fsp latch,intending to allocate more file space. So we do the following: if a threadreserves the fsp latch, we check the writer count field of the latch. Ifthis field has value 1, it means that the thread did not own the latchbefore entering the fsp system, and the mtr of the thread contains nomodifications to the fsp pages. Now we are free to reserve the ibuf latch,and check if there is an excess of pages in the free list. We can then, in aseparate mini-transaction, take them out of the free list and free them tothe fsp system.To avoid deadlocks in the ibuf system, we divide file pages into three levels:(1) non-ibuf pages,(2) ibuf tree pages and the pages in the ibuf tree free list, and(3) ibuf bitmap pages.No OS thread is allowed to access higher level pages if it has latches tolower level pages; even if the thread owns a B-tree latch it must not accessthe B-tree non-leaf pages if it has latches on lower level pages. Read-aheadis only allowed for level 1 and 2 pages. Dedicated i/o-handler threads handleexclusively level 1 i/o. A dedicated i/o handler thread handles exclusivelylevel 2 i/o. However, if an OS thread does the i/o handling for itself, i.e.,it uses synchronous aio, it can access any pages, as long as it obeys theaccess order rules. *//* Buffer pool size per the maximum insert buffer size */#define IBUF_POOL_SIZE_PER_MAX_SIZE 2/* The insert buffer control structure */ibuf_t* ibuf = NULL;staticulint ibuf_rnd = 986058871;ulint ibuf_flush_count = 0;/* Dimensions for the ibuf_count array */#define IBUF_COUNT_N_SPACES 500#define IBUF_COUNT_N_PAGES 2000/* Buffered entry counts for file pages, used in debugging */static ulint* ibuf_counts[IBUF_COUNT_N_SPACES];static ibool ibuf_counts_inited = FALSE;/* The start address for an insert buffer bitmap page bitmap */#define IBUF_BITMAP PAGE_DATA/* Offsets in bits for the bits describing a single page in the bitmap */#define IBUF_BITMAP_FREE 0#define IBUF_BITMAP_BUFFERED 2#define IBUF_BITMAP_IBUF 3 /* TRUE if page is a part of the ibuf tree, excluding the root page, or is in the free list of the ibuf *//* Number of bits describing a single page */#define IBUF_BITS_PER_PAGE 4#if IBUF_BITS_PER_PAGE % 2# error "IBUF_BITS_PER_PAGE must be an even number!"#endif/* The mutex used to block pessimistic inserts to ibuf trees */static mutex_t ibuf_pessimistic_insert_mutex;/* The mutex protecting the insert buffer structs */static mutex_t ibuf_mutex;/* The mutex protecting the insert buffer bitmaps */static mutex_t ibuf_bitmap_mutex;/* The area in pages from which contract looks for page numbers for merge */#define IBUF_MERGE_AREA 8/* Inside the merge area, pages which have at most 1 per this number lessbuffered entries compared to maximum volume that can buffered for a singlepage are merged along with the page whose buffer became full */#define IBUF_MERGE_THRESHOLD 4/* In ibuf_contract at most this number of pages is read to memory in onebatch, in order to merge the entries for them in the insert buffer */#define IBUF_MAX_N_PAGES_MERGED IBUF_MERGE_AREA/* If the combined size of the ibuf trees exceeds ibuf->max_size by thismany pages, we start to contract it in connection to inserts there, usingnon-synchronous contract */#define IBUF_CONTRACT_ON_INSERT_NON_SYNC 0/* Same as above, but use synchronous contract */#define IBUF_CONTRACT_ON_INSERT_SYNC 5/* Same as above, but no insert is done, only contract is called */#define IBUF_CONTRACT_DO_NOT_INSERT 10/* TODO: how to cope with drop table if there are records in the insertbuffer for the indexes of the table? Is there actually any problem,because ibuf merge is done to a page when it is read in, and it isstill physically like the index page even if the index would have beendropped! So, there seems to be no problem. *//**********************************************************************Validates the ibuf data structures when the caller owns ibuf_mutex. */iboolibuf_validate_low(void);/*===================*/ /* out: TRUE if ok *//**********************************************************************Sets the flag in the current OS thread local storage denoting that it isinside an insert buffer routine. */UNIV_INLINEvoidibuf_enter(void)/*============*/{ ibool* ptr; ptr = thr_local_get_in_ibuf_field(); ut_ad(*ptr == FALSE); *ptr = TRUE;}/**********************************************************************Sets the flag in the current OS thread local storage denoting that it isexiting an insert buffer routine. */UNIV_INLINEvoidibuf_exit(void)/*===========*/{ ibool* ptr; ptr = thr_local_get_in_ibuf_field(); ut_ad(*ptr == TRUE); *ptr = FALSE;}/**********************************************************************Returns TRUE if the current OS thread is performing an insert bufferroutine. */iboolibuf_inside(void)/*=============*/ /* out: TRUE if inside an insert buffer routine: for instance, a read-ahead of non-ibuf pages is then forbidden */{ return(*thr_local_get_in_ibuf_field());}/**********************************************************************Gets the ibuf header page and x-latches it. */staticpage_t*ibuf_header_page_get(/*=================*/ /* out: insert buffer header page */ ulint space, /* in: space id */ mtr_t* mtr) /* in: mtr */{ page_t* page; ut_a(space == 0); ut_ad(!ibuf_inside()); page = buf_page_get(space, FSP_IBUF_HEADER_PAGE_NO, RW_X_LATCH, mtr);#ifdef UNIV_SYNC_DEBUG buf_page_dbg_add_level(page, SYNC_IBUF_HEADER);#endif /* UNIV_SYNC_DEBUG */ return(page);}/**********************************************************************Gets the root page and x-latches it. */staticpage_t*ibuf_tree_root_get(/*===============*/ /* out: insert buffer tree root page */ ibuf_data_t* data, /* in: ibuf data */ ulint space, /* in: space id */ mtr_t* mtr) /* in: mtr */{ page_t* page; ut_a(space == 0); ut_ad(ibuf_inside()); mtr_x_lock(dict_tree_get_lock((data->index)->tree), mtr); page = buf_page_get(space, FSP_IBUF_TREE_ROOT_PAGE_NO, RW_X_LATCH, mtr);#ifdef UNIV_SYNC_DEBUG buf_page_dbg_add_level(page, SYNC_TREE_NODE);#endif /* UNIV_SYNC_DEBUG */ return(page);}/**********************************************************************Gets the ibuf count for a given page. */ulintibuf_count_get(/*===========*/ /* out: number of entries in the insert buffer currently buffered for this page */ ulint space, /* in: space id */ ulint page_no)/* in: page number */{ ut_ad(space < IBUF_COUNT_N_SPACES); ut_ad(page_no < IBUF_COUNT_N_PAGES); if (!ibuf_counts_inited) { return(0); } return(*(ibuf_counts[space] + page_no));}/**********************************************************************Sets the ibuf count for a given page. */#ifdef UNIV_IBUF_DEBUGstaticvoidibuf_count_set(/*===========*/ ulint space, /* in: space id */ ulint page_no,/* in: page number */ ulint val) /* in: value to set */{ ut_a(space < IBUF_COUNT_N_SPACES); ut_a(page_no < IBUF_COUNT_N_PAGES); ut_a(val < UNIV_PAGE_SIZE); *(ibuf_counts[space] + page_no) = val;}#endif/**********************************************************************Creates the insert buffer data structure at a database startup and initializesthe data structures for the insert buffer. */voidibuf_init_at_db_start(void)/*=======================*/{ ibuf = mem_alloc(sizeof(ibuf_t)); /* Note that also a pessimistic delete can sometimes make a B-tree grow in size, as the references on the upper levels of the tree can change */ ibuf->max_size = buf_pool_get_curr_size() / UNIV_PAGE_SIZE / IBUF_POOL_SIZE_PER_MAX_SIZE; ibuf->meter = IBUF_THRESHOLD + 1; UT_LIST_INIT(ibuf->data_list); ibuf->size = 0; #ifdef UNIV_IBUF_DEBUG { ulint i, j; for (i = 0; i < IBUF_COUNT_N_SPACES; i++) { ibuf_counts[i] = mem_alloc(sizeof(ulint) * IBUF_COUNT_N_PAGES); for (j = 0; j < IBUF_COUNT_N_PAGES; j++) { ibuf_count_set(i, j, 0); } } } #endif mutex_create(&ibuf_pessimistic_insert_mutex); mutex_set_level(&ibuf_pessimistic_insert_mutex, SYNC_IBUF_PESS_INSERT_MUTEX); mutex_create(&ibuf_mutex); mutex_set_level(&ibuf_mutex, SYNC_IBUF_MUTEX); mutex_create(&ibuf_bitmap_mutex); mutex_set_level(&ibuf_bitmap_mutex, SYNC_IBUF_BITMAP_MUTEX); fil_ibuf_init_at_db_start(); ibuf_counts_inited = TRUE;}/**********************************************************************Updates the size information in an ibuf data, assuming the segment size hasnot changed. */staticvoidibuf_data_sizes_update(/*===================*/ ibuf_data_t* data, /* in: ibuf data struct */ page_t* root, /* in: ibuf tree root */ mtr_t* mtr) /* in: mtr */{ ulint old_size;#ifdef UNIV_SYNC_DEBUG ut_ad(mutex_own(&ibuf_mutex));#endif /* UNIV_SYNC_DEBUG */ old_size = data->size; data->free_list_len = flst_get_len(root + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST, mtr); data->height = 1 + btr_page_get_level(root, mtr); data->size = data->seg_size - (1 + data->free_list_len); /* the '1 +' is the ibuf header page */ ut_ad(data->size < data->seg_size); if (page_get_n_recs(root) == 0) { data->empty = TRUE; } else { data->empty = FALSE; } ut_ad(ibuf->size + data->size >= old_size); ibuf->size = ibuf->size + data->size - old_size;/* fprintf(stderr, "ibuf size %lu, space ibuf size %lu\n", ibuf->size, data->size); */}/**********************************************************************Creates the insert buffer data struct for a single tablespace. Reads theroot page of the insert buffer tree in the tablespace. This function canbe called only after the dictionary system has been initialized, as thiscreates also the insert buffer table and index into this tablespace. */ibuf_data_t*ibuf_data_init_for_space(/*=====================*/ /* out, own: ibuf data struct, linked to the list in ibuf control structure */ ulint space) /* in: space id */{ ibuf_data_t* data; page_t* root; page_t* header_page; mtr_t mtr; char buf[50]; dict_table_t* table; dict_index_t* index; ulint n_used; ut_a(space == 0);#ifdef UNIV_LOG_DEBUG if (space % 2 == 1) { fputs("No ibuf op in replicate space\n", stderr); return(NULL); }#endif data = mem_alloc(sizeof(ibuf_data_t)); data->space = space; mtr_start(&mtr); mutex_enter(&ibuf_mutex); mtr_x_lock(fil_space_get_latch(space), &mtr); header_page = ibuf_header_page_get(space, &mtr); fseg_n_reserved_pages(header_page + IBUF_HEADER + IBUF_TREE_SEG_HEADER, &n_used, &mtr); ibuf_enter();
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -