buf0buf.c

来自「这是linux下运行的mysql软件包,可用于linux 下安装 php + m」· C语言代码 · 共 2,379 行 · 第 1/5 页
2,379 行
/*   Innobase relational database engine; Copyright (C) 2001 Innobase Oy          This program is free software; you can redistribute it and/or modify     it under the terms of the GNU General Public License 2     as published by the Free Software Foundation in June 1991.          This program is distributed in the hope that it will be useful,     but WITHOUT ANY WARRANTY; without even the implied warranty of     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the     GNU General Public License for more details.          You should have received a copy of the GNU General Public License 2     along with this program (in file COPYING); if not, write to the Free     Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//******************************************************The database buffer buf_pool(c) 1995 Innobase OyCreated 11/5/1995 Heikki Tuuri*******************************************************/#include "buf0buf.h"#ifdef UNIV_NONINL#include "buf0buf.ic"#endif#include "mem0mem.h"#include "btr0btr.h"#include "fil0fil.h"#include "lock0lock.h"#include "btr0sea.h"#include "ibuf0ibuf.h"#include "dict0dict.h"#include "log0recv.h"#include "log0log.h"#include "trx0undo.h"#include "srv0srv.h"/*		IMPLEMENTATION OF THE BUFFER POOL		=================================Performance improvement: ------------------------Thread scheduling in NT may be so slow that the OS wait mechanism shouldnot be used even in waiting for disk reads to complete.Rather, we should put waiting query threads to the queue ofwaiting jobs, and let the OS thread do something useful while the i/ois processed. In this way we could remove most OS thread switches inan i/o-intensive benchmark like TPC-C.A possibility is to put a user space thread library between the databaseand NT. User space thread libraries might be very fast.SQL Server 7.0 can be configured to use 'fibers' which are lightweightthreads in NT. These should be studied.		Buffer frames and blocks		------------------------Following the terminology of Gray and Reuter, we call the memoryblocks where file pages are loaded buffer frames. For each bufferframe there is a control block, or shortly, a block, in the buffercontrol array. The control info which does not need to be storedin the file along with the file page, resides in the control block.		Buffer pool struct		------------------The buffer buf_pool contains a single mutex which protects all thecontrol data structures of the buf_pool. The content of a buffer frame isprotected by a separate read-write lock in its control block, though.These locks can be locked and unlocked without owning the buf_pool mutex.The OS events in the buf_pool struct can be waited for without owning thebuf_pool mutex.The buf_pool mutex is a hot-spot in main memory, causing a lot ofmemory bus traffic on multiprocessor systems when processorsalternately access the mutex. On our Pentium, the mutex is accessedmaybe every 10 microseconds. We gave up the solution to have mutexesfor each control block, for instance, because it seemed to becomplicated.A solution to reduce mutex contention of the buf_pool mutex is tocreate a separate mutex for the page hash table. On Pentium,accessing the hash table takes 2 microseconds, about halfof the total buf_pool mutex hold time.		Control blocks		--------------The control block contains, for instance, the bufferfix countwhich is incremented when a thread wants a file page to be fixedin a buffer frame. The bufferfix operation does not lock thecontents of the frame, however. For this purpose, the controlblock contains a read-write lock.The buffer frames have to be aligned so that the start memoryaddress of a frame is divisible by the universal page size, whichis a power of two.We intend to make the buffer buf_pool size on-line reconfigurable,that is, the buf_pool size can be changed without closing the database.Then the database administarator may adjust it to be biggerat night, for example. The control block array mustcontain enough control blocks for the maximum buffer buf_pool sizewhich is used in the particular database.If the buf_pool size is cut, we exploit the virtual memory mechanism ofthe OS, and just refrain from using frames at high addresses. Then the OScan swap them to disk.The control blocks containing file pages are put to a hash tableaccording to the file address of the page.We could speed up the access to an individual page by using"pointer swizzling": we could replace the page references onnon-leaf index pages by direct pointers to the page, if it existsin the buf_pool. We could make a separate hash table where we couldchain all the page references in non-leaf pages residing in the buf_pool,using the page reference as the hash key,and at the time of reading of a page update the pointers accordingly.Drawbacks of this solution are added complexity and,possibly, extra space required on non-leaf pages for memory pointers.A simpler solution is just to speed up the hash table mechanismin the database, using tables whose size is a power of 2.		Lists of blocks		---------------There are several lists of control blocks. The free list containsblocks which are currently not used.The LRU-list contains all the blocks holding a file pageexcept those for which the bufferfix count is non-zero.The pages are in the LRU list roughly in the order of the lastaccess to the page, so that the oldest pages are at the end of thelist. We also keep a pointer to near the end of the LRU list,which we can use when we want to artificially age a page in thebuf_pool. This is used if we know that some page is not neededagain for some time: we insert the block right after the pointer,causing it to be replaced sooner than would noramlly be the case.Currently this aging mechanism is used for read-ahead mechanismof pages, and it can also be used when there is a scan of a fulltable which cannot fit in the memory. Putting the pages near theof the LRU list, we make sure that most of the buf_pool stays in themain memory, undisturbed.The chain of modified blocks contains the blocksholding file pages that have been modified in the memorybut not written to disk yet. The block with the oldest modificationwhich has not yet been written to disk is at the end of the chain.		Loading a file page		-------------------First, a victim block for replacement has to be found in thebuf_pool. It is taken from the free list or searched for from theend of the LRU-list. An exclusive lock is reserved for the frame,the io_fix field is set in the block fixing the block in buf_pool,and the io-operation for loading the page is queued. The io-handler threadreleases the X-lock on the frame and resets the io_fix fieldwhen the io operation completes.A thread may request the above operation using the buf_page_get-function. It may then continue to request a lock on the frame.The lock is granted when the io-handler releases the x-lock.		Read-ahead		----------The read-ahead mechanism is intended to be intelligent andisolated from the semantically higher levels of the databaseindex management. From the higher level we only need theinformation if a file page has a natural successor orpredecessor page. On the leaf level of a B-tree index,these are the next and previous pages in the naturalorder of the pages.Let us first explain the read-ahead mechanism when the leafsof a B-tree are scanned in an ascending or descending order.When a read page is the first time referenced in the buf_pool,the buffer manager checks if it is at the border of a so-calledlinear read-ahead area. The tablespace is divided into theseareas of size 64 blocks, for example. So if the page is at theborder of such an area, the read-ahead mechanism checks ifall the other blocks in the area have been accessed in anascending or descending order. If this is the case, the systemlooks at the natural successor or predecessor of the page,checks if that is at the border of another area, and in this caseissues read-requests for all the pages in that area. Maybewe could relax the condition that all the pages in the areahave to be accessed: if data is deleted from a table, there mayappear holes of unused pages in the area.A different read-ahead mechanism is used when there appearsto be a random access pattern to a file.If a new page is referenced in the buf_pool, and several pagesof its random access area (for instance, 32 consecutive pagesin a tablespace) have recently been referenced, we may predictthat the whole area may be needed in the near future, and issuethe read requests for the whole area.		AWE implementation		------------------By a 'block' we mean the buffer header of type buf_block_t. By a 'page'we mean the physical 16 kB memory area allocated from RAM for that block.By a 'frame' we mean a 16 kB area in the virtual address space of theprocess, in the frame_mem of buf_pool.We can map pages to the frames of the buffer pool.1) A buffer block allocated to use as a non-data page, e.g., to the locktable, is always mapped to a frame.2) A bufferfixed or io-fixed data page is always mapped to a frame.3) When we need to map a block to frame, we look from the listawe_LRU_free_mapped and try to unmap its last block, but note thatbufferfixed or io-fixed pages cannot be unmapped.4) For every frame in the buffer pool there is always a block whose page ismapped to it. When we create the buffer pool, we map the first elementsin the free list to the frames.5) When we have AWE enabled, we disable adaptive hash indexes.*/buf_pool_t*	buf_pool = NULL; /* The buffer buf_pool of the database */#ifdef UNIV_DEBUGulint		buf_dbg_counter	= 0; /* This is used to insert validation					operations in excution in the					debug version */ibool		buf_debug_prints = FALSE; /* If this is set TRUE,					the program prints info whenever					read-ahead or flush occurs */#endif /* UNIV_DEBUG *//************************************************************************Calculates a page checksum which is stored to the page when it is writtento a file. Note that we must be careful to calculate the same value on32-bit and 64-bit architectures. */ulintbuf_calc_page_new_checksum(/*=======================*/		       /* out: checksum */	byte*    page) /* in: buffer page */{  	ulint checksum;        /* Since the field FIL_PAGE_FILE_FLUSH_LSN, and in versions <= 4.1.x        ..._ARCH_LOG_NO, are written outside the buffer pool to the first        pages of data files, we have to skip them in the page checksum        calculation.	We must also skip the field FIL_PAGE_SPACE_OR_CHKSUM where the	checksum is stored, and also the last 8 bytes of page because	there we store the old formula checksum. */  	  	checksum = ut_fold_binary(page + FIL_PAGE_OFFSET,				 FIL_PAGE_FILE_FLUSH_LSN - FIL_PAGE_OFFSET)  		   + ut_fold_binary(page + FIL_PAGE_DATA, 				           UNIV_PAGE_SIZE - FIL_PAGE_DATA				           - FIL_PAGE_END_LSN_OLD_CHKSUM);  	checksum = checksum & 0xFFFFFFFFUL;  	return(checksum);}/************************************************************************In versions < 4.0.14 and < 4.1.1 there was a bug that the checksum onlylooked at the first few bytes of the page. This calculates that oldchecksum. NOTE: we must first store the new formula checksum toFIL_PAGE_SPACE_OR_CHKSUM before calculating and storing this old checksumbecause this takes that field as an input! */ulintbuf_calc_page_old_checksum(/*=======================*/		       /* out: checksum */	byte*    page) /* in: buffer page */{  	ulint checksum;  	  	checksum = ut_fold_binary(page, FIL_PAGE_FILE_FLUSH_LSN);  	checksum = checksum & 0xFFFFFFFFUL;  	return(checksum);}/************************************************************************Checks if a page is corrupt. */iboolbuf_page_is_corrupted(/*==================*/				/* out: TRUE if corrupted */	byte*	read_buf)	/* in: a database page */{	ulint	checksum;	ulint	old_checksum;	ulint	checksum_field;	ulint	old_checksum_field;#ifndef UNIV_HOTBACKUP	dulint	current_lsn;#endif	if (mach_read_from_4(read_buf + FIL_PAGE_LSN + 4)	     != mach_read_from_4(read_buf + UNIV_PAGE_SIZE				- FIL_PAGE_END_LSN_OLD_CHKSUM + 4)) {		/* Stored log sequence numbers at the start and the end		of page do not match */		return(TRUE);	}#ifndef UNIV_HOTBACKUP	if (recv_lsn_checks_on && log_peek_lsn(&current_lsn)) {		if (ut_dulint_cmp(current_lsn,				  mach_read_from_8(read_buf + FIL_PAGE_LSN))				 < 0) {			ut_print_timestamp(stderr);			fprintf(stderr,"  InnoDB: Error: page %lu log sequence number %lu %lu\n""InnoDB: is in the future! Current system log sequence number %lu %lu.\n""InnoDB: Your database may be corrupt or you may have copied the InnoDB\n""InnoDB: tablespace but not the InnoDB log files. See\n""http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.\n",		        (ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET),			(ulong) ut_dulint_get_high(				mach_read_from_8(read_buf + FIL_PAGE_LSN)),			(ulong) ut_dulint_get_low(				mach_read_from_8(read_buf + FIL_PAGE_LSN)),			(ulong) ut_dulint_get_high(current_lsn),			(ulong) ut_dulint_get_low(current_lsn));		}	}#endif    /* If we use checksums validation, make additional check before returning  TRUE to ensure that the checksum is not equal to BUF_NO_CHECKSUM_MAGIC which  might be stored by InnoDB with checksums disabled.     Otherwise, skip checksum calculation and return FALSE */    if (srv_use_checksums) {    old_checksum = buf_calc_page_old_checksum(read_buf);     old_checksum_field = mach_read_from_4(read_buf + UNIV_PAGE_SIZE					- FIL_PAGE_END_LSN_OLD_CHKSUM);    /* There are 2 valid formulas for old_checksum_field:	  1. Very old versions of InnoDB only stored 8 byte lsn to the start	  and the end of the page.	  2. Newer InnoDB versions store the old formula checksum there. */	    if (old_checksum_field != mach_read_from_4(read_buf + FIL_PAGE_LSN)        && old_checksum_field != old_checksum        && old_checksum_field != BUF_NO_CHECKSUM_MAGIC) {      return(TRUE);    }    checksum = buf_calc_page_new_checksum(read_buf);    checksum_field = mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM);    /* InnoDB versions < 4.0.14 and < 4.1.1 stored the space id	  (always equal to 0), to FIL_PAGE_SPACE_SPACE_OR_CHKSUM */    if (checksum_field != 0 && checksum_field != checksum        && checksum_field != BUF_NO_CHECKSUM_MAGIC) {      return(TRUE);    }  }  	return(FALSE);}/************************************************************************Prints a page to stderr. */voidbuf_page_print(/*===========*/	byte*	read_buf)	/* in: a database page */{	dict_index_t*	index;	ulint		checksum;	ulint		old_checksum;	ut_print_timestamp(stderr);	fprintf(stderr, "  InnoDB: Page dump in ascii and hex (%lu bytes):\n",		(ulint)UNIV_PAGE_SIZE);	ut_print_buf(stderr, read_buf, UNIV_PAGE_SIZE);	fputs("InnoDB: End of page dump\n", stderr);	checksum = srv_use_checksums ?    buf_calc_page_new_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;	old_checksum = srv_use_checksums ?    buf_calc_page_old_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;	ut_print_timestamp(stderr);	fprintf(stderr, "  InnoDB: Page checksum %lu, prior-to-4.0.14-form checksum %lu\n""InnoDB: stored checksum %lu, prior-to-4.0.14-form stored checksum %lu\n",			(ulong) checksum, (ulong) old_checksum,			(ulong) mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM),			(ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE					- FIL_PAGE_END_LSN_OLD_CHKSUM));	fprintf(stderr,"InnoDB: Page lsn %lu %lu, low 4 bytes of lsn at page end %lu\n""InnoDB: Page number (if stored to page already) %lu,\n""InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) %lu\n",		(ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN),		(ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN + 4),		(ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE					- FIL_PAGE_END_LSN_OLD_CHKSUM + 4),		(ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET),		(ulong) mach_read_from_4(read_buf + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));	if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE)	    == TRX_UNDO_INSERT) {	    	fprintf(stderr,			"InnoDB: Page may be an insert undo log page\n");	} else if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR						+ TRX_UNDO_PAGE_TYPE)	    	== TRX_UNDO_UPDATE) {	    	fprintf(stderr,			"InnoDB: Page may be an update undo log page\n");	}	if (fil_page_get_type(read_buf) == FIL_PAGE_INDEX) {	    	fprintf(stderr,"InnoDB: Page may be an index page where index id is %lu %lu\n",			(ulong) ut_dulint_get_high(btr_page_get_index_id(read_buf)),			(ulong) ut_dulint_get_low(btr_page_get_index_id(read_buf)));		/* If the code is in ibbackup, dict_sys may be uninitialized,		i.e., NULL */		if (dict_sys != NULL) {		        index = dict_index_find_on_id_low(					btr_page_get_index_id(read_buf));		        if (index) {				fputs("InnoDB: (", stderr);				dict_index_name_print(stderr, NULL, index);				fputs(")\n", stderr);			}		}	} else if (fil_page_get_type(read_buf) == FIL_PAGE_INODE) {		fputs("InnoDB: Page may be an 'inode' page\n", stderr);	} else if (fil_page_get_type(read_buf) == FIL_PAGE_IBUF_FREE_LIST) {		fputs("InnoDB: Page may be an insert buffer free list page\n",			stderr);	}}/************************************************************************Initializes a buffer control block when the buf_pool is created. */staticvoidbuf_block_init(/*===========*/	buf_block_t*	block,	/* in: pointer to control block */	byte*		frame)	/* in: pointer to buffer frame, or NULL if in				the case of AWE there is no frame */{	block->magic_n = 0;	block->state = BUF_BLOCK_NOT_USED;		block->frame = frame;	block->awe_info = NULL;	block->buf_fix_count = 0;	block->io_fix = 0;
buf0buf.c - 源码说明

本页面展示了「这是linux下运行的mysql软件包,可用于linux 下安装 php + mysql + apach 的网络配置」中的 buf0buf.c 源码文件，采用 C语言编程语言编写，共 2,379 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与linux相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?