📄 020_mm_slab_c.html
字号:
@page { @top { content: flow(header); } @bottom { content: flow(footer); } } /* end default print css */ /* custom css *//* end custom css */ /* ui edited css */ body { font-family: Verdana; font-size: 10.0pt; line-height: normal; background-color: #ffffff; } .documentBG { background-color: #ffffff; } /* end ui edited css */</style> </head> <body revision="dcbsxfpf_61f9cmnzdf:7"> <div align=center>
<table align=center border=0 cellpadding=0 cellspacing=0 height=5716 width=768>
<tbody>
<tr>
<td height=5716 valign=top width=100%>
<pre>2007-3-6 <br>mm/slab.c<br><br> 内存管理中,大家讨论最多的一块.我们将分几部分来阐述.<br> <br> <br> 第一部分 概述 <br> <br> slab从zone-buddy系统分配页面,然后拆分成小块内存共内核使用,相当于程序使用<br>的malloc/fee. <br> 内核使用的每一个对象(inode,dentry, sock........)都有一个专门的cache,就是<br>这里的slab机制,并且如果对内存的种类有要求,需要另外创建单独的cache,比如DMA,<br>HIGH mem.<br> slab最主要的作用就是小块内存分配,另外对SMP,cache进行了优化.内核通过buddy<br>控制大粒度的内存碎片,通过slab控制小块的内存碎片----slab 一次分配一个或者几个<br>物理地址连续的内存页.<br> <br> 关于slab对cache的优化这里采用一示意图副图:<br> <br> 1)假设只有两个cache line,0,1.<br> 2)每个page 两个cache line大小 <br> 3)第一个slab color为0,从page的起始位置开始存放slab_t<br> 4)第二个slab color 为1,从page 偏移cache size开始存放slab_t <br> +------------------+ +------------------+ <br> | cache line 0 | | cache line 1 | <br> /------------------+ +------------------+ <br> | / <br> | | <br>/---<-+ +-->-----------------------------------/ <br>| | | | <br>| \ | | <br>| +--------------------+--------------------+-----------------+--------------+ <br>| | cache line 0 | cache line 1 | cache line 0 | cache line 1 | <br>| +--------------------+--------------------+-----------------+--------------+ <br>| page0 |page1 <br>| slab_t color=0 | slab_t clolor=1 <br>| | <br>| | <br>| | <br>| | <br>| | <br>\-----------------------------------------------\ <br> <br> 如果遍历cache的所有slab,访问slab1,然后访问slab2,然后又访问slab1.这个期间<br>cache line0,cache line 1不用失效即可完成访问.<br> 如果两个slab的color相同,每次访问都会使cache line 失效从而降低效率.<br> <br> 在这里不讨论关于cache本身的详细问题,此图仅为参考.(fix me)<br> <br><br> <br> 第二部分 数据结构和示意图<br> <br> 还是先给出一张图,然后详解相关的数据结构.这是一个非off slab(slab_t不在slab<br>之中)的示意图.图中将关键的数据清楚的表明了其用图.<br> 1)kmem_cache_s <br> slabs是所有属于此cache的slab(连续的几个页面)的一个链表.<br> firtnotfull:这些slab按照 full, part used, empty的顺序排列,这个指针指向第<br> 一个部分使用的slab.<br> objsize: cache 管理的对象之字节大小.<br> flags: like off_sab bit<br> color: color的最大值,把slab的剩余空间按照coloroff的粒度分成'clolor'<br> coloroff: 颜色粒度,一般和cache line大小相关(n倍) <br> <br> 2)slab_t : slab管理所使用的结构<br> list: 挂入kmem_cache_s的slabs链表<br> colouroff:从slab起始页面到第一个obj(s_mem)的偏移.cache align(color,slab_t,bufctl)<br> inuse: 已分配objs数量<br> free: bufctrl是一个数组构成的链表,从索引free开始,是其空闲objs组成的链表.<br> <br> kmem_cache_s <br> +-------------+ <br> /-|slabs | <br> | |*firstnotfull->>-----------------------/ <br> | |objsize | | <br> | |flags; | | <br> | |num; | | <br> | | | | <br> | |color(max) |=left/cache.coloroff | <br> | |coloroff |L1 cache size | <br> | | <br> | | <br> | | <br> | | <br> | | <br> | slab_t struct slab_s | <br> | | <br> /--|-----+----------+ page --/--------+----------+page <br> | \---->|list --------------/ | | | <br> | |colouroff;| | | | color |=cache.colornext*cache.coloroff <br> | |*s_mem;---------/ | | | | <br> | |inuse; | | | | +----------+ slab_t <br> | |free -------/ | \-----+------->|list | <br> | +----------+ | | | |colouroff;| <br> \ | bufctl | | | | |*s_mem;---------/ <br>colouroff |----------| | | | |inuse; | | <br> / /-->... | | | |free -------/ | <br> | || | | | | +----------+ | | <br> | \---next <--/ | \ | bufctl | | | <br> | | | | colouroff |----------| | | <br> | +----------+ | / /-->... | | <br> | ... | | || | | | <br> | cache align | | \---next <--/ | <br> -\--------+----------+<----/ | | | | <br> | obj | | +----------+ | <br> +----------| | ... / <br> | | | cache align | <br> | | ---/--------+----------+<----/ <br> | | | obj | <br> .... <br> | | +----------+ <br> | | | | <br> | | | | <br> | | .... <br> | | | | <br> | | | | <br> | | | | <br> | | | | <br> +----------+ | | <br> | | <br> | | <br> | | <br> +----------+ <br> <br> 需要说明的是,一旦page从buddy分配到slab, 其page 描述符中的prev指向所属slab, next<br>指向所属的cache.(见kmem_cache_grow).这样,对于一个obj, rounddowntopage(objp)->next<br>就是其cache,rounddowntopage(objp)->prev就是其slab.<br><br> <br> 然后列出其定义:<br>struct kmem_cache_s {<br>/* 1) each alloc & free */<br>/* full, partial first, then free */<br> struct list_head slabs; //cache 管理的slabs 之表头<br> struct list_head *firstnotfull; <br> unsigned int objsize;<br> unsigned int flags; /* constant flags */<br> unsigned int num; /* # of objs per slab */<br> spinlock_t spinlock;<br>#ifdef CONFIG_SMP<br> unsigned int batchcount; /*SMP的时候每个cpu一个快速分配链表,一次从slab分配的个数*/<br>#endif<br><br>/* 2) slab additions /removals */<br> /* order of pgs per slab (2^n) */<br> unsigned int gfporder;<br><br> /* force GFP flags, e.g. GFP_DMA */<br> unsigned int gfpflags;<br><br> size_t colour; /* cache colouring range */<br> unsigned int colour_off; /* colour 粒度, colour*color_off 是slab 的color_off */<br> unsigned int colour_next; /* cache colouring */<br> kmem_cache_t *slabp_cache; /*当这个cache中的slab,其管理部分<br> (slab描述符和kmem_bufctl_t数组)放在<br> slab外面时,这个指针指向放置<br> 的通用cache*/<br> unsigned int growing;<br> unsigned int dflags; /* dynamic flags */<br><br> /* constructor func de-constructor func 忽略*/<br> <br>/* 3) cache creation/removal */<br> char name[CACHE_NAMELEN];<br> struct list_head next; /* 用它和其它的cache串成一个链,这个链上按照时钟算法<br> 定期地回收某个cache的部分slab*/<br>#ifdef CONFIG_SMP<br>/* 4) per-cpu data */<br> cpucache_t *cpudata[NR_CPUS];<br>#endif<br>#if STATS //忽略<br>#endif<br>};<br><br>/*<br> * slab_t<br> *<br> * 管理一个slab 中的对象. 放在一个slab 开始的地方<br> * , 或从general cache 中分配.<br> * Slabs 链成一个有序的list: 满员, 部分使用, 空slab.<br> */<br>typedef struct slab_s {<br> struct list_head list; //链成list, 表头kmem_cache_s::slabs<br> unsigned long colouroff; //s_mem = SlabBase(buddypage)+colouroff<br> void *s_mem; /* 第一个对象所处位置*/<br> unsigned int inuse; /* slab 中正在使用中的对象个数*/<br> kmem_bufctl_t free; /*空闲对象表头*/<br>} slab_t;<br><br><br> 另外一个对slab工作起到重要作用的是cache_cache:<br>/* internal cache of cache description objs */<br>static kmem_cache_t cache_cache = {<br> slabs: LIST_HEAD_INIT(cache_cache.slabs),<br> firstnotfull: &cache_cache.slabs,<br> objsize: sizeof(kmem_cache_t),<br> flags: SLAB_NO_REAP,<br> spinlock: SPIN_LOCK_UNLOCKED,<br> colour_off: L1_CACHE_BYTES,<br> name: "kmem_cache",<br>};<br><br> 这是一个手工简立的cache, 其管理的对象是kmem_cache_t.用于分配kmem_cache_t,所以<br>是cache 的cache.<br><br> 再内核初始化slab的时候,首先初始化cache_cache:kmem_cache_init然后初始化通用<br>cache: kmem_cache_sizes_init.这里有个问题要提一下:<br><br>kmem_cache_sizes_init->kmem_cache_create(name, sizes->cs_size,0,..):<br><br>kmem_cache_t * kmem_cache_create (...)<br>{<br> const char *func_nm = KERN_ERR "kmem_create: ";<br> size_t left_over, align, slab_size;<br> kmem_cache_t *cachep = NULL;<br><br> /* Sanity checks... debug */<br> ....<br> <br> /* 分配cache 的描述对象. 没有问题,cache_cache已经初始化 */<br> cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL);<br> ....<br> /* 确定以那种方式管理对象 ( 'on' or 'off' slab.) */<br> if (size >= (PAGE_SIZE>>3))//大于512字节就要off了(一般,4k页面)<br> flags |= CFLGS_OFF_SLAB;<br> <br> ..........<br> if (flags & CFLGS_OFF_SLAB)<br> cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);//再通用cache中分配<br> ........<br>}<br><br> 这里初始化通用cache的时候,如果是offslab,需要在通用cache中分配slab_t,问题是选中的通<br>用cache初始化好了吗?<br> 是这样的:先初始化32-512字节的通用cache,使用inslab的slab_t没有问题,等初始化>512字节<br>的cachesize时,只要slab_size=cachealign(sizeof(slab_t)+num*sizeof(bufctrl))小于512即可.<br>static unsigned long offslab_limit;/* 采用off-slab 管理方式的slab所含对象的上限.*/<br>这个限制也是在这里计算的,每初始化一个非offslab的通用cache,就修改此值:<br>kmem_cache_sizes_init:<br>if (!(OFF_SLAB(sizes->cs_cachep))) {<br> offslab_limit = sizes->cs_size-sizeof(slab_t);<br> offslab_limit /= 2;<br> }<br>这里最大的inslab通用cache是512:offslab_limit=(512-24)/2=244.有了这个限制,这个问题就<br>解决了.<br><br>static cache_sizes_t cache_sizes[] = {<br>#if PAGE_SIZE == 4096<br> { 32, NULL, NULL},<br>#endif<br> { 64, NULL, NULL},<br> { 128, NULL, NULL},<br> { 256, NULL, NULL},<br> { 512, NULL, NULL},<br> { 1024, NULL, NULL},//start offs lab<br> { 2048, NULL, NULL},<br> { 4096, NULL, NULL},<br> { 8192, NULL, NULL},<br> { 16384, NULL, NULL},<br> { 32768, NULL, NULL},<br> { 65536, NULL, NULL},<br> {131072, NULL, NULL},<br> { 0, NULL, NULL}<br>};<br><br><br><br><br> 第三部分 核心算法<br><br> 内存分配相关算法<br> <br> 这里讨论slab密切相关的几个函数,对于理解slab的关键操作很有好处.第一个我们讨论<br>kmem_cache_estimate,通过这个函数可以看到slab_t,bufctrl,obj的相对位置,关系.<br> 此函数根据gfporder计算此slab可以包含的obj个数,并返回剩余的可以做color的大小. <br> <br>/* 对一个给定的slab 计算可以包含的对象个数num, <br> * bytes left over(除了对象,管理机构后剩余的字节).<br> * gfporder: slab size 2^gfporder*PAGE_SIZE<br> * size: 对象大小, flags: may be CFLGS_OFF_SLAB<br> */<br>static void kmem_cache_estimate (unsigned long gfporder, size_t size,<br> int flags, size_t *left_over, unsigned int *num)<br>{<br> int i;<br> size_t wastage = PAGE_SIZE<<gfporder; /*全部空间*/<br> size_t extra = 0; /*bufctl占用的空间*/<br> size_t base = 0; /*slab_t大小*/<br><br> /*如果是off slab_t,只考虑有多少个obj即可*/<br> <br> if (!(flags & CFLGS_OFF_SLAB)) {<br> base = sizeof(slab_t);<br> extra = sizeof(kmem_bufctl_t);<br> }<br> i = 0;<br><br> /*逐步加大对象个数,只要能盛的下即可*/<br> while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)<br> i++;<br> /*<br> * 计算说明: base+i*extra 应该是第一个对象的开始位置。<br> * L1_CACHE_ALIGN 作用其上,得到对齐后的地址。<br> */<br> <br> if (i > 0)<br> i--;/*while已经算到盛不下了,要减1*/<br><br> if (i > SLAB_LIMIT)/*不要有太多对象*/<br> i = SLAB_LIMIT; <br><br> *num = i;<br> wastage -= i*size; /*总空间减去对象大小*/<br> wastage -= L1_CACHE_ALIGN(base+i*extra);/*减去管理结构大小*/<br> *left_over = wastage; /*剩下的用作color*/<br>}<br><br> 然后是为slab分配slab_t的算法,根据off/in slab 的slab_t,分成两种情况.<br><br>/*<br> *为一个slab 的slab_t 分配内存<br> *非offlab的情况直接使用slab 的顶端<br> */<br>static inline slab_t * kmem_cache_slabmgmt (kmem_cache_t *cachep,<br> void *objp, int colour_off, int local_flags)<br>{<br> slab_t *slabp;<br> <br> if (OFF_SLAB(cachep)) {/*offslab的话从指定的通用cache分配slab_t*/<br> /* Slab management obj is off-slab. */<br> slabp = kmem_cache_alloc(cachep->slabp_cache, local_flags);<br> if (!slabp)<br> return NULL;<br> } else {/*in slab 的slab_t*/<br> /* FIXME: change to<br> slabp = objp<br> * if you enable OPTIMIZE<br> */<br> slabp = objp+colour_off; /*看图,color在slab的顶端*/<br> colour_off += L1_CACHE_ALIGN(cachep->num *<br> sizeof(kmem_bufctl_t) + sizeof(slab_t));/*看图,colour_off和s_mem的关系*/<br> }<br> slabp->inuse = 0;<br> slabp->colouroff = colour_off;<br> slabp->s_mem = objp+colour_off;<br><br> return slabp;<br>}<br><br><br>下一个是static inline void kmem_cache_init_objs (kmem_cache_t * cachep,<br> slab_t * slabp, unsigned long ctor_flags)详细代码略.(罗列代码于事无补^_^).<br>其作用是:初始化一个cache 中指定slab中的所有对象,并构建基于bufctl数组的free链表.<br><br><br> 然后是从slab中分配obj的核心部分:<br><br>/*<br> * 从slab 分配一个object<br> */<br>static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep,<br> slab_t *slabp)<br>{<br> void *objp;<br><br> STATS_INC_ALLOCED(cachep);<br> STATS_INC_ACTIVE(cachep);<br> STATS_SET_HIGH(cachep);<br><br> /* get obj pointer */<br> slabp->inuse++;<br> objp = slabp->s_mem + slabp->free*cachep->objsize; /*对象指针是这样算的!*/<br> slabp->free=slab_bufctl(slabp)[slabp->free];/*next=obj->next*/<br><br> if (slabp->free == BUFCTL_END) /*这个slab用完了,firstnotfull=next slab*/<br> /* slab now full: move to next slab for next alloc */<br> cachep->firstnotfull = slabp->list.next;<br>#if DEBUG<br> ...<br>#endif<br> return objp;<br>}<br><br> 有关分配的关键函数,最后一个是kmem_cache_grow,分配新的slab给slab cache.(已经做了<br>很重的注释,不需要更多了吧 ?)(在分析很多函数的时候,都没有把其同步/互斥操作作为重点<br>以后再做此项工作吧.)<br>/*<br> * 为指定的cache 分配一个slab. <br> */<br>static int kmem_cache_grow (kmem_cache_t * cachep, int flags)<br>{<br> slab_t *slabp;<br> struct page *page;<br> void *objp;<br> size_t offset;<br> unsigned int i, local_flags;<br> unsigned long ctor_flags;<br> unsigned long save_flags;<br><br> /* Be lazy and only check for valid flags here,<br> * keeping it out of the critical path in kmem_cache_alloc().<br> */<br> if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW))<br> BUG();/*其他flags都是无效的*/<br> if (flags & SLAB_NO_GROW)<br> return 0;<br><br> /*<br> * The test for missing atomic flag is performed here, rather than<br> * the more obvious place, simply to reduce the critical path length<br> * in kmem_cache_alloc(). If a caller is seriously mis-behaving they<br> * will eventually be caught here (where it matters).<br> */<br> if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)<br> BUG();/*在中断环境中,只容许进行SLAB_ATOMIC类型(not sleep...)的分配*/<br><br> ctor_flags = SLAB_CTOR_CONSTRUCTOR;<br> local_flags = (flags & SLAB_LEVEL_MASK);<br> if (local_flags == SLAB_ATOMIC)/*禁止sleep的slab分配,告诉constructor*/<br> /*<br> * Not allowed to sleep. Need to tell a constructor about<br> * this - it might need to know...<br> */<br> ctor_flags |= SLAB_CTOR_ATOMIC;<br><br> /* About to mess with non-constant members - lock. */<br> spin_lock_irqsave(&cachep->spinlock, save_flags);<br><br> /* Get colour for the slab, and cal the next value. */<br> offset = cachep->colour_next; /*这个slab应有的color*/<br> cachep->colour_next++;<br> if (cachep->colour_next >= cachep->colour)<br> cachep->colour_next = 0;<br> offset *= cachep->colour_off;/*slab头部预留这多color bytes*/<br> cachep->dflags |= DFLGS_GROWN;<br><br> cachep->growing++;<br> spin_unlock_irqrestore(&cachep->spinlock, save_flags);<br><br> /* 一个新slab的一系列内存分配动作.<br> * Neither the cache-chain semaphore, or cache-lock, are<br> * held, but the incrementing c_growing prevents this<br> * cache from being reaped or shrunk.<br> * Note: The cache could be selected in for reaping in<br> * kmem_cache_reap(), but when the final test is made the<br> * growing value will be seen.<br> */<br><br> /* 分配slab,一组连续的buddy页面 */<br> if (!(objp = kmem_getpages(cachep, flags)))<br> goto failed;<br><br> /* 分配slab 的管理结构,in/off slab的slab_t */<br> if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags)))<br> goto opps1;<br><br> /* !!!!!! I hope this is OK. */<br> i = 1 << cachep->gfporder;<br> page = virt_to_page(objp);<br> do {<br> /* 分给slab 的页面,其page 描述符中的prev <br> * 指向所属slab, next指向所属的cache<br> */<br> SET_PAGE_CACHE(page, cachep);<br> SET_PAGE_SLAB(page, slabp); <br><br> <br> PageSetSlab(page);<br> page++;<br> } while (--i);/*slab中每个页面都有此设置*/<br><br> kmem_cache_init_objs(cachep, slabp, ctor_flags); /*初始化*/<br><br> spin_lock_irqsave(&cachep->spinlock, save_flags);<br> cachep->growing--;<br><br> /* Make slab active. */<br> list_add_tail(&slabp->list,&cachep->slabs); /*加入cache的slab列表尾部*/<br> if (cachep->firstnotfull == &cachep->slabs)/*调整firstnotfull指针*/<br> cachep->firstnotfull = &slabp->list;<br> STATS_INC_GROWN(cachep);<br> cachep->failures = 0;<br><br> spin_unlock_irqrestore(&cachep->spinlock, save_flags);<br> return 1;<br>opps1:<br> kmem_freepages(cachep, objp);<br>failed:<br> spin_lock_irqsave(&cachep->spinlock, save_flags);<br> cachep->growing--;<br> spin_unlock_irqrestore(&cachep->spinlock, save_flags);<br> return 0;<br>}<br><br><br><br><br> 内存释放的相关算法<br> <br>kmem_cache_free_one: 根据objp 找到其所属的slab( 所属的cache已经找到,即cachep)把<br>这个obj挂入空闲对象链表根据slab 的状态调整他在cache->slab列表的位置.<br><br>static inline void kmem_cache_free_one(kmem_cache_t *cachep, void *objp)<br>{<br> slab_t* slabp;<br><br> CHECK_PAGE(virt_to_page(objp));<br> /* reduces memory footprint<br> *<br> if (OPTIMIZE(cachep))<br> slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1)));<br> else<br> */<br> slabp = GET_PAGE_SLAB(virt_to_page(objp));<br> /*这里的page->prev 记录了obj 所在的slab,见GET_PAGE_SLAB*/<br> <br>#if DEBUG<br> ....<br>#endif<br> {<br> /*把这个obj挂入空闲对象链接表*/<br> unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;<br> slab_bufctl(slabp)[objnr] = slabp->free;<br> slabp->free = objnr;<br> }<br> STATS_DEC_ACTIVE(cachep);/*统计*/ <br> <br> /* fixup slab chain */<br> if (slabp->inuse-- == cachep->num) /*从full到->part full*/<br> goto moveslab_partial;<br> if (!slabp->inuse)/*从part->empty*/<br> goto moveslab_free;<br> return;<br><br> /*移动slab的位置,simple..*/<br>moveslab_partial:<br> /* was full.<br> * Even if the page is now empty, we can set c_firstnotfull to<br> * slabp: there are no partial slabs in this case<br> */<br> {<br> ....<br> }<br>moveslab_free:<br> /*<br> * was partial, now empty.<br> * c_firstnotfull might point to slabp<br> * FIXME: optimize<br> */<br> {<br> ....<br> }<br>}<br> <br>kmem_slab_destroy: 使用cachep->dtor释放此slab中所有obj.释放此slab使用的页面,还<br>有slab_t. 代码略.<br><br><br><br> 第四部分 提供的主要接口<br> 对外提供的接口分成几个部分:初始化,cache的创建删除,obj的分配释放,proc系统支持.<br> <br> I) slab初始化部分<br> 最早调用的函数: void __init kmem_cache_init(void)设置cache_cache(cache 的全局<br>管理机构).<br>void __init kmem_cache_init(void)<br>{/*已经静态初始化了许多,这里比较简单*/<br> size_t left_over;<br><br> init_MUTEX(&cache_chain_sem);<br> INIT_LIST_HEAD(&cache_chain);<br><br> kmem_cache_estimate(0, cache_cache.objsize, 0,<br> &left_over, &cache_cache.num);<br> if (!cache_cache.num)<br> BUG();<br><br> cache_cache.colour = left_over/cache_cache.colour_off;<br> cache_cache.colour_next = 0;<br>}<br> 然后初始化通用cache(cache_size): void __init kmem_cache_sizes_init(void)不再<br>罗列其代码,有关cache size初始化的特殊问题已经在第二部分说明.这里不再做其他分析<br>了.<br> 最后一个处世话是__initcall(kmem_cpucache_init),再init进程中调用.主要工作是由<br>static void enable_all_cpucaches (void) //完成的.<br>{<br> struct list_head* p;<br><br> down(&cache_chain_sem);<br><br> p = &cache_cache.next;<br> do { //遍历所有cache,初始化SMP,<br> kmem_cache_t* cachep = list_entry(p, kmem_cache_t, next);<br><br> enable_cpucache(cachep);/*设置bache alloc limit,然后通过IPI中断让各个cpu<br> 初始化私有数据*/<br> p = cachep->next.next;<br> } while (p != &cache_cache.next);<br><br> up(&cache_chain_sem);<br>}<br> <br> II)cache的创建和销毁<br> <br> kmem_cache_t *kmem_cache_create (const char *name, size_t size, size_t offset,<br> unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long),<br> void (*dtor)(void*, kmem_cache_t *, unsigned long))<br> int kmem_cache_destroy (kmem_cache_t * cachep)<br> int kmem_cache_shrink(kmem_cache_t *cachep)<br> <br> void kmem_cache_reap (int gfp_mask)<br><br>/**<br> * kmem_cache_create - 创建一个cache<br> * @name: 用于/proc/slabinfo:本cache的标识符串<br> * @size: 这cache中对象的大小.<br> * @offset: 从这个偏移使用slab.<br> * @flags: SLAB flags<br> * @ctor: cache 中page 的构造函数.<br> * @dtor: cahce 中page 的解构函数.<br> *<br> * Returns a ptr to the cache on success, NULL on failure.<br> * Cannot be called within a int, but can be interrupted.<br> * The @ctor is run when new pages are allocated by the cache<br> * and the @dtor is run before the pages are handed back.<br> * The flags are<br> <br> * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)<br> * to catch references to uninitialised memory.<br> *<br> * %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check<br> * for buffer overruns.<br> *<br> * %SLAB_NO_REAP - Don't automatically reap this cache when we're under<br> * memory pressure.<br> *<br> * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware<br> * cacheline. This can be beneficial if you're counting cycles as closely<br> * as davem.<br> */<br>kmem_cache_t *<br>kmem_cache_create (const char *name, size_t size, size_t offset,<br> unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long),<br> void (*dtor)(void*, kmem_cache_t *, unsigned long))<br>{<br> const char *func_nm = KERN_ERR "kmem_create: ";<br> size_t left_over, align, slab_size;<br> kmem_cache_t *cachep = NULL;<br><br> /*<br> * Sanity checks... these are all serious usage bugs.<br> */<br> ... //略<br>#if DEBUG<br> .......... //略<br>#endif<br><br> /*<br> * Always checks flags, a caller might be expecting debug<br> * support which isn't available.<br> */<br> if (flags & ~CREATE_MASK) //不得指定除此之外的其他flags<br> BUG();<br><br> /* 分配 kmem_cache_s*/<br> cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL);<br> if (!cachep)<br> goto opps;<br> memset(cachep, 0, sizeof(kmem_cache_t));<br><br> /* Check that size is in terms of words. This is needed to avoid<br> * unaligned accesses for some archs when redzoning is used, and makes<br> * sure any on-slab bufctl's are also correctly aligned.<br> */<br> if (size & (BYTES_PER_WORD-1)) { //强制obj size WORD 对齐<br> size += (BYTES_PER_WORD-1);<br> size &= ~(BYTES_PER_WORD-1);<br> printk("%sForcing size word alignment - %s\n", func_nm, name);<br> }<br> <br>#if DEBUG<br>.... //略<br>#endif<br> align = BYTES_PER_WORD;<br> if (flags & SLAB_HWCACHE_ALIGN)<br> align = L1_CACHE_BYTES;<br><br> /* 确定以那种方式管理对象 ( 'on' or 'off' slab.) */<br> if (size >= (PAGE_SIZE>>3)) //obj过大,则采用off slab 的slab_t<br> /*<br> * Size is large, assume best to place the slab management obj<br> * off-slab (should allow better packing of objs).<br> */<br> flags |= CFLGS_OFF_SLAB;<br><br> <br> if (flags & SLAB_HWCACHE_ALIGN) {/*调整obj size,以做到cache line 对齐*/<br> /* Need to adjust size so that objs are cache aligned. */<br> /* Small obj size, can get at least two per cache line. */<br> /* FIXME: only power of 2 supported, was better */<br> while (size < align/2)<br> align /= 2;<br> size = (size+align-1)&(~(align-1));<br> }<br><br> /* 计算slab的大小(单位page_size), 和每个slab中的对象个数.<br> * This could be made much more intelligent. For now, try to avoid<br> * using high page-orders for slabs. When the gfp() funcs are more<br> * friendly towards high-order requests, this should be changed.<br> */<br> do {<br> unsigned int break_flag = 0;<br>cal_wastage:<br> kmem_cache_estimate(cachep->gfporder, size, flags,<br> &left_over, &cachep->num);<br> if (break_flag)<br> break;<br> if (cachep->gfporder >= MAX_GFP_ORDER) //slab绝对不超过这个size<br> break;<br> if (!cachep->num) //放不下任何obj?<br> goto next; //只有加大slab<br> <br> if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit) {//这个检查<br> //第二部分有详述<br> /* Oops, this num of objs will cause problems. */<br> cachep->gfporder--;<br> break_flag++;<br> goto cal_wastage;<br> }<br><br> /*<br> * Large num of objs is good, but v. large slabs are currently<br> * bad for the gfp()s.<br> */<br> if (cachep->gfporder >= slab_break_gfp_order)//slab中能够放的下obj时<br> break;//最多2个页面分配给slab.<br><br> if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder))//浪费少于1/8,可以接受<br> break; /* Acceptable internal fragmentation. */ <br>next:<br> cachep->gfporder++;<br> } while (1);<br><br> if (!cachep->num) {<br> ...//略<br> goto opps;<br> }<br> <br> /*slab_t 加上bufctl数组的大小*/<br> slab_size = L1_CACHE_ALIGN(cachep->num*sizeof(kmem_bufctl_t)+sizeof(slab_t));<br> <br> <br> /*<br> * If the slab has been placed off-slab, and we have enough space then<br> * move it on-slab. This is at the expense of any extra colouring.<br> */<br> if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {//剩余空间能够盛的下slab_t<br> flags &= ~CFLGS_OFF_SLAB ; //改成in slab slab_t<br> left_over -= slab_size; <br> }<br><br> /* Offset must be a multiple of the alignment. */<br> offset += (align-1);<br> offset &= ~(align-1);<br> if (!offset)<br> offset = L1_CACHE_BYTES;<br> cachep->colour_off = offset;<br> cachep->colour = left_over/offset;<br><br> /* init remaining fields */<br> ........//simple,略<br><br>#ifdef CONFIG_SMP<br> if (g_cpucache_up)<br> enable_cpucache(cachep);<br>#endif<br> /* Need the semaphore to access the chain. */<br> ....//检查是否有相同名字的cache.略<br> /* There is no reason to lock our new cache before we<br> * link it in - no one knows about it yet...<br> */<br> list_add(&cachep->next, &cache_chain);<br> up(&cache_chain_sem);<br>opps:<br> return cachep;<br>}<br><br> 这两个函数:<br> int kmem_cache_destroy (kmem_cache_t * cachep)<br> int kmem_cache_shrink(kmem_cache_t *cachep)<br> 比较,怎么说,简单.destroy就是整个释放掉啊,cache不再存在. shrink将全empty的slab释<br>放掉.(注意一下cachep->growing这个值.<br><br><br><br><br> <br> III)obj的分配释放<br> void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)<br> void kmem_cache_free (kmem_cache_t *cachep, void *objp)<br> void * kmalloc (size_t size, int flags)<br> void kfree (const void *objp)<br><br>1)分配, 核心函数已经讨论了,这里就是smp的一个逻辑,主要的smp优化方式是去掉spin lock--<br>per cpu的快速分配.<br><br>void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)------------------------><br>static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, int flags)<br>{<br> unsigned long save_flags;<br> void* objp;<br><br> kmem_cache_alloc_head(cachep, flags); /* Debug only */<br>try_again:<br> local_irq_save(save_flags);<br>#ifdef CONFIG_SMP<br> { //对于smp,在per cpu的快速分配表中分配,主要是不用锁cpu<br> cpucache_t *cc = cc_data(cachep);<br><br> if (cc) {<br> if (cc->avail) {<br> STATS_INC_ALLOCHIT(cachep);<br> objp = cc_entry(cc)[--cc->avail];<br> } else {<br> STATS_INC_ALLOCMISS(cachep);<br> objp = kmem_cache_alloc_batch(cachep,flags); //用完就批发点<br> if (!objp)<br> goto alloc_new_slab_nolock;<br> }<br> } else {<br> spin_lock(&cachep->spinlock);<br> objp = kmem_cache_alloc_one(cachep); //否则就只能到slab中分配了<br> spin_unlock(&cachep->spinlock);<br> }<br> }<br>#else<br> objp = kmem_cache_alloc_one(cachep);<br>#endif<br> local_irq_restore(save_flags);<br> return objp;<br>alloc_new_slab:<br>#ifdef CONFIG_SMP<br> spin_unlock(&cachep->spinlock);<br>alloc_new_slab_nolock:<br>#endif<br> local_irq_restore(save_flags);<br> if (kmem_cache_grow(cachep, flags))<br> /* Someone may have stolen our objs. Doesn't matter, we'll<br> * just come back here again.<br> */<br> goto try_again;<br> return NULL;<br>}<br><br>2) 释放kmem_cache_free,和分配雷同,不再讨论. <br>3) void * kmalloc (size_t size, int flags)<br> void kfree (const void *objp)<br> 和1),2)核心一样,但是就是在通用cache 中寻找一个合适的而已.<br><br> <br> IV)proc 支持<br> <br>这里仅列出参数意义,实现就很简单.<br>proc read:<br>/**<br> * slabinfo_read_proc - generates /proc/slabinfo<br> * @page: scratch area, one page long(向这个buffer 写数据)<br> * @start: pointer to the pointer to the output buffer<br> * @off: offset within /proc/slabinfo the caller is interested in<br> * @count: requested len in bytes(用户想读取的长度)<br> * @eof: eof marker<br> * @data: unused<br> *<br> * The contents of the buffer are<br> * cache-name<br> * num-active-objs<br> * total-objs<br> * object size<br> * num-active-slabs<br> * total-slabs<br> * num-pages-per-slab<br> * + further values on SMP and with statistics enabled<br> */<br>int slabinfo_read_proc (char *page, char **start, off_t off,<br> int count, int *eof, void *data)<br>------------------------------><br>/*<br> * page: buffer to write<br> * start: 有效数据起始指针<br> * off: user read from 'off' ,in this 'file'<br> * count: quantity user want read <br> */<br>static int proc_getdata (char*page, char**start, off_t off, int count)<br><br><br>proc write:<br><br>/**<br> * slabinfo_write_proc - SMP tuning for the slab allocator<br> * @file: unused<br> * @buffer: user buffer<br> * @count: data len<br> * @data: unused<br> */<br>int slabinfo_write_proc (struct file *file, const char *buffer,<br> unsigned long count, void *data)<br>写操作给用户一个机会配置smp per cpu的参数.<br><br><br><br><br><br><br><br> 第五部分SMP支持简述<br> SMP这么一个听起来吓人的东西,slab的实现还是简单,清晰,估计,也是有效的.利用per cpu<br>的快速分配队列,避免一些lock,提高响应速度.<br> <br> 主要就是,smp 分配/释放 对per cpu data的处理:批量分配和释放obj. 支持proc的统计和<br>配置.一些初始化.<br> <br> 就说这么多.<br> <br> 在罗列代码和概要描述之间,很难取舍. 最重要的是,我想, 自己多看.<br> <br> 2006.8.4 22:02<br> <br> <br></pre>
</td>
</tr>
</tbody>
</table>
</div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -