📄 010_mm_mmap_c.html
字号:
<html lang="zh-CN" xmlns:gdoc=""> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> <style type="text/css">/* default css */table { font-size: 1em; line-height: inherit;}div, address, ol, ul, li, option, select { margin-top: 0px; margin-bottom: 0px;}p { margin: 0px;}body { margin: 0px; padding: 0px; font-family: Verdana, sans-serif; font-size: 10pt; background-color: #ffffff;}h6 { font-size: 10pt }h5 { font-size: 11pt }h4 { font-size: 12pt }h3 { font-size: 13pt }h2 { font-size: 14pt }h1 { font-size: 16pt }blockquote {padding: 10px; border: 1px #DDD dashed }a img {border: 0}div.google_header, div.google_footer { position: relative; margin-top: 1em; margin-bottom: 1em;}/* end default css */ /* default print css */ @media print { body { padding: 0; margin: 0; } div.google_header, div.google_footer { display: block; min-height: 0; border: none; } div.google_header { flow: static(header); } /* used to insert page numbers */ div.google_header::before, div.google_footer::before { position: absolute; top: 0; } div.google_footer { flow: static(footer); } /* always consider this element at the start of the doc */ div#google_footer { flow: static(footer, start); } span.google_pagenumber { content: counter(page); } span.google_pagecount { content: counter(pages); } } @page { @top { content: flow(header); } @bottom { content: flow(footer); } } /* end default print css */ /* custom css *//* end custom css */ /* ui edited css */ body { font-family: Verdana; font-size: 10.0pt; line-height: normal; background-color: #ffffff; } .documentBG { background-color: #ffffff; } /* end ui edited css */</style> </head> <body revision="dcbsxfpf_51d28cgjdb:5"> <table align=center cellpadding=0 cellspacing=0 height=5716 width=768>
<tbody>
<tr>
<td height=5716 valign=top width=100%>
<pre>2006-5-16 <br>mm/mmap.c<br><br>(I) 接口介绍<br> 虽然本模块提供了其他接口,但是重点还是do_mmap_pgoff,sys_brk.<br>思路还是先看看man:<br><br> 1)mmap<br>/* arch/i386/kernel/sys_i386.c */<br>asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,<br> unsigned long prot, unsigned long flags,<br> unsigned long fd, unsigned long pgoff)<br>{<br> return do_mmap2(addr, len, prot, flags, fd, pgoff);<br>}<br><br> mmap将fd(file or other)偏移offset的长度为len的一段,映射到内存。addr<br>是用户向系统建议的起始地址. 真正的映射地址由系统调用返回,never 0.<br> 参数prot给出映射内的内存段的性质:<br> PROT_EXEC, PROT_READ,PROT_WRITE,PROT_NONE(pages not be accessed).<br> flags给出映射属性:<br> MAP_FIXED:必须映射到用户指定的起始内存地址addr.(不鼓励使用)<br> MAP_SHARED:这个映射和其他映射了相同fd的进程共享.写内存等于写文件,但<br>是在调用msync或者munmap之前文件不会被更新.<br> MAP_PRIVATE:创建私有的COW映射,存储内存对原文件无影响.此选项没有定义<br>在映射后如果文件内容发生变化将会如何.<br> MAP_EXECUTABLE,MAP_DENYWRITE: 不再使用了.ignore.<br> MAP_NORESERVE:和MAP_PRIVATE共同使用.不为此映射保留swap紧急页,会导致<br>在内存缺乏的时候写内存操作引起SIGSEGV.<br> MAP_LOCKED: ver〉2.5.37,同时进行mlock。<br> MAP_GROWSDOWN:用于stack。<br> MAP_ANONYMOUS:(same as MAP_ANON)无文件作为后备缓存。从linux2.4开始<br>实现了此标志和MAP_SHARED的联合使用。<br> MAP_FILE: 兼容目的的flag。ignore。<br> MAP_32BIT: X86-64,映射入处理器地址的头2G。<br> <br> 如果映射出的内存页面有不足一页的情况,剩余的页面部分被清0,但是写相关<br>内存不会被写出到文件.如果已经影射的文件大小被改变了,效果未定.<br> <br> <br> 2)brk<br> asmlinkage unsigned long sys_brk(unsigned long brk)<br> 设置进程的数据段的结束地址(高端)为指定值.<br> <br>II) do_mmap_pgoff<br><br> 思路简析: 就是建立起一个vma,设置vma->vm_ops, 使得缺页中断能够通过<br>vma->vm_ops->readpage从指定文件读取相关内容。<br> 由于有内存分配操作,需要do_munmap 先!<br> 顺便处理一下mlock,还做了大量的安检。<br><br>unsigned long do_mmap_pgoff(struct file * file, unsigned long addr, unsigned long len,<br> unsigned long prot, unsigned long flags, unsigned long pgoff)<br>{<br> struct mm_struct * mm = current->mm;<br> struct vm_area_struct * vma;<br> int correct_wcount = 0;<br> int error;<br><br> //例行的参数,限制量检查<br> .....//代码省略<br> /*<br> * 一些关于文件操作权限的检查<br> */<br> if (file != NULL) {<br> switch (flags & MAP_TYPE) {<br> case MAP_SHARED:<br> .......//省略若干<br> /* make sure there are no mandatory locks on the file. */<br> /*mandatory lock 就是内核实现的文件互斥锁*/<br> if (locks_verify_locked(file->f_dentry->d_inode))<br> return -EAGAIN;<br> case ....://省略若干<br> default:<br> return -EINVAL;<br> }<br> }<br><br> /* Obtain the address to map to. we verify (or select) it and ensure<br> * that it represents a valid section of the address space.<br> */<br> if (flags & MAP_FIXED) {<br> if (addr & ~PAGE_MASK)<br> return -EINVAL;<br> } else {<br> addr = get_unmapped_area(addr, len); /*先圈地,寻找未映射区域*/<br> if (!addr)<br> return -ENOMEM;<br> }<br><br> /* 分配vma并简单设置,由于内存分配可能睡眠,需要unmap先.<br> */<br> vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);<br> ....<br> //根据file的情况和系统调用的参数设置vma的属性<br> //注意 VM_SHARED的设置依据<br> if (file) {<br> ......<br> if (flags & MAP_SHARED) {<br> vma->vm_flags |= VM_SHARED | VM_MAYSHARE;<br><br> ....<br> }<br> } else {<br> vma->vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;<br> if (flags & MAP_SHARED)<br> vma->vm_flags |= VM_SHARED | VM_MAYSHARE;<br> }<br> //vma配置<br> ....<br> <br> /*清除已有映射,我们可能被suspend过*/<br> error = -ENOMEM;<br> if (do_munmap(mm, addr, len))<br> goto free_vma;<br> <br> //.... 省略若干代码<br><br> //mmap能够工作的关键代码段.<br> if (file) {<br> if (vma->vm_flags & VM_DENYWRITE) {<br> error = deny_write_access(file);/*禁止作为普通文件的写操作*/<br> if (error)<br> goto free_vma;<br> correct_wcount = 1;<br> }<br> vma->vm_file = file;<br> get_file(file);<br> error = file->f_op->mmap(file, vma);/*ext2就是generic_file_mmap<br> *就是设置vma->vm_ops从而使<br> *vma->vm_ops->readpage就是<br> *filemap_nopage<br> */<br> <br><br> .....<br><br> //新的vma生效<br> insert_vm_struct(mm, vma); <br> <br> ...<br>}<br><br> 做完这些操作,mmap剩下的工作就是缺页中断的任务了.见相关文件的分析.<br> <br> <br>III)asmlinkage long sys_munmap(unsigned long addr, size_t len)<br> <br> 核心函数是do_munmap,注释简单说了一下思路,先把受影响的vma找出来,然<br>后挨个处理。<br>int do_munmap(struct mm_struct *mm, unsigned long addr, size_t len)<br>{<br> struct vm_area_struct *mpnt, *prev, **npp, *free, *extra;<br> ......<br> //参数检查<br> <br> //寻找vma->end >大于addr的vma<br> mpnt = find_vma_prev(mm, addr, &prev);<br> if (!mpnt) //addr超出所有vma的范围,没有映射存在,直接返回<br> return 0;<br> /* we have addr < mpnt->vm_end */<br><br> if (mpnt->vm_start >= addr+len)//落入空洞,直接返回<br> return 0;<br><br> .....<br><br> /*<br> * We may need one additional vma to fix up the mappings ... <br> * and this is the last chance for an easy error exit.<br> */<br> extra = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);<br> if (!extra)<br> return -ENOMEM;<br><br> /* <br> * 这部分找到需要处理的mpnt, 挂入free 链表<br> */<br> npp = (prev ? &prev->vm_next : &mm->mmap);<br> free = NULL;<br> spin_lock(&mm->page_table_lock);<br> //只要end>vm_start,vma就有肯定(addr处已经于一个vma中了)受影响<br> for ( ; mpnt && mpnt->vm_start < addr+len; mpnt = *npp) {<br> *npp = mpnt->vm_next; //从链表摘除<br> mpnt->vm_next = free; //挂入等待处理的vma表<br> free = mpnt;<br> if (mm->mmap_avl) //从vma avl查找表摘除<br> avl_remove(mpnt, &mm->mmap_avl);<br> }<br> mm->mmap_cache = NULL; /* Kill the cache. */<br> spin_unlock(&mm->page_table_lock);<br><br> /*逐个处理受影响的vma*/<br> while ((mpnt = free) != NULL) {<br> unsigned long st, end, size;<br> struct file *file = NULL;<br><br> free = free->vm_next;<br><br> st = addr < mpnt->vm_start ? mpnt->vm_start : addr;<br> end = addr+len;<br> end = end > mpnt->vm_end ? mpnt->vm_end : end;<br> size = end - st;<br> /*如果将一个vma分成两部分,而又是VM_DENYWRITE的,<br> 要额外对d_inode->i_writecount加一次锁,因为我们保留了一个<br> VM_DENYWRITE属性的vma,注意file仅在VM_DENYWRITE的情况下<br> 非空*/<br> if (mpnt->vm_flags & VM_DENYWRITE &&<br> (st != mpnt->vm_start || end != mpnt->vm_end) &&<br> (file = mpnt->vm_file) != NULL) {<br> atomic_dec(&file->f_dentry->d_inode->i_writecount);<br> }<br> remove_shared_vm_struct(mpnt);//从inode 的i_mapping->i_mmap_shared链表摘除<br> mm->map_count--;<br><br> flush_cache_range(mm, st, end); //i386 为空<br> zap_page_range(mm, st, size); //释放user used data page,参考memory.c的分析<br> flush_tlb_range(mm, st, end); //i386为空<br><br> /*<br> * Fix the mapping, and free the old area if it wasn't reused.<br> */<br> extra = unmap_fixup(mm, mpnt, st, size, extra);<br> if (file)//注意上面的条件if (mpnt->vm_flags & VM_DENYWRITE &&...<br> atomic_inc(&file->f_dentry->d_inode->i_writecount);<br> }<br><br> /* Release the extra vma struct if it wasn't used */<br> if (extra)<br> kmem_cache_free(vm_area_cachep, extra);<br> /*在给定的区间[start,end],寻找一个没有映射的子集<br> 释放这个子集范围内的page table,pmd<br> */<br> free_pgtables(mm, prev, addr, addr+len);<br><br> return 0;<br>}<br><br> 注意上面的这一行:<br> ...<br> remove_shared_vm_struct(mpnt);//从inode 的i_mapping->i_mmap_shared链表摘除<br> ...<br> <br> 迄今为止,只找到mapping的一个来源,inode->i_mapping,而此指针在inode初始化的<br>时候指向同一inode内的inode->idata.见fs/inode.c 函数clean_inode。<br> do_munmap应用很广,是此模块的一个重要的对外接口。<br> <br>IV)sys_brk(unsigned long brk)<br><br> 支持brk扩展/收缩。<br>/*<br> * sys_brk() for the most part doesn't need the global kernel<br> * lock, except when an application is doing something nasty<br> * like trying to un-brk an area that has already been mapped<br> * to a regular file. in this case, the unmapping will need<br> * to invoke file system routines that need the global lock.<br> */<br>asmlinkage unsigned long sys_brk(unsigned long brk)<br>{<br> unsigned long rlim, retval;<br> unsigned long newbrk, oldbrk;<br> struct mm_struct *mm = current->mm;<br><br> down(&mm->mmap_sem);<br><br> if (brk < mm->end_code)<br> goto out;<br> newbrk = PAGE_ALIGN(brk);<br> oldbrk = PAGE_ALIGN(mm->brk);<br> if (oldbrk == newbrk)<br> goto set_brk;<br><br> /* Always allow shrinking brk. */<br> if (brk <= mm->brk) {<br> if (!do_munmap(mm, newbrk, oldbrk-newbrk)) /*brk收缩就是一个对<br> 匿名map的 munmap操作*/<br> goto set_brk;<br> goto out;<br> }<br><br> ...//limit 检查<br><br> /* Check against existing mmap mappings. */<br> /* brk部分必须位于一个vma内*/<br> if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE))<br> goto out;<br><br> ...//空闲内存检测<br> <br> /* Ok, looks good - let it rip. */<br> if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)<br> goto out;<br>set_brk:<br> mm->brk = brk;<br>out:<br> retval = mm->brk;<br> up(&mm->mmap_sem);<br> return retval;<br>}<br> 核心函数do_brk也是一个本模块主要的对外接口。实际上brk是mmap的一个特例。<br>就是一个匿名的mmap操作,所以和do_mmap_pgoff逻辑相同。不再罗列。<br><br>V)其他重要接口函数<br> lock_vma_mappings/unlock_vma_mapping: 对vma设计到的spin inode->i_mapping-><br>i_shared_lock 加解锁,应用很多。<br> <br> find_vma/find_vma_prev/find_extend_vma: 在进程的vma链表或者avl树中查找<br>vma。注意查找到的vma的特性,并不保证所查询的地址在此vma中。<br><br> __insert_vm_struct/insert_vm_struct:建立vma的两个关联,a)放入进程的vma<br>列表或者avl查找树。 b)如果关联一个file,链入inode->i_mapping->i_mmap(_shared).<br> <br> <br> <br> <br></pre>
</td>
</tr>
</tbody>
</table></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -