📄 017_mm_page_io_c.html
字号:
<html lang="zh-CN" xmlns:gdoc=""> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> <style type="text/css">/* default css */table { font-size: 1em; line-height: inherit;}div, address, ol, ul, li, option, select { margin-top: 0px; margin-bottom: 0px;}p { margin: 0px;}body { margin: 0px; padding: 0px; font-family: Verdana, sans-serif; font-size: 10pt; background-color: #ffffff;}h6 { font-size: 10pt }h5 { font-size: 11pt }h4 { font-size: 12pt }h3 { font-size: 13pt }h2 { font-size: 14pt }h1 { font-size: 16pt }blockquote {padding: 10px; border: 1px #DDD dashed }a img {border: 0}div.google_header, div.google_footer { position: relative; margin-top: 1em; margin-bottom: 1em;}/* end default css */ /* default print css */ @media print { body { padding: 0; margin: 0; } div.google_header, div.google_footer { display: block; min-height: 0; border: none; } div.google_header { flow: static(header); } /* used to insert page numbers */ div.google_header::before, div.google_footer::before { position: absolute; top: 0; } div.google_footer { flow: static(footer); } /* always consider this element at the start of the doc */ div#google_footer { flow: static(footer, start); } span.google_pagenumber { content: counter(page); } span.google_pagecount { content: counter(pages); } } @page { @top { content: flow(header); } @bottom { content: flow(footer); } } /* end default print css */ /* custom css *//* end custom css */ /* ui edited css */ body { font-family: Verdana; font-size: 10.0pt; line-height: normal; background-color: #ffffff; } .documentBG { background-color: #ffffff; } /* end ui edited css */</style> </head> <body revision="dcbsxfpf_58fjxjrdhc:5"> <table align=center cellpadding=0 cellspacing=0 height=5716 width=768>
<tbody>
<tr>
<td height=5716 valign=top width=100%>
<pre>2006-7-14 <br>mm/page_io.c<br><br> 首次接触swap设备/文件,简单介绍一下。<br><br> 首先是swap_entry的结构: <br>/* Encode and de-code a swap entry */<br>/* |......24bits offset ......|1bits|6bit type| present bit==0| */<br>typedef struct {<br> unsigned long val;<br>} swp_entry_t;<br><br> 当页面被交换到磁盘时,页表的entry(pte_t)被如上结构的swap entry所取代.<br>此时present bit被置0,cpu 认为页面不在内存,pte_t的其他部分按照swap entry的<br>规划由os解释.<br> <br> <br> 然后来看看交换设备紧密相关的一些数据结构. 交换设备可以是单独的交换分区,<br>也可以是一般的文件. 这一点可以看文件 mm/swapfile.c<br> struct swap_info_struct swap_info[MAX_SWAPFILES];<br>和函数 get_swaphandle_info 即可得到印证.<br>void get_swaphandle_info(swp_entry_t entry, unsigned long *offset, <br> kdev_t *dev, struct inode **swapf)<br>{<br> unsigned long type;<br> struct swap_info_struct *p;<br><br>/* |......24bits offset ......|1bits|6bit type| present bit==0| */<br> type = SWP_TYPE(entry);<br> if (type >= nr_swapfiles) {<br> printk("Internal error: bad swap-device\n");<br> return;<br> }<br><br> p = &swap_info[type];<br> *offset = SWP_OFFSET(entry);<br> if (*offset >= p->max) {<br> printk("rw_swap_page: weirdness\n");<br> return;<br> }<br> if (p->swap_map && !p->swap_map[*offset]) {/*引用计数为零,尚未使用*/<br> printk("VM: Bad swap entry %08lx\n", entry.val);<br> return;<br> }<br> if (!(p->flags & SWP_USED)) {<br> printk(KERN_ERR "rw_swap_page: "<br> "Trying to swap to unused swap-device\n");<br> return;<br> }<br><br> if (p->swap_device) { /* 如果是交换分区swap_device非空*/<br> *dev = p->swap_device;<br> } else if (p->swap_file) {/*否则是交换文件*/<br> *swapf = p->swap_file->d_inode;<br> } else {<br> printk(KERN_ERR "rw_swap_page: no swap file or device\n");<br> }<br> return;<br>}<br><br> 先不说swap_info, 讨论一下交换设备的结构. 交换设备的空间被作为后备页面<br>来使用,尺寸等于cpu页面的大小.交换设备上的第一个页面被用于swap header,既:<br><br>/*交换设备和header格式,大小为一个page */<br>union swap_header {<br> struct <br> {<br> char reserved[PAGE_SIZE - 10];<br> char magic[10];<br> } magic;<br> struct <br> {<br> char bootbits[1024]; /* Space for disklabel etc. */<br> unsigned int version;<br> unsigned int last_page; /* 交换设备上最后一个页面的nr*/<br> unsigned int nr_badpages; /*多少页面是损坏的*/<br> unsigned int padding[125];<br> unsigned int badpages[1]; /*损坏页面的索引数组*/<br> } info;<br>};<br><br> 记录了其大小,版本,损坏页面,magic等信息。在sys_swapon时转化为swap_info<br>中的信息,刚才已经简单提到了,现在完整的概述一下swap_info:<br>struct swap_info_struct {<br> unsigned int flags;<br> kdev_t swap_device;<br> spinlock_t sdev_lock;<br> struct dentry * swap_file;<br> struct vfsmount *swap_vfsmnt;<br> unsigned short * swap_map; /*记录交换设备上page 的引用计数(SWAP_MAP_MAX)*/<br> /* 数组大小为this->max */<br><br> /*按簇分配算法变量,一个簇包含SWAPFILE_CLUSTER 个页面 */<br> unsigned int lowest_bit; /* 和highest_bit一起构成有可能空闲的页面的索引范围*/<br> unsigned int highest_bit;<br> unsigned int cluster_next; /*swap cluster 中下一个可分配页面*/<br> unsigned int cluster_nr; /*本簇内剩余页面数量*/<br><br> int prio; /* swap priority */<br> int pages; /*nr_good_pages*/<br> unsigned long max; /*来自swap_header 的last_page(和),见sys_swapon*/<br> int next; /* next entry on swap list */<br>};<br> 信息来自swap header以及交换设备,还有安cluster分配交换页面所需的一些结构。<br>仔细看看各个成分代表的含义,具体的代码很容易看懂的。<br> 暂时介绍到这里。<br> <br> 看看mm/page_io.c所提供的接口:rw_swap_page,rw_swap_page_nolock,其实是swap和<br>设备驱动的一个接口,提供了对swap设备上文件的读写功能.<br> 这两个函数差别不大,rw_swap_page_nolock现在只用于sys_swapon(现在关注的范围内)<br>用于读取上述的swap_header.(见sys_swapon,一个临时页面).这个页面的特殊之处在于它<br>其实不属于swap cache(特殊的 page cache).不能直接使用rw_swap_page读取,所以做了一<br>些特殊处理:<br><br>/*<br> * The swap lock map insists that pages be in the page cache!<br> * Therefore we can't use it. Later when we can remove the need for the<br> * lock map and we can reduce the number of functions exported.<br> */<br>void rw_swap_page_nolock(int rw, swp_entry_t entry, char *buf, int wait)<br>{<br> struct page *page = virt_to_page(buf);<br> <br> if (!PageLocked(page))<br> PAGE_BUG(page);<br> if (PageSwapCache(page)) //在swap cahce 就不对了<br> PAGE_BUG(page);<br> if (page->mapping) //不能属于某个map<br> PAGE_BUG(page);<br> /* needs sync_page to wait I/O completation */<br> page->mapping = &swapper_space; //借用一下swap space<br> if (!rw_swap_page_base(rw, entry, page, wait))<br> UnlockPage(page);<br> page->mapping = NULL; <br>}<br> 借用swaper_space的原因是,rw_swap_page_base,需要wait_on_page,而wait_on_page<br>则使用sync_page:<br>static inline int sync_page(struct page *page)<br>{<br> struct address_space *mapping = page->mapping;<br><br> if (mapping && mapping->a_ops && mapping->a_ops->sync_page)<br> return mapping->a_ops->sync_page(page);<br> return 0;<br>}<br> 故此需要这样一个辗转的过程。现在来看看这两个接口共同的逻辑部分:<br>/*<br> * Reads or writes a swap page.<br> * wait=1: start I/O and wait for completion. wait=0: start asynchronous I/O.<br> *<br> * Important prevention of race condition: the caller *must* atomically <br> * create a unique swap cache entry for this swap page before calling<br> * rw_swap_page, and must lock that page. By ensuring that there is a<br> * single page of memory reserved for the swap entry, the normal VM page<br> * lock on that page also doubles as a lock on swap entries. Having only<br> * one lock to deal with per swap entry (rather than locking swap and memory<br> * independently) also makes it easier to make certain swapping operations<br> * atomic, which is particularly important when we are trying to ensure <br> * that shared pages stay shared while being swapped.<br> */<br><br>static int rw_swap_page_base(int rw, swp_entry_t entry, struct page *page, int wait)<br>{<br> unsigned long offset;<br> int zones[PAGE_SIZE/512];<br> int zones_used;<br> kdev_t dev = 0;<br> int block_size;<br> struct inode *swapf = 0;<br><br> /* Don't allow too many pending pages in flight.. */<br> if ((rw == WRITE) && atomic_read(&nr_async_pages) ><br> pager_daemon.swap_cluster * (1 << page_cluster))<br> wait = 1;<br><br> if (rw == READ) {<br> ClearPageUptodate(page);<br> kstat.pswpin++;<br> } else<br> kstat.pswpout++;<br><br> /*从swap entry找到交换设备或者交换文件<br> *以及back page 在交换介质上的偏移<br> */<br> /* |......24bits offset ......|1bits|6bit type| present bit==0| */<br> get_swaphandle_info(entry, &offset, &dev, &swapf);<br> if (dev) {<br> zones[0] = offset;<br> zones_used = 1;<br> block_size = PAGE_SIZE;<br> } else if (swapf) {<br> int i, j;<br> unsigned int block = offset<br> << (PAGE_SHIFT - swapf->i_sb->s_blocksize_bits);<br><br> block_size = swapf->i_sb->s_blocksize;<br> for (i=0, j=0; j< PAGE_SIZE ; i++, j += block_size)<br> if (!(zones[i] = bmap(swapf,block++))) {<br> printk("rw_swap_page: bad swap file\n");<br> return 0;<br> }<br> zones_used = i;<br> dev = swapf->i_dev;<br> } else {<br> return 0;<br> }<br> if (!wait) {<br> SetPageDecrAfter(page);<br> atomic_inc(&nr_async_pages);<br> }<br><br> /* block_size == PAGE_SIZE/zones_used */<br> brw_page(rw, page, dev, zones, block_size);<br><br> /* Note! For consistency we do all of the logic,<br> * decrementing the page count, and unlocking the page in the<br> * swap lock map - in the IO completion handler.<br> */<br> if (!wait)<br> return 1;<br><br> wait_on_page(page);<br> /* This shouldn't happen, but check to be sure. */<br> if (page_count(page) == 0)<br> printk(KERN_ERR "rw_swap_page: page unused while waiting!\n");<br><br> return 1;<br>}<br> <br> 对于这个函数首先要说的是全局变量nr_async_pages:<br>/* rw_swap_page_base: inc nr_async_pages end_buffer_io_async:dec this one*/<br>/*异步io 状态就是提交读/写后不等待页面get unlocked*/<br>atomic_t nr_async_pages = ATOMIC_INIT(0); /*所有在异步io状态的页面总数*/<br> <br> 还有pager_daemon.swap_cluster * (1 << page_cluster)):<br>其中pager_daemon.swap_cluster是最多容许预读的页面cluster个数,而page_cluster<br>则是每个cluster拥有的页面个数:2^page_cluster。所以rw_swap_page_base中相关的<br>判断就是不让在异步io状态的页面超过所容许的总的预读页面数量。<br><br> 说到这里,page_cluster是某个cluster的大小,上面还提到了SWAPFILE_CLUSTER也是<br>某个cluster的大小,这连个cluster的区别在于:<br> page_cluster所指的是的簇用于filemap相关的预读,以及swap in的时候所做的预读。<br>而普通文件的预读情况参考do_generic_file_read的相关分析。<br> SWAPFILE_CLUSTER则是用于swap设备上的swap页面分配时的分配策略使用的簇,相关<br>代码参考:mm/swapfile.c 函数scan_swap_map.到时再论.<br><br> 其次要说的是bmap这个函数,简单看,如果swap设备是一个普通文件,则以ext2文件为例<br>bmap-> {inode->i_mapping->a_ops->bmap(inode->i_mapping, block)}->ext2_bmap-><br>generic_block_bmap->ext2_get_block. 核心函数是ext2_get_block,实际上是文件内的<br>连续的block nr到文件所在的设备的block号的一个 查找/分配 的过程.具体的这里就不<br>再细述,等到分析ext2文件系统再论不迟.<br> 看rw_swap_page_base相关部分,如果是一个交换分区,这个复杂过程就可以省略了,所以<br>交换分区的效率还是值得肯定的.<br><br> 另外简单提一下这个调用:<br> brw_page(rw, page, dev, zones, block_size);<br> 如果仔细看看rw_swap_page_base,可能会担心block size是否会大于page size?据我所<br>知对于ext2文件系统,blocksize不会大于4k,见ext2_read_super.<br> brw_page在这个page上创建buffers,设置buffer对应的设备block nr,提交设备驱动读取<br>相应block的内容.细节部分等到分析设备驱动的时候再续.<br><br> The end. <br> <br><br><br></pre>
</td>
</tr>
</tbody>
</table></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -