📄 00000047.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: flamingo (火烈鸟), 信区: Linux <BR>标 题: 第四章(Linux Kernel Internals)! <BR>发信站: BBS 水木清华站 (Wed Dec 20 19:35:10 2000) <BR> <BR>4. Linux Page Cache <BR>In this chapter we describe the Linux 2.4 pagecache. The pagecache is - as t <BR>he name suggests - a cache of physical pages. In the UNIX world the concept <BR>of a pagecache became popular with the introduction of SVR4 UNIX, where it r <BR>eplaced the buffercache for data IO operations. <BR>While the SVR4 pagecache is only used for filesystem data cache and thus use <BR>s the struct vnode and an offset into the file as hash parameters, the Linux <BR> page cache is designed to be more generic, and therefore uses a struct addr <BR>ess_space (explained below) as first parameter. Because the Linux pagecache <BR>is tightly coupled to the notation of address spaces, you will need at least <BR> a basic understanding of adress_spaces to understand the way the pagecache <BR>works. An address_space is some kind of software MMU that maps all pages of <BR>one object (e.g. inode) to an other concurrency (typically physical disk blo <BR>cks). The struct address_space is defined in include/linux/fs.h as: <BR>---------------------------------------------------------------------------- <BR>---- <BR> struct address_space { <BR> struct list_head pages; <BR> unsigned long nrpages; <BR> struct address_space_operations * a_ops; <BR> void * host; <BR> struct vm_area_struct * i_mmap; <BR> struct vm_area_struct * i_mmap_shared; <BR> spinlock_t i_shared_lock; <BR> }; <BR>---------------------------------------------------------------------------- <BR>---- <BR>To understand the way address_spaces works, we only need to look at a few of <BR> these fields: pages is a double linked list of all pages that belong to thi <BR>s address_space, nrpages is the number of pages in pages, a_ops defines the <BR>methods of this address_space and host is a opaque pointer to the object thi <BR>s address_space belongs to. The usage of pages and nrpages is obvious, so we <BR> will take a tighter look at the address_space_operations structure, defined <BR> in the same header: <BR>---------------------------------------------------------------------------- <BR>---- <BR> struct address_space_operations { <BR> int (*writepage)(struct page *); <BR> int (*readpage)(struct file *, struct page *); <BR> int (*sync_page)(struct page *); <BR> int (*prepare_write)(struct file *, <BR> struct page *, unsigned, unsigned); <BR> int (*commit_write)(struct file *, <BR> struct page *, unsigned, unsigned); <BR> int (*bmap)(struct address_space *, long); <BR> }; <BR>---------------------------------------------------------------------------- <BR>---- <BR>For a basic view at the principle of address_spaces (and the pagecache) we n <BR>eed to take a look at ->writepage and ->readpage, but in practice we need to <BR> take a look at ->prepare_write and ->commit_write, too. <BR>You can probably guess what the address_space_operations methods do by virtu <BR>e of their names alone; nevertheless, they do require some explanation. Thei <BR>r use in the course of filesystem data I/O, by far the most common path thro <BR>ugh the pagecache, provides a good way of understanding them. Unlike most ot <BR>her UNIX-like operating systems, Linux has generic file operations (a subset <BR> of the SYSVish vnode operations) for data IO through the pagecache. This me <BR>ans that the data will not directly interact with the file- system on read/w <BR>rite/mmap, but will be read/written from/to the pagecache whenever possible. <BR> The pagecache has to get data from the actual low-level filesystem in case <BR>the user wants to read from a page not yet in memory, or write data to disk <BR>in case memory gets low. <BR>In the read path the generic methods will first try to find a page that matc <BR>hes the wanted inode/index tuple. <BR>hash = page_hash(inode->i_mapping, index); <BR>Then we test whether the page actually exists. <BR>hash = page_hash(inode->i_mapping, index); page = __find_page_nolock(inode-> <BR>i_mapping, index, *hash); <BR>When it does not exist, we allocate a new free page, and add it to the page- <BR> cache hash. <BR>page = page_cache_alloc(); __add_to_page_cache(page, mapping, index, hash); <BR>After the page is hashed we use the ->readpage address_space operation to ac <BR>tually fill the page with data. (file is an open instance of inode). <BR>error = mapping->a_ops->readpage(file, page); <BR>Finally we can copy the data to userspace. <BR>For writing to the filesystem two pathes exist: one for writable mappings (m <BR>map) and one for the write(2) family of syscalls. The mmap case is very simp <BR>le, so it will be discussed first. When a user modifies mappings, the VM sub <BR>system marks the page dirty. <BR>SetPageDirty(page); <BR>The bdflush kernel thread that is trying to free pages, either as background <BR> activity or because memory gets low will try to call ->writepage on the pag <BR>es that are explicitly marked dirty. The ->writepage method does now have to <BR> write the pages content back to disk and free the page. <BR>The second write path is _much_ more complicated. For each page the user wri <BR>tes to, we are basically doing the following: (for the full code see mm/file <BR>map.c:generic_file_write()). <BR>page = __grab_cache_page(mapping, index, &cached_page); mapping->a_ops->prep <BR>are_write(file, page, offset, offset+bytes); copy_from_user(kaddr+offset, bu <BR>f, bytes); mapping->a_ops->commit_write(file, page, offset, offset+bytes); <BR>So first we try to find the hashed page or allocate a new one, then we call <BR>the ->prepare_write address_space method, copy the user buffer to kernel mem <BR>ory and finally call the ->commit_write method. As you probably have seen -> <BR>prepare_write and ->commit_write are fundamentally different from ->readpage <BR> and ->writepage, because they are not only called when physical IO is actua <BR>lly wanted but everytime the user modifies the file. There are two (or more? <BR>) ways to handle this, the first one uses the Linux buffercache to delay the <BR> physical IO, by filling a page->buffers pointer with buffer_heads, that wil <BR>l be used in try_to_free_buffers (fs/buffers.c) to request IO once memory ge <BR>ts low, and is used very widespread in the current kernel. The other way jus <BR>t sets the page dirty and relies on ->writepage to do all the work. Due to t <BR>he lack of a validitity bitmap in struct page this does not work with filesy <BR>stem that have a smaller granuality then PAGE_SIZE. <BR> <BR>-- <BR>WORLD IS NOT YOURS <BR> <BR> <BR>※ 来源:·BBS 水木清华站 smth.org·[FROM: 162.105.53.152] <BR><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -