📄 00000046.htm
字号:
<HTML><HEAD> <TITLE>BBS水木清华站∶精华区</TITLE></HEAD><BODY><CENTER><H1>BBS水木清华站∶精华区</H1></CENTER>发信人: clamor (clamor), 信区: Linux <BR>标 题: Linux Kernel Internals-3(VFS) <BR>发信站: BBS 水木清华站 (Tue Dec 19 21:35:14 2000) <BR> <BR>3. Virtual Filesystem (VFS) <BR>3.1 Inode Caches and Interaction with Dcache <BR>In order to support multiple filesystems Linux contains a special kernel int <BR>erface level called VFS - Virtual Filesystem Switch. This is similar to vnod <BR>e/vfs interface found in SVR4 derivatives (originally it came from BSD and S <BR>un original implementations). <BR>Linux inode cache is implemented in a single file fs/inode.c which consists <BR>of 977 lines of code. It is interesting to note that for the last 5-7 years <BR>not many changes were made to it, i.e. one can still recognize some of the c <BR>ode comparing the latest version with, say, 1.3.42. <BR>The structure of Linux inode cache is as follows: <BR>1. A global hashtable inode_hashtable, each inode is hashed by the value of <BR>the superblock pointer and 32bit inode number. Inodes without superblock (in <BR>ode-i_sb == NULL) are added to a doubly linked list headed by anon_hash_chai <BR>n instead. Examples of anonymous inodes are sockets created by net/socket.c: <BR>sock_alloc() by calling fs/inode.c:get_empty_inode() <BR>2. A global type in_use list (inode_in_use) which contains valid inodes with <BR> i_count0, i_nlink0. Inodes newly allocated by get_empty_inode() and get_new <BR>_inode() are added to inode_in_use list <BR>3. A global type unused list (inode_unused) which contains valid inodes with <BR> i_count = 0 <BR>4. A per-superblock type dirty list (sb-s_dirty) which contains valid inodes <BR> with i_count0, i_nlink0 and i_state & I_DIRTY. When inode is marked dirty i <BR>t is added to the sb-s_dirty list if it is also hashed. Maintaining a per-su <BR>perblock dirty list of inodes allows to quickly sync inodes <BR>5. Inode cache proper - a SLAB cache called inode_cachep. As inode objects a <BR>re allocated and freed, they are taken from and returned to this SLAB cache <BR>The type lists are anchored from inode-i_list, the hashtable from inode-i_ha <BR>sh. Each inode can be on a hashtable and one and only one type (in_use, unus <BR>ed or dirty) list. <BR>All these lists are protected by a single spinlock - inode_lock. <BR>Inode cache subsystem is initialised when inode_init() function is called in <BR>it/main.c:start_kernel(). The function is marked as __init which means its c <BR>ode is thrown away later on. It is passed a single argument - the number of <BR>physical pages on the system. This is so that inode cache can configure itse <BR>lf depending on how much memory is available, i.e. create a larger hashtable <BR> if there is enough memory. <BR>The only stats information about inode cache is the number of unused inodes, <BR> stored in inodes_stat.nr_unused and accessible to user programs via files / <BR>proc/sys/fs/inode-nr and /proc/sys/fs/inode-state. <BR>We can examine one of the lists from the gdb running on a live kernel thus: <BR>(gdb) printf "%d\n", (unsigned long)(&((struct inode *)0)-i_list) <BR>8 <BR>(gdb) p inode_unused <BR>$34 = 0xdfa992a8 <BR>(gdb) p (struct list_head)inode_unused <BR>$35 = {next = 0xdfa992a8, prev = 0xdfcdd5a8} <BR>(gdb) p ((struct list_head)inode_unused).prev <BR>$36 = (struct list_head *) 0xdfcdd5a8 <BR>(gdb) p (((struct list_head)inode_unused).prev)-prev <BR>$37 = (struct list_head *) 0xdfb5a2e8 <BR>(gdb) set $i = (struct inode *)0xdfb5a2e0 <BR>(gdb) p $i-i_ino <BR>$38 = 0x3bec7 <BR>(gdb) p $i-i_count <BR>$39 = {counter = 0x0} <BR>Note that we deducted 8 from the address 0xdfb5a2e8 to obtain the address of <BR> the 'struct inode' 0xdfb5a2e0 according to the definition of list_entry() m <BR>acro from include/linux/list.h. <BR>To understand how inode cache works let us trace a lifetime of an inode of a <BR> regular file on ext2 filesystem as it is opened and closed: <BR>fd = open("file", O_RDONLY); <BR>close(fd); <BR>The open(2) system call is implemented in fs/open.c:sys_open function and th <BR>e real work is done by fs/open.c:filp_open() function which is split into tw <BR>o parts: <BR>1. open_namei() - fills in nameidata structure containing the dentry and vfs <BR>mount structures <BR>2. dentry_open() - given a dentry and vfsmount it allocates a new 'struct fi <BR>le' and links them together, as well as invoking filesystem specific f_op-op <BR>en() method which was set in inode-i_fop when inode was read in open_namei() <BR> (which provided inode via dentry-d_inode). <BR>The open_namei() function interacts with dentry cache via path_walk() which <BR>in turn calls real_lookup() which invokes inode_operations-lookup() method w <BR>hich is filesystem-specific and its job is to find the entry in the parent d <BR>irectory with the matching name and then do iget(sb, ino) to get the corresp <BR>onding inode which brings us to the inode cache. When the inode is read in, <BR>the dentry is instantiated by means of d_add(dentry, inode). While we are at <BR> it, note that for UNIX-style filesystems which have the concept of on-disk <BR>inode number, it is the lookup method's job to map its endianness to current <BR> cpu format, e.g. if the inode number in raw (fs-specific) dir entry is in l <BR>ittle-endian 32 bit format one could do: <BR>unsigned long ino = le32_to_cpu(de-inode); <BR>inode = iget(sb, ino); <BR>d_add(dentry, inode); <BR>So, when we open a file we hit iget(sb, ino) which is really iget4(sb, ino, <BR>NULL, NULL) which does: <BR>1. Attempts to find an inode with matching superblock and inode number in th <BR>e hashtable under protection of inode_lock. If inode is found then it's refe <BR>rence count (i_count) is incremented and if and if it was 0 and inode is not <BR> dirty then inode is removed from whatever type list (inode-i_list) it is cu <BR>rrently on (it has to be inode_unused list, of course) and inserted into ino <BR>de_in_use type list and inodes_stat.nr_unused is decremented <BR>2. If inode is currently locked we wait until it is not locked so that iget4 <BR>() is guaranteed to return not locked inode <BR>3. If inode was not found in the hashtable then it is the first time we enco <BR>unter this inode so we call get_new_inode() passing it the pointer to the pl <BR>ace in the hashtable where it should be inserted to <BR>4. get_new_inode() allocates a new inode from the inode_cachep SLAB cache bu <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -