📄 00000046.htm
字号:
t this operation can block (GFP_KERNEL allocation) so it must drop the inode <BR>_lock spinlock which guards the hashtable. Since it dropped the spinlock it <BR>must retry searching the inode in the hashtable and if it is found this time <BR>, it returns (after incrementing the reference by __iget) the one found in t <BR>he hashtable and destroys the newly allocated one. If it is still not found <BR>in the hashtable then the new inode we have just allocated is the one to be <BR>used and so it is initialised to the required values and the fs-specific sb- <BR>s_op-read_inode() method is invoked to populate the rest of the inode. This <BR>brings us from inode cache back to the filesystem code - remember that we ca <BR>me to the inode cache when filesystem-specific lookup() method invoked iget( <BR>). While the s_op-read_inode() method is reading the inode from disk the ino <BR>de is locked (i_state = I_LOCK) and after it returns it is unlocked and all <BR>the waiters for it are woken up <BR>Now, let's see what happens when we close this file descriptor. The close(2) <BR> system call is implemented in fs/open.c:sys_close() function which calls do <BR>_close(fd, 1) which rips (replaces with NULL) the descriptor of the process' <BR> file descriptor table and invokes filp_close() function which does most of <BR>the work. The interesting things happen in fput() which checks if this was t <BR>he last reference to the file and if so calls fs/file_table.c:_fput() which <BR>calls __fput() which is where interaction with dcache (and therefore with in <BR>ode cache - remember dcache is a Master of inode cache!) happens. The fs/dca <BR>che.c:dput() does dentry_iput() which brings us back to inode cache via iput <BR>(inode) so let us understand fs/inode.c:iput(inode): <BR>1. if parameter passed to us is NULL, we do absolutely nothing and return <BR>2. if there is a fs-specific sb-s_op-put_inode() method it is invoked now wi <BR>th no spinlocks held (so it can block) <BR>3. inode_lock spinlock is taken and i_count is decremented. If this was NOT <BR>the last reference to this inode then we simply checked if there are too man <BR>y references to it and so i_count can wrap around the 32 bits allocated to i <BR>t and if so we print a warning and return. Note that we call printk() while <BR>holding the inode_lock spinlock - this is fine because printk() can never bl <BR>ock so it may be called in absolutely any context (even from interrupt handl <BR>ers!) <BR>4. if this was the last active reference then some work needs to be done. <BR>The work performed by iput() on the last inode reference is rather complex s <BR>o we separate it into a list of its own: <BR>1. If i_nlink == 0 (e.g. the file was unlinked while we held it open) then i <BR>node is removed from hashtable and from its type list and if there are any d <BR>ata pages held in page cache for this inode, they are removed by means of tr <BR>uncate_all_inode_pages(&inode-i_data). Then filesystem-specific s_op-delete_ <BR>inode() method is invoked which typically deletes on-disk copy of the inode. <BR> If there is no s_op-delete_inode() method registered by the filesystem (e.g <BR>. ramfs) then we call clear_inode(inode) which invokes s_op-clear_inode() if <BR> registered and if inode corresponds to a block device the device's referenc <BR>e count is dropped by bdput(inode-i_bdev). <BR>2. if i_nlink != 0 then we check if there are other inodes in the same hash <BR>bucket and if there is none, then if inode is not dirty we delete it from it <BR>s type list and add it to inode_unused list incrementing inodes_stat.nr_unus <BR>ed. If there are inodes in the same hashbucket then we delete it from the ty <BR>pe list and add to inode_unused list. If this was anonymous inode (NetApp .s <BR>napshot) then we delete it from the type list and clear/destroy it completel <BR>y <BR>3.2 Filesystem Registration/Unregistration <BR>Linux kernel provides a mechanism for new filesystems to be written with min <BR>imum effort. The historical reasons for this are: <BR>1. In the world where people still use non-Linux operating systems to protec <BR>t their investment in legacy software Linux had to provide interoperability <BR>by supporting a great multitude of different filesystems - most of which wou <BR>ld not deserve to exist on their own but only for compatibility with existin <BR>g non-Linux operating systems <BR>2. The interface for filesystem writers had to be very simple so that people <BR> could try to reverse engineer existing proprietary filesystems by writing r <BR>ead-only versions of them. Therefore Linux VFS makes it very easy to impleme <BR>nt read-only filesystems - 95% of the work is to finish them by adding full <BR>write-support. As a concrete example, I wrote read-only BFS filesystem for L <BR>inux in about 10 hours but it took several weeks to complete it to have full <BR> write support (and even today some purists claim that it is not complete be <BR>cause "it doesn't have compactification support") <BR>3. All Linux filesystems can be implemented as modules so VFS interface is e <BR>xported <BR>Let us consider the steps required to implement a filesystem under Linux. Th <BR>e code implementing a filesystem can be either a dynamically loadable module <BR> or statically linked into the kernel and the way it is done under Linux is <BR>very transparent. All that is needed is to fill in a 'struct file_system_typ <BR>e' structure and register it with the VFS using register_filesystem() functi <BR>on as in the following example from fs/bfs/inode.c: <BR>#include <linux/module.h <BR>#include <linux/init.h <BR>static struct super_block *bfs_read_super(struct super_block *, void *, int) <BR>; <BR>static DECLARE_FSTYPE_DEV(bfs_fs_type, "bfs", bfs_read_super); <BR>static int __init init_bfs_fs(void) <BR>{ <BR> return register_filesystem(&bfs_fs_type); <BR>} <BR>static void __exit exit_bfs_fs(void) <BR>{ <BR> unregister_filesystem(&bfs_fs_type); <BR>} <BR>module_init(init_bfs_fs) <BR>module_exit(exit_bfs_fs) <BR>These macros ensure that for modules the functions init_bfs_fs() and exit_bf <BR>s_fs() turn into init_module() and cleanup_module() respectively and for sta <BR>tically linked objects the exit_bfs_fs() code vanishes as it is unnecessary. <BR> <BR>The 'struct file_system_type' is declared in include/linux/fs.h: <BR>struct file_system_type { <BR> const char *name; <BR> int fs_flags; <BR> struct super_block *(*read_super) (struct super_block *, void *, int <BR>); <BR> struct module *owner; <BR> struct vfsmount *kern_mnt; /* For kernel mount, if it's FS_SINGLE fs <BR> */ <BR> struct file_system_type * next; <BR>}; <BR>The fields thereof are explained thus: <BR>· name - human readable name, appears in /proc/filesystems file and is used <BR> as a key to find filesystem by name (type of mount(2)) and to refuse to reg <BR>ister a different filesystem under the name of the one already registered - <BR>so there can (obviously) be only one filesystem with a given name. For modul <BR>es, name points to module's address spaces and not copied - this means cat / <BR>proc/filesystems can oops if the module was unloaded but filesystem is still <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -