📄 vfs.txt
字号:
Overview of the Linux Virtual File System Original author: Richard Gooch <rgooch@atnf.csiro.au> Last updated on June 24, 2007. Copyright (C) 1999 Richard Gooch Copyright (C) 2005 Pekka Enberg This file is released under the GPLv2.Introduction============The Virtual File System (also known as the Virtual Filesystem Switch)is the software layer in the kernel that provides the filesysteminterface to userspace programs. It also provides an abstractionwithin the kernel which allows different filesystem implementations tocoexist.VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and soon are called from a process context. Filesystem locking is describedin the document Documentation/filesystems/Locking.Directory Entry Cache (dcache)------------------------------The VFS implements the open(2), stat(2), chmod(2), and similar systemcalls. The pathname argument that is passed to them is used by the VFSto search through the directory entry cache (also known as the dentrycache or dcache). This provides a very fast look-up mechanism totranslate a pathname (filename) into a specific dentry. Dentries livein RAM and are never saved to disc: they exist only for performance.The dentry cache is meant to be a view into your entire filespace. Asmost computers cannot fit all dentries in the RAM at the same time,some bits of the cache are missing. In order to resolve your pathnameinto a dentry, the VFS may have to resort to creating dentries alongthe way, and then loading the inode. This is done by looking up theinode.The Inode Object----------------An individual dentry usually has a pointer to an inode. Inodes arefilesystem objects such as regular files, directories, FIFOs and otherbeasts. They live either on the disc (for block device filesystems)or in the memory (for pseudo filesystems). Inodes that live on thedisc are copied into the memory when required and changes to the inodeare written back to disc. A single inode can be pointed to by multipledentries (hard links, for example, do this).To look up an inode requires that the VFS calls the lookup() method ofthe parent directory inode. This method is installed by the specificfilesystem implementation that the inode lives in. Once the VFS hasthe required dentry (and hence the inode), we can do all those boringthings like open(2) the file, or stat(2) it to peek at the inodedata. The stat(2) operation is fairly simple: once the VFS has thedentry, it peeks at the inode data and passes some of it back touserspace.The File Object---------------Opening a file requires another operation: allocation of a filestructure (this is the kernel-side implementation of filedescriptors). The freshly allocated file structure is initialized witha pointer to the dentry and a set of file operation member functions.These are taken from the inode data. The open() file method is thencalled so the specific filesystem implementation can do it's work. Youcan see that this is another switch performed by the VFS. The filestructure is placed into the file descriptor table for the process.Reading, writing and closing files (and other assorted VFS operations)is done by using the userspace file descriptor to grab the appropriatefile structure, and then calling the required file structure method todo whatever is required. For as long as the file is open, it keeps thedentry in use, which in turn means that the VFS inode is still in use.Registering and Mounting a Filesystem=====================================To register and unregister a filesystem, use the following APIfunctions: #include <linux/fs.h> extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *);The passed struct file_system_type describes your filesystem. When arequest is made to mount a device onto a directory in your filespace,the VFS will call the appropriate get_sb() method for the specificfilesystem. The dentry for the mount point will then be updated topoint to the root inode for the new filesystem.You can see all filesystems that are registered to the kernel in thefile /proc/filesystems.struct file_system_type-----------------------This describes the filesystem. As of kernel 2.6.22, the followingmembers are defined:struct file_system_type { const char *name; int fs_flags; int (*get_sb) (struct file_system_type *, int, const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key;}; name: the name of the filesystem type, such as "ext2", "iso9660", "msdos" and so on fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) get_sb: the method to call when a new instance of this filesystem should be mounted kill_sb: the method to call when an instance of this filesystem should be unmounted owner: for internal VFS use: you should initialize this to THIS_MODULE in most cases. next: for internal VFS use: you should initialize this to NULL s_lock_key, s_umount_key: lockdep-specificThe get_sb() method has the following arguments: struct file_system_type *fs_type: decribes the filesystem, partly initialized by the specific filesystem code int flags: mount flags const char *dev_name: the device name we are mounting. void *data: arbitrary mount options, usually comes as an ASCII string struct vfsmount *mnt: a vfs-internal representation of a mount pointThe get_sb() method must determine if the block device specifiedin the dev_name and fs_type contains a filesystem of the type the methodsupports. If it succeeds in opening the named block device, it initializes astruct super_block descriptor for the filesystem contained by the block device.On failure it returns an error.The most interesting member of the superblock structure that theget_sb() method fills in is the "s_op" field. This is a pointer toa "struct super_operations" which describes the next level of thefilesystem implementation.Usually, a filesystem uses one of the generic get_sb() implementationsand provides a fill_super() method instead. The generic methods are: get_sb_bdev: mount a filesystem residing on a block device get_sb_nodev: mount a filesystem that is not backed by a device get_sb_single: mount a filesystem which shares the instance between all mountsA fill_super() method implementation has the following arguments: struct super_block *sb: the superblock structure. The method fill_super() must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string int silent: whether or not to be silent on errorThe Superblock Object=====================A superblock object represents a mounted filesystem.struct super_operations-----------------------This describes how the VFS can manipulate the superblock of yourfilesystem. As of kernel 2.6.22, the following members are defined:struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); void (*read_inode) (struct inode *); void (*dirty_inode) (struct inode *); int (*write_inode) (struct inode *, int); void (*put_inode) (struct inode *); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); void (*put_super) (struct super_block *); void (*write_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); void (*write_super_lockfs) (struct super_block *); void (*unlockfs) (struct super_block *); int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); int (*show_options)(struct seq_file *, struct vfsmount *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);};All methods are called without any locks being held, unless otherwisenoted. This means that most methods can block safely. All methods areonly called from a process context (i.e. not from an interrupt handleror bottom half). alloc_inode: this method is called by inode_alloc() to allocate memory for struct inode and initialize it. If this function is not defined, a simple 'struct inode' is allocated. Normally alloc_inode will be used to allocate a larger structure which contains a 'struct inode' embedded within it. destroy_inode: this method is called by destroy_inode() to release resources allocated for struct inode. It is only required if ->alloc_inode was defined and simply undoes anything done by ->alloc_inode. read_inode: this method is called to read a specific inode from the mounted filesystem. The i_ino member in the struct inode is initialized by the VFS to indicate which inode to read. Other members are filled in by this method. You can set this to NULL and use iget5_locked() instead of iget() to read inodes. This is necessary for filesystems for which the inode number is not sufficient to identify an inode. dirty_inode: this method is called by the VFS to mark an inode dirty. write_inode: this method is called when the VFS needs to write an inode to disc. The second parameter indicates whether the write should be synchronous or not, not all filesystems check this flag. put_inode: called when the VFS inode is removed from the inode cache. drop_inode: called when the last access to the inode is dropped, with the inode_lock spinlock held. This method should be either NULL (normal UNIX filesystem semantics) or "generic_delete_inode" (for filesystems that do not want to cache inodes - causing "delete_inode" to always be called regardless of the value of i_nlink) The "generic_delete_inode()" behavior is equivalent to the old practice of using "force_delete" in the put_inode() case, but does not have the races that the "force_delete()" approach had. delete_inode: called when the VFS wants to delete an inode put_super: called when the VFS wishes to free the superblock (i.e. unmount). This is called with the superblock lock held write_super: called when the VFS superblock needs to be written to disc. This method is optional sync_fs: called when VFS is writing out all dirty data associated with a superblock. The second parameter indicates whether the method should wait until the write out has been completed. Optional. write_super_lockfs: called when VFS is locking a filesystem and forcing it into a consistent state. This method is currently used by the Logical Volume Manager (LVM). unlockfs: called when VFS is unlocking a filesystem and making it writable again. statfs: called when the VFS needs to get filesystem statistics. This is called with the kernel lock held remount_fs: called when the filesystem is remounted. This is called with the kernel lock held clear_inode: called then the VFS clears the inode. Optional umount_begin: called when the VFS is unmounting a filesystem. show_options: called by the VFS to show mount options for /proc/<pid>/mounts. quota_read: called by the VFS to read from filesystem quota file. quota_write: called by the VFS to write to filesystem quota file.The read_inode() method is responsible for filling in the "i_op"field. This is a pointer to a "struct inode_operations" whichdescribes the methods that can be performed on individual inodes.The Inode Object================An inode object represents an object within the filesystem.struct inode_operations-----------------------This describes how the VFS can manipulate an inode in yourfilesystem. As of kernel 2.6.22, the following members are defined:struct inode_operations { int (*create) (struct inode *,struct dentry *,int, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); int (*mkdir) (struct inode *,struct dentry *,int);
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -