📄 quota-deadlock-on-pagelock-core.patch
字号:
From: Jan Kara <jack@suse.cz>The four patches in this series fix deadlocks with quotas of pagelock (theproblem was lock inversion on PageLock and transaction start - quota codeneeded to first start a transaction and then write the data which subsequentlyneeded acquisition of PageLock while the standard ordering - PageLock firstand transaction start later - was used e.g. by pdflush). They implement anew way of quota access to disk: Every filesystem that would like to implementquotas now has to provide quota_read() and quota_write() functions. Thesefunctions must obey quota lock ordering (in particular they should not takePageLock inside a transaction).The first patch implements the changes in the quota core, the other threepatches implement needed functions in ext2, ext3 and reiserfs. The patch forreiserfs also fixes several other lock inversion problems (similar as ext3had) and implements the journaled quota functionality (which comes almost forfree after the locking fixes...).The quota core patch makes quota support in other filesystems (except XFSwhich implements everything on its own ;)) unfunctional (quotaon() will refuseto turn on quotas on them). When the patches get reasonable wide testing andit will seem that no major changes will be needed I can make fixes also forthe other filesystems (JFS, UDF, UFS).This patch:The patch implements the new way of quota io in the quota core. Everyfilesystem wanting to support quotas has to provide functions quota_read()and quota_write() obeying quota locking rules. As the writes and readsbypass the pagecache there is some ugly stuff ensuring that userspace cansee all the data after quotaoff() (or Q_SYNC quotactl). In future I planto make quota files inaccessible from userspace (with the exception ofquotacheck(8) which will take care about the cache flushing and such stuffitself) so that this synchronization stuff can be removed...The rewrite of the quota core. Quota uses the filesystem read() and write()functions no more to avoid possible deadlocks on PageLock. From now on everyfilesystem supporting quotas must provide functions quota_read() andquota_write() which obey the quota locking rules (e.g. they cannot acquire thePageLock).Signed-off-by: Jan Kara <jack@suse.cz>Signed-off-by: Andrew Morton <akpm@osdl.org>--- 25-akpm/fs/dquot.c | 162 +++++++++++++-------------- 25-akpm/fs/quota.c | 45 +++++++ 25-akpm/fs/quota_v1.c | 62 ++-------- 25-akpm/fs/quota_v2.c | 227 +++++++++++++++++---------------------- 25-akpm/include/linux/fs.h | 3 25-akpm/include/linux/quota.h | 2 25-akpm/include/linux/security.h | 8 - 25-akpm/security/dummy.c | 2 25-akpm/security/selinux/hooks.c | 4 9 files changed, 247 insertions(+), 268 deletions(-)diff -rup RH_2_6_9_55.orig/fs/dquot.c RH_2_6_9_55/fs/dquot.c--- RH_2_6_9_55.orig/fs/dquot.c+++ RH_2_6_9_55/fs/dquot.c@@ -49,7 +49,7 @@ * New SMP locking. * Jan Kara, <jack@suse.cz>, 10/2002 *- * Added journalled quota support+ * Added journalled quota support, fix lock inversion problems * Jan Kara, <jack@suse.cz>, 2003,2004 * * (C) Copyright 1994 - 1997 Marco van Wieringen @@ -75,7 +75,8 @@ #include <linux/proc_fs.h> #include <linux/security.h> #include <linux/kmod.h>-#include <linux/pagemap.h>+#include <linux/namei.h>+#include <linux/buffer_head.h> #include <asm/uaccess.h> @@ -114,7 +115,7 @@ * operations on dquots don't hold dq_lock as they copy data under dq_data_lock * spinlock to internal buffers before writing. *- * Lock ordering (including related VFS locks) is following:+ * Lock ordering (including related VFS locks) is the following: * i_sem > dqonoff_sem > journal_lock > dqptr_sem > dquot->dq_lock > * dqio_sem * i_sem on quota files is special (it's below dqio_sem)@@ -183,8 +184,7 @@ static void put_quota_format(struct quot * on all three lists, depending on its current state. * * All dquots are placed to the end of inuse_list when first created, and this- * list is used for the sync and invalidate operations, which must look- * at every dquot.+ * list is used for invalidate operation, which must look at every dquot. * * Unused dquots (dq_count == 0) are added to the free_dquots list when freed, * and this list is searched whenever we need an available dquot. Dquots are@@ -1341,10 +1341,12 @@ int vfs_quota_off(struct super_block *sb { int cnt; struct quota_info *dqopt = sb_dqopt(sb);+ struct inode *toput[MAXQUOTAS]; /* We need to serialize quota_off() for device */ down(&dqopt->dqonoff_sem); for (cnt = 0; cnt < MAXQUOTAS; cnt++) {+ toput[cnt] = NULL; if (type != -1 && cnt != type) continue; if (!sb_has_quota_enabled(sb, cnt))@@ -1364,7 +1366,7 @@ int vfs_quota_off(struct super_block *sb dqopt->ops[cnt]->free_file_info(sb, cnt); put_quota_format(dqopt->info[cnt].dqi_format); - fput(dqopt->files[cnt]);+ toput[cnt] = dqopt->files[cnt]; dqopt->files[cnt] = NULL; dqopt->info[cnt].dqi_flags = 0; dqopt->info[cnt].dqi_igrace = 0;@@ -1372,6 +1374,26 @@ int vfs_quota_off(struct super_block *sb dqopt->ops[cnt] = NULL; } up(&dqopt->dqonoff_sem);+ /* Sync the superblock so that buffers with quota data are written to+ * disk (and so userspace sees correct data afterwards) */+ if (sb->s_op->sync_fs)+ sb->s_op->sync_fs(sb, 1);+ sync_blockdev(sb->s_bdev);+ /* Now the quota files are just ordinary files and we can set the+ * inode flags back. Moreover we discard the pagecache so that+ * userspace sees the writes we did bypassing the pagecache. We+ * must also discard the blockdev buffers so that we see the+ * changes done by userspace on the next quotaon() */+ for (cnt = 0; cnt < MAXQUOTAS; cnt++)+ if (toput[cnt]) {+ down(&toput[cnt]->i_sem);+ toput[cnt]->i_flags &= ~(S_IMMUTABLE | S_NOATIME | S_NOQUOTA);+ truncate_inode_pages(&toput[cnt]->i_data, 0);+ up(&toput[cnt]->i_sem);+ mark_inode_dirty(toput[cnt]);+ iput(toput[cnt]);+ }+ invalidate_bdev(sb->s_bdev, 0); return 0; } @@ -1379,68 +1401,56 @@ int vfs_quota_off(struct super_block *sb * Turn quotas on on a device */ -/* Helper function when we already have file open */-static int vfs_quota_on_file(struct file *f, int type, int format_id)+/* Helper function when we already have the inode */+static int vfs_quota_on_inode(struct inode *inode, int type, int format_id) { struct quota_format_type *fmt = find_quota_format(format_id);- struct inode *inode;- struct super_block *sb = f->f_dentry->d_sb;+ struct super_block *sb = inode->i_sb; struct quota_info *dqopt = sb_dqopt(sb);- struct dquot *to_drop[MAXQUOTAS];- int error, cnt;- unsigned int oldflags = -1;+ int error;+ int oldflags = -1; if (!fmt) return -ESRCH;- error = -EIO;- if (!f->f_op || !f->f_op->read || !f->f_op->write)+ if (!S_ISREG(inode->i_mode)) {+ error = -EACCES; goto out_fmt;- inode = f->f_dentry->d_inode;- error = -EACCES;- if (!S_ISREG(inode->i_mode))+ }+ if (IS_RDONLY(inode)) {+ error = -EROFS;+ goto out_fmt;+ }+ if (!sb->s_op->quota_write || !sb->s_op->quota_read) {+ error = -EINVAL; goto out_fmt;+ } + /* As we bypass the pagecache we must now flush the inode so that+ * we see all the changes from userspace... */+ write_inode_now(inode, 1);+ /* And now flush the block cache so that kernel sees the changes */+ invalidate_bdev(sb->s_bdev, 0); down(&inode->i_sem); down(&dqopt->dqonoff_sem); if (sb_has_quota_enabled(sb, type)) {- up(&inode->i_sem); error = -EBUSY; goto out_lock; } /* We don't want quota and atime on quota files (deadlocks possible)- * We also need to set GFP mask differently because we cannot recurse- * into filesystem when allocating page for quota inode */+ * Also nobody should write to the file - we use special IO operations+ * which ignore the immutable bit. */ down_write(&dqopt->dqptr_sem);- oldflags = inode->i_flags & (S_NOATIME | S_NOQUOTA);- inode->i_flags |= S_NOQUOTA | S_NOATIME;+ oldflags = inode->i_flags & (S_NOATIME | S_IMMUTABLE | S_NOQUOTA);+ inode->i_flags |= S_NOQUOTA | S_NOATIME | S_IMMUTABLE; up_write(&dqopt->dqptr_sem);- up(&inode->i_sem); - dqopt->files[type] = f;+ error = -EIO;+ dqopt->files[type] = igrab(inode);+ if (!dqopt->files[type])+ goto out_lock; error = -EINVAL; if (!fmt->qf_ops->check_quota_file(sb, type)) goto out_file_init;- /*- * We write to quota files deep within filesystem code. We don't want- * the VFS to reenter filesystem code when it tries to allocate a- * pagecache page for the quota file write. So clear __GFP_FS in- * the quota file's allocation flags.- */- mapping_set_gfp_mask(inode->i_mapping,- mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS);-- down_write(&dqopt->dqptr_sem);- for (cnt = 0; cnt < MAXQUOTAS; cnt++) {- to_drop[cnt] = inode->i_dquot[cnt];- inode->i_dquot[cnt] = NODQUOT;- }- up_write(&dqopt->dqptr_sem);- /* We must put dquots outside of dqptr_sem because we may need to- * start transaction for dquot_release() */- for (cnt = 0; cnt < MAXQUOTAS; cnt++) {- if (to_drop[cnt])- dqput(to_drop[cnt]);- } dqopt->ops[type] = fmt->qf_ops; dqopt->info[type].dqi_format = fmt;@@ -1451,6 +1461,7 @@ static int vfs_quota_on_file(struct file goto out_file_init; } up(&dqopt->dqio_sem);+ up(&inode->i_sem); set_enable_flags(dqopt, type); add_dquot_ref(sb, type);@@ -1460,19 +1471,18 @@ static int vfs_quota_on_file(struct file out_file_init: dqopt->files[type] = NULL;+ iput(inode); out_lock: up(&dqopt->dqonoff_sem); if (oldflags != -1) {- down(&inode->i_sem); down_write(&dqopt->dqptr_sem);- /* Reset the NOATIME flag back. I know it could change in the- * mean time but playing with NOATIME flags on a quota file is- * never a good idea */- inode->i_flags &= ~(S_NOATIME | S_NOQUOTA);+ /* Set the flags back (in the case of accidental quotaon()+ * on a wrong file we don't want to mess up the flags) */+ inode->i_flags &= ~(S_NOATIME | S_NOQUOTA | S_IMMUTABLE); inode->i_flags |= oldflags; up_write(&dqopt->dqptr_sem);- up(&inode->i_sem); }+ up(&inode->i_sem); out_fmt: put_quota_format(fmt); @@ -1482,47 +1492,37 @@ out_fmt: /* Actual function called from quotactl() */ int vfs_quota_on(struct super_block *sb, int type, int format_id, char *path) {- struct file *f;+ struct nameidata nd; int error; - f = filp_open(path, O_RDWR, 0600);- if (IS_ERR(f))- return PTR_ERR(f);- error = security_quota_on(f);+ error = path_lookup(path, LOOKUP_FOLLOW, &nd);+ if (error < 0)+ return error;+ error = security_quota_on(nd.dentry); if (error)- goto out_f;- error = vfs_quota_on_file(f, type, format_id);- if (!error)- return 0;-out_f:- filp_close(f, NULL);+ goto out_path;+ /* Quota file not on the same filesystem? */+ if (nd.mnt->mnt_sb != sb)+ error = -EXDEV;+ else+ error = vfs_quota_on_inode(nd.dentry->d_inode, type, format_id);+out_path:+ path_release(&nd); return error; } /*- * Function used by filesystems when filp_open() would fail (filesystem is- * being mounted now). We will use a private file structure. Caller is- * responsible that it's IO functions won't need vfsmnt structure or- * some dentry tricks...+ * This function is used when filesystem needs to initialize quotas+ * during mount time. */ int vfs_quota_on_mount(int type, int format_id, struct dentry *dentry) {- struct file *f; int error; - dget(dentry); /* Get a reference for struct file */- f = dentry_open(dentry, NULL, O_RDWR);- if (IS_ERR(f)) {- error = PTR_ERR(f);- goto out_dentry;- }- error = vfs_quota_on_file(f, type, format_id);- if (!error)- return 0;- fput(f);-out_dentry:- dput(dentry);- return error;+ error = security_quota_on(dentry);+ if (error)+ return error;+ return vfs_quota_on_inode(dentry->d_inode, type, format_id); } /* Generic routine for getting common part of quota structure */diff -rup RH_2_6_9_55.orig/fs/quota.c RH_2_6_9_55/fs/quota.c--- RH_2_6_9_55.orig/fs/quota.c+++ RH_2_6_9_55/fs/quota.c@@ -13,6 +13,8 @@ #include <linux/kernel.h> #include <linux/smp_lock.h> #include <linux/security.h>+#include <linux/syscalls.h>+#include <linux/buffer_head.h> /* Check validity of quotactl */ static int check_quotactl_valid(struct super_block *sb, int type, int cmd, qid_t id)@@ -134,16 +136,54 @@ restart: return NULL; } +void quota_sync_sb(struct super_block *sb, int type)+{+ int cnt;+ struct inode *discard[MAXQUOTAS];++ sb->s_qcop->quota_sync(sb, type);+ /* This is not very clever (and fast) but currently I don't know about+ * any other simple way of getting quota data to disk and we must get+ * them there for userspace to be visible... */+ if (sb->s_op->sync_fs)+ sb->s_op->sync_fs(sb, 1);+ sync_blockdev(sb->s_bdev);++ /* Now when everything is written we can discard the pagecache so+ * that userspace sees the changes. We need i_sem and so we could+ * not do it inside dqonoff_sem. Moreover we need to be carefull+ * about races with quotaoff() (that is the reason why we have own+ * reference to inode). */+ down(&sb_dqopt(sb)->dqonoff_sem);+ for (cnt = 0; cnt < MAXQUOTAS; cnt++) {+ discard[cnt] = NULL;+ if (type != -1 && cnt != type)+ continue;+ if (!sb_has_quota_enabled(sb, cnt))+ continue;+ discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);+ }+ up(&sb_dqopt(sb)->dqonoff_sem);+ for (cnt = 0; cnt < MAXQUOTAS; cnt++) {+ if (discard[cnt]) {+ down(&discard[cnt]->i_sem);+ truncate_inode_pages(&discard[cnt]->i_data, 0);+ up(&discard[cnt]->i_sem);+ iput(discard[cnt]);+ }+ }+}+ void sync_dquots(struct super_block *sb, int type) { if (sb) { if (sb->s_qcop->quota_sync)- sb->s_qcop->quota_sync(sb, type);+ quota_sync_sb(sb, type); } else {- while ((sb = get_super_to_sync(type)) != 0) {+ while ((sb = get_super_to_sync(type)) != NULL) { if (sb->s_qcop->quota_sync)- sb->s_qcop->quota_sync(sb, type);+ quota_sync_sb(sb, type); drop_super(sb); } }diff -rup RH_2_6_9_55.orig/fs/quota_v1.c RH_2_6_9_55/fs/quota_v1.c--- RH_2_6_9_55.orig/fs/quota_v1.c+++ RH_2_6_9_55/fs/quota_v1.c@@ -7,7 +7,6 @@ #include <linux/init.h> #include <linux/module.h> -#include <asm/uaccess.h>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -