|David Cecil||Nov 29, 2007 4:01 pm|
|Bill Vermillion||Nov 29, 2007 4:27 pm|
|David Cecil||Nov 29, 2007 4:30 pm|
|Julian Elischer||Nov 29, 2007 4:49 pm|
|Bruce Evans||Nov 29, 2007 5:22 pm|
|David Cecil||Nov 29, 2007 5:38 pm|
|Matthew D. Fuller||Nov 29, 2007 8:05 pm|
|Bruce Evans||Nov 29, 2007 8:58 pm|
|David Cecil||Nov 29, 2007 9:26 pm|
|Bruce Evans||Nov 29, 2007 9:44 pm|
|Bruce Evans||Nov 29, 2007 10:02 pm|
|Kostik Belousov||Nov 29, 2007 10:03 pm|
|David Cecil||Nov 29, 2007 11:14 pm|
|Bruce Evans||Nov 30, 2007 8:33 am|
|Matthew D. Fuller||Nov 30, 2007 9:44 am|
|David Cecil||Nov 30, 2007 4:42 pm|
|Bruce Evans||Nov 30, 2007 6:01 pm|
|Bruce Evans||Nov 30, 2007 6:23 pm|
|David Cecil||Nov 30, 2007 8:27 pm|
|Don Lewis||Nov 30, 2007 11:26 pm|
|Don Lewis||Nov 30, 2007 11:26 pm|
|Kostik Belousov||Dec 1, 2007 12:07 am|
|Bruce Evans||Dec 1, 2007 3:34 am|
|Bruce Evans||Dec 1, 2007 8:07 am|
|Bruce Evans||Dec 1, 2007 10:33 am|
|Don Lewis||Dec 1, 2007 2:07 pm|
|Don Lewis||Dec 1, 2007 2:14 pm|
|Bruce Evans||Dec 1, 2007 7:56 pm|
|Kostik Belousov||Dec 1, 2007 10:14 pm|
|Bruce Evans||Dec 2, 2007 12:35 am|
|Bruce Evans||Dec 2, 2007 1:07 am|
|David Cecil||Dec 2, 2007 2:10 pm|
|Don Lewis||Dec 2, 2007 2:53 pm|
|Julian Elischer||Dec 2, 2007 3:49 pm|
|Bruce Evans||Dec 2, 2007 8:03 pm|
|Don Lewis||Dec 2, 2007 9:17 pm|
|Bruce Evans||Dec 3, 2007 2:11 am|
|Subject:||File remove problem|
|From:||Don Lewis (truc...@FreeBSD.org)|
|Date:||Dec 2, 2007 9:17:32 pm|
On 3 Dec, Bruce Evans wrote:
On Sun, 2 Dec 2007, Don Lewis wrote:
On 2 Dec, Bruce Evans wrote:
So it should be safe to remove all the r/o checks in ufs_inactive() after fixing the other bugs. ffs_truncate alread panics if fs_ronly, but only in some cases. In particular, it doesn't panic for truncations that don't change the file size. Such truncations aren't quite null, since standards require [f]truncate(2) to mark the ctime and mtime for update. ffs_truncate() sets the marks, which is correct for null truncations from userland but not ones from syncer internals. Setting of the marks when fs_ronly is set should cause panics later (my patch has a vprint() for it).
I think the MNT_RDONLY check in ufs_itimes_locked() should be also be changed to look at fs_ronly and panic if any marks are set. This will require some changes to add some early MNT_RDONLY checks.
Yes, already done (except vprint() instead of panic).
In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This also looks like it should be a reasonable optimization for read-only file systems that should eliminate unnecessary work at the lower levels of the code.
But I let these happen and discard IN_ATIME marks if fs_ronly. I thought that the optimization went the other way -- unconditionally setting the marks was very efficient, and discarding them in ufs_itimes() was efficient too. I think this is still true now with larger locking overheads, and the marks should be discarded later in the MNT_NOATIME case too. It is expected that the marks are set much more often than they are looked at by ufs_itimes(), since most calls to ufs_itimes() are in close() and read() is much more common than close().
ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking MNT_RDONLY there as well is free. Setting and clearing the mark will consume a few instruction cycles, dirty a cache line, and increase main memory write-back traffic, though the expense is likely to be small.
Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set on a downgrade to read-only seems to be the right thing to do.
ufs_itimes() is also called in stat() but I think that is less common than close() (except for some tree walks). WIth non-delayed marking, ufs_itimes() would still have to check fs_ronly, and the only gain would be that it could then skip checking the marks except as an invariants check. But it can gain like that even with delayed setting -- just ignore any old marks while fs_ronly (except as an invariants check), but clear them at mount or unmount time so that there shouldn't be any.
I think that setting the marks when the file system is read-only causes the syncer to do extra work. I think that ffs_sync() still gets called if the file system is read-only, and if it encounters any inodes with marks set, it calls ffs_syncvnode() on them.
The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY check, appears to be protected by the MNT_RDONLY check in vfs_mark_atime().
Thanks, I had forgotten about that. In vfs_mark_atime(), there is much more efficiency to be gained by not setting marks that will be discarded, since it takes a VOP to set them and many file systems don't support this setting. However, it is hard for vfs_mark_atime() to know when the mark will be discarded without calling the fs:
- it already doesn't know which fs's support it - it should be checking fs_ronly for ffs
I think that MNT_RDONLY is correct here. We want to stop new atime updates as soon as the downgrade starts, just like we stop new user-initiated writes.
- it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance to adjust the marks. This is automatically safe in one direction only (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops changes), and always bad for strict invariants.
Maybe a reasonable way to handle this would be to set the flags before calling VOP_MOUNT() when they are being changed from 0 to 1, and clear them after calling VOP_MOUNT() when changing them from 1 to 0. Adding explicit locking sounds painful ...
I now use the following fixes:
% Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.293 % diff -u -2 -r1.293 ufs_vnops.c % --- ufs_vnops.c 8 Nov 2007 17:21:51 -0000 1.293 % +++ ufs_vnops.c 2 Dec 2007 04:56:58 -0000 % @@ -89,4 +89,5 @@ % #endif % % +#include <ufs/ffs/fs.h> % #include <ufs/ffs/ffs_extern.h> % % @@ -137,8 +138,38 @@ % % ip = VTOI(vp); % + /* % + * MNT_RDONLY can barely be trusted here. Full r/o mode is indicated % + * by fs_ronly, and the MNT_RDONLY setting [should] differ from the % + * fs_ronly setting only during transition from r/w mode to r/o mode. % + * We set IN_ACCESS even in full r/o mode, so we must discard it % + * unconditionally here. During the transition, we must convert % + * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect % + * to set IN_MODIFIED. We also set the timestamps indicated by % + * IN_CHANGE | IN_UPDATE normally during the transition, since the % + * update marks may have been set correctly before the transition and % + * not yet converted into timestamps. Callers that set IN_CHANGE | % + * IN_UPDATE during the transition are buggy, since userland writes % + * are supposed to be denied (by MNT_RDONLY checks) during the % + * transition, while kernel writes should should only be for syncs % + * and syncs should not touch timestamps except to convert old % + * update marks to timestamps. Callers that set any update mark or % + * modification flag except IN_ACCESS while in full r/o mode are % + * broken; we will panic for them later. % + */ % if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) % - goto out; % + ip->i_flag &= ~IN_ACCESS;
IN_ACCESS might have been set before the downgrade request. As written, this change will toss out the timestamp update. I think it would be better to use fs_ronly here, but it would be more efficient to check MNT_RDONLY in ffs_*read() and eliminate these two lines of code.