| From | Sent On | Attachments |
|---|---|---|
| David Cecil | Nov 29, 2007 4:01 pm | |
| Bill Vermillion | Nov 29, 2007 4:27 pm | |
| David Cecil | Nov 29, 2007 4:30 pm | |
| Julian Elischer | Nov 29, 2007 4:49 pm | |
| Bruce Evans | Nov 29, 2007 5:22 pm | |
| David Cecil | Nov 29, 2007 5:38 pm | |
| Matthew D. Fuller | Nov 29, 2007 8:05 pm | |
| Bruce Evans | Nov 29, 2007 8:58 pm | |
| David Cecil | Nov 29, 2007 9:26 pm | |
| Bruce Evans | Nov 29, 2007 9:44 pm | |
| Bruce Evans | Nov 29, 2007 10:02 pm | |
| Kostik Belousov | Nov 29, 2007 10:03 pm | |
| David Cecil | Nov 29, 2007 11:14 pm | |
| Bruce Evans | Nov 30, 2007 8:33 am | |
| Matthew D. Fuller | Nov 30, 2007 9:44 am | |
| David Cecil | Nov 30, 2007 4:42 pm | |
| Bruce Evans | Nov 30, 2007 6:01 pm | |
| Bruce Evans | Nov 30, 2007 6:23 pm | |
| David Cecil | Nov 30, 2007 8:27 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Kostik Belousov | Dec 1, 2007 12:07 am | |
| Bruce Evans | Dec 1, 2007 3:34 am | |
| Bruce Evans | Dec 1, 2007 8:07 am | |
| Bruce Evans | Dec 1, 2007 10:33 am | |
| Don Lewis | Dec 1, 2007 2:07 pm | |
| Don Lewis | Dec 1, 2007 2:14 pm | |
| Bruce Evans | Dec 1, 2007 7:56 pm | |
| Kostik Belousov | Dec 1, 2007 10:14 pm | |
| Bruce Evans | Dec 2, 2007 12:35 am | |
| Bruce Evans | Dec 2, 2007 1:07 am | |
| David Cecil | Dec 2, 2007 2:10 pm | |
| Don Lewis | Dec 2, 2007 2:53 pm | |
| Julian Elischer | Dec 2, 2007 3:49 pm | |
| Bruce Evans | Dec 2, 2007 8:03 pm | |
| Don Lewis | Dec 2, 2007 9:17 pm | |
| Bruce Evans | Dec 3, 2007 2:11 am |
| Subject: | File remove problem | |
|---|---|---|
| From: | Bruce Evans (br...@optusnet.com.au) | |
| Date: | Dec 3, 2007 2:11:31 am | |
| List: | org.freebsd.freebsd-fs | |
On Sun, 2 Dec 2007, Don Lewis wrote:
On 3 Dec, Bruce Evans wrote:
On Sun, 2 Dec 2007, Don Lewis wrote:
In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in addition to MNT_NOATIME (as is already done in vfs_mark_atime()). This also looks like it should be a reasonable optimization for read-only file systems that should eliminate unnecessary work at the lower levels of the code.
But I let these happen and discard IN_ATIME marks if fs_ronly. I thought that the optimization went the other way -- unconditionally setting the marks was very efficient, and discarding them in ufs_itimes() was efficient too. I think this is still true now with larger locking overheads, and the marks should be discarded later in the MNT_NOATIME case too. It is expected that the marks are set much more often than they are looked at by ufs_itimes(), since most calls to ufs_itimes() are in close() and read() is much more common than close().
ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking MNT_RDONLY there as well is free. Setting and clearing the mark will consume a few instruction cycles, dirty a cache line, and increase main memory write-back traffic, though the expense is likely to be small.
The check can also avoid the new vnode locking for useless settings of IN_ATIME. But what locks the MNT flags? Nothing directly I think. Here we must not care if we read a stale value, and ufs_itimes() must not care if the value changed just after we read it.
Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set on a downgrade to read-only seems to be the right thing to do.
Either reads or ufs_itimes() must use MNT_RDONLY to prevent changes to the inode after MNT_RDONLY is set early in the r/w to r/o transition. Checking MNT_RDONLY gives the more correct behaviour of not having to discard even IN_ATIME settings that were made before the transition began.
I don't understand how unmount (apparently) works so well without setting MNT_RDONLY to prevent further writes like the transition does.
ufs_itimes() is also called in stat() but I think that is less common than close() (except for some tree walks). WIth non-delayed marking, ufs_itimes() would still have to check fs_ronly, and the only gain would be that it could then skip checking the marks except as an invariants check. But it can gain like that even with delayed setting -- just ignore any old marks while fs_ronly (except as an invariants check), but clear them at mount or unmount time so that there shouldn't be any.
I think that setting the marks when the file system is read-only causes the syncer to do extra work. I think that ffs_sync() still gets called if the file system is read-only, and if it encounters any inodes with marks set, it calls ffs_syncvnode() on them.
I think VOP_SYNC() actually isn't called on r/o file systems. Callers check MNT_RDONLY or possibly dirty block list pointers. msdosfs had a buf that would have caused panics if msdosfs_sync() were called on an fs that had ever been mounted r/w but is currently r/o.
The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY check, appears to be protected by the MNT_RDONLY check in vfs_mark_atime().
Thanks, I had forgotten about that. In vfs_mark_atime(), there is much more efficiency to be gained by not setting marks that will be discarded, since it takes a VOP to set them and many file systems don't support this setting. However, it is hard for vfs_mark_atime() to know when the mark will be discarded without calling the fs:
- it already doesn't know which fs's support it - it should be checking fs_ronly for ffs
I think that MNT_RDONLY is correct here. We want to stop new atime updates as soon as the downgrade starts, just like we stop new user-initiated writes.
Right. Same as for normal read accesses or delayed killing of IN_ACCESS in ufs_itimes().
- it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious too. Upper layers set the MNT flags before giving VOP_MOUNT() a chance to adjust the marks. This is automatically safe in one direction only (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops changes), and always bad for strict invariants.
Maybe a reasonable way to handle this would be to set the flags before calling VOP_MOUNT() when they are being changed from 0 to 1, and clear them after calling VOP_MOUNT() when changing them from 1 to 0. Adding explicit locking sounds painful ...
This already happens for MNT_RDONLY. ffs_mount() has dead code which obfuscates this and other things by setting all the generic flags again. It gets the timing of the setting of MNT_RDONLY backwards by delaying the setting until the end of the transition from r/w to r/o, but this has no effect since MNT_RDONLY is set throughout the transition. It only gets the timing of the clearing of MNT_RDONLY right by delaying it until the end of the transition from r/o to r/w.
Some flags have the wrong sense for this to be right. E.g., early clearing of MNT_ASYNC is safe, but early setting of it is not. tegge fixed some races from this.
Bruce





