| From | Sent On | Attachments |
|---|---|---|
| David Cecil | Nov 29, 2007 4:01 pm | |
| Bill Vermillion | Nov 29, 2007 4:27 pm | |
| David Cecil | Nov 29, 2007 4:30 pm | |
| Julian Elischer | Nov 29, 2007 4:49 pm | |
| Bruce Evans | Nov 29, 2007 5:22 pm | |
| David Cecil | Nov 29, 2007 5:38 pm | |
| Matthew D. Fuller | Nov 29, 2007 8:05 pm | |
| Bruce Evans | Nov 29, 2007 8:58 pm | |
| David Cecil | Nov 29, 2007 9:26 pm | |
| Bruce Evans | Nov 29, 2007 9:44 pm | |
| Bruce Evans | Nov 29, 2007 10:02 pm | |
| Kostik Belousov | Nov 29, 2007 10:03 pm | |
| David Cecil | Nov 29, 2007 11:14 pm | |
| Bruce Evans | Nov 30, 2007 8:33 am | |
| Matthew D. Fuller | Nov 30, 2007 9:44 am | |
| David Cecil | Nov 30, 2007 4:42 pm | |
| Bruce Evans | Nov 30, 2007 6:01 pm | |
| Bruce Evans | Nov 30, 2007 6:23 pm | |
| David Cecil | Nov 30, 2007 8:27 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Kostik Belousov | Dec 1, 2007 12:07 am | |
| Bruce Evans | Dec 1, 2007 3:34 am | |
| Bruce Evans | Dec 1, 2007 8:07 am | |
| Bruce Evans | Dec 1, 2007 10:33 am | |
| Don Lewis | Dec 1, 2007 2:07 pm | |
| Don Lewis | Dec 1, 2007 2:14 pm | |
| Bruce Evans | Dec 1, 2007 7:56 pm | |
| Kostik Belousov | Dec 1, 2007 10:14 pm | |
| Bruce Evans | Dec 2, 2007 12:35 am | |
| Bruce Evans | Dec 2, 2007 1:07 am | |
| David Cecil | Dec 2, 2007 2:10 pm | |
| Don Lewis | Dec 2, 2007 2:53 pm | |
| Julian Elischer | Dec 2, 2007 3:49 pm | |
| Bruce Evans | Dec 2, 2007 8:03 pm | |
| Don Lewis | Dec 2, 2007 9:17 pm | |
| Bruce Evans | Dec 3, 2007 2:11 am |
| Subject: | File remove problem | |
|---|---|---|
| From: | Bruce Evans (br...@optusnet.com.au) | |
| Date: | Dec 1, 2007 7:56:59 pm | |
| List: | org.freebsd.freebsd-fs | |
On Sat, 1 Dec 2007, Don Lewis wrote:
On 2 Dec, Bruce Evans wrote:
Here is a non-hackish patch which explains why ignoring MNT_RDONLY in the above or in ffs_mount() helps. It just fixes the confusion between IN_MODIFIED and IN_CHANGE in critical places.
% Index: ffs_softdep.c
[All settings here, but not yet in ufs_inactive() and ffs_truncate(), which are critical, and in other places which might not be critical.]
Without this change, soft updates depends on IN_CHANGE being converted to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen when MNT_RDONLY is set. With soft updates, changes are often delayed until sync time, and when the sync is for mount-update it is done after setting MNT_RDONLY so the above doesn't work.
ufs_itimes() should probably also be looking at fs_ronly instead of MNT_RDONLY, *but* all the paths leading from userland to ufs_itimes() would need to be checked to verify that they check MNT_RDONLY to prevent new file system write operations from happening while the remount is in progress.
Yes, that is probably why MNT_RDONLY is (ab)used now.
I found old (Y2002) private mail from mckusick that explains a previous change in this area, a change that mostly avoided the problem but has been lost:
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% Working file: ufs_vnops.c
% ----------------------------
% revision 1.182
% date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: +4 -5
% When downgrading a filesystem from read-write to read-only, operations
% involving file removal or file update were not always being fully
% committed to disk. The result was lost files or corrupted file data.
% This change ensures that the filesystem is properly synced to disk
% before the filesystem is down-graded.
%
% This delta also fixes a long standing bug in which a file open for
% reading has been unlinked. When the last open reference to the file
% is closed, the inode is reclaimed by the filesystem. Previously,
% if the filesystem had been down-graded to read-only, the inode could
% not be reclaimed, and thus was lost and had to be later recovered
% by fsck. With this change, such files are found at the time of the
% down-grade. Normally they will result in the filesystem down-grade
% failing with `device busy'. If a forcible down-grade is done, then
% the affected files will be revoked causing the inode to be released
% and the open file descriptors to begin failing on attempts to read.
%
% Submitted by: "Sam Leffler" <sa...@errno.com>
% ----------------------------
%
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.181
% retrieving revision 1.182
% diff -u -2 -r1.181 -r1.182
% --- ufs_vnops.c 22 Nov 2001 15:33:12 -0000 1.181
% +++ ufs_vnops.c 15 Jan 2002 07:17:12 -0000 1.182
% @@ -37,5 +37,5 @@
% *
% * @(#)ufs_vnops.c 8.27 (Berkeley) 5/27/95
% - * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.181 2001/11/22 15:33:12 guido
Exp $
% + * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.182 2002/01/15 07:17:12 mckusick
Exp $
% */
%
% @@ -159,11 +159,10 @@
% if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
% return;
% + if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
% + ip->i_flag |= IN_LAZYMOD;
% + else
% + ip->i_flag |= IN_MODIFIED;
% if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% vfs_timestamp(&ts);
% - if ((vp->v_type == VBLK || vp->v_type == VCHR) &&
% - !DOINGSOFTDEP(vp))
% - ip->i_flag |= IN_LAZYMOD;
% - else
% - ip->i_flag |= IN_MODIFIED;
% if (ip->i_flag & IN_ACCESS) {
% ip->i_atime = ts.tv_sec;
This is in ufs_itimes(). Note that it moves the setting of the modified flag before the check of MNT_RDONLY, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose the timestamps but not critical updates here.
% Index: ffs_inode.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v
% retrieving revision 1.73
% retrieving revision 1.74
% diff -u -2 -r1.73 -r1.74
% --- ffs_inode.c 13 Dec 2001 05:07:48 -0000 1.73
% +++ ffs_inode.c 15 Jan 2002 07:17:12 -0000 1.74
% @@ -32,5 +32,5 @@
% *
% * @(#)ffs_inode.c 8.13 (Berkeley) 4/21/95
% - * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.73 2001/12/13 05:07:48 mckusick
Exp $
% + * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.74 2002/01/15 07:17:12 mckusick
Exp $
% */
%
% @@ -88,7 +88,7 @@
% return (0);
% ip->i_flag &= ~(IN_LAZYMOD | IN_MODIFIED);
% - if (vp->v_mount->mnt_flag & MNT_RDONLY)
% - return (0);
% fs = ip->i_fs;
% + if (fs->fs_ronly)
% + return (0);
% /*
% * Ensure that uid and gid are correct. This is a temporary
This fixes loss of the critical updates a little later in ffs_update().
% @@ -153,4 +153,6 @@ % oip = VTOI(ovp); % fs = oip->i_fs; % + if (fs->fs_ronly) % + panic("ffs_truncate: read-only filesystem"); % if (length < 0) % return (EINVAL);
This is a sanity check in ffs_truncate(). I think all callers except the one in ufs_inactive() automatically pass the check, since they are higher level so they do a correct check of MNT_RDONLY. The call in ufs_inactive() used to be unconditional (except for the (i_nlink == 0) condition of course). In -current it is conditional on MNT_ONLY. kib's patch changes it to be conditional on fs_ronly, but I hope it can become unconditional again -- it should be an error to reach ufs_inactive() with a partially deleted file after syncing before changing fs_ronly to 0.
ffs_update() should panic instead of returning 0 when (fs->fs_ronly) too, so that bugs get noticed.
mckusick's explanation says that "[fs_ronly is the only believable flag]. Thus the killing of IN_MODIFIED has to happen a few lines later [in] ffs_update()".
It should be safe to blindly ignore all modification flags except IN_ACCESS in ufs_itimes(), since ffs_update() will kill the completely invalid ones.
4.4BSD-Lite blindly ignored all modification flags in ITIMES(), and checks the wrong read-only flag (MNT_RDONLY) in open code that duplicates ITIMES() plus adds the wrong r/o check. When I converted ITIMES() to ufs_itimes(), I centralized this wrong r/o check. This was mainly a cleanup, but I think it fixes wrong setting of atimes (IN_ATIME is set without checking any r/o flags, and IN_ATIME was sometimes converted into a timestamp before ffs_update killed it, so applications could see atime changing even on purely r/o mounted file systems).
The fix in ufs_vnops.c was lost relatively recently as part of related changes to fix IN_ACCESS for non-exclusively-locked reads:
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.279
% retrieving revision 1.280
% diff -u -2 -r1.279 -r1.280
% --- ufs_vnops.c 2 Oct 2006 02:08:31 -0000 1.279
% +++ ufs_vnops.c 10 Oct 2006 09:20:54 -0000 1.280
% @@ -36,5 +36,5 @@
%
% #include <sys/cdefs.h>
% -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.279 2006/10/02 02:08:31
tegge Exp $");
% +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.280 2006/10/10 09:20:54
kib Exp $");
%
% #include "opt_mac.h"
% @@ -129,29 +129,47 @@
% struct inode *ip;
% struct timespec ts;
% + int mnt_locked;
%
% ip = VTOI(vp);
% + mnt_locked = 0;
% + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
% + VI_LOCK(vp);
% + goto out;
% + }
% + MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */
% + mnt_locked = 1;
% + VI_LOCK(vp);
% if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
% - return;
% + goto out_unl;
% +
%
% ip->i_flag |= IN_LAZYMOD;
% - else
% + else if (((vp->v_mount->mnt_kern_flag &
% + (MNTK_SUSPENDED | MNTK_SUSPEND)) == 0) ||
% + (ip->i_flag & (IN_CHANGE | IN_UPDATE)))
% ip->i_flag |= IN_MODIFIED;
% - if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) {
% - vfs_timestamp(&ts);
% - if (ip->i_flag & IN_ACCESS) {
% - DIP_SET(ip, i_atime, ts.tv_sec);
% - DIP_SET(ip, i_atimensec, ts.tv_nsec);
% - }
% - if (ip->i_flag & IN_UPDATE) {
% - DIP_SET(ip, i_mtime, ts.tv_sec);
% - DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% - ip->i_modrev++;
% - }
% - if (ip->i_flag & IN_CHANGE) {
% - DIP_SET(ip, i_ctime, ts.tv_sec);
% - DIP_SET(ip, i_ctimensec, ts.tv_nsec);
% - }
% + else if (ip->i_flag & IN_ACCESS)
% + ip->i_flag |= IN_LAZYACCESS;
% + vfs_timestamp(&ts);
% + if (ip->i_flag & IN_ACCESS) {
% + DIP_SET(ip, i_atime, ts.tv_sec);
% + DIP_SET(ip, i_atimensec, ts.tv_nsec);
% + }
% + if (ip->i_flag & IN_UPDATE) {
% + DIP_SET(ip, i_mtime, ts.tv_sec);
% + DIP_SET(ip, i_mtimensec, ts.tv_nsec);
% + ip->i_modrev++;
% + }
% + if (ip->i_flag & IN_CHANGE) {
% + DIP_SET(ip, i_ctime, ts.tv_sec);
% + DIP_SET(ip, i_ctimensec, ts.tv_nsec);
% }
% +
% + out:
% ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
% + out_unl:
% + VI_UNLOCK(vp);
% + if (mnt_locked)
% + MNT_IUNLOCK(vp->v_mount);
% }
%
Now we trust MNT_RDONLY again, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose both timestamps and critical updates here.
I think the best quick fix would be to trust fs_ronly here, except for IN_ACCESS. Then wrong IN_CHANGE | IN_UPDATE flags would just cause wrong updates of i_ctime and i_mtime, and an extra i/o to write these changes, but missing IN_MODIFIED flags would be fixed up as in rev.1.182 and associated correct IN_CHANGE | IN_UPDATE flags wouldn't be incorrectly discarded. IN_ACCESS needs special handling even in the non-snapshot cases so that read() doesn't race mount-update (at best, read()s might keep dirtying inodes, so there would be a problem setting fs_ronly atomically with completing the sync).
Most other file systems are primitive or broken in this area. ext2fs is at the level of ufs_vnops.c 1.182. msdosfs is at level before 4.4BSD-Lite (it still uses its clone of ITIMES() and looks more like the Net/2 ffs than the 4.4BSD one). But most other file systems are Giant-locked, so they don't have the IN_ACCESS races.
Bruce





