atom feed37 messages in org.freebsd.freebsd-fsFile remove problem
FromSent OnAttachments
David CecilNov 29, 2007 4:01 pm 
Bill VermillionNov 29, 2007 4:27 pm 
David CecilNov 29, 2007 4:30 pm 
Julian ElischerNov 29, 2007 4:49 pm 
Bruce EvansNov 29, 2007 5:22 pm 
David CecilNov 29, 2007 5:38 pm 
Matthew D. FullerNov 29, 2007 8:05 pm 
Bruce EvansNov 29, 2007 8:58 pm 
David CecilNov 29, 2007 9:26 pm 
Bruce EvansNov 29, 2007 9:44 pm 
Bruce EvansNov 29, 2007 10:02 pm 
Kostik BelousovNov 29, 2007 10:03 pm 
David CecilNov 29, 2007 11:14 pm 
Bruce EvansNov 30, 2007 8:33 am 
Matthew D. FullerNov 30, 2007 9:44 am 
David CecilNov 30, 2007 4:42 pm 
Bruce EvansNov 30, 2007 6:01 pm 
Bruce EvansNov 30, 2007 6:23 pm 
David CecilNov 30, 2007 8:27 pm 
Don LewisNov 30, 2007 11:26 pm 
Don LewisNov 30, 2007 11:26 pm 
Kostik BelousovDec 1, 2007 12:07 am 
Bruce EvansDec 1, 2007 3:34 am 
Bruce EvansDec 1, 2007 8:07 am 
Bruce EvansDec 1, 2007 10:33 am 
Don LewisDec 1, 2007 2:07 pm 
Don LewisDec 1, 2007 2:14 pm 
Bruce EvansDec 1, 2007 7:56 pm 
Kostik BelousovDec 1, 2007 10:14 pm 
Bruce EvansDec 2, 2007 12:35 am 
Bruce EvansDec 2, 2007 1:07 am 
David CecilDec 2, 2007 2:10 pm 
Don LewisDec 2, 2007 2:53 pm 
Julian ElischerDec 2, 2007 3:49 pm 
Bruce EvansDec 2, 2007 8:03 pm 
Don LewisDec 2, 2007 9:17 pm 
Bruce EvansDec 3, 2007 2:11 am 
Subject:File remove problem
From:Bruce Evans (br@optusnet.com.au)
Date:Dec 1, 2007 7:56:59 pm
List:org.freebsd.freebsd-fs

On Sat, 1 Dec 2007, Don Lewis wrote:

On 2 Dec, Bruce Evans wrote:

Here is a non-hackish patch which explains why ignoring MNT_RDONLY in the above or in ffs_mount() helps. It just fixes the confusion between IN_MODIFIED and IN_CHANGE in critical places.

% Index: ffs_softdep.c

[All settings here, but not yet in ufs_inactive() and ffs_truncate(), which are critical, and in other places which might not be critical.]

Without this change, soft updates depends on IN_CHANGE being converted to IN_MODIFIED by ufs_itimes(), but this conversion doesn't happen when MNT_RDONLY is set. With soft updates, changes are often delayed until sync time, and when the sync is for mount-update it is done after setting MNT_RDONLY so the above doesn't work.

ufs_itimes() should probably also be looking at fs_ronly instead of MNT_RDONLY, *but* all the paths leading from userland to ufs_itimes() would need to be checked to verify that they check MNT_RDONLY to prevent new file system write operations from happening while the remount is in progress.

Yes, that is probably why MNT_RDONLY is (ab)used now.

I found old (Y2002) private mail from mckusick that explains a previous change in this area, a change that mostly avoided the problem but has been lost:

% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % Working file: ufs_vnops.c % ---------------------------- % revision 1.182 % date: 2002/01/15 07:17:12; author: mckusick; state: Exp; lines: +4 -5 % When downgrading a filesystem from read-write to read-only, operations % involving file removal or file update were not always being fully % committed to disk. The result was lost files or corrupted file data. % This change ensures that the filesystem is properly synced to disk % before the filesystem is down-graded. % % This delta also fixes a long standing bug in which a file open for % reading has been unlinked. When the last open reference to the file % is closed, the inode is reclaimed by the filesystem. Previously, % if the filesystem had been down-graded to read-only, the inode could % not be reclaimed, and thus was lost and had to be later recovered % by fsck. With this change, such files are found at the time of the % down-grade. Normally they will result in the filesystem down-grade % failing with `device busy'. If a forcible down-grade is done, then % the affected files will be revoked causing the inode to be released % and the open file descriptors to begin failing on attempts to read. % % Submitted by: "Sam Leffler" <sa@errno.com> % ---------------------------- % % Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.181 % retrieving revision 1.182 % diff -u -2 -r1.181 -r1.182 % --- ufs_vnops.c 22 Nov 2001 15:33:12 -0000 1.181 % +++ ufs_vnops.c 15 Jan 2002 07:17:12 -0000 1.182 % @@ -37,5 +37,5 @@ % * % * @(#)ufs_vnops.c 8.27 (Berkeley) 5/27/95 % - * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.181 2001/11/22 15:33:12 guido
Exp $ % + * $FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.182 2002/01/15 07:17:12 mckusick
Exp $ % */ % % @@ -159,11 +159,10 @@ % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % return; % + if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) % + ip->i_flag |= IN_LAZYMOD; % + else % + ip->i_flag |= IN_MODIFIED; % if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % vfs_timestamp(&ts); % - if ((vp->v_type == VBLK || vp->v_type == VCHR) && % - !DOINGSOFTDEP(vp)) % - ip->i_flag |= IN_LAZYMOD; % - else % - ip->i_flag |= IN_MODIFIED; % if (ip->i_flag & IN_ACCESS) { % ip->i_atime = ts.tv_sec;

This is in ufs_itimes(). Note that it moves the setting of the modified flag before the check of MNT_RDONLY, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose the timestamps but not critical updates here.

% Index: ffs_inode.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_inode.c,v % retrieving revision 1.73 % retrieving revision 1.74 % diff -u -2 -r1.73 -r1.74 % --- ffs_inode.c 13 Dec 2001 05:07:48 -0000 1.73 % +++ ffs_inode.c 15 Jan 2002 07:17:12 -0000 1.74 % @@ -32,5 +32,5 @@ % * % * @(#)ffs_inode.c 8.13 (Berkeley) 4/21/95 % - * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.73 2001/12/13 05:07:48 mckusick
Exp $ % + * $FreeBSD: src/sys/ufs/ffs/ffs_inode.c,v 1.74 2002/01/15 07:17:12 mckusick
Exp $ % */ % % @@ -88,7 +88,7 @@ % return (0); % ip->i_flag &= ~(IN_LAZYMOD | IN_MODIFIED); % - if (vp->v_mount->mnt_flag & MNT_RDONLY) % - return (0); % fs = ip->i_fs; % + if (fs->fs_ronly) % + return (0); % /* % * Ensure that uid and gid are correct. This is a temporary

This fixes loss of the critical updates a little later in ffs_update().

% @@ -153,4 +153,6 @@ % oip = VTOI(ovp); % fs = oip->i_fs; % + if (fs->fs_ronly) % + panic("ffs_truncate: read-only filesystem"); % if (length < 0) % return (EINVAL);

This is a sanity check in ffs_truncate(). I think all callers except the one in ufs_inactive() automatically pass the check, since they are higher level so they do a correct check of MNT_RDONLY. The call in ufs_inactive() used to be unconditional (except for the (i_nlink == 0) condition of course). In -current it is conditional on MNT_ONLY. kib's patch changes it to be conditional on fs_ronly, but I hope it can become unconditional again -- it should be an error to reach ufs_inactive() with a partially deleted file after syncing before changing fs_ronly to 0.

ffs_update() should panic instead of returning 0 when (fs->fs_ronly) too, so that bugs get noticed.

mckusick's explanation says that "[fs_ronly is the only believable flag]. Thus the killing of IN_MODIFIED has to happen a few lines later [in] ffs_update()".

It should be safe to blindly ignore all modification flags except IN_ACCESS in ufs_itimes(), since ffs_update() will kill the completely invalid ones.

4.4BSD-Lite blindly ignored all modification flags in ITIMES(), and checks the wrong read-only flag (MNT_RDONLY) in open code that duplicates ITIMES() plus adds the wrong r/o check. When I converted ITIMES() to ufs_itimes(), I centralized this wrong r/o check. This was mainly a cleanup, but I think it fixes wrong setting of atimes (IN_ATIME is set without checking any r/o flags, and IN_ATIME was sometimes converted into a timestamp before ffs_update killed it, so applications could see atime changing even on purely r/o mounted file systems).

The fix in ufs_vnops.c was lost relatively recently as part of related changes to fix IN_ACCESS for non-exclusively-locked reads:

% Index: ufs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v % retrieving revision 1.279 % retrieving revision 1.280 % diff -u -2 -r1.279 -r1.280 % --- ufs_vnops.c 2 Oct 2006 02:08:31 -0000 1.279 % +++ ufs_vnops.c 10 Oct 2006 09:20:54 -0000 1.280 % @@ -36,5 +36,5 @@ % % #include <sys/cdefs.h> % -__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.279 2006/10/02 02:08:31
tegge Exp $"); % +__FBSDID("$FreeBSD: src/sys/ufs/ufs/ufs_vnops.c,v 1.280 2006/10/10 09:20:54
kib Exp $"); % % #include "opt_mac.h" % @@ -129,29 +129,47 @@ % struct inode *ip; % struct timespec ts; % + int mnt_locked; % % ip = VTOI(vp); % + mnt_locked = 0; % + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) { % + VI_LOCK(vp); % + goto out; % + } % + MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ % + mnt_locked = 1; % + VI_LOCK(vp); % if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) % - return; % + goto out_unl; % + % % ip->i_flag |= IN_LAZYMOD; % - else % + else if (((vp->v_mount->mnt_kern_flag & % + (MNTK_SUSPENDED | MNTK_SUSPEND)) == 0) || % + (ip->i_flag & (IN_CHANGE | IN_UPDATE))) % ip->i_flag |= IN_MODIFIED; % - if ((vp->v_mount->mnt_flag & MNT_RDONLY) == 0) { % - vfs_timestamp(&ts); % - if (ip->i_flag & IN_ACCESS) { % - DIP_SET(ip, i_atime, ts.tv_sec); % - DIP_SET(ip, i_atimensec, ts.tv_nsec); % - } % - if (ip->i_flag & IN_UPDATE) { % - DIP_SET(ip, i_mtime, ts.tv_sec); % - DIP_SET(ip, i_mtimensec, ts.tv_nsec); % - ip->i_modrev++; % - } % - if (ip->i_flag & IN_CHANGE) { % - DIP_SET(ip, i_ctime, ts.tv_sec); % - DIP_SET(ip, i_ctimensec, ts.tv_nsec); % - } % + else if (ip->i_flag & IN_ACCESS) % + ip->i_flag |= IN_LAZYACCESS; % + vfs_timestamp(&ts); % + if (ip->i_flag & IN_ACCESS) { % + DIP_SET(ip, i_atime, ts.tv_sec); % + DIP_SET(ip, i_atimensec, ts.tv_nsec); % + } % + if (ip->i_flag & IN_UPDATE) { % + DIP_SET(ip, i_mtime, ts.tv_sec); % + DIP_SET(ip, i_mtimensec, ts.tv_nsec); % + ip->i_modrev++; % + } % + if (ip->i_flag & IN_CHANGE) { % + DIP_SET(ip, i_ctime, ts.tv_sec); % + DIP_SET(ip, i_ctimensec, ts.tv_nsec); % } % + % + out: % ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); % + out_unl: % + VI_UNLOCK(vp); % + if (mnt_locked) % + MNT_IUNLOCK(vp->v_mount); % } %

Now we trust MNT_RDONLY again, so that when the wrong or incomplete flags are set earlier and the wrong flags aren't converted to the modified flag before MNT_RDONLY is set, then we only lose both timestamps and critical updates here.

I think the best quick fix would be to trust fs_ronly here, except for IN_ACCESS. Then wrong IN_CHANGE | IN_UPDATE flags would just cause wrong updates of i_ctime and i_mtime, and an extra i/o to write these changes, but missing IN_MODIFIED flags would be fixed up as in rev.1.182 and associated correct IN_CHANGE | IN_UPDATE flags wouldn't be incorrectly discarded. IN_ACCESS needs special handling even in the non-snapshot cases so that read() doesn't race mount-update (at best, read()s might keep dirtying inodes, so there would be a problem setting fs_ronly atomically with completing the sync).

Most other file systems are primitive or broken in this area. ext2fs is at the level of ufs_vnops.c 1.182. msdosfs is at level before 4.4BSD-Lite (it still uses its clone of ITIMES() and looks more like the Net/2 ffs than the 4.4BSD one). But most other file systems are Giant-locked, so they don't have the IN_ACCESS races.

Bruce