atom feed37 messages in org.freebsd.freebsd-fsFile remove problem
FromSent OnAttachments
David CecilNov 29, 2007 4:01 pm 
Bill VermillionNov 29, 2007 4:27 pm 
David CecilNov 29, 2007 4:30 pm 
Julian ElischerNov 29, 2007 4:49 pm 
Bruce EvansNov 29, 2007 5:22 pm 
David CecilNov 29, 2007 5:38 pm 
Matthew D. FullerNov 29, 2007 8:05 pm 
Bruce EvansNov 29, 2007 8:58 pm 
David CecilNov 29, 2007 9:26 pm 
Bruce EvansNov 29, 2007 9:44 pm 
Bruce EvansNov 29, 2007 10:02 pm 
Kostik BelousovNov 29, 2007 10:03 pm 
David CecilNov 29, 2007 11:14 pm 
Bruce EvansNov 30, 2007 8:33 am 
Matthew D. FullerNov 30, 2007 9:44 am 
David CecilNov 30, 2007 4:42 pm 
Bruce EvansNov 30, 2007 6:01 pm 
Bruce EvansNov 30, 2007 6:23 pm 
David CecilNov 30, 2007 8:27 pm 
Don LewisNov 30, 2007 11:26 pm 
Don LewisNov 30, 2007 11:26 pm 
Kostik BelousovDec 1, 2007 12:07 am 
Bruce EvansDec 1, 2007 3:34 am 
Bruce EvansDec 1, 2007 8:07 am 
Bruce EvansDec 1, 2007 10:33 am 
Don LewisDec 1, 2007 2:07 pm 
Don LewisDec 1, 2007 2:14 pm 
Bruce EvansDec 1, 2007 7:56 pm 
Kostik BelousovDec 1, 2007 10:14 pm 
Bruce EvansDec 2, 2007 12:35 am 
Bruce EvansDec 2, 2007 1:07 am 
David CecilDec 2, 2007 2:10 pm 
Don LewisDec 2, 2007 2:53 pm 
Julian ElischerDec 2, 2007 3:49 pm 
Bruce EvansDec 2, 2007 8:03 pm 
Don LewisDec 2, 2007 9:17 pm 
Bruce EvansDec 3, 2007 2:11 am 
Subject:File remove problem
From:David Cecil (davi@nokia.com)
Date:Nov 29, 2007 11:14:28 pm
List:org.freebsd.freebsd-fs

I've determined the following for the scenario I have. These steps are executed during the boot cycle, and I reproduce the problem about 1 in 5-10 times: 1. mount -u -w / 2. rm -f /etc/myfile 3. mount -u -o ro /

1. finished Remounted R/W 2. started ufs_remove 786 ffs_truncate 268 ffs_update 87 ffs_update 92 ffs_update 99 ffs_update 140 ffs_update 87 ffs_update 92 ffs_update 99 ffs_update 140 2. finished: Removed file 3. Finished Remounted R/O

Note that line 140 in ffs_update is the call to bdwrite, not bwrite.

Investigations ongoing...

Dave

ext Bruce Evans wrote:

On Fri, 30 Nov 2007, David Cecil wrote:

Thanks Bruce.

Actually, I had found the same problem, and I came up with the first line of your patch (adding IN_MODIFIED) myself, but I still saw the problem. I

Yes, it's not that. Testing reminded me that there is normally a VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long for unlink (it can only live long for open files).

Testing shows that the problem is easy to reproduce and often partially detected before it becomes fatal. I saw something like the following:

after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining after touch a; rm a; unmount -- no problem with unmount after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link... after touch a; ; rm a; mount -u o ro -- worked once without soft updates but seemed to be responsible for a soft update panic later after touch a; ; rm a; mount -u o ro -- usually fails with soft updates; the error is detected in various ways: under ~5.2, mount -u prints "/f: update error: blocks 0 files 1" but succeeds under -current, mount -u fails and a subroutine prints "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c" However, mount -u apparently cannot afford to fail at this poing since it has committed to succeeding -- further mount -u's and unmounts fail and it takes a reboot to reach an fsck that can fix the problem.

mount -u seems to do some things right: at least under -current: - it calls ffs_sync() and thus ffs_update() with waitfor != 0. - IN_MODIFIED is usually already set in ffs_update(). - softdep_update_inode_inodeblock() in ffs_update() seems to make null changes. That doesn't seem right -- shouldn't it update the link count and finish removing the file?... I just noticed that ufs_inactive() handles some of this. - it calls softdep_flushfiles() after doing the sync. This doesn't seem to touch the inode. - apparently, softdep_flushfiles() fails in -current, while in ~5.2 it bogusly succeeds and then code just after it is called detects a problem but doesn't handle it.

didn't pick up on the need for the second line (else if (DOINGASYNC(dvp)) {) though. It's a default mount, so I don't understand how that will help, i.e. it won't be an async mount, right?

Ignore that. It is for async mounts, to make them unconditionally async.

One more point to address Julian's question, the partition is not mounted with soft updates.

Interesting. I saw no sign of the problem without soft updates except a panic later after enabling soft updates. I was running fsck a lot but may have forgotten one since no error was detected. The problem should be easier to understand if it affects non-soft-updates.

[Context lost to top posting]