| From | Sent On | Attachments |
|---|---|---|
| David Cecil | Nov 29, 2007 4:01 pm | |
| Bill Vermillion | Nov 29, 2007 4:27 pm | |
| David Cecil | Nov 29, 2007 4:30 pm | |
| Julian Elischer | Nov 29, 2007 4:49 pm | |
| Bruce Evans | Nov 29, 2007 5:22 pm | |
| David Cecil | Nov 29, 2007 5:38 pm | |
| Matthew D. Fuller | Nov 29, 2007 8:05 pm | |
| Bruce Evans | Nov 29, 2007 8:58 pm | |
| David Cecil | Nov 29, 2007 9:26 pm | |
| Bruce Evans | Nov 29, 2007 9:44 pm | |
| Bruce Evans | Nov 29, 2007 10:02 pm | |
| Kostik Belousov | Nov 29, 2007 10:03 pm | |
| David Cecil | Nov 29, 2007 11:14 pm | |
| Bruce Evans | Nov 30, 2007 8:33 am | |
| Matthew D. Fuller | Nov 30, 2007 9:44 am | |
| David Cecil | Nov 30, 2007 4:42 pm | |
| Bruce Evans | Nov 30, 2007 6:01 pm | |
| Bruce Evans | Nov 30, 2007 6:23 pm | |
| David Cecil | Nov 30, 2007 8:27 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Don Lewis | Nov 30, 2007 11:26 pm | |
| Kostik Belousov | Dec 1, 2007 12:07 am | |
| Bruce Evans | Dec 1, 2007 3:34 am | |
| Bruce Evans | Dec 1, 2007 8:07 am | |
| Bruce Evans | Dec 1, 2007 10:33 am | |
| Don Lewis | Dec 1, 2007 2:07 pm | |
| Don Lewis | Dec 1, 2007 2:14 pm | |
| Bruce Evans | Dec 1, 2007 7:56 pm | |
| Kostik Belousov | Dec 1, 2007 10:14 pm | |
| Bruce Evans | Dec 2, 2007 12:35 am | |
| Bruce Evans | Dec 2, 2007 1:07 am | |
| David Cecil | Dec 2, 2007 2:10 pm | |
| Don Lewis | Dec 2, 2007 2:53 pm | |
| Julian Elischer | Dec 2, 2007 3:49 pm | |
| Bruce Evans | Dec 2, 2007 8:03 pm | |
| Don Lewis | Dec 2, 2007 9:17 pm | |
| Bruce Evans | Dec 3, 2007 2:11 am |
| Subject: | File remove problem | |
|---|---|---|
| From: | David Cecil (davi...@nokia.com) | |
| Date: | Nov 29, 2007 9:26:14 pm | |
| List: | org.freebsd.freebsd-fs | |
ext Bruce Evans wrote:
On Fri, 30 Nov 2007, David Cecil wrote:
Thanks Bruce.
Actually, I had found the same problem, and I came up with the first line of your patch (adding IN_MODIFIED) myself, but I still saw the problem. I
Yes, it's not that. Testing reminded me that there is normally a VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long for unlink (it can only live long for open files).
Testing shows that the problem is easy to reproduce and often partially detected before it becomes fatal. I saw something like the following:
after touch a; ln a b; rm a; unmount -- no problem with 1 link remaining after touch a; rm a; unmount -- no problem with unmount after touch a; ln a b; rm a; mount -u o ro -- no problem with 1 link... after touch a; ; rm a; mount -u o ro -- worked once without soft updates but seemed to be responsible for a soft update panic later after touch a; ; rm a; mount -u o ro -- usually fails with soft updates; the error is detected in various ways: under ~5.2, mount -u prints "/f: update error: blocks 0 files 1" but succeeds under -current, mount -u fails and a subroutine prints "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c" However, mount -u apparently cannot afford to fail at this poing since it has committed to succeeding -- further mount -u's and unmounts fail and it takes a reboot to reach an fsck that can fix the problem.
mount -u seems to do some things right: at least under -current: - it calls ffs_sync() and thus ffs_update() with waitfor != 0.
Do you know it calls it for this vnode? I'm going to try and verify that.
- IN_MODIFIED is usually already set in ffs_update(). - softdep_update_inode_inodeblock() in ffs_update() seems to make null changes. That doesn't seem right -- shouldn't it update the link count and finish removing the file?... I just noticed that ufs_inactive() handles some of this. - it calls softdep_flushfiles() after doing the sync. This doesn't seem to touch the inode. - apparently, softdep_flushfiles() fails in -current, while in ~5.2 it bogusly succeeds and then code just after it is called detects a problem but doesn't handle it.
One more point to address Julian's question, the partition is not mounted with soft updates.
Interesting. I saw no sign of the problem without soft updates except a panic later after enabling soft updates. I was running fsck a lot but may have forgotten one since no error was detected. The problem should be easier to understand if it affects non-soft-updates.
It is not especially easy to reproduce. The only reliable mechanism I have involves mounting rw, removing a file, and remount ro during the boot cycle. I can only guess it's timing related and this increases the chance of reproducing the problem.





