|Andrew Reilly||Apr 24, 2012 7:29 am|
|Peter Maloney||Apr 24, 2012 7:37 am|
|Andrew Reilly||Apr 24, 2012 4:21 pm|
|Bob Friesenhahn||Apr 25, 2012 6:58 am|
|Peter Maloney||Apr 25, 2012 8:36 am|
|Andrew Reilly||Apr 25, 2012 8:32 pm|
|Andrew Reilly||Apr 25, 2012 8:44 pm|
|Bob Friesenhahn||Apr 26, 2012 7:23 am|
|Peter Jeremy||Apr 26, 2012 2:06 pm|
|Andrew Reilly||Apr 28, 2012 5:05 am|
|Subject:||Re: Odd file system corruption in ZFS pool|
|From:||Andrew Reilly (arei...@bigpond.net.au)|
|Date:||Apr 25, 2012 8:32:51 pm|
On Wed, Apr 25, 2012 at 05:36:45PM +0200, Peter Maloney wrote:
On 04/25/2012 01:21 AM, Andrew Reilly wrote:
On Tue, Apr 24, 2012 at 04:37:45PM +0200, Peter Maloney wrote: Rm and rm -r doesn't work. Even as root, rm -rf Maildir.bad returns a lot of messages of the form: foo/bar: no such file or directory. The result is that I now have a directory that contains no "good" files, but a concentrated collection of breakage.
That sucks. But there is one thing I forgot... you need to run the "rm" command immediately after scrub. (no export, reboot, etc. in between).
I believe that I've tried that, and it still didn't work. The system is behaving as though the directory has a file with an illegal or unallocated inode number. Directories don't seem to be amenable to the old-school techniques of looking at them with hexdump or whatever, either, so I can't tell more than that. The names exist in the directory, but ask for any info that would be in the inode and you get an error.
Is your broken stuff limited to a single dataset, or the whole pool? You could try making a second dataset, copying good files to it, and destroying the old one (losing all your snapshots on that dataset, of course).
Seems to be only associated with the filesystem, rather than the pool. Well, my "tank" pool, (the raidz) shows zpool scrub making 0 fixes but there being unrecoverable erorrs in tank/home:<0x0>, but my backup file system (the one I send snapshot deltas to) shows exactly the same errors with no tank problems. (Hmm. Hold that thought: I haven't actually tried a scrub on the backup file system. It's just zpool status that shows no errors. Running a scrub now. Will take a while: it's a fairly slow USB2-connected disk. Zpool status says expect 10+ hours...)
Here is another thread about it: http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027902.html
That does seem to be the same situation that I'm seeing.
And this message looks interesting: "but if you search on the lists for up to a year or so, you'll find some useful commands to inspect and destroy corrupted objects." http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027926.html
Not sure about destroying corrupted objects smaller than at the file-system level. It's annoying: if I could just remove these files, I'd be happy, because I've already restored them from the backup. Instead, it is starting to look as though the only way to proceed is to destroy my home filesystem, recreate it and repopulate it from the backup (using something like rsync that doesn't also replicate the filesystem damage.) That sounds like a lot of down-time on what is a fairly busy system.
And "I tried your suggestion and ran the command "zdb -ccv backups" to try and check the consistency of the troublesome "backups" pool. This is what I ended up with:"
But they don't say what the solution is (other than destroy the pool, and I would think the dataset could be enough since the filesystem is corrupt, but maybe not the pool).
FYI: I've been running "zdb -ccv bkp2pool" on my backup disk, to see if it has anything to say about the dangling directory entries. Problem is that it currently has a process size of about 5G (RES 2305M) on a system with 4G of physical RAM: it's paging like crazy. Probably unhelpful.
I have another zpool scrub running at the moment. We'll see if that is able to clean it up, but it hasn't had much luck in the past.
Note that none of these broken files or directories show up in the zpool status -v error list. That just contains the one entry for the zfs root directory: tank/home:<0x0>
I doubt scrubbing more than once (repeating the same thing and expecting different results) should fix anything. But if you scrubbed on OpenIndiana, it would at least be different. And if it worked, you could file a PR about it.
Some of the (perhaps Solaris related) ZFS web pages I've been reading lately suggested that several zpool scrub passes were beneficial. Certainly I seem to have hit a local minimum on the goodness curve at the moment.
Thanks for the suggestions. Appreciated.
_______________________________________________ free...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "free...@freebsd.org"