Messages per Month
|Subject:||vnode_free_list corruption [patch]|
|From:||Dave Chapeskie (dcha...@borderware.com)|
|Date:||Apr 14, 2000 11:20:15 am|
I've been seeing a rash of "free vnode isn't" panics lately. Some machines were panicing several times a day. Along with this we saw occasional "object inconsistent state: RPC: %d, RC: %d" messages.
I was able to replicate the problem by running multiple (~8 on a pentium 200 system with 32 MB of RAM) copies of each of the attached simple shell scripts (with all output redirected to /dev/null). It would often panic within 10-20 minutes.
I tracked the problem down to a race between getnewvnode() recycling a vnode and vhold(). I found that vhold() was calling vbusy() for a vnode with the VDOOMED flag set.
This is bad since getnewvnode() removes the vnode from the free list before setting this flag so vbusy() is calling TAILQ_REMOVE for a vnode that is not on the free list. This can easily result in corruption of the free list pointers causing future getnewvnode() calls to find active vnodes that it thinks are on the free list.
I added a panic in vbusy() if VDOOMED is set and this hit quite often during my tests. Typically the call chain looked something like:
ffs_truncate ffs_indirtrunc getblk bgetvp vhold vbusy panic
With ffs_truncate often being called due to rename(2) or unlink(2).
I managed to solve the problem here by adding a VOP_ISLOCKED(vp) check to getnewvnode() and skipping such vnodes instead of trying to recycle them. From my searches of the mailing lists it appears I'm not the first one to think of this but apparently this isn't guaranteed to work for all files system types. I just know it works for the FFS problems I was seeing.
At a minimum I'd highly recommend that someone commit a panic to vbusy() for vnodes with VDOOMED set since letting it continue if that flag is set can and does result in the corruption of the vnode_free_list. I'd also recommend the addition of the VOP_ISLOCKED() check to getnewvnode() even if it doesn't work for all file system types it will help in some (most?) cases. A patch for CURRENT is attached.
-- Dave Chapeskie Senior Software Engineer Borderware Technologies Inc.