|Konstantin Belousov||Aug 18, 2006 4:48 pm|
|Tor Egge||Aug 18, 2006 8:19 pm|
|Peter Holm||Aug 20, 2006 5:28 pm|
|Tor Egge||Aug 21, 2006 1:21 pm|
|Eric Anderson||Aug 21, 2006 1:37 pm|
|Kostik Belousov||Aug 21, 2006 1:38 pm|
|Kostik Belousov||Aug 21, 2006 1:44 pm|
|Tor Egge||Aug 21, 2006 2:09 pm|
|Bruce Evans||Aug 22, 2006 9:15 am|
|Kostik Belousov||Aug 22, 2006 1:07 pm|
|Kostik Belousov||Aug 22, 2006 1:16 pm|
|Tor Egge||Aug 22, 2006 9:46 pm|
|Kostik Belousov||Aug 23, 2006 4:40 am|
|Kostik Belousov||Aug 23, 2006 11:07 am|
|Bruce Evans||Aug 23, 2006 2:04 pm|
|Bruce Evans||Aug 23, 2006 3:39 pm|
|Tor Egge||Aug 23, 2006 3:47 pm|
|Tor Egge||Aug 23, 2006 4:31 pm|
|Tor Egge||Aug 23, 2006 4:57 pm|
|Bruce Evans||Aug 23, 2006 6:32 pm|
|Bruce Evans||Aug 23, 2006 7:12 pm|
|Tor Egge||Aug 23, 2006 8:06 pm|
|Subject:||Deadlock between nfsd and snapshots.|
|From:||Tor Egge (Tor....@cvsup.no.freebsd.org)|
|Date:||Aug 22, 2006 9:46:57 pm|
I have a proposal.
1. Remove IN_ACCESS, IN_UPDATE, IN_CHANGE from i_flag. For each flag, introduce two new i_ fields, e.g., i_access of type timespec, and i_accessed of boolean type.
On amd64, sizeof(struct timespec) is 16 bytes and sizeof(struct boolean_t) is 4 bytes. 3 * (16 + 4) = 60 bytes extra per inode. With 100K inodes that becomes 6 MB extra memory.
I don't see why all these extra fields are needed.
2. All places that currently set IN_ACCESS, instead would increment i_accessed using the atomic ops. ufs_itimes shall update i_access under some mutex if i_accessed is greater than zero.
Protecting the existing i_flag and the timestamps with the vnode interlock when the current thread only has a shared vnode lock should be sufficient to protect against the races, removing the need for #3, #4 and #4 below.
What's left is avoiding setting IN_MODIFIED when it's unsafe, to protect against the deadlock.
3. Check the i_access instead of the IN_ACCESS.
4. ffs_update and ffs_syncvnode shall do the DIP_SET(i_atime) under the mutex from #2 before the main run and set IN_MODIFIED accordingly if i_accessed is not 0.
4. ufs_getattr shall retrieve the *time from new i_ fields under the mutex from #2 if corresponding i_ flag is set.
Basically, I want to set IN_MODIFIED i_flag (induced by IN_ACCESS and others) only under exclusive vnode lock. Moreover, i_accessed can be zeroed only under exclusive lock. This way, even shared lock on the vnode shall be enough to safely update modification times, and the times are moved to the disk often enough (at least, at the sync of the syncer vnodes).
An exclusive vnode lock isn't needed, see above. Holding an exclusive vnode lock does not make it safe to set IN_MODIFIED.
There are some constraints with regards to setting IN_MODIFIED on an inode.
If neither IN_CHANGE nor IN_UPDATE is set then it might be unsafe to set IN_MODIFIED since the file system might be suspended or in the process of being suspended with the vnode sync loop in ffs_sync() having iterated past the vnode.
If the file system is suspended then IN_MODIFIED cannot be set. If IN_MODIFIED, IN_CHANGE or IN_UPDATE is set and the file system is suspended then something is wrong.
If the file system is in the process of being suspended then IN_MODIFIED can be set at the cost of triggering a restart of the vnode sync loop in ffs_sync(). If either IN_MODIFIED, IN_CHANGE or IN_UPDATE is already set then the vnode sync loop has not reached the vnode, and a restart isn't needed.
When ufs_itimes() cannot set IN_MODIFIED then it has to either risk losing the access time update or use some mechanism to defer it (e.g. set IN_LAZYMOD or a new flag and let process_deferred_inactive() set IN_MODIFIED after the file system has been resumed).
- Tor Egge