|Subject:||NFS lockup when copying a "special" file|
|From:||Oliver Fromme (ol...@lurza.secnetix.de)|
|Date:||Aug 17, 2006 4:59:05 pm|
This doesn't seem to be amd64-specific, so I copy freebsd-fs.
David O'Brien <obr...@freebsd.org> wrote:
On Thu, Jul 13, 2006 at 08:14:34PM +0200, Oliver Lehmann wrote:
nfs server www:/mnt/space/www: not responding nfs server www:/mnt/space/www: not responding nfs server www:/mnt/space/www: not responding
That could be an MTU problem. If you have an unusual MTU (check ifconfig(8) output), try lowering it to 1500. If it's already at 1500, try lowering it even further, e.g. to 1492.
I get this all the time now. I started sometime in 2006. The only "fix" I've found is to use NFS over TPC vs. UDP. For me my NFS server is i386/bge(4) and the clients are i386/bge, amd64/bge, amd64/nve, sparc64/hme.
I'm had exactly the same problem on an i386/bge client running RELENG_6 of last week (2006-08-09). I can provide dmesg, kernel and other info if required, just ask me. Server is a NetApp Filer. FreeBSD 4.x clients don't have any problem.
Symptoms: When trying to copy a certain file to the NFS directory, the whole share hung, the cp(1) process was in diskwait state ("D" in ps, with mwchan "bo_wwa"), and only a reboot could get rid of the hanging share. However, it was possible to mount the very same share a second time to a different mountpoint and continue working there, until you tried to copy that certain file there again. The file size is 1349 bytes, and the contents don't matter, i.e. the problem could be reproduced even with dd: $ dd if=/dev/zero of=/nfsshare/foo bs=1349 count=1 After a few experiments I found out that sizes between 1349 and 1352 exhibit the problem.
Solution: Switching to TCP isn't an option because of our setup. However, I noticed that forcing NFSv2 (default is NFSv3) also seems to solve the problem. Since NFSv2 is limited to 2GB file size, I continued looking for other solutions. Then I noticed that the VLAN's parent interface (bge0) had an MTU of 1504 (the NFS mount is from a VLAN on that interface). I have no idea why it was set to 1504. When I lowered it to 1500, the problem was gone.
Best regards Oliver
-- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way.
C++: "an octopus made by nailing extra legs onto a dog" -- Steve Taylor, 1998