atom feed27 messages in org.freebsd.freebsd-hackersNFS client/buffer cache deadlock
FromSent OnAttachments
Brian Fundakowski FeldmanApr 14, 2005 10:07 pm 
Marc OlzheimApr 15, 2005 6:21 am 
Brian Fundakowski FeldmanApr 15, 2005 8:21 am 
Marc OlzheimApr 18, 2005 2:25 am 
Marc OlzheimApr 19, 2005 6:32 am 
Brian Fundakowski FeldmanApr 19, 2005 8:18 am 
Marc OlzheimApr 19, 2005 9:02 am 
Marc OlzheimApr 19, 2005 9:09 am 
Brian Fundakowski FeldmanApr 19, 2005 9:17 am 
Brian Fundakowski FeldmanApr 19, 2005 1:48 pm 
Marc OlzheimApr 20, 2005 7:04 am 
Brian Fundakowski FeldmanApr 20, 2005 7:26 am 
Marc OlzheimApr 20, 2005 7:39 am 
Brian Fundakowski FeldmanApr 20, 2005 8:22 am 
Marc OlzheimApr 20, 2005 8:35 am 
Brian Fundakowski FeldmanApr 20, 2005 8:54 am 
Jilles TjoelkerApr 20, 2005 10:12 am 
Brian Fundakowski FeldmanApr 20, 2005 10:31 am 
Brian Fundakowski FeldmanApr 20, 2005 11:03 am 
Marc OlzheimApr 20, 2005 11:03 am 
Dag-Erling SmørgravApr 21, 2005 1:36 am 
Garrett WollmanApr 21, 2005 4:50 am 
Garrett WollmanApr 21, 2005 4:51 am 
Garrett WollmanApr 22, 2005 5:49 am 
Brian Fundakowski FeldmanApr 22, 2005 8:12 am 
Brian Fundakowski FeldmanApr 22, 2005 8:38 am 
Garrett WollmanApr 23, 2005 5:09 am 
Subject:NFS client/buffer cache deadlock
From:Brian Fundakowski Feldman (gre@freebsd.org)
Date:Apr 20, 2005 8:54:29 am
List:org.freebsd.freebsd-hackers

On Wed, Apr 20, 2005 at 05:35:28PM +0200, Marc Olzheim wrote:

On Wed, Apr 20, 2005 at 11:20:38AM -0400, Brian Fundakowski Feldman wrote:

Reads should be totally unaffected...

The server was misbehaving. Fixed. :-)

Btw.: I'm not sure write(),writev() and pwrite() are allowed to do short writes on regular files... ?

Our manpage is incorrect; POSIX states that they are (see earlier e-mail). There really is no alternative -- we simply can't build an NFS transaction larger than our buffer cache can accomodate. Note that short wries won't happen for normal buffer sizes, only excessively large ones. I really don't believe that writev() is meant to be used so that you can write gigantic data structures in a single transaction...

Ah, I was reading the SUSv2 page:

http://www.opengroup.org/onlinepubs/009695399/functions/write.html

instead of the POSIX version.

But in neither of those I can extrude the fact that it can return with result < nbyte, without it being a permanent condition. What phrase makes you conclude that it can ?

This specific issue is not clear-cut; the best thing to do lies somewhere within the range of these scenarios:

"If a write() requests that more bytes be written than there is room for (for example, [XSI] [Option Start] the process' file size limit or [Option End] the physical end of a medium), only as many bytes as there is room for shall be written. For example, suppose there is space for 20 bytes more in a file before reaching a limit. A write of 512 bytes will return 20. The next write of a non-zero number of bytes would give a failure return (except as noted below)."

"When attempting to write to a file descriptor (other than a pipe or FIFO) that supports non-blocking writes and cannot accept the data immediately:

* If the O_NONBLOCK flag is clear, write() shall block the calling thread until the data can be accepted.

* If the O_NONBLOCK flag is set, write() shall not block the thread. If some data can be written without blocking the thread, write() shall write what it can and return the number of bytes written. Otherwise, it shall return -1 and set errno to [EAGAIN]."

"[ENOBUFS] Insufficient resources were available in the system to perform the operation."

I think the first is more useful behavior than the last. Supporting it should be exactly the same as supporting what happens if the actual filesystem fills up. In this case, the filesystem is being requested to write more "than there is room for."