9 messages in ru.sysoev.nginxRe: nginx, php-fpm and 502 errors
FromSent OnAttachments
Jure PečarNov 20, 2007 7:16 am 
Igor SysoevNov 20, 2007 8:16 am 
Jure PečarNov 20, 2007 10:45 am 
Igor SysoevNov 20, 2007 11:30 am 
Jure PečarNov 21, 2007 12:04 am 
Denis F. LatypoffNov 21, 2007 12:22 am 
Jure PečarNov 21, 2007 12:27 am 
Igor SysoevNov 21, 2007 12:45 am 
Denis F. LatypoffNov 21, 2007 12:45 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: nginx, php-fpm and 502 errorsActions...
From:Igor Sysoev (is-G@public.gmane.org)
Date:Nov 20, 2007 8:16:16 am
List:ru.sysoev.nginx

On Tue, Nov 20, 2007 at 04:17:00PM +0100, Jure Pe??ar wrote:

I'm trying to understand why some of our production nginx/php-fpm servers
frequently return 502 errors. At that time "writev() failed (107: Transport
endpoint is not connected) while sending request to upstream" is logged into
error log.

Runnng strace -e connect,writev on nginx worker process frequently shows:

connect(75, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = 0 writev(75, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\6\260\0\0\17"...,
1752}], 1) = 1752 connect(759, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = 0 writev(759, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\5\377\1\0\17"...,
1576}], 1) = 1576 connect(940, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = -1 EAGAIN
(Resource temporarily unavailable) connect(996, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = -1 EAGAIN
(Resource temporarily unavailable) connect(391, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = -1 EAGAIN
(Resource temporarily unavailable) connect(1120, {sa_family=AF_FILE, path="/tmp/php-fpm.sock"}, 110) = -1 EAGAIN
(Resource temporarily unavailable) writev(996, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\6O\1\0\0172S"...,
1656}], 1) = -1 ENOTCONN (Transport endpoint is not connected) writev(758, [{"HTTP/1.1 502 Bad Gateway\r\nServer"..., 157}], 1) = 157 writev(940, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\6\320\0\0\17"...,
1784}], 1) = -1 ENOTCONN (Transport endpoint is not connected) writev(658, [{"HTTP/1.1 502 Bad Gateway\r\nServer"..., 157}], 1) = 157

Could these EAGAIN from connect be related to ENOTCONN from writev?

Partially. Usually connect() to an unix stream is established at once because it's localhost. It's seem there is shortage of some resources.

By the way, it's starnge that Linux returns EAGAIN instead of EINPROGRESS. It's also strange that Linux does not return ENOTCONN error via getsockopt(SO_ERROR).

The frequency of these increase if I decrease the number of php-cgi processes
and vice-versa. If I have enough php-cgi processes, they do not occur at all.
But this "enough" number is way too high for my taste (I ended with 64).

What exactly happens that writev looses connection in the middle of the writing?

It's not the middle. It's first FastCGI packet: "\1\1\0\1\0\10\0\0..." The scenario is following:

connect() returns EAGAIN, nginx adds socket to epoll epoll reports about some condition (may be an error) on the socket nginx writev()s FastCGI request and the writev() returns ENOTCONN.