17 messages in ru.sysoev.nginxRe: Nginx + fair load balancer patch ...
FromSent OnAttachments
Alexander StauboFeb 29, 2008 1:39 am 
Igor SysoevFeb 29, 2008 2:45 am 
Joshua SierlesFeb 29, 2008 2:51 am 
Igor SysoevFeb 29, 2008 3:25 am 
Grzegorz NosekFeb 29, 2008 3:53 am 
Alexander StauboMar 13, 2008 4:29 pm 
Grzegorz NosekMar 14, 2008 5:50 am 
Alexander StauboMar 28, 2008 7:39 am 
Grzegorz NosekMar 28, 2008 8:11 am 
Alexander StauboMar 28, 2008 8:27 am 
Grzegorz NosekMar 28, 2008 8:39 am 
Alexander StauboMar 28, 2008 8:53 am 
Alexander StauboApr 16, 2008 2:16 pm 
Grzegorz NosekApr 17, 2008 7:16 am 
Grzegorz NosekApr 18, 2008 6:02 am 
Andy VerprauskusJun 5, 2008 11:33 am 
Grzegorz NosekJun 13, 2008 12:01 am 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Nginx + fair load balancer patch loopingActions...
From:Alexander Staubo (alex@public.gmane.org)
Date:Mar 28, 2008 7:39:31 am
List:ru.sysoev.nginx

On 3/14/08, Grzegorz Nosek
<grze@public.gmane.org> wrote:

If it doesn't kill your I/O, it might be useful (as an insight into WTF the fair balancer is choosing). I think that the most useful would be capturing debug_http logs (only those with [upstream_fair] should be enough), but that generates a truly massive amount of data.

Well, it happened again. What is interesting -- and I am pretty sure this is what happened before -- is that the errant process is actually a worker shutting down:

root 4746 0.0 0.0 18316 1356 ? Ss Mar25 0:00 nginx: master process /usr/sbin/nginx www-data 4301 56.6 34.4 6026976 2818648 ? R Mar27 1127:20 \_ nginx: worker process is shutting down www-data 17604 0.7 0.0 18984 2132 ? S 06:27 4:10 \_ nginx: worker process www-data 17605 0.7 0.0 19048 2228 ? S 06:27 4:12 \_ nginx: worker process www-data 17606 0.7 0.0 18824 2012 ? S 06:27 3:58 \_ nginx: worker process www-data 17607 0.7 0.0 18788 1960 ? S 06:27 4:10 \_ nginx: worker process

I don't know why it's shutting down, though. It could be the log rotation job that has poked it.

Running strace -e connect on this process yields an infinite sequence of the following two lines:

connect(3, {sa_family=AF_INET, sin_port=htons(11003), sin_addr=inet_addr("...")}, 16) = -1 EINPROGRESS (Operation now in progress) connect(4, {sa_family=AF_INET, sin_port=htons(11003), sin_addr=inet_addr("...")}, 16) = -1 EINPROGRESS (Operation now in progress)

where the address being connected to is one of the back ends. I also ran a full strace, and I can send you the output privately if you like.

I have not tried the latest snapshot yet. We are still running the one from February 12th or so.

Alexander.