atom feed27 messages in ru.sysoev.nginxRe: Weird 0.8.11.1 connections spike
FromSent OnAttachments
Jeff WaughAug 29, 2009 5:50 am 
Igor SysoevAug 29, 2009 6:48 am 
Jeff WaughAug 29, 2009 7:32 am 
Igor SysoevAug 29, 2009 8:08 am 
Jeff WaughAug 29, 2009 8:45 am 
Igor SysoevAug 29, 2009 9:40 am 
Jim OhlsteinAug 30, 2009 7:59 am 
Igor SysoevAug 30, 2009 8:31 am 
Jim OhlsteinAug 30, 2009 8:52 am 
Igor SysoevAug 30, 2009 9:29 am 
Jim OhlsteinAug 30, 2009 7:55 pm 
Jeff WaughAug 30, 2009 8:03 pm 
Igor SysoevAug 30, 2009 10:23 pm 
Igor SysoevAug 30, 2009 10:46 pm 
Jeff WaughAug 30, 2009 11:36 pm 
Jeff WaughAug 31, 2009 12:14 am 
Igor SysoevAug 31, 2009 12:24 am 
Jeff WaughAug 31, 2009 12:55 am 
Igor SysoevAug 31, 2009 1:34 am 
Jeff WaughAug 31, 2009 1:36 am 
Jeff WaughAug 31, 2009 2:03 am 
Igor SysoevAug 31, 2009 3:37 am 
Igor SysoevAug 31, 2009 4:27 am.Other
Jim OhlsteinAug 31, 2009 5:13 am 
Igor SysoevAug 31, 2009 5:27 am 
Jeff WaughAug 31, 2009 6:53 am 
Igor SysoevAug 31, 2009 7:07 am 
Subject:Re: Weird 0.8.11.1 connections spike
From:Igor Sysoev (is@rambler-co.ru)
Date:Aug 30, 2009 10:23:50 pm
List:ru.sysoev.nginx

On Sun, Aug 30, 2009 at 10:55:57PM -0400, Jim Ohlstein wrote:

Igor Sysoev wrote:

On Sun, Aug 30, 2009 at 11:52:51AM -0400, Jim Ohlstein wrote:

2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in connection 1015 2009/08/30 10:29:00 [alert] 2042#0: aborting

Other servers seem to be running fine including ones with busy sites. For the moment I have reverted that VPS to 0.8.10.

Could you do the following:

1) enable coredumps 2) set in nginx.conf: debug_points abort; 3) reconfigure nginx, if there are open connections, then nginx creates coredump on exit

Do you want nginx reconfigured "--with-debug" or is there another option you need?

No. The coredump is enough, it's just should have debug info (gcc -g option).

4) look in log for alerts: open socket #... left in connection NN 5) run "gdb /path/to/nginx /path/to/core", then

p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count

where NN is NN from log message.

Unfortunately I don't think it gave too much information.

I watched connections gradually rise. I have ulimit -n set to 1024, two workers, 1024 connections/worker. As connections neared 2048 the site became unresponsive and load went up dramatically.

I began to see the same errors in the log. Nginx did not abort on its own so I killed it after a few minutes. I then saw the same entries in the error log like:

2009/08/30 22:22:40 [alert] 6118#0: open socket #980 left in connection 993

nginx aborts only when you send -HUP and it found leaked connections.

I ran gdb on the core but this was the output from three connections:

[root@mars proc]# gdb /vz/private/101/fs/root/usr/local/sbin/nginx ./kcore GNU gdb Red Hat Linux (6.5-37.el5_2.2rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1".

warning: core file may not match specified executable file. Core was generated by `ro root=LABEL=/ console=tty0 console=ttyS1,19200n8 debug'. #0 0x0000000000000000 in ?? () (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->uri Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->main->count Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->main->count Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->uri Cannot access memory at address 0x130 (gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->main->count Cannot access memory at address 0x130 (gdb) quit [root@mars proc]#

During this time there were hundreds of connections in "CLOSE_WAIT" state. They gradually increased to just over 1000 when it crashed.

Sorry, I've mistaked:

p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->uri p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->main->count