39 messages in ru.sysoev.nginxRe: Nginx stops sending file after ~1...
FromSent OnAttachments
Yo'av MosheJan 11, 2009 5:43 am 
Jure PečarJan 11, 2009 9:50 am 
Yo'av MosheJan 11, 2009 10:46 am 
Jure PečarJan 11, 2009 11:07 am 
Yo'av MosheJan 11, 2009 11:22 am 
Jure PečarJan 11, 2009 11:35 am 
Igor SysoevJan 12, 2009 2:28 am 
Yo'av MosheJan 18, 2009 3:35 pm 
Dave CheneyJan 18, 2009 4:24 pm 
Yo'av MosheJan 19, 2009 2:49 am 
Just MarcJan 19, 2009 4:25 am 
Yo'av MosheJan 19, 2009 4:59 pm 
Igor SysoevJan 19, 2009 11:23 pm.Other
Yo'av MosheJan 20, 2009 1:44 am 
Igor SysoevJan 20, 2009 2:08 am 
Yo'av MosheJan 20, 2009 10:25 am 
jeff emmingerJan 20, 2009 11:31 am 
Kiril AngovJan 20, 2009 11:38 am 
Yo'av MosheJan 20, 2009 3:52 pm 
ResicowJan 20, 2009 4:17 pm 
Igor SysoevJan 20, 2009 9:26 pm 
Yo'av MosheJan 23, 2009 4:34 am 
Igor SysoevJan 23, 2009 5:04 am 
Yo'av MosheJan 25, 2009 8:26 am 
Igor SysoevJan 25, 2009 9:33 am 
Yo'av MosheJan 25, 2009 11:04 am 
Igor SysoevJan 25, 2009 11:20 am 
Yo'av MosheJan 26, 2009 4:05 am.gz
Yo'av MosheJan 27, 2009 12:42 am 
Igor SysoevJan 27, 2009 12:50 am 
Igor SysoevJan 27, 2009 1:00 am 
Yo'av MosheJan 27, 2009 1:24 am 
Igor SysoevJan 27, 2009 1:39 am.nodelay
Yo'av MosheJan 27, 2009 1:58 am 
Igor SysoevJan 27, 2009 2:05 am 
Yo'av MosheJan 27, 2009 2:24 am 
Dave CheneyJan 27, 2009 3:03 am 
Igor SysoevJan 27, 2009 3:17 am 
Jure PečarJan 28, 2009 1:52 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Nginx stops sending file after ~1.5mb ?Actions...
From:Yo'av Moshe (bje@gmail.com)
Date:Jan 25, 2009 8:26:35 am
List:ru.sysoev.nginx

Igor, I tried using select instead of poll, but I get the exact same problem.

Check my strace: http://pastebin.com/m65056ec0

I get the same thing with rtsig too.

Hints? Again, file is at http://www.noal.org.il/static/temp/hozer.pdf. Server is using select now.

Thanks... Yo'av.

On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <is@rambler-co.ru> wrote:

On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe wrote:

Thanks Igor. I don't want to be rude, but I find it hard to believe that it's a kernel bug.

First, I'm using the default CentOS 5.2 kernel, it's seems to me pretty odd that there's such a bug in it. CentOS is being used on a lot of servers, some of them must be running nginx.

Second, I tried serving the same file with thttpd, and it runs just fine. http://195.28.180.147:40/temp/hozer.pdf Note it's the exact same file which I cannot get from nginx: http://www.noal.org.il/static/temp/hozer.pdf

How come thttpd can serve the file just fine? If it's a kernel bug, I thought it would happen with all of my servers. If it's using some different way of sending the files which does not depend on that epoll call - is there any way to configure nginx to use that as well?

I do not look modern thttpd, but according its change log, it has no epoll support. You may try different ways in nginx: you need to build it

--with-rtsig_module --with-select_module --with-poll_module

and then choose method:

events { use select; #use poll; #use rtsig; }

Also, epoll has two modes: level triggered (default) and edge triggered (EPOLLET). nginx uses more effective edge triggered mode, lighttpd uses level triggered one. The bug may be in ET mode only.

As to probabilty of kernel bugs: I saw them in FreeBSD (kqueue, sendfile), Linux (epoll), Solaris (event ports), and MacOSX (kqueue, sendfile).

Can I do anything to confirm that it's a kernel bug? If it is, I should probably contact CentOS's maintainers.

Your straces confirms this: nginx added socket to epoll and did not deleted it, nevertheless epoll does not send event.

Thank you, again. Yo'av.

On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <is@rambler-co.ru> wrote:

On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av Moshe wrote:

Igor, here's the new strace. http://pastebin.com/m5129c54b

Does the fact it works in low download rates still makes you think it's a kernel issue?

Yes, this is seems like a kernel issue. Kernel excatly said (with EAGAIN) that there is no place to output data, but never returned a new notification via epoll_wait() about free space:

21420 01:39:43 <... epoll_wait resumed> {{EPOLLOUT, {u32=153137960, u64=13811628390360331048}}}, 512, 60000) = 1 21420 01:39:43 gettimeofday({1232494783, 413764}, NULL) = 0 21420 01:39:43 sendfile64(20, 21, [1645625], 304605) = 40320 21420 01:39:43 sendfile64(20, 21, [1685945], 264285) = -1 EAGAIN (Resource temporarily unavailable) 21420 01:39:43 epoll_wait(11, <unfinished ...>

This is probably some race condition inside kernel, that does not appear on low speed download.

Thank you, really! And thanks, Kiril and Jeff, for confirming that!

Yo'av.

On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <kupo@gmail.com> wrote:

wget http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:36:00-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il|195.28.180.147|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1950230 (1.9M) [application/pdf] Saving to: `barvazi2.pdf'

81% [=====================================> ] 1,596,665 --.-K/s eta 5s

wget -c http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:38:28-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il|195.28.180.147|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 1950230 (1.9M), 305960 (299K) remaining [application/pdf] Saving to: `barvazi2.pdf'

100%[+++++++++++++++++++++++++++++++++++++++=======>] 1,950,230 295K/s in 1.0s

2009-01-20 21:38:30 (295 KB/s) - `barvazi2.pdf' saved [1950230/1950230]

On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger < jemm@gmail.com> wrote:

Could anyone confirm that this works: wget --limit-rate=3500 http://www.noal.org.il/static/temp/barvazi2.pdf while this doesn't: wget http://www.noal.org.il/static/temp/barvazi2.pdf

Confirmed