| From | Sent On | Attachments |
|---|---|---|
| Yo'av Moshe | Jan 11, 2009 5:43 am | |
| Jure Pečar | Jan 11, 2009 9:50 am | |
| Yo'av Moshe | Jan 11, 2009 10:46 am | |
| Jure Pečar | Jan 11, 2009 11:07 am | |
| Yo'av Moshe | Jan 11, 2009 11:22 am | |
| Jure Pečar | Jan 11, 2009 11:35 am | |
| Igor Sysoev | Jan 12, 2009 2:28 am | |
| Yo'av Moshe | Jan 18, 2009 3:35 pm | |
| Dave Cheney | Jan 18, 2009 4:24 pm | |
| Yo'av Moshe | Jan 19, 2009 2:49 am | |
| Just Marc | Jan 19, 2009 4:25 am | |
| Yo'av Moshe | Jan 19, 2009 4:59 pm | |
| Igor Sysoev | Jan 19, 2009 11:23 pm | .Other |
| Yo'av Moshe | Jan 20, 2009 1:44 am | |
| Igor Sysoev | Jan 20, 2009 2:08 am | |
| Yo'av Moshe | Jan 20, 2009 10:25 am | |
| jeff emminger | Jan 20, 2009 11:31 am | |
| Kiril Angov | Jan 20, 2009 11:38 am | |
| Yo'av Moshe | Jan 20, 2009 3:52 pm | |
| Resicow | Jan 20, 2009 4:17 pm | |
| Igor Sysoev | Jan 20, 2009 9:26 pm | |
| Yo'av Moshe | Jan 23, 2009 4:34 am | |
| Igor Sysoev | Jan 23, 2009 5:04 am | |
| Yo'av Moshe | Jan 25, 2009 8:26 am | |
| Igor Sysoev | Jan 25, 2009 9:33 am | |
| Yo'av Moshe | Jan 25, 2009 11:04 am | |
| Igor Sysoev | Jan 25, 2009 11:20 am | |
| Yo'av Moshe | Jan 26, 2009 4:05 am | .gz |
| Yo'av Moshe | Jan 27, 2009 12:42 am | |
| Igor Sysoev | Jan 27, 2009 12:50 am | |
| Igor Sysoev | Jan 27, 2009 1:00 am | |
| Yo'av Moshe | Jan 27, 2009 1:24 am | |
| Igor Sysoev | Jan 27, 2009 1:39 am | .nodelay |
| Yo'av Moshe | Jan 27, 2009 1:58 am | |
| Igor Sysoev | Jan 27, 2009 2:05 am | |
| Yo'av Moshe | Jan 27, 2009 2:24 am | |
| Dave Cheney | Jan 27, 2009 3:03 am | |
| Igor Sysoev | Jan 27, 2009 3:17 am | |
| Jure Pečar | Jan 28, 2009 1:52 pm |
| Subject: | Re: Nginx stops sending file after ~1.5mb ? | |
|---|---|---|
| From: | Igor Sysoev (is...@rambler-co.ru) | |
| Date: | Jan 27, 2009 12:50:47 am | |
| List: | ru.sysoev.nginx | |
On Tue, Jan 27, 2009 at 10:42:57AM +0200, Yo'av Moshe wrote:
Note I just compiled Cherokee 0.98. It is serving the file as it should using sendfile and epoll.
Have you tried this on the same host ? Could you post strace ?
On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bje...@gmail.com> wrote:
Hey, attached is my tcpdump log. I've never used tcpdump before, so I hope the log is fine. I used grep to show only the connections from my computer while running 'wget http://www.noal.org.il/static/temp/hozer.pdf'.
Hope this will lead us somewhere...
Yo'av.
On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:
Sendfile is now off, using select, and still the same... http://pastebin.com/m2c3ea25b
I have no idea, why it may be so.
I see that select() returned event ("out [19]"):
28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 1 (out [19], left {0, 380000})
However, at some stage select() did not return event, although the event is still active ("select(20, [10 19], [19], NULL ..."):
28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0 28215 20:50:18 writev(19, [{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"..., 9676}], 1) = 9676 28215 20:50:18 pread64(20, "\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"..., 32768, 1605632) = 32768 28215 20:50:18 writev(19, [{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"..., 32768}], 1) = 32768 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 0 (Timeout)
Could you run tcpdump of the session ?
Yo'av
On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe wrote:
Igor, I tried using select instead of poll, but I get the exact same problem.
Check my strace: http://pastebin.com/m65056ec0
I get the same thing with rtsig too.
Hints? Again, file is at http://www.noal.org.il/static/temp/hozer.pdf.
Server
is
using select now.
According strace select() does not return event for socket 19 after sendfile64() sent 1651005 bytes:
25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0} <unfinished ...> 25951 18:14:27 <... select resumed> ) = 1 (out [19], left {58, 790000})
25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) = 0 25951 18:14:27 sendfile64(19, 20, [1509465], 440765 <unfinished ...> 25951 18:14:27 <... sendfile64 resumed> ) = 141540 25951 18:14:27 sendfile64(19, 20, [1651005], 299225 <unfinished ...> 25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000} <unfinished ...> 25951 18:14:28 <... select resumed> ) = 0 (Timeout)
As you have got the same issue with all methods (even with level triggered select()), I believe that the bug is in sendfile. You may try to turn it off even for the single file:
location = /static/temp/hozer.pdf { sendfile off; }
In Linux 2.6.23 sendfile() has been rewritten to use splice framework. The bug may be introduced while rewriting.
BTW, it seems that thttpd does not use sendfile.
Thanks... Yo'av.
On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe wrote:
Thanks Igor. I don't want to be rude, but I find it hard to believe that it's
a
kernel
bug.
First, I'm using the default CentOS 5.2 kernel, it's seems to me
pretty
odd
that there's such a bug in it. CentOS is being used on a lot of servers, some of them must be running nginx.
Second, I tried serving the same file with thttpd, and it runs
just
fine.
http://195.28.180.147:40/temp/hozer.pdf Note it's the exact same file which I cannot get from nginx: http://www.noal.org.il/static/temp/hozer.pdf
How come thttpd can serve the file just fine? If it's a kernel bug, I thought it would happen with all of my servers. If it's using some different way of sending the files which does
not
depend
on that epoll call - is there any way to configure nginx to use
that
as
well?
I do not look modern thttpd, but according its change log, it has no epoll support. You may try different ways in nginx: you need to
build
it
--with-rtsig_module --with-select_module --with-poll_module
and then choose method:
events { use select; #use poll; #use rtsig; }
Also, epoll has two modes: level triggered (default) and edge triggered (EPOLLET). nginx uses more effective edge triggered mode, lighttpd uses level triggered one. The bug may be in ET mode only.
As to probabilty of kernel bugs: I saw them in FreeBSD (kqueue, sendfile), Linux (epoll), Solaris (event ports), and MacOSX (kqueue, sendfile).
Can I do anything to confirm that it's a kernel bug? If it is, I should probably contact CentOS's maintainers.
Your straces confirms this: nginx added socket to epoll and did not deleted it, nevertheless epoll does not send event.
Thank you, again. Yo'av.
On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av Moshe wrote:
Igor, here's the new strace. http://pastebin.com/m5129c54b
Does the fact it works in low download rates still makes you
think
it's a
kernel issue?
Yes, this is seems like a kernel issue. Kernel excatly said
(with
EAGAIN)
that there is no place to output data, but never returned a new notification via epoll_wait() about free space:
21420 01:39:43 <... epoll_wait resumed> {{EPOLLOUT, {u32=153137960, u64=13811628390360331048}}}, 512, 60000) = 1 21420 01:39:43 gettimeofday({1232494783, 413764}, NULL) = 0 21420 01:39:43 sendfile64(20, 21, [1645625], 304605) = 40320 21420 01:39:43 sendfile64(20, 21, [1685945], 264285) = -1
EAGAIN
(Resource
temporarily unavailable) 21420 01:39:43 epoll_wait(11, <unfinished ...>
This is probably some race condition inside kernel, that does
not
appear on
low speed download.
Thank you, really! And thanks, Kiril and Jeff, for confirming that!
Yo'av.
On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <
kupo...@gmail.com>
wrote:
wget http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:36:00-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il|195.28.180.147|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1950230 (1.9M) [application/pdf] Saving to: `barvazi2.pdf'
81% [=====================================> ] 1,596,665 --.-K/s eta 5s
wget -c http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:38:28-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il|195.28.180.147|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 1950230 (1.9M), 305960 (299K) remaining [application/pdf] Saving to: `barvazi2.pdf'
100%[+++++++++++++++++++++++++++++++++++++++=======>] 1,950,230 295K/s in 1.0s
2009-01-20 21:38:30 (295 KB/s) - `barvazi2.pdf' saved [1950230/1950230]
On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger < jemm...@gmail.com> wrote:
Could anyone confirm that this works: wget --limit-rate=3500 http://www.noal.org.il/static/temp/barvazi2.pdf while this doesn't: wget http://www.noal.org.il/static/temp/barvazi2.pdf
Confirmed
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/






.Other