| From | Sent On | Attachments |
|---|---|---|
| Yo'av Moshe | Jan 11, 2009 5:43 am | |
| Jure Pečar | Jan 11, 2009 9:50 am | |
| Yo'av Moshe | Jan 11, 2009 10:46 am | |
| Jure Pečar | Jan 11, 2009 11:07 am | |
| Yo'av Moshe | Jan 11, 2009 11:22 am | |
| Jure Pečar | Jan 11, 2009 11:35 am | |
| Igor Sysoev | Jan 12, 2009 2:28 am | |
| Yo'av Moshe | Jan 18, 2009 3:35 pm | |
| Dave Cheney | Jan 18, 2009 4:24 pm | |
| Yo'av Moshe | Jan 19, 2009 2:49 am | |
| Just Marc | Jan 19, 2009 4:25 am | |
| Yo'av Moshe | Jan 19, 2009 4:59 pm | |
| Igor Sysoev | Jan 19, 2009 11:23 pm | .Other |
| Yo'av Moshe | Jan 20, 2009 1:44 am | |
| Igor Sysoev | Jan 20, 2009 2:08 am | |
| Yo'av Moshe | Jan 20, 2009 10:25 am | |
| jeff emminger | Jan 20, 2009 11:31 am | |
| Kiril Angov | Jan 20, 2009 11:38 am | |
| Yo'av Moshe | Jan 20, 2009 3:52 pm | |
| Resicow | Jan 20, 2009 4:17 pm | |
| Igor Sysoev | Jan 20, 2009 9:26 pm | |
| Yo'av Moshe | Jan 23, 2009 4:34 am | |
| Igor Sysoev | Jan 23, 2009 5:04 am | |
| Yo'av Moshe | Jan 25, 2009 8:26 am | |
| Igor Sysoev | Jan 25, 2009 9:33 am | |
| Yo'av Moshe | Jan 25, 2009 11:04 am | |
| Igor Sysoev | Jan 25, 2009 11:20 am | |
| Yo'av Moshe | Jan 26, 2009 4:05 am | .gz |
| Yo'av Moshe | Jan 27, 2009 12:42 am | |
| Igor Sysoev | Jan 27, 2009 12:50 am | |
| Igor Sysoev | Jan 27, 2009 1:00 am | |
| Yo'av Moshe | Jan 27, 2009 1:24 am | |
| Igor Sysoev | Jan 27, 2009 1:39 am | .nodelay |
| Yo'av Moshe | Jan 27, 2009 1:58 am | |
| Igor Sysoev | Jan 27, 2009 2:05 am | |
| Yo'av Moshe | Jan 27, 2009 2:24 am | |
| Dave Cheney | Jan 27, 2009 3:03 am | |
| Igor Sysoev | Jan 27, 2009 3:17 am | |
| Jure Pečar | Jan 28, 2009 1:52 pm |
| Subject: | Re: Nginx stops sending file after ~1.5mb ? | |
|---|---|---|
| From: | Yo'av Moshe (bje...@gmail.com) | |
| Date: | Jan 27, 2009 2:24:27 am | |
| List: | ru.sysoev.nginx | |
Right, it's working on the other port. I'll check it again with my VPS.
Sorry for all that.
Yo'av.
On Tue, Jan 27, 2009 at 12:05 PM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Tue, Jan 27, 2009 at 11:58:58AM +0200, Yo'av Moshe wrote:
Applied the patch, still the same. The other only difference I can think about is that I'm running Cherokee on a different port. Maybe my VPS is handling them differently. I'll try testing it on the same port this evening and let you know.
Could you run another nginx on this port ? Or even the same nginx:
server { listen 80; listen PORT;
... }
Thank you.
Yo'av
On Tue, Jan 27, 2009 at 11:40 AM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Tue, Jan 27, 2009 at 11:25:09AM +0200, Yo'av Moshe wrote:
Sure, here my Cherokee's strace: http://pastebin.com/m13cd213d
What you are saying about tcpdump's logs is strange, but I have no idea what's causing it.
I see only two difference with nginx:
1) it uses level triggered epoll (nginx uses edge triggered epoll), 2) and it uses TCP_NODELAY for first request (nginx set TCP_NODELAY only just before going to keep-alive).
The attached patch sets TCP_NODELAY from very start.
Yo'av.
On Tue, Jan 27, 2009 at 10:51 AM, Igor Sysoev <is...@rambler-co.ru> wrote:
On Tue, Jan 27, 2009 at 10:42:57AM +0200, Yo'av Moshe wrote:
Note I just compiled Cherokee 0.98. It is serving the file as it should using sendfile and epoll.
Have you tried this on the same host ? Could you post strace ?
On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bje...@gmail.com> wrote:
Hey, attached is my tcpdump log. I've never used tcpdump before, so I hope the log is fine. I used grep to show only the connections from my computer while running 'wget http://www.noal.org.il/static/temp/hozer.pdf'.
Hope this will lead us somewhere...
Yo'av.
On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is...@rambler-co.ru
wrote:
On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:
Sendfile is now off, using select, and still the same... http://pastebin.com/m2c3ea25b
I have no idea, why it may be so.
I see that select() returned event ("out [19]"):
28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 1 (out [19], left
{0,
380000})
However, at some stage select() did not return event, although
the
event
is still active ("select(20, [10 19], [19], NULL ..."):
28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0 28215 20:50:18 writev(19,
[{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"...,
9676}], 1) = 9676 28215 20:50:18 pread64(20,
"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
32768, 1605632) = 32768 28215 20:50:18 writev(19,
[{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,
32768}], 1) = 32768 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 0 (Timeout)
Could you run tcpdump of the session ?
Yo'av
On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <
is...@rambler-co.ru>
wrote:
On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe wrote:
Igor, I tried using select instead of poll, but I get
the
exact
same
problem.
Check my strace: http://pastebin.com/m65056ec0
I get the same thing with rtsig too.
Hints? Again, file is at
http://www.noal.org.il/static/temp/hozer.pdf.
Server
is
using select now.
According strace select() does not return event for socket
19
after
sendfile64() sent 1651005 bytes:
25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0}
<unfinished
...>
25951 18:14:27 <... select resumed> ) = 1 (out [19],
left
{58,
790000})
25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) = 0 25951 18:14:27 sendfile64(19, 20, [1509465], 440765
<unfinished
...>
25951 18:14:27 <... sendfile64 resumed> ) = 141540 25951 18:14:27 sendfile64(19, 20, [1651005], 299225
<unfinished
...>
25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN
(Resource
temporarily
unavailable)
25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000}
<unfinished
...>
25951 18:14:28 <... select resumed> ) = 0 (Timeout)
As you have got the same issue with all methods (even with
level
triggered
select()), I believe that the bug is in sendfile. You may
try
to
turn
it off even for the single file:
location = /static/temp/hozer.pdf { sendfile off; }
In Linux 2.6.23 sendfile() has been rewritten to use
splice
framework.
The bug may be introduced while rewriting.
BTW, it seems that thttpd does not use sendfile.
Thanks... Yo'av.
On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <
is...@rambler-co.ru>
wrote:
On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe wrote:
Thanks Igor. I don't want to be rude, but I find it hard to
believe
that
it's
a
kernel
bug.
First, I'm using the default CentOS 5.2 kernel, it's
seems
to me
pretty
odd
that there's such a bug in it. CentOS is being used
on a
lot
of
servers,
some of them must be running nginx.
Second, I tried serving the same file with thttpd,
and
it
runs
just
fine.
http://195.28.180.147:40/temp/hozer.pdf Note it's the exact same file which I cannot get
from
nginx:
http://www.noal.org.il/static/temp/hozer.pdf
How come thttpd can serve the file just fine? If
it's a
kernel
bug, I
thought it would happen with all of my servers. If it's using some different way of sending the
files
which
does
not
depend
on that epoll call - is there any way to configure
nginx
to
use
that
as
well?
I do not look modern thttpd, but according its change
log,
it
has
no
epoll support. You may try different ways in nginx:
you
need
to
build
it
--with-rtsig_module --with-select_module --with-poll_module
and then choose method:
events { use select; #use poll; #use rtsig; }
Also, epoll has two modes: level triggered (default)
and
edge
triggered (EPOLLET). nginx uses more effective edge
triggered
mode,
lighttpd uses level triggered one. The bug may be in
ET
mode
only.
As to probabilty of kernel bugs: I saw them in FreeBSD
(kqueue,
sendfile),
Linux (epoll), Solaris (event ports), and MacOSX
(kqueue,
sendfile).
Can I do anything to confirm that it's a kernel bug?
If
it
is, I
should
probably contact CentOS's maintainers.
Your straces confirms this: nginx added socket to
epoll
and
did
not
deleted it, nevertheless epoll does not send event.
Thank you, again. Yo'av.
On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <
is...@rambler-co.ru>
wrote:
On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av
Moshe
wrote:
Igor, here's the new strace. http://pastebin.com/m5129c54b
Does the fact it works in low download rates
still
makes
you
think
it's a
kernel issue?
Yes, this is seems like a kernel issue. Kernel
excatly
said
(with
EAGAIN)
that there is no place to output data, but never
returned
a
new
notification via epoll_wait() about free space:
21420 01:39:43 <... epoll_wait resumed>
{{EPOLLOUT,
{u32=153137960,
u64=13811628390360331048}}}, 512, 60000) = 1 21420 01:39:43 gettimeofday({1232494783, 413764},
NULL) =
0
21420 01:39:43 sendfile64(20, 21, [1645625],
304605) =
40320
21420 01:39:43 sendfile64(20, 21, [1685945],
264285) =
-1
EAGAIN
(Resource
temporarily unavailable) 21420 01:39:43 epoll_wait(11, <unfinished ...>
This is probably some race condition inside
kernel,
that
does
not
appear on
low speed download.
Thank you, really! And thanks, Kiril and Jeff, for confirming that!
Yo'av.
On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <
kupo...@gmail.com>
wrote:
wget http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:36:00-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il
|195.28.180.147|:80...
connected.
HTTP request sent, awaiting response... 200 OK Length: 1950230 (1.9M) [application/pdf] Saving to: `barvazi2.pdf'
81% [=====================================>
]
1,596,665
--.-K/s eta 5s
wget -c http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:38:28--
http://www.noal.org.il/static/temp/barvazi2.pdf
Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il
|195.28.180.147|:80...
connected.
HTTP request sent, awaiting response... 206
Partial
Content
Length: 1950230 (1.9M), 305960 (299K)
remaining
[application/pdf]
Saving to: `barvazi2.pdf'
100%[+++++++++++++++++++++++++++++++++++++++=======>]
1,950,230
295K/s in 1.0s
2009-01-20 21:38:30 (295 KB/s) -
`barvazi2.pdf'
saved
[1950230/1950230]
On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger
<
jemm...@gmail.com>
wrote:
Could anyone confirm that this works: wget --limit-rate=3500 http://www.noal.org.il/static/temp/barvazi2.pdf while this doesn't: wget http://www.noal.org.il/static/temp/barvazi2.pdf
Confirmed
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe
-- Igor Sysoev http://sysoev.ru/en/
-- Yo'av Moshe






.Other