atom feed39 messages in ru.sysoev.nginxRe: Nginx stops sending file after ~1...
FromSent OnAttachments
Yo'av MosheJan 11, 2009 5:43 am 
Jure PečarJan 11, 2009 9:50 am 
Yo'av MosheJan 11, 2009 10:46 am 
Jure PečarJan 11, 2009 11:07 am 
Yo'av MosheJan 11, 2009 11:22 am 
Jure PečarJan 11, 2009 11:35 am 
Igor SysoevJan 12, 2009 2:28 am 
Yo'av MosheJan 18, 2009 3:35 pm 
Dave CheneyJan 18, 2009 4:24 pm 
Yo'av MosheJan 19, 2009 2:49 am 
Just MarcJan 19, 2009 4:25 am 
Yo'av MosheJan 19, 2009 4:59 pm 
Igor SysoevJan 19, 2009 11:23 pm.Other
Yo'av MosheJan 20, 2009 1:44 am 
Igor SysoevJan 20, 2009 2:08 am 
Yo'av MosheJan 20, 2009 10:25 am 
jeff emmingerJan 20, 2009 11:31 am 
Kiril AngovJan 20, 2009 11:38 am 
Yo'av MosheJan 20, 2009 3:52 pm 
ResicowJan 20, 2009 4:17 pm 
Igor SysoevJan 20, 2009 9:26 pm 
Yo'av MosheJan 23, 2009 4:34 am 
Igor SysoevJan 23, 2009 5:04 am 
Yo'av MosheJan 25, 2009 8:26 am 
Igor SysoevJan 25, 2009 9:33 am 
Yo'av MosheJan 25, 2009 11:04 am 
Igor SysoevJan 25, 2009 11:20 am 
Yo'av MosheJan 26, 2009 4:05 am.gz
Yo'av MosheJan 27, 2009 12:42 am 
Igor SysoevJan 27, 2009 12:50 am 
Igor SysoevJan 27, 2009 1:00 am 
Yo'av MosheJan 27, 2009 1:24 am 
Igor SysoevJan 27, 2009 1:39 am.nodelay
Yo'av MosheJan 27, 2009 1:58 am 
Igor SysoevJan 27, 2009 2:05 am 
Yo'av MosheJan 27, 2009 2:24 am 
Dave CheneyJan 27, 2009 3:03 am 
Igor SysoevJan 27, 2009 3:17 am 
Jure PečarJan 28, 2009 1:52 pm 
Subject:Re: Nginx stops sending file after ~1.5mb ?
From:Yo'av Moshe (bje@gmail.com)
Date:Jan 27, 2009 2:24:27 am
List:ru.sysoev.nginx

Right, it's working on the other port. I'll check it again with my VPS.

Sorry for all that.

Yo'av.

On Tue, Jan 27, 2009 at 12:05 PM, Igor Sysoev <is@rambler-co.ru> wrote:

On Tue, Jan 27, 2009 at 11:58:58AM +0200, Yo'av Moshe wrote:

Applied the patch, still the same. The other only difference I can think about is that I'm running Cherokee on a different port. Maybe my VPS is handling them differently. I'll try testing it on the same port this evening and let you know.

Could you run another nginx on this port ? Or even the same nginx:

server { listen 80; listen PORT;

... }

Thank you.

Yo'av

On Tue, Jan 27, 2009 at 11:40 AM, Igor Sysoev <is@rambler-co.ru> wrote:

On Tue, Jan 27, 2009 at 11:25:09AM +0200, Yo'av Moshe wrote:

Sure, here my Cherokee's strace: http://pastebin.com/m13cd213d

What you are saying about tcpdump's logs is strange, but I have no idea what's causing it.

I see only two difference with nginx:

1) it uses level triggered epoll (nginx uses edge triggered epoll), 2) and it uses TCP_NODELAY for first request (nginx set TCP_NODELAY only just before going to keep-alive).

The attached patch sets TCP_NODELAY from very start.

Yo'av.

On Tue, Jan 27, 2009 at 10:51 AM, Igor Sysoev <is@rambler-co.ru> wrote:

On Tue, Jan 27, 2009 at 10:42:57AM +0200, Yo'av Moshe wrote:

Note I just compiled Cherokee 0.98. It is serving the file as it should using sendfile and epoll.

Have you tried this on the same host ? Could you post strace ?

On Mon, Jan 26, 2009 at 2:05 PM, Yo'av Moshe <bje@gmail.com> wrote:

Hey, attached is my tcpdump log. I've never used tcpdump before, so I hope the log is fine. I used grep to show only the connections from my computer while running 'wget http://www.noal.org.il/static/temp/hozer.pdf'.

Hope this will lead us somewhere...

Yo'av.

On Sun, Jan 25, 2009 at 9:21 PM, Igor Sysoev <is@rambler-co.ru

wrote:

On Sun, Jan 25, 2009 at 09:04:54PM +0200, Yo'av Moshe wrote:

Sendfile is now off, using select, and still the same... http://pastebin.com/m2c3ea25b

I have no idea, why it may be so.

I see that select() returned event ("out [19]"):

28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 1 (out [19], left

{0,

380000})

However, at some stage select() did not return event, although

the

event

is still active ("select(20, [10 19], [19], NULL ..."):

28215 20:50:18 gettimeofday({1232909418, 293895}, NULL) = 0 28215 20:50:18 writev(19,

[{"\227\1\2\1\3\1\4\1\5\0\373\0\374\1\230\1\231\1\232\1\233\0\375\0\376\1\6\1\7\1\10\0"...,

9676}], 1) = 9676 28215 20:50:18 pread64(20,

"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,

32768, 1605632) = 32768 28215 20:50:18 writev(19,

[{"\24\7\6#\"\'&\0214\22$63\1\6\25\24\26\27\02632654&#\"\6\3\226\204"...,

32768}], 1) = 32768 28215 20:50:18 pread64(20, "\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768, 1638400) = 32768 28215 20:50:18 writev(19, [{"\0 \0M\0o\0r\0i\0s\0o\0n\0\'\0s\0 \0d\0i\0r\0e\0c"..., 32768}], 1) = 1716 28215 20:50:18 select(20, [10 19], [19], NULL, {0, 500000} <unfinished ...> 28215 20:50:18 <... select resumed> ) = 0 (Timeout)

Could you run tcpdump of the session ?

Yo'av

On Sun, Jan 25, 2009 at 7:33 PM, Igor Sysoev <

is@rambler-co.ru>

wrote:

On Sun, Jan 25, 2009 at 06:26:58PM +0200, Yo'av Moshe wrote:

Igor, I tried using select instead of poll, but I get

the

exact

same

problem.

Check my strace: http://pastebin.com/m65056ec0

I get the same thing with rtsig too.

Hints? Again, file is at

http://www.noal.org.il/static/temp/hozer.pdf.

Server

is

using select now.

According strace select() does not return event for socket

19

after

sendfile64() sent 1651005 bytes:

25951 18:14:26 select(20, [7 8 16 19], [19], NULL, {60, 0}

<unfinished

...>

25951 18:14:27 <... select resumed> ) = 1 (out [19],

left

{58,

790000})

25951 18:14:27 gettimeofday({1232900067, 890376}, NULL) = 0 25951 18:14:27 sendfile64(19, 20, [1509465], 440765

<unfinished

...>

25951 18:14:27 <... sendfile64 resumed> ) = 141540 25951 18:14:27 sendfile64(19, 20, [1651005], 299225

<unfinished

...>

25951 18:14:27 <... sendfile64 resumed> ) = -1 EAGAIN

(Resource

temporarily

unavailable)

25951 18:14:27 select(20, [16 19], [19], NULL, {0, 500000}

<unfinished

...>

25951 18:14:28 <... select resumed> ) = 0 (Timeout)

As you have got the same issue with all methods (even with

level

triggered

select()), I believe that the bug is in sendfile. You may

try

to

turn

it off even for the single file:

location = /static/temp/hozer.pdf { sendfile off; }

In Linux 2.6.23 sendfile() has been rewritten to use

splice

framework.

The bug may be introduced while rewriting.

BTW, it seems that thttpd does not use sendfile.

Thanks... Yo'av.

On Fri, Jan 23, 2009 at 3:05 PM, Igor Sysoev <

is@rambler-co.ru>

wrote:

On Fri, Jan 23, 2009 at 02:35:00PM +0200, Yo'av Moshe wrote:

Thanks Igor. I don't want to be rude, but I find it hard to

believe

that

it's

a

kernel

bug.

First, I'm using the default CentOS 5.2 kernel, it's

seems

to me

pretty

odd

that there's such a bug in it. CentOS is being used

on a

lot

of

servers,

some of them must be running nginx.

Second, I tried serving the same file with thttpd,

and

it

runs

just

fine.

http://195.28.180.147:40/temp/hozer.pdf Note it's the exact same file which I cannot get

from

nginx:

http://www.noal.org.il/static/temp/hozer.pdf

How come thttpd can serve the file just fine? If

it's a

kernel

bug, I

thought it would happen with all of my servers. If it's using some different way of sending the

files

which

does

not

depend

on that epoll call - is there any way to configure

nginx

to

use

that

as

well?

I do not look modern thttpd, but according its change

log,

it

has

no

epoll support. You may try different ways in nginx:

you

need

to

build

it

--with-rtsig_module --with-select_module --with-poll_module

and then choose method:

events { use select; #use poll; #use rtsig; }

Also, epoll has two modes: level triggered (default)

and

edge

triggered (EPOLLET). nginx uses more effective edge

triggered

mode,

lighttpd uses level triggered one. The bug may be in

ET

mode

only.

As to probabilty of kernel bugs: I saw them in FreeBSD

(kqueue,

sendfile),

Linux (epoll), Solaris (event ports), and MacOSX

(kqueue,

sendfile).

Can I do anything to confirm that it's a kernel bug?

If

it

is, I

should

probably contact CentOS's maintainers.

Your straces confirms this: nginx added socket to

epoll

and

did

not

deleted it, nevertheless epoll does not send event.

Thank you, again. Yo'av.

On Wed, Jan 21, 2009 at 7:26 AM, Igor Sysoev <

is@rambler-co.ru>

wrote:

On Wed, Jan 21, 2009 at 01:52:53AM +0200, Yo'av

Moshe

wrote:

Igor, here's the new strace. http://pastebin.com/m5129c54b

Does the fact it works in low download rates

still

makes

you

think

it's a

kernel issue?

Yes, this is seems like a kernel issue. Kernel

excatly

said

(with

EAGAIN)

that there is no place to output data, but never

returned

a

new

notification via epoll_wait() about free space:

21420 01:39:43 <... epoll_wait resumed>

{{EPOLLOUT,

{u32=153137960,

u64=13811628390360331048}}}, 512, 60000) = 1 21420 01:39:43 gettimeofday({1232494783, 413764},

NULL) =

0

21420 01:39:43 sendfile64(20, 21, [1645625],

304605) =

40320

21420 01:39:43 sendfile64(20, 21, [1685945],

264285) =

-1

EAGAIN

(Resource

temporarily unavailable) 21420 01:39:43 epoll_wait(11, <unfinished ...>

This is probably some race condition inside

kernel,

that

does

not

appear on

low speed download.

Thank you, really! And thanks, Kiril and Jeff, for confirming that!

Yo'av.

On Tue, Jan 20, 2009 at 9:39 PM, Kiril Angov <

kupo@gmail.com>

wrote:

wget http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:36:00-- http://www.noal.org.il/static/temp/barvazi2.pdf Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il

|195.28.180.147|:80...

connected.

HTTP request sent, awaiting response... 200 OK Length: 1950230 (1.9M) [application/pdf] Saving to: `barvazi2.pdf'

81% [=====================================>

]

1,596,665

--.-K/s eta 5s

wget -c http://www.noal.org.il/static/temp/barvazi2.pdf --2009-01-20 21:38:28--

http://www.noal.org.il/static/temp/barvazi2.pdf

Resolving www.noal.org.il... 195.28.180.147 Connecting to www.noal.org.il

|195.28.180.147|:80...

connected.

HTTP request sent, awaiting response... 206

Partial

Content

Length: 1950230 (1.9M), 305960 (299K)

remaining

[application/pdf]

Saving to: `barvazi2.pdf'

100%[+++++++++++++++++++++++++++++++++++++++=======>]

1,950,230

295K/s in 1.0s

2009-01-20 21:38:30 (295 KB/s) -

`barvazi2.pdf'

saved

[1950230/1950230]

On Tue, Jan 20, 2009 at 9:31 PM, jeff emminger

jemm@gmail.com>

wrote:

Could anyone confirm that this works: wget --limit-rate=3500 http://www.noal.org.il/static/temp/barvazi2.pdf while this doesn't: wget http://www.noal.org.il/static/temp/barvazi2.pdf

Confirmed