17 messages in ru.sysoev.nginxRe: Is it possible to monitor the fai...
FromSent OnAttachments
Robbie AllenJun 27, 2008 5:22 pm 
Alexander StauboJun 27, 2008 5:39 pm 
Robbie AllenJun 27, 2008 6:08 pm 
Rt IbmerJun 27, 2008 6:54 pm 
mikeJun 27, 2008 11:02 pm 
Grzegorz NosekJun 28, 2008 4:50 am 
Grzegorz NosekJun 28, 2008 5:31 am 
mikeJun 28, 2008 9:14 am 
Alexander StauboJun 28, 2008 12:28 pm 
Grzegorz NosekJun 28, 2008 12:53 pm 
Almir KaricJun 28, 2008 1:30 pm 
Brice FigureauJun 28, 2008 2:36 pm 
Alexander StauboJun 28, 2008 4:02 pm 
Rt IbmerJun 28, 2008 9:38 pm 
Grzegorz NosekJun 29, 2008 10:57 am.patch, .patch, .patch
Brice FigureauJun 30, 2008 12:23 pm 
Grzegorz NosekJun 30, 2008 12:49 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Is it possible to monitor the fair proxy balancer?Actions...
From:Alexander Staubo (alex@public.gmane.org)
Date:Jun 28, 2008 4:02:11 pm
List:ru.sysoev.nginx

On Sat, Jun 28, 2008 at 9:54 PM, Grzegorz Nosek <grze@public.gmane.org> wrote:

I'd like to gather ideas about how to notify the outside world. A log message? Sending a signal somewhere? An SNMP trap? Every way has its advantages and disadvantages, so I'd like to pick the one that sucks the least.

Why just one? A status page supplemented by machine-readable log output is a good solution that I think would satisfy most sysadmins.

Pardon me for asking a naive question, but to change the list of backends, would you not simply edit the config file and do a SIGHUP? I would reset whatever internal structures that are kept by the workers, but I can't think of anything that's not okay to lose.

Yes. That's the obvious solution but apparently not always acceptable, especially when you'd want to use an external monitoring system to do this automatically.

What's simpler for an external monitoring system than sending a signal to a process?

Of course, you could go all the way and do a Varnish-style admin interface. I have mentioned Varnish before on this list. Varnish has a pretty clever admin/monitoring infrastructure. For example, you can load multiple configs and selectively enable them:

$ varnishadm vcl.load test /etc/varnish/test.vcl $ varnishadm vcl.use test # ... something goes horribly wrong ... $ varnishadm vcl.use boot

The use of named configs means the input can be anything (even your default set of config files). You can load it, try it out, and unload it.

You could do worse than looking at Varnish's logging system for ideas. Varnish uses circular buffers in shared memory for logging, and its logs are explicitly machine-readable, each line being a tag followed by a value. So log output looks like this:

14 Debug c "Hash Match: /-/cache/border/w=6;h=6;sw=true;sx=0;sy=3;sbr=10;sbs=5;sm=10;sp=0;c=fff;t=r_24.png#origo.no#" 14 Hit c 1402130806 14 VCL_call c hit 14 VCL_return c deliver 14 Length c 217 14 VCL_call c deliver 14 VCL_return c deliver 14 TxProtocol c HTTP/1.1 14 TxStatus c 200 14 TxResponse c OK 14 TxHeader c Status: 200 OK

and so on.

In addition to making it superbly easy for scripts to graph, analyze and monitor activity in real time, this lets you tail the log for specific events or strings, and since it's all RAM-based, you can get real-time, low-overhead debug log output immediately without changing any configuration settings or reloading the daemon. As far as I know, Varnish only logs when you listen to log output and filtered by what you're listening for, but I could be wrong.

Using shared memory with Nginx's worker process model should not pose any problems as each worker could maintain its own shared memory and thus avoid the need for locking.

- a new option, e.g. max_requests 10 10 20 20 (specifying the number for each backend in the order of server directives)

That's a horrible syntax and one that is going to cause problems as you add or remove backends from the config. A max_requests setting belongs on each backend declaration.

Like I wrote in the snipped part, I cannot easily add options to the server directives (at least without patching nginx or reinventing the square wheel). I don't like the max_requests idea too, for precisely the same reason. I presume that means the overloading of weight=X is at least acceptable.

I think you have to push Igor for a more flexible internal infrastructure. :-)

Even something string-based would work, even if it would be hackier than a true syntax:

server 127.0.0.1:10000 option <key>=<value> [option ...];

Eg.,

server 127.0.0.1:10000 option fair.max_conns=5;

You should only return an error if a request cannot be served within a given timeout, not when all backends are full.

Will have to think about it. This has the potential of busy-looping when all the backends are indeed full (or down, but then one can just send a hard error and be done with it). I don't think nginx has a way to be told "everything is unavailable now, come back to me in a second or two" or even better "I'll tell you when to ask me again".

I think Nginx needs something like this.

Alexander.