27 messages in ru.sysoev.nginxRe: Feature request: Run a script whe...
FromSent OnAttachments
Rt IbmerApr 28, 2008 8:44 am 
Cliff WellsApr 28, 2008 10:20 am 
mikeApr 28, 2008 11:53 am 
Rt IbmerApr 28, 2008 2:02 pm 
Cliff WellsApr 28, 2008 2:18 pm 
Manlio PerilloApr 28, 2008 2:37 pm 
Rt IbmerApr 28, 2008 6:10 pm 
Rt IbmerApr 28, 2008 6:19 pm 
Manlio PerilloApr 29, 2008 1:18 am 
Manlio PerilloApr 29, 2008 1:25 am 
Rt IbmerApr 29, 2008 8:11 am 
François BattailApr 29, 2008 10:46 am 
Rt IbmerApr 29, 2008 1:41 pm 
Manlio PerilloApr 29, 2008 1:57 pm 
François BattailApr 29, 2008 2:27 pm 
Aleksandar LazicApr 29, 2008 2:36 pm 
François BattailApr 29, 2008 11:20 pm 
Mansoor PeerbhoyApr 30, 2008 1:46 am 
Manlio PerilloApr 30, 2008 2:36 am 
François BattailApr 30, 2008 3:54 am 
Mansoor PeerbhoyApr 30, 2008 5:03 am 
Grzegorz NosekApr 30, 2008 5:18 am 
Cliff WellsApr 30, 2008 10:40 am 
Manlio PerilloApr 30, 2008 12:16 pm 
Manlio PerilloMay 1, 2008 2:42 am 
Grzegorz NosekMay 1, 2008 11:28 am 
Manlio PerilloMay 1, 2008 12:02 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:Re: Feature request: Run a script when upstream detected down/upActions...
From:Cliff Wells (clif@public.gmane.org)
Date:Apr 28, 2008 2:18:14 pm
List:ru.sysoev.nginx

On Mon, 2008-04-28 at 14:02 -0700, Rt Ibmer wrote:

This sounds like a job for a heartbeat monitor, not a web server.

For our needs this would be best handled by nginx. Here's why... Nginx is the first one to know that it considers a server down and has stopped routing traffic to it until fail_timeout occurs.

Well, it *might* be, depending on the timing of the heartbeat and whether/when a particular request causes Nginx to try that backend.

So regardless of whether its right and the upstream is really down, or was tripped by a false positive, the bottom line is that it is now ignoring that upstream for fail_timeout duration.

Currently nginx is the only one that knows this. So yes I can use Heartbeat or whatever other monitoring tools are out there. But those tools can say an upstream is up, or down, but nginx could have the upstream's state differently (i.e monitoring could say its up when in fact it missed a condition that nginx considered the upstream to be down - so the monitoring goes on saying the upstream is fine, while nginx is treating it as offline - and all the while we have no idea of this).

Bottom line is that it doesn't make any difference whether a monitoring script says an upstream server is down or not. What matters is whether nginx considers it down or not. And for me to know that, nginx needs to tell me.

But it does. It's in your error logs. There are alternate loggers that can even allow you to have scripts run when a regex is matched (metalog for one). I've used metalog successfully to deter brute-force ssh attacks for example.

http://metalog.sourceforge.net/

Metalog is available in most Linux distros (I've used it on Gentoo and Fedora).

The beauty of it is that it seems like quite a trivial yet very useful function to implement. Basically where ever the code is that decides to ignore an upstream for fail_timeout, it just needs to call out to some script to launch it and pass it a param like the name of the upstream entity that went down. Seems like something that could be done in just minutes. Unfortunately I'm not a coder or I would take a crack at it.

Except that Nginx is asynchronous, not threaded. This means that when your script is called, Nginx will now be delayed while the script is launched (and what if the script fails?). You might be able to work around this, but I suspect it won't be as trivial as you might hope.

Regards, Cliff