| From | Sent On | Attachments |
|---|---|---|
| François Battail | Apr 30, 2008 12:29 pm | |
| Manlio Perillo | Apr 30, 2008 1:16 pm | |
| François Battail | Apr 30, 2008 1:39 pm | |
| Manlio Perillo | May 1, 2008 1:30 am | |
| François Battail | May 1, 2008 1:58 am | |
| Manlio Perillo | May 1, 2008 2:39 am | |
| François Battail | May 1, 2008 5:42 am | |
| Rt Ibmer | May 1, 2008 9:59 am | |
| Grzegorz Nosek | May 1, 2008 11:35 am | |
| Manlio Perillo | May 1, 2008 11:57 am | |
| François Battail | May 1, 2008 1:25 pm | |
| Manlio Perillo | May 2, 2008 1:52 am | |
| François Battail | May 2, 2008 6:50 am | |
| Rt Ibmer | May 2, 2008 10:31 am |
| Subject: | Re: Feature requestED: monitoring Nginx from the outside | |
|---|---|---|
| From: | François Battail (fb-f...@public.gmane.org) | |
| Date: | May 2, 2008 6:50:50 am | |
| List: | ru.sysoev.nginx | |
Le vendredi 02 mai 2008 à 10:52 +0200, Manlio Perillo a écrit :
The problem with this is that the script can arbitrarily block Nginx if it holds the lock for too much time.
I will not call a sem_wait() but a sem_trywait() of course! If Nginx cannot write because a script hold the semaphore then the script will read the old values, I don't see an issue.
Ok, but I think that providing a file system interface is not the better solution.
If you want to monitor global variables, then you can use the stub_status module (maybe adding new global shared variables).
Stub_status works but it's not the cleanest code in Nginx and there's no simple way to extend the variables watched since you need to modify specifically other modules using blocks of conditionnal compilation. If you modify stub_status you potentially break Collectd and Nagios plugins. A file system interface is universal and means it will be easy to use whatever tool you want. A monitoring agent written in C will be happy to read a file, a little bit less happy if a www library or executing wget is needed to fetch data.
That's why I propose two things:
1) A generic interface for monitoring agents
The easiest one: a file-like and a list of key:value, even if the monitoring agent doesn't know the semantic of the key it can report back the value and a graph can be made. Of course it's possible to modify stub_status (and to break compatibility) to do the same things but it will be of no help for point 2. Don't know today if it will be a shared memory or a regular file mmaped (file locking on unices is a complete mess :( ).
2) An API
An API for other modules to help providing variables for monitoring. At the cost of an indirection it may be possible at runtime to choose if this variable is monitored and then to do an atomic_t operation or not. If a module offer some variables for monitoring the user can choose or not to monitor. That's value for the software *and* for the user.
The API could be as simple as:
ngx_monitoring_value_t * ngx_register_monitoring_value (ngx_str_t * name, ngx_str_t * command_name, ngx_int_t option) ;
void ngx_monitoring_value_add (ngx_monitoring_value_t * value, ngx_int_t nbr) ;
void ngx_monitoring_value_set (ngx_monitoring_value_t * value, ngx_int_t new_value) ;
For example, in the case of the upstream server round robin module, code would be like this (pseudo code):
init: servers = array of ngx_monitoring_value_t * [nbr_servers] for each upstream server servers [i] = ngx_register_monitoring_value ("upstream-status-"+server_name [i], "upstream_server_status",0) ; ...
run: if (event == down) { ... ngx_monitoring_value_set (server [i], 0) ; } else if (event == up) { ... ngx_monitoring_value_set (server [i], 1) ; }
Just put "monitor upstream_server_status ;" in nginx.conf and my module will do all the atomic_t stuff else it will use normal operations. Cost at runtime: one function call and a conditionnal per variable...
If you want to monitor things like gzip compression ratio, then just implement a custom variable $gzip_ratio that the user can use in the log file.
OK, gzip ratio was not the best real life example ;-) but imagine you have MRTG graphs and important values in the log, you ran stress tests for 24 h, 1.4 10^9 requests later, the error log is 100 MB long, looks like a nightmare to exploit the log to correlate with load for example, isnt it?
Just a different example where I want logging *and* monitoring. I've a special Nginx module with a circular buffer used to communicate with threads. If there's a buffer overflow, I log it, but it would be nice (for me) to have a circular buffer overflow error counter included in the monitoring watch set. Of course I can "hack" stub_status and collectd plugin, but it's better to propose a more general solution without breaking anything and with no significant performance hit.
Thank you very much for your time and your input Manlio, even if we don't agree on some points, it is very stimulating for me to have a contradictor such as you.
Best regards.





