3 messages in com.mysql.lists.clusterRE: NDBD process always going down
FromSent OnAttachments
Casto, Robert18 Jan 2005 11:10 
Martin Pála19 Jan 2005 00:06 
Casto, Robert19 Jan 2005 08:51 
Subject:RE: NDBD process always going down
From:Casto, Robert (rca@amazon.com)
Date:01/19/2005 08:51:01 AM
List:com.mysql.lists.cluster

Martin,

I have a pretty extensive internally developed monitoring tool that I use to
keep track of everything.

My question really is why the angel process is shutting down? I was under the
impression that it would stay running no matter what. The log entry I gave shows
it shutting after endTakeOver. I don't understand why this is happening of what
it is. If there is some way to configure it so that it does not shut down, I
would be most grateful for the information.

Robert Casto Quality Measurements 206.266.3695

-----Original Message----- From: Martin Pála [mailto:Mart@oskar.cz] Sent: Wednesday, January 19, 2005 12:07 AM To: Casto, Robert; clus@lists.mysql.com Subject: RE: NDBD process always going down

I am noticing that after a couple of hours, the NDBD process will go down. Using
the SHOW command in the NDB_MGM program shows me that one of the nodes has disconnected. I can reactivate it and it comes back fine, but I want to keep it
alive all the time.

I read about an "angel" process that watches the ndbd process but I don't see
any way to configure it. The documentation says it is possible, but I can't find it.

The angel is parent process of ndbd server, for example:

--8<-- root@ndb1:[/root]# ps -ef | grep ndb root 8625 1 0 08:38:01 ? 0:00 /usr/local/mysql/bin/ndbd root 8626 8625 4 08:38:01 ? 0:01 /usr/local/mysql/bin/ndbd --8<--

The angel has PID 8625, its task is to check whether its child (PID 8626
currently) is running:

--8<-- root@ndb1:[/root]# truss -p 8625 waitid(P_PID, 8626, 0xFFBFFB20, WEXITED|WTRAPPED) (sleeping...) --8<--

If you want to make sure that angel is running anytime, you have to use
independent monitoring. You can use for example Monit
(http://www.tildeslash.com/monit). It's opensource aplication, which allows to
check processes and do custom actions in the case of failure. In the case of
ndbd it is useful to check that angel is running. You can also check disk usage,
cpu and memory usage of the angel itself + its children (ndbd server), etc. so
it allows you to detect possible unusual behavior (memory leak, etc.) Monit can
run from init in respawn mode, it has web and command line interface and you can
use remote methods (via http) for monitoring/service control (thus to integrate
it with the mysql cluster).