3 messages in com.mysql.lists.clusterodd failure| From | Sent On | Attachments |
|---|---|---|
| B. Keith Murphy | 19 Sep 2007 12:15 | |
| B. Keith Murphy | 20 Sep 2007 10:21 | |
| Stewart Smith | 23 Sep 2007 06:39 |
| Subject: | odd failure![]() |
|---|---|
| From: | B. Keith Murphy (kmur...@icontact.com) |
| Date: | 09/19/2007 12:15:18 PM |
| List: | com.mysql.lists.cluster |
I have setup up a development cluster for our developers. It consists of two
physical servers running the SQL daemon and data node on each one with
management running on another server.
About an hour an a half ago the sql node on one of the two servers stopped
responding. The data node part was still responding and showing up in the
ndb_mgm console. As you can see node 4 started missing heartbeats at 1:06 pm.
2007-09-19 13:06:59 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2
2007-09-19 13:08:57 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2
2007-09-19 13:14:43 [MgmSrvr] INFO -- Node 2: Local checkpoint 134 started. Keep
GCI = 207358 oldest restorable GCI = 207369
2007-09-19 13:33:44 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2
2007-09-19 13:33:46 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2
2007-09-19 13:33:48 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 3
2007-09-19 13:33:49 [MgmSrvr] WARNING -- Node 2: Node 4 missed heartbeat 2
2007-09-19 13:33:50 [MgmSrvr] WARNING -- Node 3: Node 4 missed heartbeat 4
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 declared dead due to
missed heartbeat
2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected
2007-09-19 13:33:50 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected
2007-09-19 13:33:50 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected
2007-09-19 13:33:53 [MgmSrvr] INFO -- Node 3: Communication to Node 4 opened
2007-09-19 13:33:54 [MgmSrvr] INFO -- Node 3: Node 4 Connected
2007-09-19 13:33:55 [MgmSrvr] INFO -- Node 2: Communication to Node 4 opened
2007-09-19 13:33:56 [MgmSrvr] INFO -- Node 2: Node 4 Connected
I could log into the MySQL server node as normal and was able to switch
databases and list tables. Anything against a table (select * from users for
instance) would give an error 157.
The two servers I have set up (each running a sql node and a data node) are
running in virtual machines on the same server. So I can't figure out why the
heartbeat failed. The management node is on another server, but it is on the
same network.
To get things going I ended up shutting everything down and restarting. I
couldn't get the mysql processess on the sql nodes to shut down normally
(/etc/init.d/mysql stop) but had to kill the processes on one server..on the
second server I ended up rebooted the server just to shut it down. Once
everything was reset it looks fine. I can start and stop the mysql nodes,
etc..everything looks normal.
Oh, I am running 5.1.20 all around on 64-bit debian etch.
Any suggestions?
thanks,
Keith
-- B. Keith Murphy Database Administrator iContact 2635 Meridian Parkway, 2nd Floor Durham, North Carolina 27713 blog: http://blog.paragon-cs.com (o) 919-433-0786 (c) 850-637-3877




