2 messages in com.mysql.lists.clusterRE: Problem with missed heartbeats
FromSent OnAttachments
Jeff Tucker28 Aug 2006 07:41 
Jonathan Miller29 Aug 2006 07:33 
Subject:RE: Problem with missed heartbeats
From:Jonathan Miller (jmil@mysql.com)
Date:08/29/2006 07:33:41 AM
List:com.mysql.lists.cluster

Try moving the MySQLD off the system for a while.

I would also check for network issues.

i.e. Card problems, Cable, switch, etc...

/jeb

Jonathan Miller Austin, Texas USA Senior Lead Quality Assurance Developer MySQL AB www.mysql.com __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / / /|_/ / // /\ \/ /_/ / /__ /_/ /_/\_, /___/\___\_\___/ <___/ www.mysql.com

Jumpstart your cluster! http://www.mysql.com/consulting/packaged/cluster.html

Get training on clusters http://www.mysql.com/training/courses/mysql_cluster.html

All-in-one Enterprise-grade Database, Support and Services http://www.mysql.com/network/

-> -----Original Message----- -> From: Jeff Tucker [mailto:je@jltnet.com] -> Sent: Monday, August 28, 2006 9:42 AM -> To: clus@lists.mysql.com -> Subject: Problem with missed heartbeats -> -> Hi, guys. -> -> I had an "event" a few minutes ago. I had a similar one a few days ago -> that actually kicked one node offline. I'd like to avoid these, so I -> need any advice I can get from you. -> -> First, I have two machines running a cluster, with a third for the -> management node. I have the ndb engine running on each machine. I also -> have mysql running on each of those machines, although I'm only using -> one of them at the present time. -> -> So, in my case, node 1 is the management node named db1. Node 2 is ndb -> on db2. Node 3 is ndb on db3. Node 4 is mysql on db2 and Node 5 is mysql -> on db3. The nodes are all connected via gigabit ethernet. Here, this -> should help: -> -> -> -- NDB Cluster -- Management Client -- -> ndb_mgm> show -> Connected to Management Server at: localhost:1186 -> Cluster Configuration -> --------------------- -> [ndbd(NDB)] 2 node(s) -> id=2 @192.168.1.82 (Version: 5.0.22, Nodegroup: 0) -> id=3 @192.168.1.83 (Version: 5.0.22, Nodegroup: 0, Master) -> -> [ndb_mgmd(MGM)] 1 node(s) -> id=1 @192.168.1.81 (Version: 5.0.22) -> -> [mysqld(API)] 2 node(s) -> id=4 @192.168.1.82 (Version: 5.0.22) -> id=5 @192.168.1.83 (Version: 5.0.22) -> -> -> -> Now, here is what happened a few minutes ago: -> -> 2006-08-28 10:05:30 [MgmSrvr] WARNING -- Node 3: Node 2 missed heartbeat -> 2 -> 2006-08-28 10:05:36 [MgmSrvr] WARNING -- Node 3: Node 2 missed heartbeat -> 2 -> 2006-08-28 10:05:44 [MgmSrvr] WARNING -- Node 3: Node 2 missed heartbeat -> 2 -> 2006-08-28 10:05:45 [MgmSrvr] INFO -- Node 1: Node 2 Connected -> 2006-08-28 10:05:45 [MgmSrvr] WARNING -- Node 3: Node 2 missed heartbeat -> 3 -> 2006-08-28 10:05:47 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected -> 2006-08-28 10:05:47 [MgmSrvr] INFO -- Node 3: Communication to Node -> 4 closed -> 2006-08-28 10:05:48 [MgmSrvr] INFO -- Mgmt server state: nodeid 4 -> freed, m_reserved_nodes 0000000000000022. -> 2006-08-28 10:05:49 [MgmSrvr] INFO -- Mgmt server state: nodeid 4 -> reserved for ip 192.168.1.82, m_reserved_nodes 0000000000000032. -> 2006-08-28 10:05:49 [MgmSrvr] INFO -- Node 4: mysqld --server-id=1 -> 2006-08-28 10:05:51 [MgmSrvr] INFO -- Node 1: Node 2 Connected -> 2006-08-28 10:05:51 [MgmSrvr] INFO -- Node 3: Communication to Node -> 4 opened -> 2006-08-28 10:05:51 [MgmSrvr] INFO -- Node 3: Node 4 Connected -> 2006-08-28 10:05:51 [MgmSrvr] INFO -- Node 3: Node 4: API version -> 5.0.22 -> -> -> It appears that communications to Node 2 and Node 4 (which are really -> the same machine) were lost temporarily maybe. Node 4 disconnected and -> reconnected automatically. When this happened last week, Node 2 missed -> more heartbeats and was actually kicked out of the cluster. -> -> Can you give me some hints on what could be happening here? These are -> dual-processor Xeon machines. The load average on the busy machine -> (node2/node4) is around 1.0. The others are lower. -> -> Thanks for your help. -> -> Jeff -> -> -- -> MySQL Cluster Mailing List -> For list archives: http://lists.mysql.com/cluster -> To unsubscribe: -> http://lists.mysql.com/cluster?unsub=jbmi@mysql.com