4 messages in com.mysql.lists.clusterRe: Node restart issue| From | Sent On | Attachments |
|---|---|---|
| Adam Dixon | 22 Aug 2005 19:06 | |
| Jonathan Miller | 22 Aug 2005 19:24 | |
| Adam Dixon | 22 Aug 2005 19:37 | |
| Adam Dixon | 23 Aug 2005 21:22 |
| Subject: | Re: Node restart issue![]() |
|---|---|
| From: | Adam Dixon (adam...@gmail.com) |
| Date: | 08/23/2005 09:22:26 PM |
| List: | com.mysql.lists.cluster |
So does anyone have any idea whats happening here.
Just to see, I deleted all the tables in the 2 databases I am running across the cluster, then ran the 'nodeid restart' command. It restarted fine, nice and quick obviously. But as soon as I load the cluster with some data, apon running the restart command i get this still; ndb_mgm> 15 restart Restart failed. * 2003: Restart failed * 5005-Send to process or receive failed.
and the clusterlog shows; 2005-08-24 13:44:05 [MgmSrvr] ALERT -- Node 10: Node 15 Disconnected 2005-08-24 13:44:05 [MgmSrvr] INFO -- Node 10: Communication to Node 15 closed 2005-08-24 13:44:05 [MgmSrvr] INFO -- Node 10: Communication to Node 15 closed 2005-08-24 13:44:05 [MgmSrvr] ALERT -- Node 20: Node 15 Disconnected ..... <more communication closed messages>... 2005-08-24 13:44:05 [MgmSrvr] ALERT -- Node 10: Arbitration check won - node group majority 2005-08-24 13:44:05 [MgmSrvr] INFO -- Node 10: President restarts arbitration thread [state=6] 2005-08-24 13:44:05 [MgmSrvr] INFO -- Mgmt server state: nodeid 15 freed, m_reserved_nodes 0000004842140406. 2005-08-24 13:44:05 [MgmSrvr] WARNING -- Allocate nodeid (15) failed. Connection from ip 10.10.10.15. Returned error string "Id 15 already allocated by another node." 2005-08-24 13:44:05 [MgmSrvr] INFO -- Mgmt server state: node id's 15 connected but not reserved 2005-08-24 13:44:05 [MgmSrvr] INFO -- Mgmt server state: node id's 1 not connected but reserved 2005-08-24 13:44:08 [MgmSrvr] INFO -- Mgmt server state: nodeid 15 reserved for ip 10.10.10.15, m_reserved_nodes 0000004842148406. 2005-08-24 13:44:08 [MgmSrvr] INFO -- Node 1: Node 15 Connected 2005-08-24 13:44:25 [MgmSrvr] INFO -- Node 10: Communication to Node 15 opened 2005-08-24 13:44:25 [MgmSrvr] INFO -- Node 10: Node 15 Connected 2005-08-24 13:44:25 [MgmSrvr] INFO -- Node 35: Communication to Node 15 opened
But the ndbd process comes back just like normal. No trace logs or anything in the ndbd's errorlog. And absolutely nothing suss in the ndb_out log file either. Looks like it happened without error from the ndbd side of things.
Any help on this would be good. Is it a bug in the mgmd side of things?
On 8/23/05, Jonathan Miller <jmil...@mysql.com> wrote:
Adam,
For the mysqld:
" New command in ndb_mgm, PURGE STALE SESSIONS, as a workaround for cases where nodes fail to allocate a node id even if it is free to use."
Was there an error log or trace file on the system that 35 resides on?
JBM
-----Original Message----- From: Adam Dixon [mailto:adam...@gmail.com] Sent: Monday, August 22, 2005 9:07 PM To: clus...@lists.mysql.com Subject: Node restart issue
Bit of a weird one here; I would assume that with no problem with the cluster to begin with from the mgt node running; ndb_mgm> 35 restart
I get this showing up in the ndb_mgm; Restart failed. * 2003: Restart failed * 5005-Send to process or receive failed.
And this in the clusterlog; 2005-08-23 11:16:15 [MgmSrvr] INFO -- Node 35: Node shutdown initiated 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 1: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 10: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 10: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 10: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 15: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 15: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 20: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 25: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 25: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 30: Node 35 Disconnected 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 30: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 15: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 20: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 25: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 30: Communication to Node 35 closed 2005-08-23 11:16:25 [MgmSrvr] ALERT -- Node 10: Arbitration check won - node group majority 2005-08-23 11:16:25 [MgmSrvr] INFO -- Node 10: President restarts arbitration thread [state=6] 2005-08-23 11:16:26 [MgmSrvr] INFO -- Mgmt server state: nodeid 35 freed, m_reserved_nodes 0400000042108402. 2005-08-23 11:16:26 [MgmSrvr] WARNING -- Allocate nodeid (35) failed. Connection from ip 10.10.10.35. Returned error string "Id 35 already allocated by another node." 2005-08-23 11:16:26 [MgmSrvr] INFO -- Mgmt server state: node id's 14 18 23 27 33 35 59 connected but not reserved 2005-08-23 11:16:26 [MgmSrvr] INFO -- Mgmt server state: node id's 1 not connected but reserved 2005-08-23 11:16:29 [MgmSrvr] INFO -- Mgmt server state: nodeid 35 reserved for ip 10.10.10.35, m_reserved_nodes 0400000842108402. 2005-08-23 11:16:29 [MgmSrvr] INFO -- Node 1: Node 35 Connected 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 30: Communication to Node 35 opened 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 20: Communication to Node 35 opened 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 20: Node 35 Connected 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 35: Node 20 Connected 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 35: Node 30 Connected 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 30: Node 35 Connected 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 25: Communication to Node 35 opened 2005-08-23 11:16:45 [MgmSrvr] INFO -- Node 15: Communication to Node 35 opened 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 25: Node 35 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 10: Communication to Node 35 opened 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 35: Node 10 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 35: Node 15 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 35: Node 25 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 10: Node 35 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 15: Node 35 Connected 2005-08-23 11:16:46 [MgmSrvr] INFO -- Node 35: CM_REGCONF president = 10, own Node = 35, our dynamic id = 7
It did come back ok (well it shows up fine in all status and show) so I dont know.. seems a little scary for a managed restart of a node.
I also had something similar when a mysqld crashed, it attempted to connect again, and failed due to the id not being free twice, then it came up on one of the spare id connections i have leftover with no ip tied to it instead of the one that it is supposed to have; eg; [mysqld(API)] 9 node(s) id=2 (not connected, accepting connect from 10.10.10.1) ...(few others) id=33 (Version: 4.1.13) id=38 (not connected, accepting connect from 10.10.10.35) id=58 @10.10.10.1 (Version: 4.1.13) id=59 (Version: 4.1.13)
-- MySQL Cluster Mailing List For list archives: http://lists.mysql.com/cluster To unsubscribe: http://lists.mysql.com/cluster?unsub=jbmi...@mysql.com




