4 messages in com.mysql.lists.clusterRe: Two NDB nodes taking a while to r...
FromSent OnAttachments
Andy Smith19 May 2007 04:52 
Geert Vanderkelen19 May 2007 05:09 
Andy Smith19 May 2007 05:25 
Geert Vanderkelen19 May 2007 05:31 
Subject:Re: Two NDB nodes taking a while to restart after crash
From:Geert Vanderkelen (gee@mysql.com)
Date:05/19/2007 05:09:13 AM
List:com.mysql.lists.cluster

Hi Andy,

On May 19, 2007, at 13:52, Andy Smith wrote: ..

2007-05-19 11:56:43 [ndbd] INFO -- Error handler shutting down system 2007-05-19 11:56:45 [ndbd] INFO -- Error handler shutdown completed - exiting 2007-05-19 11:56:45 [ndbd] ALERT -- Node 5: Forced node shutdown completed. Initiated by signal 0. Caused by error 2306: 'Pointer too large(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Could be some fixed problem since 5.0.24 is quite old.

2007-05-19 12:13:00 [MgmSrvr] INFO -- Node 5: Receive arbitrator node 1 [ticket=68170002a3fb8cd5] 2007-05-19 12:13:01 [MgmSrvr] INFO -- Node 5: Start phase 4 completed (node restart) .. 2007-05-19 12:13:03 [MgmSrvr] INFO -- Node 6: Start phase 1 completed 2007-05-19 12:13:04 [MgmSrvr] INFO -- Node 6: Receive arbitrator node 1 [ticket=68170002a3fb8cd5]

however that's all I've seen since then, and it's now 12:51. Is this normal? It's never taken this long before.

I think Node 6 is waiting for Node 5 to complete its start. Unless Node 5 gets stuck in phase 4, that would .. well.. take forever. Maybe you could kill node 5 and make sure node 6 is coming up. If nothing helps, you might need to do an initial start on both or one of them. Depending on data size, it can take a while too, but they were not down that long.

Make sure you have backups! (Can't repeat it to much)

Cheers,