7 messages in com.mysql.lists.clusterRe: Node dies after DELETE
FromSent OnAttachments
Kevin Burton30 Nov 2006 19:15 
Kevin Burton30 Nov 2006 19:42 
Kevin Burton30 Nov 2006 19:52 
Kevin Burton30 Nov 2006 20:14 
Anatoly Pidruchny30 Nov 2006 21:31 
Kevin Burton30 Nov 2006 22:10 
Martin Skold01 Dec 2006 00:21 
Subject:Re: Node dies after DELETE
From:Martin Skold (Mart@mysql.com)
Date:12/01/2006 12:21:22 AM
List:com.mysql.lists.cluster

Hi!

Please file a bug report with instructions on how to reproduce.

BR -- Martin Kevin Burton wrote:

OUCH.. now it gets worse..... I can't even delete from the table with a DELETE FROM FOO LIMIT 10000 as it causes all the data nodes to restart. The mgm console logs this:

ndb_mgm> Node 4: Forced node shutdown completed, restarting. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. Node 3: Forced node shutdown completed, restarting. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. Node 2: Forced node shutdown completed, restarting. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. Node 5: Forced node shutdown completed, restarting. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. Node 4: Forced node shutdown completed, restarting. Occured during startphase 1. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart

Ouch..... and just looked at the 5.1.13 changelog and there weren't any bugs for this. Maybe I can get a test case setup......

OUCH..... this bug is reproducable......... at least with my data set.

Want me to file a bug? Im running 5.1.12 with the ON DUPLICATE KEY UPDATE patch.... maybe I should look at the recent NDB changes in 5.1.13...

I had one node right now die after the following SQL:

mysql> DELETE FROM POST_NODE; ERROR 1297 (HY000): Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER mysql> DELETE FROM POST_NODE LIMIT 10000; ERROR 1297 (HY000): Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER

.........

Now the thing is I did it AGAIN and another node died. No ndbd process at ALL

The mgm node reports this:

Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

I'm going to try a rolling restart...... maybe that will fix the problem.