13 messages in com.mysql.lists.clusterMax number of open files exceeded / E...
FromSent OnAttachments
Alex Davies27 Feb 2005 10:54 
Alex Davies27 Feb 2005 13:39 
pek...@mysql.com27 Feb 2005 14:56 
Alex Davies28 Feb 2005 00:16 
Alex Davies28 Feb 2005 08:11 
pek...@mysql.com28 Feb 2005 14:06 
Alex Davies01 Mar 2005 00:28 
Mikael Ronström01 Mar 2005 01:31 
Alex Davies01 Mar 2005 01:34 
Alex Davies01 Mar 2005 09:43 
Mikael Ronström01 Mar 2005 12:12 
Alex Davies02 Mar 2005 03:35 
Jonas Oreland04 Mar 2005 02:14 
Subject:Max number of open files exceeded / Error while reading REDO log / I can't start my cluster!
From:Alex Davies (davi@gmail.com)
Date:02/27/2005 10:54:40 AM
List:com.mysql.lists.cluster

Dear All,

I have a three server cluster that I am trying to restart. For various reasons it got SHUTDOWN (cleanly). When I attempt to restart it I am getting all sorts of problems. When I run ndbd on each storage machine, the managment server shows them as "starting" for about 5 minutes. It then shows one server up:

[ndbd(NDB)] 2 node(s) id=2 @81.29.81.196 (Version: 4.1.9, Nodegroup: 0, Master) id=3 @81.29.81.197 (Version: 4.1.9, starting, Nodegroup: 0)

But after another few minutes, both disconnect:

[ndbd(NDB)] 2 node(s) id=2 (not connected, accepting connect from 81.29.81.196) id=3 (not connected, accepting connect from 81.29.81.197)

The errors are

Server ID 2: Date/Time: x 27 February 2005 - 18:50:53 Type of error: error Message: Max number of open files exceeded Fault ID: 2806 Problem data: Object of reference: Ndbfs::createAsyncFile ProgramName: ndbd ProcessID: 3864 TraceFile: /var/lib/mysql-cluster/ndb_2_trace.log.10 ***EOM***

Server ID 3: Date/Time: x 27 February 2005 - 18:50:57 Type of error: error Message: Node failed during system restart Fault ID: 2308 Problem data: Unhandled node failure during restart Object of reference: NDBCNTR (Line: 1389) 0x0000000e ProgramName: ndbd ProcessID: 26476 TraceFile: /var/lib/mysql-cluster/ndb_3_trace.log.1 ***EOM***

Please note I have tried to increase /proc/sys/fs/file-max to avoid the number of open files problem but it has not worked.

Any ideas? Thanks everyone for your help as usual,

Alex

PS - I am starting server with ID 3 from an empty DataDir with the ndbd --initial command because I got a horrible error with it earlier (below) but I am using just plain ndbd on server ID 2 because I want/need to keep the data on the cluster.

Server 2 error log: Date/Time: x 27 February 2005 - 18:22:10 Type of error: error Message: Error while reading the REDO log Fault ID: 2310 Problem data: Error while reading REDO log. D=8, F=0 Mb=0 FP=1 W1=35 W2=0 Object of reference: DBLQH (Line: 14928) 0x0000000a ProgramName: ndbd ProcessID: 25653 TraceFile: /var/lib/mysql-cluster/ndb_3_trace.log.12 ***EOM***