17 messages in com.mysql.lists.clusterRe: DB node hang on start
FromSent OnAttachments
Brancaleoni Matteo20 Jun 2004 00:54 
Brancaleoni Matteo20 Jun 2004 14:24 
Tomas Ulin20 Jun 2004 15:43 
Tomas Ulin21 Jun 2004 04:37 
Tomas Ulin21 Jun 2004 04:45 
Matteo Brancaleoni21 Jun 2004 05:22 
Matteo Brancaleoni21 Jun 2004 07:30 
Tomas Ulin21 Jun 2004 07:57 
Tomas Ulin21 Jun 2004 08:34 
Brancaleoni Matteo21 Jun 2004 10:33 
Tomas Ulin21 Jun 2004 11:36 
Tomas Ulin22 Jun 2004 02:57 
tul...@mysql.com22 Jun 2004 14:37 
Matteo Brancaleoni23 Jun 2004 00:23 
Matteo Brancaleoni23 Jun 2004 01:40 
Matteo Brancaleoni23 Jun 2004 01:46 
Tomas Ulin23 Jun 2004 03:30 
Subject:Re: DB node hang on start
From:Tomas Ulin (tom@mysql.com)
Date:06/21/2004 04:37:11 AM
List:com.mysql.lists.cluster

Did you try to start the second node with "ndbd -i"?

T

Brancaleoni Matteo wrote:

Hi, thanks for the fast answer :) see my comments inline.

Il lun, 2004-06-21 alle 00:43, Tomas Ulin ha scritto:

first of all, if you download the latest source you don't have to specify the "[TCP]" connections at all

Ok, done.

1) please look where you started ndb_mgmd, you should find a cluster.log (look at the end "tail -n100 cluster.log")

ok, got it. unfortunately no trace about the db node #3, that's the one onto the remote machine

2) please make sure that you don't have any trailing "ndbd" processes on the failing machine. (we're working on better detection on clashes), if so kill and restart (if a "ndb" process hangs this is often due to that there are "multiple" processes trying to connect as the same "id")

ok. no trailing processes.

3) make sure you have your [COMPUTER] sections correct in the config file

ok, done

4) make sure that your Ndb.cfg/NDB_CONNECTSTRING points to the actual host:port that run the ndb_mgmd

sure done. If I write something wrong (done just 4 testing) the node doesn't go at all into starting phase (should be phase 1, I think). But when starts, is stick in that state.

and try again until you get the config right

mmh... I tried to start 2 db nodes on the same machine (of course with different fs), the 2nd db node starts, but after phase #4 crashes.

I have a rather long trace file for that. the error into ndbd error.log is :

Date/Time: x 20 June 2004 - 23:15:49 Type of error: error Message: Internal program error (failed ndbrequire) Fault ID: 2341 Problem data: DbdihMain.cpp Object of reference: DBDIH (Line: 1080) 0x00000002 ProgramName: NDB Kernel ProcessID: 10904 TraceFile: NDB_TraceFile_1.trace ***EOM***

The mgm config is (for 2 db nodes on same machine) [COMPUTER] Id: 1 ByteOrder: Little HostName: bestia [COMPUTER] Id: 2 ByteOrder: Little HostName: bestia [MGM] Id: 1 ExecuteOnComputer: 1 ArbitrationRank: 1 [DB DEFAULT] NoOfReplicas: 2 [DB] Id: 2 ExecuteOnComputer: 1 FileSystemPath: /root/ndb/ndb_data1 [DB] Id: 3 ExecuteOnComputer: 2 FileSystemPath: /root/ndb/ndb_data2 [API] Id: 4 ExecuteOnComputer: 1 ArbitrationRank: 1

Regarding 2 db nodes on different machines, I'm stick to node #3 not starting (stops at phase 1, without exiting...) The only difference in mgm config.ini is the hostname of COMPUTER with id #2

any clue?