6 messages in com.mysql.lists.clusterRe: Major problems migrating to a clu...| From | Sent On | Attachments |
|---|---|---|
| James Lesworth | 23 Aug 2006 08:43 | |
| Apostolos Pantsiopoulos | 23 Aug 2006 11:58 | |
| James Lesworth | 23 Aug 2006 12:02 | |
| Apostolos Pantsiopoulos | 23 Aug 2006 12:05 | |
| Stewart Smith | 23 Aug 2006 18:15 | |
| Brendan Bouffler | 23 Aug 2006 19:12 |
| Subject: | Re: Major problems migrating to a cluster env from a single device![]() |
|---|---|
| From: | Apostolos Pantsiopoulos (mys...@easy-things.com) |
| Date: | 08/23/2006 11:58:23 AM |
| List: | com.mysql.lists.cluster |
I am only going to comment 2 things that I saw at the end of your post.
DataMemory = 80M IndexMemory = 20M
I don't know the size of your db (ndb_size.pl would show you that), but since you are using cluster (on dual xeons with 4-6 GB) I am guessing that it is quite big. The mysql-cluster stores everything in memory. So you use the equation shown in the documentation to specify the amount of ram you need. And then update the above parameters. If your DB is 1G. Then you need about : 1.1 GB of ram dedicated for ndb in order 2 ndbs with 2 replicas (like your config) to work.
The second thing I have noticed is that you put the ndb_mgmd on one of the NDB machines. I don't know if this really works, but from my experience it is somehow buggy. Try putting the ndb_mgmd service on another machine (even a celeron 300 would do the job).
James Lesworth wrote:
Hello list,
Yesterday we completed what we thought was a successful upgrade from a single MySQL database to a nice new ndb clustered system. This morning all hell broke lose and we have had to roll back to our previous setup. I have had a dig around on the list, docs, and goggle and pestered everyone I know who knows even a little about ndb but with no luck. If you have a few spare minutes could you have a look over the details below and tell me what I did wrong as I have reached the limit of what I can debug.
Many thanks in advance.
James
The issue we are getting that transaction time goes though the roof. Statements that took a few msec before start to take 40+ seconds. However, this is not always the case. We have cleaned the database of all data and moved it back into the live envy, i.e. it has 0 rows, things seem ok for about 40seconds and then the statement time goes up again, that’s with maybe 300 rows. In the logs on the devices issuing the insert query ( listed below ) we have a lot of these errors DBD::mysql::st execute failed: Lock wait timeout exceeded; try restarting transaction DBD::mysql::st execute failed: Got error 4350 'Transaction already aborted' from ndbcluster at Which seems to suggest its a locking issue. I turned on the slow log and was seeing almost every transaction in it, # Query_time: 58 Lock_time: 3 Rows_sent: 3 Rows_examined: 8307 select user,pass,time,sessions.session from `attributes` left join sessions on attributes.sso_id=sessions.sso_id where `attributes`.`sso_id`='c92a3c8e2702c87fc24fff518d983139'; There were many others with Lock times up towards 20seconds
While the NDB was in the live env I picked one of the insert statements and was running it on the MySQL console. It would run ok for a period of time and then would suddenly take 30-50seconds to complete. The few requests might also take the same period of time, but after that it would return to normal again. After a period of time where the query time had dropped back down it would once again rise up again. On my second test of the ndb back in the live environment after about 90seconds all queries stopped and I had to once again back out the changes. Sometimes I would see a Lock error reported on the console other times the statement would complete successfully after a few msec or up to 50 seconds.
My first suspicion was a network issue, as was everyone else I have spoken to. However, having looking on the switch and the devices I can see no rx or tx errors. I have setup a `ping -f -s 1200` between both devices for a good hour and not seen a single packet lost. I have also tested the speed between the two using a large file and I get a between 90-110MByte each way.
All the logs on the NDB and sql show no indication of anything being wrong.
The load on the cluster is not that great at present either, school holidays and its 90% used by them. Averaging about 31 statements a second with ~17% of the total statements being inserts and ~2% being updates.
The error message that the DBI dumps out seems to indicate that it’s a locking issue, which would make sense as to why everything kind of stops. However, I do not understand why the cluster can not take this level of load. It’s not like its doing a lot really. The old setup using myisam tables one of the devices was doing ~10x the work the ndb version was doing. I would have hoped to equal that with the increase in hardware. As such I do not think that is the real issue but a precursor to something else that I am missing.
Well I think that’s everything. If you got this far thanks for reading and if there is any other info you need about this please let me know.
- Cluster setup All devices are Dell Blades with 4-6 Gig RAM and Dual Xeon's Its a simple 2 device cluster, its purely for resilience as we do a fair amount of updates/inserts and these are critical to the system. It’s why we discounted a slave/master setup. device ndb_1 (10.0.0.1) runs ndbd, ndb_mgmd and a mysqld device ndb_2 (10.0.0.2) runs ndbd and mysqld We have a NLB to split the in coming connections between the two mysqld procs
there are a number of nodes that access the database ( 10 to be exact ) using apache/mod_perl and the DBI modules
-NDB manager config
[NDBD DEFAULT] NoOfReplicas=2 [MYSQLD DEFAULT] [NDB_MGMD DEFAULT] [TCP DEFAULT] [NDB_MGMD] HostName=10.0.0.1 DataDir=/var/lib/mysql-cluster
[NDBD] TransactionInactiveTimeout=250 HostName=10.0.0.1 DataDir= /var/lib/mysql-cluster DataMemory = 80M IndexMemory = 20M MaxNoOfConcurrentTransactions = 5000 MaxNoOfConcurrentOperations = 25000 [NDBD] TransactionInactiveTimeout=250 HostName=10.0.0.2 DataDir=/var/lib/mysql-cluster DataMemory = 80M IndexMemory = 20M MaxNoOfConcurrentTransactions = 5000 MaxNoOfConcurrentOperations = 25000
[MYSQLD] [MYSQLD]
MySQL my.conf [mysqld] ndbcluster ndb-connectstring=10.0.0.1 [mysql_cluster] ndb-connectstring=10.0.0.1
- tables the tables are very simple we are just passing some attributes between the devices
CREATE TABLE `attributes` ( `id` int(32) NOT NULL auto_increment, `sso_id` char(32) NOT NULL, `user` varchar(32) NOT NULL, `class` varchar(32) NOT NULL, `time` int(32) NOT NULL, `ttl` int(32) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `sso_id` (`sso_id`) ) ENGINE=NDBCLUSTER DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
CREATE TABLE `sessions` ( `sso_id` char(32) NOT NULL, `session` blob NOT NULL, UNIQUE KEY `id` (`sso_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
- Statement The statements run against these tables are very limited. Below is a full list of what is run ( its not a lot ) 1 devices run select id from `attributes` where `sso_id`='<some_id>' select user,pass,time,sessions.session from `attributes` left join sessions on attributes.sso_id=sessions.sso_id where `attributes`.`sso_id`='<some_id>' insert into `attributes` ( `sso_id`, `user`,`class`,`time`,`ttl` ) VALUES ( '<some_id','<some_user>','<some_class>',<time>, <ttl>) on duplicate key update user=values(user), pass=values(pass), time=time, ttl=values(ttl) insert into `sessions` ( `sso_id`, session ) values ( '<some_id>','<apache_session_data>') on duplicate key update session=values(session) delete from `attributes` where `sso_id`='<some_id>' delete from `sessions` where `sso_id`='<some_id>' The other devices run select * from `attributes` where `sso_id`='<some_id>' update `attributes` set ttl=<ttl> where sso_id='<some_id>'
--
--
------------------------------- Apostolos Pantsiopoulos R & D Dpt. Kinetix Tele.com Support Center
-------------------------------




