4 messages in com.mysql.lists.clusterRe: unable to start (infinite crash l...| From | Sent On | Attachments |
|---|---|---|
| Devananda | 20 Jul 2004 18:55 | |
| Devananda | 20 Jul 2004 19:36 | |
| Johan Andersson | 20 Jul 2004 21:32 | |
| Johan Andersson | 23 Jul 2004 14:12 |
| Subject: | Re: unable to start (infinite crash loop) and workaround![]() |
|---|---|
| From: | Johan Andersson (joh...@mysql.com) |
| Date: | 07/23/2004 02:12:29 PM |
| List: | com.mysql.lists.cluster |
Hi,
Thank you very much.
Devananda wrote:
While not running any inserts, the results of /sbin/hdparm were Timing buffer-cache reads: 3744 MB in 2.00 seconds = 1872.00 MB/sec Timing buffered disk reads: 140 MB in 3.02 seconds = 46.36 MB/sec
At the time of this crash, I was only running inserts from 3 API nodes. I can just as easily run from 1 or from all 5. Today, I cleared all the data directories, started up fresh, and began running inserts from 4 of the 5 API nodes, and watched vmstat. The 'bo' column was consistently over 1000, usually around 5000, and sometimes spiked to 9- or 10,000.
So the disk is pretty loaded (assuming you have 4096 bytes blocks), and also considering that write perf is generally slower than read. Change TimeBetweenGlobalCheckpoints to 500ms Also set TimeBetweenLocalCheckpoints to 20. Both these changes will make the disk writes spread out a bit more. Try that. I am curious to know if there is any difference in the blocks out column.
Also, it is interesting to see cat /proc/loadavg when you have these peak
I will distribute the information to the right people. Thank you very much.
Good luck you too, johan
The only things running on these boxes, besides the cluster, is php, which I use for the insert script.
Good luck! Let me know if there's anything more I can do.
Best regards, Devananda
Johan Andersson wrote:
Hi,
Thank you for providing us with important test data! I have also noticed problems with system restart and your information is very valuable. We are very interested in getting the tracefiles. Can you put together the information as follows:
devananda_mgm.tgz (cluster.log + config.ini) devananda_db12.tgz (tracefiles + error.log) devananda_db13.tgz (tracefiles + error.log) devananda_db14.tgz (tracefiles + error.log) devananda_db15.tgz (tracefiles + error.log)
Send this to me privately (because of the size of the attachments) and I will distribute it to the right people!
Also, I noticed that your NDB nodes started to miss heartbeats. So I have a couple of questions and recommendations:
* Was the system heavily loaded when the nodes started to miss heartbeats? * What hardware are you using (Disk subsystem (IDE, SCSI), CPU? * Does two or more NDB nodes share a single disk? * Can you do /sbin/hdpart -Tt /dev/hdX (where X is the drive that keeps the NDB filessytem)?
If the system is heavily loaded and the disks are slow then there is a chance that the NDB nodes can miss heartbeats. This can happen because the NDB nodes writes checkpoints and transaction logs to disk, and this can be very disk intensive. If you can do vmstat 1 (a program that atleast exist on Linux) and give me information about how many blocks per second that are written to disk (look for the "bo" column) and also what block size and file system (ext3, reiserfs...) you are currently using.
A way to flatten out the disk writes is to change the TimeBetweenLocalCheckpoints to ~500. This means that the redo log buffers will be flushed to disk often and thus reducing disk writes. Otherwise, during high load (write load) the REDO log buffers can become big, resulting in a lot of information that must be written to disk. During these disk writes on a very loaded system other processes can be stalled because they must wait for I/O, thus resulting in that heartbeats might not be sent as they should, because other processes must also do I/O.
In any case, you should always be able to do a system restart. Sorry for the inconvenience caused and thanks again for you help.
Best regards, Johan Andersson
Devananda wrote:
Sorry! forgot to post the workaround.
executed 'all stop' and deleted the ndb data storage directory on 2 of my 4 DB nodes (one from each pair). When I restarted the cluster, it started slowly and then copied data over onto the 2 that I deleted. This worked once, but I am having trouble making it work a second time.




