3 messages in com.mysql.lists.clusterRe: Real disk space - NDBcluster - My...
FromSent OnAttachments
Apostolos Pantsiopoulos18 Aug 2006 13:15 
Benton, Kevin18 Aug 2006 15:08 
Apostolos Pantsiopoulos18 Aug 2006 17:06 
Subject:Re: Real disk space - NDBcluster - MyISAM - InnoDB
From:Apostolos Pantsiopoulos (mys@easy-things.com)
Date:08/18/2006 05:06:08 PM
List:com.mysql.lists.cluster

Packing the data will be an essential part of the application. But the way of packing them differs depending on the engine that will be used. By using MyISAM packing the data on a compressed engine would be a good solution. On InnoDB that would work too. But by using ndbcluster this is not very efficient since my app is using multiple MySQL Api's to access the data on different hosts so if I use an archive table it will only be accessible to the particular node that the table will be created. And since the app is using a random algorithm to connect the client with a database API node this will not work. So I was thinking of using a second ndbcluster table to pack the data (not saving much in disk space) but at least this rarely accessed table will not be cached after a while. Also the idea of moving the data to my own archive flat files is interesting.

Since this will be a billing platform performance (trhoughput -response time) - consistency - redundancy are all important factors.

when I posted the question I needed a more simple (empirical) opinion... examples of people who migrated to cluster and saw a great deal of growth in disk usage.

And since the performance issue came up: does stripping has a great impact on performance? If I use 4 ndb nodes and 2 replicas would that configuration be faster than 2 nodes with 2 replicas (or 3 nodes with 3 replicas)?

Benton, Kevin wrote:

Has anyone measured the real disk data space difference between the three engines (on the same database schema and data)?

Any measurement one could make would be specific to the method, type and amount of data added, regardless of the storage engine. A better method to forecast disk space utilization is to use your own measurements to determine what you'll expect based on the kinds of data you're expecting to see. With MyISAM, for example, do you plan to pack your data after a period of time? Do you plan to use the archive engine at all? Do you do more inserts / updates or more queries? How important is overall performance to you? What kind of performance is most important? What is your base OS planed to be? How is the underlying file system organized?

I can tell you that disk is usually the slowest method of accessing data and if performance is your goal, plan to spend good money on disk, high-speed memory, and if possible, up to eight fast processors for each host.

How's that for "not answering your question." :)