15 messages in com.mysql.lists.clusterRe: Will I have a runaway primary key...
FromSent OnAttachments
Jayson Larose04 Dec 2006 08:41 
mizi...@aim.com04 Dec 2006 09:20 
Jayson Larose04 Dec 2006 11:03 
Kevin Burton04 Dec 2006 13:28 
Adam Dixon04 Dec 2006 14:55 
Kevin Burton04 Dec 2006 15:23 
Jayson Larose04 Dec 2006 16:21 
Anatoly Pidruchny04 Dec 2006 17:23 
Kevin Burton04 Dec 2006 19:04 
Jayson Larose04 Dec 2006 20:51 
Anatoly Pidruchny04 Dec 2006 21:09 
mizi...@aim.com05 Dec 2006 08:56 
mizi...@netscape.net05 Dec 2006 09:10 
Stewart Smith12 Dec 2006 19:44 
Stewart Smith12 Dec 2006 19:46 
Subject:Re: Will I have a runaway primary key problem in the future if I do this?
From:Jayson Larose (jay@gni.com)
Date:12/04/2006 04:21:35 PM
List:com.mysql.lists.cluster

Hello, Kevin, Adam, et al.

The data being returned from the SNMP servers are key->value pairs, but each individual server that is polled will be giving back the same keys, so you run into the same duplicate primary key issue if you try to base your primary key off 'description' as in my sample database spec.

My question is more along the lines of how data will be allocated between the various NDB nodes during the course of the entire contents of the table getting deleted and recreated every polling cycle.

For example, using the original sample table I mentioned:

CREATE TABLE discoverables ( colo INT UNSIGNED NOT NULL, server INT UNSIGNED NOT NULL, description VARCHAR(255), value VARCHAR(255), KEY (colo), KEY (server) );

There is of course an 'invisible' BIGINT NOT NULL AUTO_INCREMENT primary key hiding in there, which gets incremented each insert.

When the poller for colo '1' wakes up, polls the various servers in it's polling jurisdiction it does the following: DELETE FROM discoverables WHERE colo='1'; INSERT INTO discoverables VALUES ( ('1','1','foo','bar'), ('1','1','baz','gonk'), ('1','2','foo','morestuff'), ('1','2','baz','yetmorestuff'), (...this continues for potentially several thousand records...) );

Then the poller for colo '2' wakes up, and does the same thing to it's subset of the data: DELETE FROM discoverables WHERE colo='2'; INSERT INTO discoverables VALUES ( ('2','1','foo','cheeseburger'), ('2','1','baz','twinkie'), (...et cetera...) );

And this happens on for as many colos as there are. Every 5 minutes or so.

How does all this deleting and inserting affect the state of the cluster? The MySQL 5.1 Reference Manual states in section 16.2.4 that:

Note: Also beginning with MySQL 5.1.6, tables using the NDB Cluster storage engine are implicitly partitioned by KEY, again using the table's primary key as the partitioning key. In the event that the Cluster table has no explicit primary key, the “hidden” primary key generated by the NDB storage engine for each Cluster table is used as the partitioning key.

( http://dev.mysql.com/doc/refman/5.1/en/partitioning-key.html )

Does this mean that as old, lower-ordered keys are deleted and new, higher-ordered keys are added to the database, the fields are actively being shifted amongst the various NDB nodes?

Kevin Burton wrote:

Another note... since it's key->value you could just use memcached .....

Just make sure to overprovision so that it doesn't start to GC the values.

Kevin

On 12/4/06, Adam Dixon <adam@gmail.com> wrote:

Hi, I have a table without a PK, which does about 5-10 inserts per second,

1) Am I going to have a table that hits its INT (or BIGINT, or whatever) max sometime in the future without periodic table rebuilding, even if there are only 10,000 or so rows in the table due high INSERT/DELETE volume and autoincrement?

In NDB at least, tables without a PK will automatically get a BIGINT UNSIGNED column (pretty sure although its not specifically mentioned in the manual to be a BIGINT) which is a maximum of 18,446,744,073,709,551,615. (or lots)

2) I read that MySQL cluster performs partitioning by default on the primary key. Would there be any performance hit due to this "wandering primary key", like having all your data winding up on one NDB node?

I can say that my particular table with about 32million rows is lightning quick (obviously with correct indexes) Cluster certainly achieved the speed and scalability we needed because of this table.