9 messages in com.googlegroups.google-appengine[google-appengine] Re: felxible shard...
FromSent OnAttachments
conmanAug 13, 2008 4:11 am 
Calvin SpealmanAug 13, 2008 3:07 pm 
BillAug 13, 2008 7:03 pm 
conmanAug 14, 2008 12:16 am 
conmanAug 14, 2008 12:39 am 
BillAug 14, 2008 10:38 am 
BillAug 14, 2008 11:39 am 
Bryan A. PendletonAug 14, 2008 12:15 pm 
conmanAug 14, 2008 11:58 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[google-appengine] Re: felxible sharded counterActions...
From:conman (cons@googlemail.com)
Date:Aug 14, 2008 11:58:39 pm
List:com.googlegroups.google-appengine

@Bill

If two threads try to put() self.num_shards and one gets overridden, that's OK.

Yes, I think the same way - but the put() can result in a failed transaction which you might catch with a try/except block - or the user recieves an internal server error in that case (which wouln't be neccessary because by ignoring it you know how to handle this case)

@Bryan You are right - a max shard number is neccessary because of the query limit. And what you told in point 2 is also a very resonable point I didn't think about! Any transient datastore problem whould cause my solution to increase the shard count up to the allowed maximum - and it would stay there although the datastore problem may be gone. What would be neccessary is some kind of statistics how many transactions have failed in the last hour or so and based on that inc or dec the shard number - but as you said this is a risky thing and it is also kind of an overkill :)

I guess i'm returning back to a manualy chosen static number of shards...

Thanks for this interesting discussion!

Regards, Constantin

On 14 Aug., 21:15, "Bryan A. Pendleton" <bpen@gmail.com> wrote:

In the question of "how much is enough", a few concerns present themselves:

1) Increasing the number of shards past 1000 is going to require a re- think of how to perform an actual count, because of the limit on results from DataStore queries. 2) Updating the number of shards based on a failed transaction seems risky - don't you really want to update the number of shards based on how often transactions are failing? If the contention is transient, increasing the number of shards will still permanently increase the time to calculate the count. Heck, in an ideal world - you might periodically combine/reduce high-number shards, and slowly back down the shard count to keep the number of shards to count from staying too large for too long. Can transient problems with Google's DataStore cause transactions to fail that would otherwise not? These kind of unbounded auto-adjusting algorithms have a tendency to blow up given unexpected causes for expected error conditions.

I second the question about how much contention costs - how many counter writes per second can be expected on a non-contended shard vs. one with heavy contention? How much contention starts to actually cause transactions to fail (vs just retry and eventually succeed?).

On Aug 14, 2:39 pm, Bill <bill@gmail.com> wrote:

I should have mentioned why num_shards doesn't matter to getting accurate results from the Counter. The get_count() finds all shards regardless of the value of num_shards because it's querying on the name property. So it'd be nice to get a large enough num_shards to spread the increment transactions around, but it's not necessary so overwrites on num_shard puts() is OK. -Bill