One discussion I am aware of related to replication. The question was should there be effort put into edit conflict resolution by tracking the causal ordering of edits, e.g. by way of using vector clocks or similar techniques, or is that even necessary given Bigtable's inherent multiversioning. Either technique pushes the task of conflict resolution up to the client, but the latter is "inline" with Bigtable's data model and conventional API, and the implementation effort is therefore lower and the complexity of the resulting replication model is more simple, an important consideration (easier to demonstrate correctness, etc.). It is true that if clients are setting the timestamp dimension to user defined values, then an edit on one cluster may conflict with an edit written to another, with the conflict "resolved" from the client perspective as if by arbitrary coin toss.
From: "Bria...@nokia.com" <Bria...@nokia.com>
Sent: Sunday, June 21, 2009 9:30:39 AM
Subject: Use of versions in transaction and replication strategies
Not too long ago their were some discussions related to implementing transactions and/or replication strategies that made use of column versions. I can't remember whether the discussions happened in this forum or in one or more Jira tickets. Regardless, my question is whether there will indeed be a reliance on cell versions to implement either of these features. I ask because currently, the API allows the caller to decide what value is used for the version stamp. It doesn't have to be a timestamp. In fact, some folks have talked about not even using versions as versions, but rather as an additional data dimension.
So my question really boils down to whether HBase would need to restrict the use of versions if versions were indeed used to implement transactions and/or replication.