atom feed5 messages in org.apache.hadoop.hbase-userHTable checkAndPut equivalent for Del...
FromSent OnAttachments
Michael DaltonApr 30, 2010 2:50 pm 
Jonathan GrayApr 30, 2010 2:57 pm 
Ryan RawsonApr 30, 2010 2:57 pm 
Michael DaltonApr 30, 2010 3:26 pm 
Michael DaltonApr 30, 2010 3:56 pm 
Subject:HTable checkAndPut equivalent for Deletes
From:Michael Dalton (mwda@gmail.com)
Date:Apr 30, 2010 2:50:51 pm
List:org.apache.hadoop.hbase-user

Hi everyone,

I have a quick question -- I'd like to do a simple atomic check-and-Delete for a row. For Put operations, HTable.checkAndPut appears to allow a simple atomic compare-and-update, which is great. However, there doesn't seem to be an equivalent function for deletes.

I was thinking about approximating this by writing NULL or zero-length byte array as a value in a Put to emulating deleting a cell. It appears that checkAndPut already treats a zero-length array as equivalent to a non-existent value when performing its comparison (before committing the Put). The only drawback I can see to this is that I never truly remove rows, I just end up with 'dead' rows containing empty byte arrays, so I'd imagine that every N hours or days I would need to garbage collect these empty rows somehow (which brings us back full circle to the issue of how to atomically check and delete a row).

The only real alternative I can see for doing this would be to emulate checkAndDelete by using RowLocks to lock the row, perform a Get, verify that the row contains the expected value, then perform a delete, and then unlock the row itself. Correct me if I'm wrong, but this should definitely emulate the semantics of atomic compare-and-Delete (assuming the compare and delete operate on the same row and use the RowLock). However, I'm not sure what the performance would be for using RowLocks to emulate checkAndDelete on the client side vs. using Put+checkAndPut to emulate checkAndDelete on the server side. Does anyone have any advice on this issue, or any idea what the relative tradeoffs are?

In the long run, it seems to me that the clearly optimal solution would be to have a checkAndDelete function in HTable, and I'd be interesting in adding this functionality if no one else is currently working on it. Is that something that would be interesting to integrate and worth committing back to mainline? Are there any hidden pitfalls I should be aware of, or some technical/design reason for why this API call doesn't already exist? If not, I'll take a hard look at the delete and checkAndPut code in the regionserver and once sometime soon open an issue in JIRA and start coding.

Best regards,

Mike