atom feed4 messages in org.codehaus.btm.devRe: [btm-dev] BTM-35
FromSent OnAttachments
Brett WooldridgeJan 17, 2010 4:21 pm 
Ludovic OrbanJan 18, 2010 2:28 am 
Brett WooldridgeJan 18, 2010 8:02 am 
Ludovic OrbanJan 22, 2010 12:38 am 
Subject:Re: [btm-dev] BTM-35
From:Brett Wooldridge (bret@gmail.com)
Date:Jan 18, 2010 8:02:28 am
List:org.codehaus.btm.dev

Ludovic,

I checked-in all BTM-35 related changes. I also addressed the issue with the recovery being asymmetric. In PoolingDataSource.startRecovery(), we call pool.getConnectionHandle() ... but in endRecovery(), rather than calling close() on the connection handle, we were calling close() on the RecoveryXAResourceHolder which was just setting the connection back to IN_POOL. This bypassed the new connection usage counting. endRecovery() now calls close() on the connection handle and everything seems happy. I put an error level log in the JdbcPooledConnection that will log if a connection is every set back to IN_POOL with a usage other than 0. In running the unit tests with logging enabled, I never saw this log occur.

In testing the recoverer, I did come across an invalid test. RecovererTest.testReentrance() is invalid. It fails sporadically on my machine, and after analysis it is due to non-determinism in the test itself, not indicative of anything wrong with the code.

The test creates 10 threads, starts them, then joins them. The assumption is that the first thread starts recovery, and the other 9 exit because recovery is already running. At the end of the test a check is made to ensure there was only 1 execution. However, on a sufficiently busy machine with other things going in the background, it is actually possible that (for example) thread 1 completes recovery before thread 10 is even serviced by the operating system. In this case, thread 10 can legitimately enter recovery. The result is that at the end of the test, executions might be greater than 1. I actually saw as many as four executions on repeated runs of the test -- but typically it completes without error. In any case, it is the non-determinism in the test that is the failure, not the underlying recoverer code (I checked). I'm not sure what you want to do with this test. The only way to make it deterministic is to somehow inject a sufficient lengthy delay into the recovery thread such that all other threads gets a chance to start and fail.

Brett

On Mon, Jan 18, 2010 at 7:28 PM, Ludovic Orban <ludo@gmail.com>wrote:

Brett,

Working with patches on Subversion is a major pain. I wish I had a Mercurial repository...

I've found a few things worth discussing but nothing that should hold you from committing your latest changes. Please commit all your changes to the trunk, I'll review them all in one go and send you my comments.

If you're curious the only thing which surprised me for now was your comment about forcing the counter back to 0 when the state changes to IN_POOL because of some shortcut in the recovery code. I need to dig in the details but if your analysis is correct this is a design mistake in the recovery code we will need to fix.

A bit unrelated: I've implemented and committed JTA 1.1 support (BTM-65). There is still one minor thing left: there is a potential race condition I left in the Scheduler class (there is one TODO in it). If you like mind blurring algorithmic challenges I invite you to fix this issue. If you don't care or have no time don't worry, I'll do it myself.

Thanks, Ludovic

2010/1/18 Brett Wooldridge <bret@gmail.com>

Ludovic,

I'm holding onto my BTM-35 changes pending your review. Normally I would just check-in a fix, but given the difficulty of this defect, I've been holding the changes out. I'm not sure what mode you prefer to work in -- using patches, or diff'ing directly to svn. Let me know, I'm fine with either mode in this case.