|Subject:||[axis2] Status on Axis2 1.5.1 and Rampart 1.5|
|From:||Glen Daniels (gl...@thoughtcraft.com)|
|Date:||Oct 12, 2009 7:08:10 am|
OK, so here are the results of my weekend investigations. The lockup when running the Rampart 1.5 tests with Axis2 1.5.1 was due to http connection starvation. I've fixed two issues and everything works now, but I'd like to respin both Axis2 1.5.1 and Rampart 1.5 as a result. Details below.
First, a quick summary of a major change in Axis2 1.5.1 : we were formerly creating new MultithreadedHTTPConenctionManagers all the time in the HTTP sender code. In typical usage you'd never see connection pool starvation (since each new MHCM had a new pool), but two major problems occurred. 1) Connection reuse wasn't really possible, and 2) we would eventually (in high-volume situations) run into the OS limits for open sockets. So I fixed this so that 1.5.1 now re-uses a single MHCM for each ConfigurationContext, which allows for sharing connections across ServiceClient instances.
The bigger problem *behind* the problem above is that users of the commons HTTPClient library (like Axis2) need to call releaseConnection() on each and every HTTPMethod after they are finished. The ServiceClient.cleanupTransport() call does this, but since we never told people to call that explicitly, no one was in the habit of doing it. A number of bugs about connection starvation came up, and we put in the Options.setCallTransportCleanup() option, which automatically calls cleanupTransport() after each call, but at a cost - since we're releasing connection resources you need to make sure you've read everything, which means building the whole Axiom tree. Bye-bye, streaming. So I also added a different connection cleanup option which automatically cleans up the *last* operation as you're setting up the next one.
So, to make the Rampart story very short, the problem was this: a new ServiceClient gets created to deal with SecureConversation interactions (see STSClient.getServiceClient()). This SC shares the same ConfigurationContext with the outer (i.e. user) SC, so it shares a MHCM and a connection pool. The problem is since the STS operations happen inside a user-level operation, the record of the "last operation" gets overwritten, and as a result my automatic cleanup mechanism can't catch both! So we lose one connection each time we go through the STS process, and that causes a hard lock.
I did two things to fix this, both of which I think should be reflected in the released code. First, in Rampart, I added a call to setCallTransportCleanup(true) in STSClient - this means that the STS operations will be forced to build the complete Axiom tree (see above), but solves the connection starvation issue. Second, in Axis2, I added a default 30-second timeout while waiting for new connections - this doesn't change the functionality at all, but it does mean that we can no longer get into situations where the system just locks up forever. With that change, we'll now at least get an Exception if there's a starvation issue, which can then be debugged.
Nandana/all, can you check what I did in Rampart and let me know if you foresee any problems with it? I'm going to respin Axis2 1.5.1 with this and one other fix, and we should respin Rampart 1.5 as well.