Adam Bultman writes:
For about 6 months, it was only the one "large" server (The ~22k user
one) and things ran fine. We have a Foundry SI4G load balancer which we
were using to load balance LDAP requests over 3 servers (one linux, two
solaris.)
What we found is that Courier-authdaemon will make a heck of a lot of
connections to LDAP, and never close them down
No, it doesn't. Each authdaemon will open exactly one LDAP connection, that
will remain open as long as authdaemon is running, or until there's 1-2
minutes of inactivity. There may be a temporary second connection, if you
use authenticated binds, but there will never be more than two persistent
LDAP connections from a single authdaemon processes.
I have also tried changing the number of daemons that the authdaemon
fires up, but that doesn't seem to make much of a difference. It has
ranged from 500 to 55 (I have it at 55 currently) but no matter what,
courier tempfails and irritates the users.
55 connections is required for, maybe, Yahoo or AOL. For you, it's overkill,
and the default of five authdaemon processes should be sufficient.
Assuming a very generous 100 milliseconds per LDAP lookup, a single
authdaemon process will handle ten lookups per second. Five of them will
handle fifty a second, three hundred a minute, and 18,000 per hour.
So, unless all of your 22K users logs on more often than once an hour, five
connections will be more than enough.
I'm at the end of my rope, and I can't figure out what to do next. Any
help would be appreciated.
More than likely your load balancer is broken, and must be fixed. My guess
is that it assumes that LDAP connections are short term connections. And
they are, with most simple-minded LDAP clients, that bind, query, and
disconnect. Authdaemon is more efficient than that. It opens a connection
and holds it open, avoiding the utterly useless waste of time for connecting
and disconnecting from the LDAP server, for each authentication time.
Each authdaemon is probably connecting to your LDAP server, through your
load balancer, with the first authentication attempt. After the response is
received, the load balancer assumes that the connection is no longer needed,
and drops it from its memory, but as far as the authdaemon and LDAP server
is concerned, it's still a valid connection.
With the next connection attempt, authdaemon gets a broken socket
indication, since the load balancer has dropped the first connection from
memory, and automatically reconnects to the LDAP server, that still thinks
the first connection exists.
It won't take long before everything comes crashing down.