Could you post up some of your ideas to achieve these speedups? I'm
fascinated by this area, because it is such a crucial one if freebsd
is to perform well after all the work in unwinding giant.
Mostly boring stuff like making sure that important mutexes live in
their own cache line to avoid false sharing and tweaking some code to
avoid unnecessary invalidation of cache lines. There are also some
architecture specific assembly tweaks that I like to try. Maybe a few
hacks for dynamic run time patching to allow processor specific and
SMP/UP optimizations on a GENERIC kernel. Replacing cli/sti with a
spl() style interrupt enabler/disabler for i386 is also something I
would like to test to speed up spin locks. Restoring single thread
wakeup for sleep mutexes is also on the list. Once I start digging I
will probably find more things to try.
If you want an excellent candidate for cache line contention foo, you
might take a glance at the uma_pcpu_mtx array in UMA.
This may well be obsoleted by my changed to UMA to use critical sections
instead of mutexes here, but it would be very interesting to see what
happens here since it's an example of high probability simultaneous
access/low probability contention mutexes that are packed tightly. The
impact on performance, if significant, would be measurable using a broad
range of benchmarks. Some other interesting candidates might be:
- Mutex pool mutexes in kern_mtxpool.c.
- The sockbuf send/receive mutexes in struct socket, and in fact the
struct sockbufs themselves.
We might also want to investigate a struct mtx_with_pad that includes the
necessary padding, to be used for static mutex structures that are
probably getting packed with the oddest stuff. I.e., sigio_mtx, devmtx,
mac_policy_mtx, malloc_mtx, lockbuilder_pool, tid_lock, callout_lock,
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
rob...@fledge.watson.org Principal Research Scientist, McAfee Research