atom feed5 messages in org.freebsd.freebsd-archmacro benchmark for mutex locks needed.
FromSent OnAttachments
Stephan UphoffNov 23, 2004 4:27 pm 
Robert WatsonNov 23, 2004 4:51 pm 
Stephan UphoffNov 23, 2004 6:55 pm 
Stephan UphoffNov 23, 2004 8:00 pm 
Robert WatsonNov 23, 2004 10:49 pm 
Subject:macro benchmark for mutex locks needed.
From:Robert Watson (rwat@freebsd.org)
Date:Nov 23, 2004 10:49:56 pm
List:org.freebsd.freebsd-arch

On Tue, 23 Nov 2004, Stephan Uphoff wrote:

On Tue, 2004-11-23 at 11:32, Phil Brennan wrote:

Could you post up some of your ideas to achieve these speedups? I'm fascinated by this area, because it is such a crucial one if freebsd is to perform well after all the work in unwinding giant.

Mostly boring stuff like making sure that important mutexes live in their own cache line to avoid false sharing and tweaking some code to avoid unnecessary invalidation of cache lines. There are also some architecture specific assembly tweaks that I like to try. Maybe a few hacks for dynamic run time patching to allow processor specific and SMP/UP optimizations on a GENERIC kernel. Replacing cli/sti with a spl() style interrupt enabler/disabler for i386 is also something I would like to test to speed up spin locks. Restoring single thread wakeup for sleep mutexes is also on the list. Once I start digging I will probably find more things to try.

If you want an excellent candidate for cache line contention foo, you might take a glance at the uma_pcpu_mtx array in UMA.

This may well be obsoleted by my changed to UMA to use critical sections instead of mutexes here, but it would be very interesting to see what happens here since it's an example of high probability simultaneous access/low probability contention mutexes that are packed tightly. The impact on performance, if significant, would be measurable using a broad range of benchmarks. Some other interesting candidates might be:

- Mutex pool mutexes in kern_mtxpool.c. - The sockbuf send/receive mutexes in struct socket, and in fact the struct sockbufs themselves.

We might also want to investigate a struct mtx_with_pad that includes the necessary padding, to be used for static mutex structures that are probably getting packed with the oddest stuff. I.e., sigio_mtx, devmtx, mac_policy_mtx, malloc_mtx, lockbuilder_pool, tid_lock, callout_lock, cache_lock.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects rob@fledge.watson.org Principal Research Scientist, McAfee Research