|Subject:||macro benchmark for mutex locks needed.|
|From:||Stephan Uphoff (up...@tree.com)|
|Date:||Nov 23, 2004 8:00:05 pm|
On Tue, 2004-11-23 at 11:49, Robert Watson wrote:
On Tue, 23 Nov 2004, Stephan Uphoff wrote:
I have a bunch of ideas to speed up spin and mutex locks somewhat. For this I need benchmarks to test different modifications.
While the micro-benchmark from rwatson@ is a good way to quickly test modifications to weed out unlikely candidates - jhb@ tests have shown that micro and macro-benchmarks do not always show the same result.
Running benchmarks and booting takes a lot of time. Since this is NOT one my favorite tasks I want to run generally accepted benchmarks so I can test (boot) each modification exactly once for each test machine.
If you think I should run certain benchmarks with certain parameters please tell me BEFORE I start testing!
I like to use netblast from src/tools/tools/netrate/netblast. It attempts to send packets as quickly as possible on a network interface, which is a CPU-intensive operation that is very sensitive to the cost of synchronization. On an SMP system, it also generates a moderate ithread load as the gig-e interface transmits, and that ithread will often contend on the network interface driver lock with the running netblast thread. As such, it changes that affect the cost and handling of contention are also visible in this benchmark. With the synchronization micro-benchmark, I see spin locks on SMP being faster with the atomic release removed, but in the netblast test, I see those spinlocks as slower on SMP, since they behave less well under contention.
(The above with 64-bit if_em cards on a dual-Xeon). Note that you'll want to make sure netreceive is running on a second box, or that you're sending to the broadcast address, or the icmp errors will substantially quench your send ability due to the asynchronouse report of the port closed.
My initial SMP test machine will be a Dell 1600SC dual-Xeon (P4 - 2.8 GHz/400MHz bus). It has a build in em Ethernet interface. Unfortunately it is only a 82540EM / 32bit chip and it shares the PCI bus with a few 33MHz PCI cards :-(. The machine has an unused pci bus with free PCI-X slot but I would need to order a server card. What is you normal data rate with this test - any chance that the 82540EM will be sufficient?
The data sink will be a 32bit em card with an ancient slow P4 processor using a cross-over cable. Since this combination is probably not able to sink enough data I plan to add a dummy static arp address for a dummy remote IP address to the SMP machine. This should keep the the data sink's em card from actually filling the receive buffers. Since this takes the pci bus and the slow processor out of the equation this should be a perfect data sink - right?