| From | Sent On | Attachments |
|---|---|---|
| Sean Chittenden | Jun 2, 2008 5:21 am | |
| Claus Guttesen | Jun 2, 2008 8:26 am | |
| Gary Stanley | Jun 2, 2008 9:34 am | |
| Bruce Evans | Jun 2, 2008 10:55 am | |
| Bruce Evans | Jun 2, 2008 11:25 am | |
| Bruce Evans | Jun 2, 2008 5:10 pm | |
| Sean Chittenden | Jun 2, 2008 7:05 pm | |
| Sean Chittenden | Jun 2, 2008 7:11 pm | |
| Gary Stanley | Jun 2, 2008 7:58 pm | |
| Gary Stanley | Jun 2, 2008 8:24 pm | |
| Bruce Evans | Jun 3, 2008 8:02 am | |
| Bruce Evans | Jun 3, 2008 9:19 am | |
| Bruce Evans | Jun 3, 2008 9:31 am | |
| Bruce Evans | Jun 3, 2008 10:14 am |
| Subject: | Micro-benchmark for various time syscalls... | |
|---|---|---|
| From: | Bruce Evans (br...@optusnet.com.au) | |
| Date: | Jun 3, 2008 9:19:43 am | |
| List: | org.freebsd.freebsd-performance | |
On Mon, 2 Jun 2008, Gary Stanley wrote:
At 06:19 AM 6/2/2008, Bruce Evans wrote:
These are very slow. Are they on a 486? :-) I get about 262 ns for CLOCK_REALTIME using the TSC timecounter on all ~2GHz UP systems. The syscall overhead is about 200 nsec (170 nsec for a simpler syscall and maybe 30 nsec extra for copyin/out for clock_gettime()) and reading the TSC timecounter adds another 60 nsec, including a whole 6 nsec for the hardware part of the read (perhaps more like 30 nsec than 60 for the whoe read). The TSC doesn't work on all machines (never for SMP), but this will hopefully change. (Phenom is supposed to have TSCs that are coherent across CPUs, and rdtsc has slowed down from 12 cycles to 40+ to implement this :-(. Core2 already has a 40+ cycles rdtsc, but AFAIK it doesn't have coherent TSCs.) Other timecounters are much slower than the TSC, but I haven't seen one take 8000 nsec since 486 days.
Phenom's don't have TSCs that are coherent, as least on a few machines here:
According to the amd64 arch manual (volume 3 3.14 Sep 2007):
If CPUID 8000_0007.edx[8] = 1, then [details about hardware states...] then the TSC is suitable for use as a source of time. Google shows support for this feature in at least Linux and Xen.
Phenom also has a rdtscp instruction which is serializing.
4 CPUs, running 4 parallel test-tasks. checking for time-warps via: - read time stamp counter (RDTSC) instruction (cycle resolution) - gettimeofday (TOD) syscall (usec resolution) - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
new TSC-warp maximum: -4294919263 cycles, 00000000ffffe11b -> 0000000000009cbc new TSC-warp maximum: -4294919300 cycles, 00000000ffff74e4 -> 0000000000003060 | TSC: 2.24us, fail:3 | TOD: 2.24us, fail:0 | CLK: 2.24us, fail:0 |
The difference seems to be only about -0x6000, with an overflow bug in the test giving a value near -2^32.
The code to test the TSC to check for warping:
However, it seems that Core2's don't have any warping of the TSC. I tested that code on a core2quad for 8 hours with no TSC failures.
Interesting. Please check the manual. I don't have current Intel arch manuals handy
Bruce





