atom feed51 messages in org.freebsd.freebsd-armRe: Performance of SheevaPlug on 8-st...
FromSent OnAttachments
Maks VerverMar 6, 2010 12:39 pm 
Bernd WalterMar 6, 2010 1:16 pm 
Bernd WalterMar 6, 2010 1:51 pm 
M. Warner LoshMar 6, 2010 2:25 pm 
Maks VerverMar 6, 2010 5:39 pm 
Bernd WalterMar 6, 2010 10:59 pm 
Maks VerverMar 7, 2010 11:55 am 
Bernd WalterMar 7, 2010 12:11 pm 
Rafal JaworowskiMar 7, 2010 12:30 pm 
Mark TinguelyMar 7, 2010 1:25 pm 
Maks VerverMar 7, 2010 1:38 pm 
Bernd WalterMar 7, 2010 4:26 pm 
Bernd WalterMar 7, 2010 5:30 pm 
Bernd WalterMar 7, 2010 6:16 pm 
Mark TinguelyMar 7, 2010 6:59 pm 
Bernd WalterMar 8, 2010 12:20 am 
Jacques FourieMar 8, 2010 12:25 am 
Hans Petter SelaskyMar 8, 2010 1:06 am 
Bernd WalterMar 8, 2010 4:40 am 
Mark TinguelyMar 8, 2010 5:57 am 
M. Warner LoshMar 8, 2010 6:07 am 
Maks VerverMar 8, 2010 6:28 am 
Grzegorz BernackiMar 8, 2010 7:50 am 
M. Warner LoshMar 8, 2010 8:14 am 
Mark TinguelyMar 8, 2010 10:18 am 
Bernd WalterMar 8, 2010 10:41 am 
Mark TinguelyMar 8, 2010 11:36 am 
Bernd WalterMar 8, 2010 11:54 am 
Maks VerverMar 8, 2010 3:50 pm 
Rafal JaworowskiMar 9, 2010 2:03 am 
Grzegorz BernackiMar 9, 2010 8:11 am 
Mark TinguelyMar 9, 2010 10:11 am 
Grzegorz BernackiMar 10, 2010 5:57 am 
Rafal JaworowskiMar 10, 2010 6:04 am 
Mark TinguelyMar 10, 2010 6:20 am 
Bernd WalterMar 10, 2010 6:37 am 
Rafal JaworowskiMar 10, 2010 7:52 am 
Mark TinguelyMar 10, 2010 8:41 am 
Mark TinguelyMar 10, 2010 10:06 am 
Rafal JaworowskiMar 11, 2010 1:18 pm 
Maks VerverMar 12, 2010 9:51 am 
Maks VerverMar 12, 2010 11:58 am 
Mark TinguelyMar 12, 2010 1:20 pm 
Mark TinguelyMar 15, 2010 10:50 am 
Mark TinguelyMar 22, 2010 7:54 am 
Olivier HouchardMar 22, 2010 8:05 am 
Mark TinguelyMar 22, 2010 9:25 am 
Steve WoodfordMar 23, 2010 1:14 am 
Grzegorz BernackiMar 23, 2010 4:13 am 
Mark TinguelyMar 23, 2010 5:56 am 
Mark TinguelyNov 3, 2010 9:08 am 
Subject:Re: Performance of SheevaPlug on 8-stable
From:Bernd Walter (tic@cicely7.cicely.de)
Date:Mar 6, 2010 1:51:30 pm
List:org.freebsd.freebsd-arm

On Sat, Mar 06, 2010 at 10:17:16PM +0100, Bernd Walter wrote:

On Sat, Mar 06, 2010 at 09:39:57PM +0100, Maks Verver wrote:

Hi everyone,

After a bit of patching and tinkering I got my SheevaPlug to boot FreeBSD from a UFS2-formatted USB stick. To compare it with Linux I decided to run nbench to see how FreeBSD compares with Ubuntu (which is shipped with the SheevaPlug). To my surprise, the results were atrocious! FreeBSD scores about 50 times worse than Ubuntu.

Of course, this performance difference is too large to be caused by implementation differences. There must be something more fundemental wrong here. To simplify things, I created a simple testcase that counts up to the maximum value of an integer:

int main() { int i = 0; do ++i; while(i > 0); return 0; }

This compiles to: (both on Linux and on FreeBSD)

0000848c <main>: 848c: e3a03000 mov r3, #0 ; 0x0 8490: e2833001 add r3, r3, #1 ; 0x1 8494: e3530000 cmp r3, #0 ; 0x0 8498: cafffffc bgt 8490 <main+0x4> 849c: e3a00000 mov r0, #0 ; 0x0 84a0: e1a0f00e mov pc, lr

This stresses the CPU and not much else. Since there are three instructions in the loop and the SheevaPlug runs at 1.2 GHz, I expect this to take around (1<<31)*3/1.2e9 ~ 5.3687 seconds. On Ubuntu:

$ time ./test real 0m5.422s user 0m5.390s sys 0m0.020s

Exactly as expected. On FreeBSD on the other hand:

%time ./test 286.000u 0.000s 4:47.22 99.8% 40+1321k 0+0io 0pf+0w

This takes almost five minutes, or over 50 times as long! All of it is user-space CPU time. Does anybody have a suggestion why the CPU appears to run so slowly in FreeBSD?

I was tempted to say different compiler optimisaitons, but you say that the resulting code is the same. Such massive speed difference sounds a bit like cache problems. For what it's worth - I see it takes minutes (not finished yet) on 180MHz RM9200 as well.

[67]chipmunk.cicely.de# ./test 2185.000u 3.000s 42:03.86 86.6% 46+1532k 0+0io 0pf+0w

This is really a long time to count 2^32 with 180MHz. I would really say that there is something wrong.

According to dmesg IC is enabled: CPU: ARM920T rev 0 (ARM9TDMI core) DC enabled IC enabled WB enabled LABT 16KB/32B 64-way Instruction cache 16KB/32B 64-way write-back-locking-A Data cache

If the above calculation is correct I would expect it to finish after ~7 times more time than calculated. If the calculation is wrong, then why does Ubunto agrees with it?

I pored over my kernel configuration but I don't see anything suspect. I did (manually) apply Hans Petter Selasky's patch [1] to be able to boot from USB, and consequently removed the NFS and BOOTP stuff from the config provided at sys/arm/conf/SHEEVAPLUG. Furthermore I removed the NO_SWAPPING and NO_FFS_SNAPSHOT options (because I plan to attach a USB disk drive) and I left in the KDB and DDB options because as I think they do not significantly affect performance. Is this correct?

P.S. The strange thing is that stuff like network performance is perfectly fine. I can fetch FTP data at 11 MB/s, which is about the maximum possible on the cheap 100 Mbit switch I use, and is even a few percent better than Ubuntu. So it seems it's really the CPU that's the bottleneck, for no apparent reason.

FTP won't win that much from cache and our network stack might outweight the loss, so this all makes sense if IC cache won't work. I think you have a very interesting catch, although I don't know why it exactly is.