atom feed51 messages in org.freebsd.freebsd-armRe: Performance of SheevaPlug on 8-st...
FromSent OnAttachments
Maks VerverMar 6, 2010 12:39 pm 
Bernd WalterMar 6, 2010 1:16 pm 
Bernd WalterMar 6, 2010 1:51 pm 
M. Warner LoshMar 6, 2010 2:25 pm 
Maks VerverMar 6, 2010 5:39 pm 
Bernd WalterMar 6, 2010 10:59 pm 
Maks VerverMar 7, 2010 11:55 am 
Bernd WalterMar 7, 2010 12:11 pm 
Rafal JaworowskiMar 7, 2010 12:30 pm 
Mark TinguelyMar 7, 2010 1:25 pm 
Maks VerverMar 7, 2010 1:38 pm 
Bernd WalterMar 7, 2010 4:26 pm 
Bernd WalterMar 7, 2010 5:30 pm 
Bernd WalterMar 7, 2010 6:16 pm 
Mark TinguelyMar 7, 2010 6:59 pm 
Bernd WalterMar 8, 2010 12:20 am 
Jacques FourieMar 8, 2010 12:25 am 
Hans Petter SelaskyMar 8, 2010 1:06 am 
Bernd WalterMar 8, 2010 4:40 am 
Mark TinguelyMar 8, 2010 5:57 am 
M. Warner LoshMar 8, 2010 6:07 am 
Maks VerverMar 8, 2010 6:28 am 
Grzegorz BernackiMar 8, 2010 7:50 am 
M. Warner LoshMar 8, 2010 8:14 am 
Mark TinguelyMar 8, 2010 10:18 am 
Bernd WalterMar 8, 2010 10:41 am 
Mark TinguelyMar 8, 2010 11:36 am 
Bernd WalterMar 8, 2010 11:54 am 
Maks VerverMar 8, 2010 3:50 pm 
Rafal JaworowskiMar 9, 2010 2:03 am 
Grzegorz BernackiMar 9, 2010 8:11 am 
Mark TinguelyMar 9, 2010 10:11 am 
Grzegorz BernackiMar 10, 2010 5:57 am 
Rafal JaworowskiMar 10, 2010 6:04 am 
Mark TinguelyMar 10, 2010 6:20 am 
Bernd WalterMar 10, 2010 6:37 am 
Rafal JaworowskiMar 10, 2010 7:52 am 
Mark TinguelyMar 10, 2010 8:41 am 
Mark TinguelyMar 10, 2010 10:06 am 
Rafal JaworowskiMar 11, 2010 1:18 pm 
Maks VerverMar 12, 2010 9:51 am 
Maks VerverMar 12, 2010 11:58 am 
Mark TinguelyMar 12, 2010 1:20 pm 
Mark TinguelyMar 15, 2010 10:50 am 
Mark TinguelyMar 22, 2010 7:54 am 
Olivier HouchardMar 22, 2010 8:05 am 
Mark TinguelyMar 22, 2010 9:25 am 
Steve WoodfordMar 23, 2010 1:14 am 
Grzegorz BernackiMar 23, 2010 4:13 am 
Mark TinguelyMar 23, 2010 5:56 am 
Mark TinguelyNov 3, 2010 9:08 am 
Subject:Re: Performance of SheevaPlug on 8-stable
From:Bernd Walter (tic@cicely7.cicely.de)
Date:Mar 8, 2010 11:54:26 am
List:org.freebsd.freebsd-arm

On Mon, Mar 08, 2010 at 01:37:23PM -0600, Mark Tinguely wrote:

<deleted>

This puzzled me as well. What is the requirement for such a handling with shared pages? I though handing over shared data is done by cache-flush, barriers or whatever an architectur has for this. Most systems we talk about are single CPU, so it is just DMA and handing over dcache writes to icache, but we don't support self modifying code, so it is always done in a controlled way. And even for SMP systems handing over data requires using cache coherence mechanisms - e.g. those embedded in mutexes. So what is wrong in my picture and requires us to do special handling for shared pages on ARM?

And if there's only one copy of 'test' running, why does it hit the 'shared' case for this code?

Warner

-- B.Walter <ber@bwct.de> http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

ARMv4/ARMv5 use virtual indexed / virtual tagged level one caches. They may or may not have level two caches. This is the ARM chips that we currently support, and I will explain the rules below.

Newest processors the ARMv6 can be virtual index / physical tagged or physical index / physical tagged level one caches; The ARM7 must have physical index / physical tag level one caches. The ARMv6 and ARMv7 have more pde/pte bit explaining the cache status on the "inner" and "outter" caches. The ARMv7 has the more mature cache management; it defines the "level of unity" and "level of coherence" for the caches. There is also a level snooping for the ARMv7 mulit-core, that I will just dance around. PIPT cache must be synced to the "level of coherency" before DMA and when modified from another process - think debugger in another address space modifying instruction code. ARMv6/ARMv7 have special address spaces to avoid tlb flushes. If they are not used, then tlbs have to be flushed on context switch. This is close to the i386/amd64 with the exception of DMA, the i386/amd64 have self snooping cache buses.

VIVT cache rules:

1) flush cache and tlb on context change.

2) USER cache must be disabled if a physical page has AT LEAST one writable user mapping AND is also mapped more than one time in the same user address space. (multiple read mappings and no writes are fine, they take up multiple cache entries. Obviously, a single read or a single write is fine. If the mappings are in different user address spaces, we will be okay because the flush on context change will sync things up).

3) KERNEL spaces are global. a) If the page is mapped writable AT LEAST ONCE to a kernel space AND the page is mapped more than once, no matter if the second mapping is in the user or kernel space, all mappings must not be cached.

I never assumed to be happy without a direct map.

b) If the page has only readable kernel mappings but at least one writable user mapping, the cache must be disabled for the mappings of page in this address space. This is slightly different from rule 2. Kernel mappings are typically writable, so this is a case that really does not happen.

It gets a little tricky to implement, because we have to catch the transition from cache -> non-cache (change pte and wbinv/inv data or instruction caches) and from non-cache -> cache (change the pte).

Thanks for the detailed explanation. I took a while, but now I got it. My picture wasn't expecting caching virtual pages.