atom feed67 messages in org.freebsd.freebsd-hackersRe: [RFT][patch] Scheduling for HTT a...
FromSent OnAttachments
Alexander MotinFeb 5, 2012 11:04 pm 
David XuFeb 5, 2012 11:59 pm 
Gary JennejohnFeb 6, 2012 2:08 am 
Alexander BestFeb 6, 2012 8:01 am 
Alexander MotinFeb 6, 2012 8:28 am 
Tijl CoosemansFeb 6, 2012 9:37 am 
Alexander MotinFeb 6, 2012 9:54 am 
Florian SmeetsFeb 6, 2012 11:07 am 
Alexander BestFeb 6, 2012 11:10 am 
Alexander MotinFeb 6, 2012 11:18 am 
Julian ElischerFeb 6, 2012 10:10 pm 
Ivan VorasFeb 8, 2012 3:06 am 
Andriy GaponFeb 11, 2012 5:34 am 
Alexander MotinFeb 11, 2012 6:21 am 
Konstantin BelousovFeb 11, 2012 7:35 am 
Andriy GaponFeb 11, 2012 9:04 am 
Alexander MotinFeb 13, 2012 11:56 am 
Jeff RobersonFeb 13, 2012 12:23 pm 
Alexander MotinFeb 13, 2012 12:54 pm 
Jeff RobersonFeb 13, 2012 1:39 pm 
Alexander MotinFeb 13, 2012 2:38 pm 
Alexander MotinFeb 15, 2012 11:46 am 
Jeff RobersonFeb 15, 2012 11:54 am 
Alexander MotinFeb 15, 2012 12:06 pm 
Alexander MotinFeb 15, 2012 8:41 pm 
Alexander MotinFeb 16, 2012 12:48 am 
Alexander MotinFeb 16, 2012 2:58 am 
Florian SmeetsFeb 16, 2012 1:28 pm 
Alexander MotinFeb 17, 2012 8:29 am 
Arnaud LacombeFeb 17, 2012 8:52 am 
Alexander MotinFeb 17, 2012 9:02 am 
George MitchellFeb 26, 2012 4:32 pm 
George MitchellFeb 26, 2012 4:37 pm 
Olivier SmedtsFeb 27, 2012 2:34 am 
George MitchellFeb 27, 2012 3:23 am 
Olivier SmedtsFeb 27, 2012 3:27 am 
Andriy GaponFeb 27, 2012 4:41 am 
George MitchellFeb 27, 2012 3:54 pm 
Adrian ChaddMar 2, 2012 3:05 pm 
George MitchellMar 2, 2012 4:14 pm 
Adrian ChaddMar 2, 2012 7:24 pm 
Alexander MotinMar 2, 2012 11:40 pm 
Ivan KlymenkoMar 3, 2012 12:18 am 
Adrian ChaddMar 3, 2012 12:59 am 
Alexander MotinMar 3, 2012 1:12 am 
Alexander MotinMar 3, 2012 4:53 am 
Ivan KlymenkoMar 3, 2012 7:25 am 
Alexander MotinMar 3, 2012 8:30 am 
Mario LoboMar 3, 2012 8:56 am 
Alexander MotinMar 3, 2012 9:56 am 
Ivan KlymenkoMar 3, 2012 11:15 am 
Arnaud LacombeApr 5, 2012 11:11 am 
Alexander MotinApr 5, 2012 11:45 am 
Attilio RaoApr 6, 2012 7:12 am 
Alexander MotinApr 6, 2012 7:26 am 
Attilio RaoApr 6, 2012 7:30 am 
Alexander MotinApr 6, 2012 7:40 am 
Alexander MotinApr 9, 2012 12:57 pm 
Arnaud LacombeApr 10, 2012 9:57 am 
Alexander MotinApr 10, 2012 10:18 am 
Alexander MotinApr 10, 2012 10:53 am 
Arnaud LacombeApr 10, 2012 11:45 am 
Alexander MotinApr 10, 2012 12:13 pm 
Mike MeyerApr 10, 2012 1:04 pm 
Arnaud LacombeApr 10, 2012 1:50 pm 
Mike MeyerApr 10, 2012 2:19 pm 
Adrian ChaddApr 11, 2012 3:19 pm 
Subject:Re: [RFT][patch] Scheduling for HTT and not only
From:Alexander Motin (ma@FreeBSD.org)
Date:Apr 6, 2012 7:26:51 am
List:org.freebsd.freebsd-hackers

On 04/06/12 17:13, Attilio Rao wrote:

Il 05 aprile 2012 19:12, Arnaud Lacombe<laco@gmail.com> ha scritto:

Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motin<ma@freebsd.org>:

On 17.02.2012 18:53, Arnaud Lacombe wrote:

On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin<ma@freebsd.org> wrote:

On 02/15/12 21:54, Jeff Roberson wrote:

On Wed, 15 Feb 2012, Alexander Motin wrote:

I've decided to stop those cache black magic practices and focus on things that really exist in this world -- SMT and CPU load. I've dropped most of cache related things from the patch and made the rest of things more strict and predictable: http://people.freebsd.org/~mav/sched.htt34.patch

This looks great. I think there is value in considering the other approach further but I would like to do this part first. It would be nice to also add priority as a greater influence in the load balancing as well.

I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :)

is there a place where all the patches are available ?

All my scheduler patches are cumulative, so all you need is only the last mentioned here sched.htt40.patch.

You may want to have a look to the result I collected in the `runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results available in `runs/freebsd'. On the dual package platform, your patch is not a definite win.

But in some cases, especially for multi-socket systems, to let it show its best, you may want to apply additional patch from avg@ to better detect CPU topology: https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd

test I conducted specifically for this patch did not showed much improvement...

Can you please clarify on this point? The test you did included cases where the topology was detected badly against cases where the topology was detected correctly as a patched kernel (and you still didn't see a performance improvement), in terms of cache line sharing?

At this moment SCHED_ULE does almost nothing in terms of cache line sharing affinity (though it probably worth some further experiments). What this patch may improve is opposite case -- reduce cache sharing pressure for cache-hungry applications. For example, proper cache topology detection (such as lack of global L3 cache, but shared L2 per pairs of cores on Core2Quad class CPUs) increases pbzip2 performance when number of threads is less then number of CPUs (i.e. when there is place for optimization).