8 messages in com.xensource.lists.xen-ia64-develRe: [Xen-ia64-devel]RID virtualizatio...
FromSent OnAttachments
Xu, Anthony23 May 2007 23:02 
Isaku Yamahata23 May 2007 23:48 
Xu, Anthony24 May 2007 01:25 
Isaku Yamahata24 May 2007 01:59 
Xu, Anthony24 May 2007 02:04 
Xu, Anthony30 May 2007 00:38 
INAKOSHI Hiroya12 Jun 2007 23:06 
Xu, Anthony12 Jun 2007 23:45 
Subject:Re: [Xen-ia64-devel]RID virtualization discussion
From:INAKOSHI Hiroya (inak@jp.fujitsu.com)
Date:06/12/2007 11:06:29 PM
List:com.xensource.lists.xen-ia64-devel

Hi, Anthony,

here are two experimental results regarding the discussion on rid virtualization. One is SpecJBB, where two VT-i guests execute it sharing the same logical processors. The other is TPC-C, as a more practical workload.

1/ SpecJBB I employed a 4-core server. Domain-0 has one vcpu pinned on lp#0. A VT-i guest has two vcpus pinned on lp#2 and #3, so the two guests share the same logical processors. I will show only the overhead caused by the patch. It was about 2.2%. The number of TLB flushing for each lp in 60 seconds was:

lp#0 lp#1 lp#2 lp#3 ---------------------------- 36734 0 6733 8104

Most of them occurred in Domain-0.

2/ TPC-C I employed a different 8-core server for TPC-C. Domain-0 has one vcpu pinned on lp#0. The VT-i guest has four vcpus pinned on lp#1 through lp#4. Please note that I have one VT-i guest in this case. I will show only the overhead caused by the patch. It was about 1.6%. The number of TLB flushing for each lp in 60 seconds was:

lp#0 lp#1 lp#2 lp#3 lp#4 lp#5 lp#6 lp#7 ------------------------------------------------------------ 505550 17531 23472 21544 21154 0 0 0

Similarly, most of them occurred in Domain-0. A TPC-C export told me that there should be at most 2% of perturbation among trials in this server settings. Note that this comment is on bare-metal case, though I have no evidence the situation is different on virtualized servers.

TLB flushing seems infrequent in VT-i guests. Because the frequency would be sub-linear to the number of guests, I suppose that the penalty caused by missing rid virtualization would be less significant.

Regards,

Hiroya

Xu, Anthony wrote:

More tests.

Test case: Specjbb

Platform: 6 physical cpus with HT disable

Guest Env: 2 vti-guest each with 4 vcpus pined on same physical cpu Guest1: Vcpu1 pined on pcpu2 Vcpu2 pined on pcpu3 Vcpu3 pined on pcpu4 Vcpu4 pined on pcpu5 Guest2: Same as guest1

Without flushing: Score: 11066

With flushing: Score: 11031 Flushing TLB times: 3973286 Flushing times per second: 3014/s

The penalty is less than 0.5%.

Definitely, we need to run "big benchmark” to get answer how much ptc.e will impact performance. Hope community can do more tests.

Thanks, Anthony

-----Original Message----- From: xen-@lists.xensource.com [mailto:xen-@lists.xensource.com] On Behalf Of Xu, Anthony Sent: 2007年5月24日 17:05 To: Isaku Yamahata Cc: Xen-ia64-devel Subject: RE: [Xen-ia64-devel]RID virtualization discussion

From: Isaku Yamahata Sent: 2007年5月24日 17:00 To: Xu, Anthony Cc: Xen-ia64-devel Subject: Re: [Xen-ia64-devel]RID virtualization discussion

We have tested following cases There are 6 physical processors. And local_purge_all is executed about 2000 per second on each processor.

Dom0(1vcpu) + domU(2vcpu) Dom0(1vcpu) + domU(4vcpu) Dom0(1vcpu) + vti(2vcpu) Dom0(1vcpu) + vti(4vcpu) Dom0(1vcpu) + vti(2vcpu) + vti(2vcpu)

Thank you for explanation. Given that # of vcpu < # of pcpu, we can assume each vcpus are bounded to pcpu. So context_switch() is called only when pcpu goes to idle or pcpu is waked up from idle.

Probably you may want to insert tlb flush into continue_running() which is called when vcpu uses up time slice and it is chosen again. Thus tlb is flushed each time slice.

There is about 2000 vcpu switch per second on each processor. That's a lot of vcpu switch.

I can do a test with #vcpu> #pcpu.

Thanks, Anthony