78 messages in com.xensource.lists.xen-develRE: [Xen-devel] [PATCH] Add a timer m...
FromSent OnAttachments
Shan, Haitao30 Oct 2007 07:27.patch
Keir Fraser30 Oct 2007 09:12 
Dave Winchell30 Oct 2007 14:15.vpt, .c
Shan, Haitao30 Oct 2007 20:09.patch
Keir Fraser31 Oct 2007 00:09 
Dave Winchell01 Nov 2007 14:14.patch
Dave Winchell01 Nov 2007 14:21 
Keir Fraser02 Nov 2007 02:40 
Dave Winchell02 Nov 2007 08:50 
Keir Fraser02 Nov 2007 09:14 
Keir Fraser02 Nov 2007 09:35 
Dave Winchell02 Nov 2007 11:05 
Dave Winchell03 Nov 2007 14:17.Other
Keir Fraser03 Nov 2007 15:31.patch
Dave Winchell05 Nov 2007 06:36 
Dave Winchell07 Nov 2007 06:38.Other
Keir Fraser07 Nov 2007 06:39 
Dave Winchell07 Nov 2007 08:22 
Keir Fraser07 Nov 2007 09:10 
Keir Fraser07 Nov 2007 09:28 
Keir Fraser07 Nov 2007 09:47 
Dave Winchell07 Nov 2007 11:38 
Keir Fraser08 Nov 2007 00:07 
Dave Winchell08 Nov 2007 06:42 
Keir Fraser08 Nov 2007 06:52 
Dave Winchell08 Nov 2007 06:57 
Dave Winchell08 Nov 2007 07:08 
Dave Winchell09 Nov 2007 11:22.patch
Keir Fraser10 Nov 2007 02:54 
Dave Winchell12 Nov 2007 07:37 
Dave Winchell26 Nov 2007 12:57 
Keir Fraser06 Dec 2007 03:56 
Dan Magenheimer19 Dec 2007 10:56 
Dave Winchell19 Dec 2007 11:32 
Dave Winchell19 Dec 2007 11:39 
Dan Magenheimer03 Jan 2008 14:57 
Dave Winchell03 Jan 2008 15:23 
Dave Winchell04 Jan 2008 15:23.time
Keir Fraser08 Jan 2008 06:32 
Dave Winchell09 Jan 2008 08:52 
Dan Magenheimer09 Jan 2008 09:18 
Keir Fraser09 Jan 2008 11:13 
Dan Magenheimer25 Jan 2008 15:50.png
Dave Winchell27 Jan 2008 13:21 
Dan Magenheimer27 Jan 2008 16:29 
Dave Winchell28 Jan 2008 07:20 
Dan Magenheimer29 Jan 2008 14:34.png
Dave Winchell30 Jan 2008 07:24 
Deepak Patel30 Jan 2008 13:03 
Dave Winchell30 Jan 2008 13:44 
Dan Magenheimer01 Feb 2008 14:30.png
Dave Winchell04 Feb 2008 12:06 
Dave Winchell08 Feb 2008 13:20 
Dave Winchell11 Feb 2008 08:52.png
Dave Winchell14 Feb 2008 07:59 
Dan Magenheimer14 Feb 2008 08:20 
Dave Winchell14 Feb 2008 09:55 
Dan Magenheimer15 Feb 2008 08:46 
Dave Winchell15 Feb 2008 09:27 
Dave Winchell19 Feb 2008 07:26 
Dan Magenheimer19 Feb 2008 09:55 
Keir Fraser19 Feb 2008 11:29 
Dave Winchell19 Feb 2008 12:50 
Dan Magenheimer19 Feb 2008 15:37 
Dan Magenheimer20 Feb 2008 15:40 
Dan Magenheimer25 Feb 2008 08:41 
Dave Winchell25 Feb 2008 12:00 
Keir Fraser26 Feb 2008 00:25 
Dave Winchell26 Feb 2008 06:45 
Keir Fraser26 Feb 2008 06:56 
Dave Winchell26 Feb 2008 07:48 
Dave Winchell05 Mar 2008 07:06 
Keir Fraser05 Mar 2008 07:19 
Keir Fraser05 Mar 2008 09:20 
Dave Winchell05 Mar 2008 09:25 
Dave Winchell05 Mar 2008 09:42 
Dan Magenheimer05 Mar 2008 09:53 
Dan Magenheimer06 Mar 2008 15:35 
Subject:RE: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks
From:Dan Magenheimer (dan.@oracle.com)
Date:02/19/2008 03:37:43 PM
List:com.xensource.lists.xen-devel

percentage. However, our 32-bit clock skew seems to show a measureable problem now.

For the 32 bit guest, which timesource did it pick?

The dmesg output is hard to interpret on a 32-bit guest, but based on what we've seen, I think it was selecting hpet as timesource (because we specified clocksource=pit which would have been ignored on RHEL4-32). We are running another test with "clock=pit" to see if the skew goes away.

For Xen RHEL5 HVM guests: - I *think* clock=pit is sufficient for RHEL5-32

But still poor accuracy, right?

Unproven yet but I hope not. The nohpet and nopmtimer parameters are ignored on RHEL5-32 so the clock=pit (or clocksource=pit) is the only way to choose the clock source, and thus the only way to get good accuracy on RHEL5-32.

Oops, I see from my long list that I neglected to say that the two clocks (WALL and GTOD) on RHEL5 are only reported on RHEL5-64. RHELx-32 looks to have only the one, which is overridden with clock=.

[1] Is notsc necessary for proper ticks for RHEL4-32/RHEL5-64? (I *think* not as it has never come up in any email.)

I have not investigated this yet.

My *think* is based on: 1) observation of dmesg output for RHEL5-64 where specifying "nohpet nopmtimer" seems to select PIT for WALL timer; 2) no mention of tsc in the generic clocksource.c code nor in the i386-specific time code.

Still I would sleep better if this were definitive.

[2] In RHEL5, I *think* it is the WALL source that we care about?

I'll have to check on this too.

My *think* is based on our observations to date that clock=pit is insufficient to fix the skew problem (and doesn't change the dmesg WALL source output on RHELx-64)... nohpet and nopmtimer is required to change the WALL source output and fix the skew.

Again I would sleep better if this were definitive.

Thanks, Dan

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Tuesday, February 19, 2008 1:50 PM To: dan.@oracle.com Cc: Keir Fraser; xen-@lists.xensource.com; Deepak Patel; Dave Winchell Subject: Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks

Hi Dan,

Thanks for all the investigation you've done!

-Dave

Dan Magenheimer wrote:

Hi Dave --

Thanks for that observation on ltp running on one vcpu!

With "clocksource=pit nohpet nopmtimer" our clock skew problems seem to have been reduced to a reasonable percentage. However, our 32-bit clock skew seems to show a measureable problem now.

For the 32 bit guest, which timesource did it pick?

As a result, I've been doing some digging into kernel sources and have observed the following relative to RHEL4 (2.6.9-based) kernels and RHEL5 (2.6.18-based) kernels and thought I would document them for posterity. Some of our confusion arises from the fact that invalid command line parameters are silently ignored.

RHEL4: - clock= is a valid parameter for RHEL4-32 - clocksource= is not a valid parameter for RHEL4-xx - nohpet is a valid parameter for RHEL4-64, not RHEL4-32 - nopmtimer is not a valid parameter for RHEL4-xx - notsc is a valid parameter for RHEL4-32, not RHEL4-64 - SMP vs UP RHEL4-64 reports timekeeping in dmesg differently

For Xen RHEL4 HVM guests: - I *think* clock=pit is sufficient for RHEL4-32 [1] - I *think* nohpet is sufficient for RHEL4-64 [1]

RHEL5: - there are two kinds of timekeeping, WALL and gtod - clocksource= is a valid parameter for RHEL5-xx - clock= is a valid but deprecated parameter for RHEL5-xx - clock= and clocksource= are essentially equivalent - nohpet is a valid parameter for RHEL5-64, not RHEL5-32 - nopmtimer is a valid parameter for RHEL5-64, not RHEL5-32 - notsc is a valid parameter for RHEL5-64, not RHEL5-32 [1] - clock=pit changes the gtod source but not the WALL source[2] - nohpet nopmtimer changes the WALL source to PIT - /sys/devices/system/clocksource/clocksource0/... available_clocksource lists the possible clock sources current_clocksource lists the chosen clock source ..but neither of these works in a RHEL5 guest!

For Xen RHEL5 HVM guests: - I *think* clock=pit is sufficient for RHEL5-32

But still poor accuracy, right?

- I *think* clock=pit nohpet nopmtimer is sufficient for RHEL5-64

Other info: - As of 2.6.24.2, clock= is still valid (though still deprecated)

So, some open questions: [1] Is notsc necessary for proper ticks for RHEL4-32/RHEL5-64? (I *think* not as it has never come up in any email.)

I have not investigated this yet.

[2] In RHEL5, I *think* it is the WALL source that we care about?

I'll have to check on this too.

And finally, since invalid command line parameters are ignored. I think specifying: clock=pit nohpet nopmtimer will force the guest clock sources into the optimal state for all RHEL4 and RHEL5 both 32-bit and 64-bit guests (though see the question above on tsc). And we should keep an eye on kernel/time/clocksource.c to ensure the __setup("clock="...) line doesn't go away before RHEL6.

Note that if hpet=0 and pmtimer=0 were the default hvm platform parameters for all xen hvm guests (on all versions of xen), specifying kernel command line parameters would be unnecessary, but c'est la vie.

Oh, and to be complete, timer_mode=0 for 32-bit RHEL guests and timer_mode=2 for 64-bit RHEL guests.

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Tuesday, February 19, 2008 8:27 AM To: dan.@oracle.com Cc: Dave Winchell; Keir Fraser; xen-@lists.xensource.com; Deepak Patel Subject: Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks

Hi Dan,

ltp runs by default loading up only one vcpu. The -x option can be used to run multiple instances, though in this mode you will get test failures. I ran 8 instances on each guest for 16 hours, 25 min and the time error was -11 sec (-.019%) on each guest.

Regards, Dave

Dave Winchell wrote:

Hi Dan,

Mine was oversubscribed. 8 physical cpu, 2 guests, each with 8 vcpu. I ran one instance of ltp on each guest, continuously. I hope ltp loaded up all the vcpus. I seem to recall that it did, but I could be wrong. If it didn't, that would be a major difference between our tests. I'll verify this afternoon and run

multiple instances,

if necessary.

Thanks, Dave

Dan Magenheimer wrote:

Hi Dave --

No new results yet but one other question:

The problems we've seen with our testing have been with a heavily oversubscribed system: 8 physical CPU, six 2-vcpu 2GB guests running LTP simultaneously.

Was your LTP testing oversubscribed or just a single guest?

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Thursday, February 14, 2008 10:56 AM To: dan.@oracle.com Cc: Keir Fraser; xen-@lists.xensource.com; Deepak Patel; Dave Winchell Subject: Re: [Xen-devel] [PATCH] Add a timer mode that

disables pending

missed ticks

Dan,

Here are some boot snipets for rh4u564 on xen 3.2.

#1:

Feb 14 10:44:59 vs076 kernel: Bootdata ok (command line is ro root=LABEL=/ console=ttyS0 clocksource=pit nohpet) Feb 14 10:44:59 vs076 kernel: Linux version 2.6.9-55.ELsmp (brew@hs20-bc2-4.build.redhat.com) (gcc version

3.4.6 20060404

(Red Hat 3.4.6-3)) #1 SMP Fri Apr 20 16:36:54 EDT 2007 ... Feb 14 10:44:59 vs076 kernel: Kernel command line: ro root=LABEL=/ console=ttyS0 clocksource=pit nohpet Feb 14 10:44:59 vs076 kernel: Initializing CPU#0 Feb 14 10:44:59 vs076 kernel: PID hash table entries:

2048 (order: 11,

65536 bytes) Feb 14 10:44:59 vs076 kernel: time.c: Using 3.579545 MHz PM timer. Feb 14 10:44:59 vs076 kernel: time.c: Detected 1992.050

MHz processor.

... Feb 14 10:45:00 vs076 kernel: checking TSC

synchronization across 8

CPUs: passed. Feb 14 10:45:00 vs076 kernel: Brought up 8 CPUs Feb 14 10:45:00 vs076 kernel: Disabling vsyscall due to

use of PM timer

Feb 14 10:45:00 vs076 kernel: time.c: Using PM based timekeeping.

#2:

Feb 14 10:47:57 vs076 kernel: Bootdata ok (command line is ro root=LABEL=/ console=ttyS0 clocksource=pit nohpet nopmtimer) Feb 14 10:47:57 vs076 kernel: Linux version 2.6.9-55.ELsmp (brew@hs20-bc2-4.build.redhat.com) (gcc version

3.4.6 20060404

(Red Hat 3.4.6-3)) #1 SMP Fri Apr 20 16:36:54 EDT 2007 ... Feb 14 10:47:58 vs076 kernel: Kernel command line: ro root=LABEL=/ console=ttyS0 clocksource=pit nohpet nopmtimer Feb 14 10:47:58 vs076 kernel: Initializing CPU#0 Feb 14 10:47:58 vs076 kernel: PID hash table entries:

2048 (order: 11,

65536 bytes) Feb 14 10:47:58 vs076 kernel: time.c: Using 1.193182 MHz

PIT timer.

Feb 14 10:47:58 vs076 kernel: time.c: Detected 1991.959

MHz processor.

... Feb 14 10:47:59 vs076 kernel: checking TSC

synchronization across 8

CPUs: passed. Feb 14 10:47:59 vs076 kernel: Brought up 8 CPUs Feb 14 10:47:59 vs076 kernel: time.c: Using PIT/TSC based

timekeeping.

As you can see, I only get the pit if I specify nopmtimer.

Dan Magenheimer wrote:

Hi Dave --

Thanks for continuing to run tests!

Hmmm... I thought I had noticed that even though Linux will

acknowledge

the existence of the pmtimer, it still prints:

time.c: Using PIT/TSC based timekeeping.

I will check again, but assuming the clocksource for our tests is indeed pit, the huge difference in the results (yours vs ours) is baffling. I wonder if the difference may be the

underlying hardware.

Maybe we will try to ensure we can duplicate the results on

a different

box.

So your testing was with stock 3.2.0 xen bits (what

cset?) without

any of your [quote from below] "clock related tweaks

that I haven't

submitted, because I'm still characterizing them"?

None of the tweaks I mentioned are in this test. It was stock with some patches. However, none of the

patches are time

related to my knowledge and I checked vpt.c to make sure that it is

the same as

what's in unstable. The only difference is in pt_intr_post, where I set the

timer mode.

I don't have timer mode tied into our config process yet, which is different than official xen method.

(In pt_intr_post) else { + if(v->arch.paging.mode->guest_levels == 4) +

v->domain->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] =

HVMPTM_no_missed_ticks_pending; + else +

v->domain->arch.hvm_domain.params[HVM_PARAM_TIMER_MODE] =

HVMPTM_delay_for_missed_ticks; if ( mode_is(v->domain, one_missed_tick_pending) || mode_is(v->domain, no_missed_ticks_pending) ) {

Could you also send detail on the rhel4u4-64 kernel you are testing with, just to ensure we are not comparing apples and oranges? (Perhaps there's some way we can even share the identical disk image and vm.cfg file?)

And if our problem is indeed the pmtimer, I will need to submit another patch to Keir to add an hvm pmtimer platform variable. (Hmmm... I don't think he's even accepted the hpet variable patch yet. I'll have to check.)

Thanks, Dan

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Thursday, February 14, 2008 9:00 AM To: dan.@oracle.com Cc: Dave Winchell; Keir Fraser;

xen-@lists.xensource.com; Deepak

Patel Subject: Re: [Xen-devel] [PATCH] Add a timer mode that disables pending missed ticks

Hi Dan,

I ran the ltp tests with 3.2 and found the errors for a 16 hour run to be:

rh4u564 -9.9 sec (-.017%) rh4u464 -7.3 sec (-.013%)

There were no cliffs and the drift was linear.

I think the problem you had may be due to the use of the pm timer. If you still have the boot log, it would tell you.

When I first tried a guest on 3.2 with "clocksource=pit nohpet" I noticed that it picked the pm timer. Adding "nopmtimer", the guest will pick the pit.

The reason I didn't have the problem with our 3.1 base is that I had disabled the hpet and the pmtimer by not advertising them in the acpi tables. I did this so long ago, I forgot

that I had to

disable pmtimer as well as hpet.

So, can you re-run your test with "clocksource=pit nohpet

nopmtimer"?

You should see this in the boot messages:

time.c: Using PIT/TSC based timekeeping.

Thanks, Dave

Dave Winchell wrote:

Hi Dan,

Over the weekend the drift was +18 seconds for each

guest (no ntp).

The duration was 3900 minutes, so the error for each

was +.0077%.

Looking back through the data, it appears to drift linearly at this rate. I've attached a plot for rh4u5-64.

This accuracy is better than what I've seen before (.03-.05%). This may be due to the different load (ltp vs usex) or to

one of the

changes I've made recently. I'll do some

experimentation to see if

there is a fix I should propose.

This still doesn't address the radical drift you saw. The next step for me is to run 3.2 and see if I can

reproduce it.

Regards, Dave

Dave Winchell wrote:

Hi Dan,

Sorry it took me so long, but I finally ran an ltp test today. Its on rh4u4-64. I'm using the defaults for ltp and

using a script

called runltp. I had a usex load on rh4u5-64. No ntpd. virtual processors / physical processors = 2.

The clocks drifted -1 sec (4u5) and +1.5 sec (4u4) in

300 minutes

for -.005% and .008%.

I'm running a 3.1 based hypervisor with some clock related

tweaks that

I haven't submitted, because I'm still characterizing them.

I'm stopping the usex load on 4u5-64 now and

replacing it with ltp

and will leave the two guests running ltp over the weekend.

Regards, Dave

Dave Winchell wrote:

Hi Dan, Deepak:

Thanks for the data. Those drifts are severe - no wonder

ntp couldn't

keep then in synch. I'll try to reproduce that behaviour

here, with

my code base. If I can't reproduce it, I'll try 3.2.

If you can isolate what ltp is doing during the cliffs,

that would

be very helpful.

thanks, Dave

Dan Magenheimer wrote:

OK, Deepak repeated the test without ntpd and using

ntpdate -b before

the test.

The attached graph shows his results: el5u1-64

(best=~0.07%),

el4u5-64 (middle=~0.2%), and el4u5-32 (worst=~0.3%).

We will continue to look at LTP to try to isolate.

Thanks, Dan

P.S. elXuY is essentially RHEL XuY with some patches.

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Wednesday, January 30, 2008 2:45 PM To: Deepak Patel Cc: dan.@oracle.com; Keir Fraser; xen-@lists.xensource.com; akir@oracle.com;

Dave Winchell

Subject: Re: [Xen-devel] [PATCH] Add a timer mode

that disables

pending missed ticks

Dan, Deeepak,

It may be that the underlying clock error is too

great for ntp

to handle. It would be useful if you did not run ntpd and, instead did ntpdate -b <timeserver> at the start

of the test

for each guest. Then capture the data as you have

been doing.

If the drift is greater than .05%, then we need to

address that.

Another option is, when running ntpd, to enable loop

statistics in

/etc/ntp.conf by adding this to the file:

statistics loopstats statsdir /var/lib/ntp/

Then you will see loop data in that directory. Correlating the data in the loopstats files with the peaks in skew would be interesting. You will see

entries of the form

54495 76787.701 -0.045153303 -132.569229 0.020806776

239.735511 10

Where the second to last column is the Allan Deviation.

When that

gets over 1000, ntpd is working pretty hard. However,

I have not

seen ntpd completely loose it like you have.

I'm on vacation until Monday, and won't be reading email.

Thanks for all your work on this!

-Dave

Deepak Patel wrote:

Is the graph for RHEL5u1-64? (I've never tested

this one.)

I do not know which graph was attached with this. But

I saw this

behavior in EL4u5 - 32, EL4U5 - 64 and EL5U1 - 64 hvm

guests when I

was running ltp tests continuously.

What was the behaviour of the other guests running?

All pvm guests are fine. But behavior of most of the

hvm guests were

as described.

If they had spikes, were they at the same wall time?

No. They are not at the same wall time.

Were the other guests running ltp as well?

Yes all 6 guests (4 hvm and 2 pvm) the guests are

running ltp

continuously.

How are you measuring skew?

I was collecting output of "ntpdate -q <timeserver> every

300 seconds

(5 minutes) and have created graph based on that.

Are you running ntpd?

Yes. ntp was running on all the guests.

I am investigating what causes this spikes and

let everyone

know what

are my findings.

Thanks, Deepak

Anything that you can discover that would be in sync with the spikes would be very helpful!

The code that I test with is our product code,

which is based

on 3.1. So it is possible that something in 3.2 other

than vpt.c

is the cause. I can test with 3.2, if necessary.

thanks, Dave

Dan Magenheimer wrote:

Hi Dave (Keir, see suggestion below) --

Thanks!

Turning off vhpet certainly helps a lot (though

see below).

I wonder if timekeeping with vhpet is so bad that it

should be

turned off by default (in 3.1, 3.2, and unstable)

until it is

fixed? (I have a patch that defaults it off,

can post it if

there is agreement on the above point.) The whole

point of an

HPET is to provide more precise timekeeping and

if vhpet is

worse than vpit, it can only confuse users. Comments?

In your testing, are you just measuring % skew

over a long

period of time? We are graphing the skew continuously and seeing periodic behavior that is unsettling,

even with pit.

See attached. Though your algorithm recovers,

the "cliffs"

could still cause real user problems. I wonder

if there is

anything that can be done to make the "recovery" more responsive?

We are looking into what part(s) of LTP is causing

the cliffs.

-----Original Message----- From: Dave Winchell [mailto:dwin@virtualiron.com] Sent: Monday, January 28, 2008 8:21 AM To: dan.@oracle.com Cc: Keir Fraser; xen-@lists.xensource.com; deep@oracle.com; akir@oracle.com; Dave Winchell Subject: Re: [Xen-devel] [PATCH] Add a timer mode

that disables

pending missed ticks

Dan,

I guess I'm a bit out of date calling for clock= usage. Looking at linux 2.6.20.4 sources, I think you

should specify

"clocksource=pit nohpet" on the linux guest bootline.

You can leave the xen and dom0 bootlines as they are. The xen and guest clocksources do not need to

be the same.

In my tests, xen is using the hpet for its

timekeeping and

that appears to be the default.

When you boot the guests you should see time.c: Using PIT/TSC based timekeeping. on the rh4u5-64 guest, and something similar

on the others.

(xm dmesg shows 8x Xeon 3.2GHz stepping 04,

Platform timer

14.318MHz HPET.)

This appears to be the xen state, which is fine. I was wrongly assuming that this was the guest state. You might want to look in your guest logs and see

what they were

picking for a clock source.

Regards, Dave

Dan Magenheimer wrote:

Thanks, I hadn't realized that! No wonder we didn't

see the same

improvement you saw!

Try specifying clock=pit on the linux boot line...

I'm confused... do you mean "clocksource=pit"

on the Xen

command line or

"nohpet" / "clock=pit" / "clocksource=pit" on the

guest (or

dom0?) command

line? Or both places? Since the tests take

awhile, it

would be nice

to get this right the first time. Do the Xen

and guest

clocksources need

to be the same?

Thanks, Dan

-----Original Message----- *From:* Dave Winchell

[mailto:dwin@virtualiron.com]

*Sent:* Sunday, January 27, 2008 2:22 PM *To:* dan.@oracle.com; Keir Fraser *Cc:* xen-@lists.xensource.com;

deep@oracle.com;

akir@oracle.com; Dave Winchell *Subject:* RE: [Xen-devel] [PATCH] Add a timer mode

that disables

pending missed ticks

Hi Dan,

Hpet timer does have a fairly large error, as I was

trying this

one recently. I don't remember what I got for error, but 1% sounds

about right.

The problem is that hpet is not built on top of vpt.c,

the module

Keir and I did all the recent work in, for its periodic timer

needs. Try

specifying clock=pit on the linux boot line. If it still picks the

hpet, which it

might, let me know and I'll tell you how to get around this.

Regards, Dave

--------------------------------------------------------------

----------

*From:* Dan Magenheimer

[mailto:dan.@oracle.com]

*Sent:* Fri 1/25/2008 6:50 PM *To:* Dave Winchell; Keir Fraser *Cc:* xen-@lists.xensource.com;

deep@oracle.com;

akir@oracle.com *Subject:* RE: [Xen-devel] [PATCH] Add a timer mode

that disables

pending missed ticks

Sorry for the very late followup on this but

we finally

were able

to get our testing set up again on stable 3.1

bits and have

seen some very bad results on 3.1.3-rc1, on the

order of 1%.

Test enviroment was a 4-socket dual core machine

with 24GB of

memory running six two-vcpu 2GB domains, four hvm

plus two pv.

All six guests were running LTP simultaneously.

The four hvm

guests were: RHEL5u1-64, RHEL4u5-32, RHEL5-64, and

RHEL4u5-64.

Timer_mode was set to 2 for 64-bit guests and 0 for

32-bit guests.

All four hvm guests experienced skew around -1%,

even the 32-bit

guest. Less intensive testing didn't exhibit much

skew at all.

A representative graph is attached.

Dave, I wonder if some portion of your patches

didn't end up in

the xen trees?

(xm dmesg shows 8x Xeon 3.2GHz stepping 04,

Platform timer

14.318MHz HPET.)

Thanks, Dan

P.S. Many thanks to Deepak and Akira for

running tests.

-----Original Message----- From: xen-@lists.xensource.com

[mailto:xen-@lists.xensource.com]On Behalf Of

Dave Winchell Sent: Wednesday, January 09, 2008 9:53 AM To: Keir Fraser Cc: dan.@oracle.com;

xen-@lists.xensource.com; Dave

Winchell Subject: Re: [Xen-devel] [PATCH] Add a

timer mode that

disables pending missed ticks

Hi Keir,

The latest change, c/s 16690, looks fine. I agree that the code in c/s 16690 is equivalent to the code I submitted. Also, your version is more concise.

The error tests confirm the equivalence. With

overnight cpu loads,

the checked in version was accurate to

+.048% for sles

and +.038% for red hat. My version was +.046%

and

+.032% in a

2 hour test. I don't think the difference is significant.

i/o loads produced errors of +.01%.

Thanks for all your efforts on this issue.

Regards, Dave

Keir Fraser wrote:

Applied as c/s 16690, although the

checked-in patch is

smaller. I think the

only important fix is to pt_intr_post() and the

only bit of

the patch I

totally omitted was the change to

pt_process_missed_ticks().

I don't think

that change can be important, but let's see what

happens to the

error

percentage...

-- Keir

On 4/1/08 23:24, "Dave Winchell"

<dwin@virtualiron.com> wrote:

Hi Dan and Keir,

Attached is a patch that fixes some

issues with the

SYNC policy

(no_missed_ticks_pending). I have not tried to make the change the

minimal one, but,

rather, just

ported into the new code what I know to work well.

The error for

no_missed_ticks_pending goes from over 3% to .03% with this change according

to my testing.

Regards, Dave

Dan Magenheimer wrote:

Hi Dave --

Did you get your correction ported? If so,

it would be

nice to see this get

into 3.1.3.

Note that I just did some very limited

testing with

timer_mode=2(=SYNC=no

missed ticks pending) on tip of xen-3.1-testing (64-bit Linux hv

guest) and the

worst error I've

seen so far is 0.012%. But I haven't tried any exotic

loads, just LTP.

-----Original Message----- From: Dave Winchell

[mailto:dwin@virtualiron.com]

Sent: Wednesday, December 19, 2007 12:33 PM To: dan.@oracle.com Cc: Keir Fraser; Shan, Haitao;

xen-@lists.xensource.com; Dong,

Eddie; Jiang, Yunhong; Dave Winchell Subject: Re: [Xen-devel] [PATCH] Add a

timer mode that

disables pending missed ticks

Dan,

I did some testing with the constant tsc offset

SYNC method

(now called no_missed_ticks_pending) and found the error to be very high, much larger

than 1 %, as

I recall. I have not had a chance to submit a

correction. I

will try to

do it later this week or the first week in January. My

version of

constant tsc

offset SYNC method produces .02 % error, so I just need to port

that into the

current code.

The error you got for both of those kernels is

what I would

expect

for the default mode, delay_for_missed_ticks.

I'll let Keir answer on how to set the

time mode.

Regards, Dave

Dan Magenheimer wrote:

Anyone make measurements on the final patch?

I just ran a 64-bit RHEL5.1 pvm kernel and

saw a loss of

about 0.2% with no load. This was

xen-unstable tip today

with no options specified. 32-bit was

about 0.01%.

I think I missed something... how do I

run the various

accounting choices and which ones are

known to be

appropriate

for which kernels?

-----Original Message----- From: xen-@lists.xensource.com

[mailto:xen-@lists.xensource.com]On Behalf Of

Keir Fraser

Sent: Thursday, December 06, 2007 4:57 AM To: Dave Winchell Cc: Shan, Haitao;

xen-@lists.xensource.com; Dong,

Eddie; Jiang,

Yunhong Subject: Re: [Xen-devel] [PATCH] Add a timer

mode that

disables pending missed ticks

Please take a look at xen-unstable

changeset 16545.

-- Keir

On 26/11/07 20:57, "Dave Winchell"

<dwin@virtualiron.com> wrote:

Keir,

The accuracy data I've collected for i/o

loads for the

various time protocols follows. In

addition, the data

for cpu loads is shown.

The loads labeled cpu and i/o-8 are on an 8

processor AMD

box.

Two guests, red hat and sles 64 bit, 8

vcpu each.

The cpu load is usex -e36 on each guest. (usex is available at

http://people.redhat.com/anderson/usex.)

i/o load is 8 instances of dd if=/dev/hda6

of=/dev/null.

The loads labeled i/o-32 are 32

instances of dd.

Also, these are run on 4 cpu AMD box. In addition, there is an idle rh-32bit guest. All three guests are 8vcpu.

The loads labeled i/o-4/32 are the same

as i/o-32

except that the redhat-64 guest has 4

instances of dd.

Date Duration Protocol sles, rhat error load

11/07 23 hrs 40 min ASYNC -4.96 sec,

+4.42

sec -.006%,

+.005% cpu

11/09 3 hrs 19 min ASYNC -.13 sec, +1.44

sec, -.001%,

+.012% cpu

11/08 2 hrs 21 min SYNC -.80 sec, -.34

sec, -.009%,

-.004% cpu

11/08 1 hr 25 min SYNC -.24 sec, -.26 sec,

-.005%, -.005% cpu

11/12 65 hrs 40 min SYNC -18 sec, -8 sec,

-.008%, -.003% cpu

11/08 28 min MIXED -.75 sec, -.67 sec -.045%,

-.040% cpu

11/08 15 hrs 39 min MIXED -19. sec,-17.4

sec, -.034%,

-.031% cpu

11/14 17 hrs 17 min ASYNC -6.1

sec,-55.7 sec, -.01%,

-.09% i/o-8

11/15 2 hrs 44 min ASYNC -1.47

sec,-14.0 sec, -.015%

-.14% i/o-8

11/13 15 hrs 38 min SYNC -9.7 sec,-12.3

sec, -.017%,

-.022% i/o-8

11/14 48 min SYNC - .46 sec, - .48 sec,

-.017%, -.018% i/o-8

11/14 4 hrs 2 min MIXED -2.9 sec, -4.15

sec, -.020%,

-.029% i/o-8

11/20 16 hrs 2 min MIXED -13.4 sec,-18.1

sec, -.023%,

-.031% i/o-8

11/21 28 min MIXED -2.01 sec, -.67

sec, -.12%,

-.04% i/o-32

11/21 2 hrs 25 min SYNC -.96 sec, -.43

sec, -.011%,

-.005% i/o-32

11/21 40 min ASYNC -2.43 sec, -2.77

sec -.10%,

-.11% i/o-32

11/26 113 hrs 46 min MIXED -297. sec,

13. sec -.07%,

.003% i/o-4/32

11/26 4 hrs 50 min SYNC -3.21 sec, 1.44

sec, -.017%,

.01% i/o-4/32

Overhead measurements:

Progress in terms of number of passes

through a fixed

system workload

on an 8 vcpu red hat with an 8 vcpu

sles idle.

The workload was usex -b48.

ASYNC 167 min 145 passes .868 passes/min SYNC 167 min 144 passes .862 passes/min SYNC 1065 min 919 passes .863 passes/min MIXED 221 min 196 passes .887 passes/min

Conclusions:

The only protocol which meets the

.05% accuracy

requirement for ntp

tracking under the loads above is the SYNC protocol. The worst case

accuracies for

SYNC, MIXED,

and ASYNC are .022%, .12%, and .14%, respectively.

We could reduce the cost of the SYNC

method by only

scheduling the extra

wakeups if a certain number of ticks are missed.

Regards, Dave

Keir Fraser wrote:

On 9/11/07 19:22, "Dave Winchell"

<dwin@virtualiron.com> wrote:

Since I had a high error (~.03%) for the

ASYNC method a

couple of days ago,

I ran another ASYNC test. I think

there may have

been something

wrong with the code I used a couple of

days ago for

ASYNC. It may have been

missing the immediate delivery of interrupt

after context

switch in.

My results indicate that either SYNC

or ASYNC give

acceptable accuracy,

each running consistently around or under

.01%. MIXED has

a fairly high

error of greater than .03%. Probably too close

to .05% ntp

threshold for comfort.

I don't have an overnight run with SYNC. I

plan to leave

SYNC running

over the weekend. If you'd rather I can

leave MIXED

running instead.

It may be too early to pick the

protocol and

I can run

more overnight tests

next week.

I'm a bit worried about any unwanted side

effects of the

SYNC+run_timer

approach -- e.g., whether timer wakeups will

cause higher

system-wide CPU

contention. I find it easier to think

through the

implications of ASYNC. I'm

surprised that MIXED loses time, and is less

accurate than

ASYNC. Perhaps it

delivers more timer interrupts than

the other

approaches,

and each interrupt

event causes a small accumulated error?

Overall I would consider MIXED and ASYNC as

favourites and

if the latter is

actually more accurate then I can

simply revert the

changeset that

implemented MIXED.

Perhaps rather than running more of the same

workloads you

could try idle

VCPUs and I/O bound VCPUs (e.g., repeated

large disc reads

to /dev/null)? We

don't have any data on workloads that aren't

CPU bound, so

that's really an

obvious place to put any further effort imo.

-- Keir

_______________________________________________

Xen-devel mailing list Xen-@lists.xensource.com http://lists.xensource.com/xen-devel

diff -r cfdbdca5b831 xen/arch/x86/hvm/vpt.c --- a/xen/arch/x86/hvm/vpt.c Thu Dec 06 15:36:07

2007 +0000

+++ b/xen/arch/x86/hvm/vpt.c Fri Jan 04 17:58:16

2008 -0500

@@ -58,7 +58,7 @@ static void

pt_process_missed_ticks(stru

missed_ticks = missed_ticks / (s_time_t)

pt->period + 1;

if ( mode_is(pt->vcpu->domain,

no_missed_ticks_pending) )

- pt->do_not_freeze = !pt->pending_intr_nr; + pt->do_not_freeze = 1; else pt->pending_intr_nr += missed_ticks; pt->scheduled += missed_ticks * pt->period; @@ -127,7 +127,12 @@ static void

pt_timer_fn(void *data)

pt_lock(pt);

- pt->pending_intr_nr++; + if ( mode_is(pt->vcpu->domain,

no_missed_ticks_pending) ) {

+ pt->pending_intr_nr = 1; + pt->do_not_freeze = 0; + } + else + pt->pending_intr_nr++;

if ( !pt->one_shot ) { @@ -221,8 +226,6 @@ void pt_intr_post(struct

vcpu *v, struct

return; }

- pt->do_not_freeze = 0; - if ( pt->one_shot ) { pt->enabled = 0; @@ -235,6 +238,10 @@ void pt_intr_post(struct vcpu

*v, struct

pt->last_plt_gtime =

hvm_get_guest_time(v);

pt->pending_intr_nr = 0; /*

'collapse' all

missed ticks */

} + else if ( mode_is(v->domain,

no_missed_ticks_pending) ) {

+ pt->pending_intr_nr--; + pt->last_plt_gtime = hvm_get_guest_time(v); + } else { pt->last_plt_gtime +=

pt->period_cycles;