17 messages in com.xensource.lists.xen-ia64-develRE: [Xen-ia64-devel] PATCH: slightly ...
FromSent OnAttachments
Tristan Gingold27 Apr 2006 08:13.diffs
Xu, Anthony27 Apr 2006 18:47 
Xu, Anthony27 Apr 2006 20:18.patch
Magenheimer, Dan (HP Labs Fort Collins)28 Apr 2006 07:48 
Xu, Anthony28 Apr 2006 23:02.patch
Magenheimer, Dan (HP Labs Fort Collins)29 Apr 2006 06:57 
Magenheimer, Dan (HP Labs Fort Collins)29 Apr 2006 09:12 
Xu, Anthony29 Apr 2006 18:04 
Zhang, Xiantao29 Apr 2006 18:20 
Xu, Anthony29 Apr 2006 22:43 
Zhang, Xiantao30 Apr 2006 02:25 
Magenheimer, Dan (HP Labs Fort Collins)30 Apr 2006 20:23 
Alex Williamson08 May 2006 13:14 
Alex Williamson08 May 2006 13:18 
Isaku Yamahata23 Jun 2006 02:19.patch
Alex Williamson23 Jun 2006 08:05 
Alex Williamson23 Jun 2006 15:28 
Subject:RE: [Xen-ia64-devel] PATCH: slightly improve stability
From:Xu, Anthony (anth@intel.com)
Date:04/27/2006 06:47:37 PM
List:com.xensource.lists.xen-ia64-devel

From: xen-@lists.xensource.com

[mailto:xen-@lists.xensource.com] On Behalf Of Tristan Gingold Sent: 2006?4?27? 23:14 To: xen-@lists.xensource.com; Magenheimer, Dan (HP Labs Fort Collins); Alex Williamson Subject: [Xen-ia64-devel] PATCH: slightly improve stability

Hi,

as reported earlier, this patch seems to improve stability: crashes are at least more coherent and maybe less frequent.

RSE handling seems to have a bug: crahes are now due to either a bad value in a stacked register or a use of an invalid stacked register (although cfm seems correct in gdb!)

I'm looking at this too, Yes there is a bug about handle_lazy_cover.

void ia64_do_page_fault (unsigned long address, unsigned long isr, struct
pt_regs *regs, unsigned long itir) { unsigned long iip = regs->cr_iip, iha; // FIXME should validate address here unsigned long pteval; unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL); IA64FAULT fault;

if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, isr, regs)) return;

This code sequence is intended to handle following scenario.

1. Guest executes br.ret, this may cause mandatory RSE load, and this load may cause TLB miss. 2. VMM gets control, but VMM can't handle this TLB miss itself, then VMM injects TLB miss to Guest TLB miss handler, when VMM executing "rfi" to jump to Guest
TLB miss handler, this TLB miss happens again. 3. At this time, interrupt_collection_enabled is 0, so handle_lazy_cover
executes "cover" on behalf of Guest, and return to Guest TLB miss handler again,
this time there is no TLB miss.

Following code sequence is in ia64_leave_kernel path with psr.ic and psr.i off. When br.ret.dptk.many b0 is executed, there may be a mandatory load, thus There may be a tlb miss, according to above description handle_lazy_cover executes "cover" on behalf of Guest and return to Guest, this is no correct in this scenario.

I didn't find an easy way to fix this bug.

mov loc6=0 mov loc7=0 (pRecurse) br.call.dptk.few b0=rse_clear_invalid ;; mov loc8=0 mov loc9=0 cmp.ne pReturn,p0=r0,in1 // if recursion count != 0, we need to do a br.ret mov loc10=0 mov loc11=0 (pReturn) br.ret.dptk.many b0 #endif /* !CONFIG_ITANIUM */ # undef pRecurse # undef pReturn ;; alloc r17=ar.pfs,0,0,0,0 // drop current register frame ;; loadrs

Thanks, Anthony

Tested by doing many linux kernel compilation in SMP domU (> 100).

Tristan.