9 messages in com.xensource.lists.xen-develRe: [Xen-devel] memory fault
FromSent OnAttachments
David Becker18 Mar 2004 12:58 
Ian Pratt18 Mar 2004 15:49 
Keir Fraser18 Mar 2004 23:56 
David Becker19 Mar 2004 10:57 
Ian Pratt19 Mar 2004 11:22 
David Becker23 Apr 2004 13:10 
Ian Pratt23 Apr 2004 13:33 
David Becker23 Apr 2004 13:56 
Keir Fraser24 Apr 2004 01:14 
Subject:Re: [Xen-devel] memory fault
From:David Becker (bec@cs.duke.edu)
Date:03/19/2004 10:57:58 AM
List:com.xensource.lists.xen-devel

" > DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0)

" It's pretty unlikely this is anything to do with Xen -- I bet you " could reproduce this on a stock Linux compiled without CONFIG_HIGHMEM

You are correct. This message pops up on stock linux as well if memory is constrained as tight as in our Xen config.

" > DOM3: Unable to handle kernel paging request at virtual address c3f77820

The EIP is in arch/xeno/drivers/network/network.c:_network_interrupt() I no longer have the oops messages unfortunately. We had to get the hosts going again for that project and the oops got lost.

" > DOM1: Weird failure in hard_start_xmit

Xen prints this message here: xeno-1.2.bk/xen/net/dev.c:816: printk("Weird failure in hard_start_xmit!\n");

Last night a user sent me a detailed report on NIC trouble:

" When the machines freeze up running bbsend, bbrecv, or netgen, they _also_ " freeze up on incoming SSH connections. " If I'm already logged into rack217 via SSH when I start a netgen, then my " interactive session gets laggy or freezes completely. " " At any time, killing the netgen process makes whatever was frozen resume " almost immediately. " " We're not talking about large amounts of traffic here: 12KB/s causes all " of the above symptops. netgen and bbsend both do some busy-waiting, but " not that much of it. " " For some reason, the system load goes sky-high, even with just one netgen " process. netgen is single-threaded and spends less than half of its time " busy-waiting, yet system load often ends up above 3. " " End of symtoms, beginning of theory: all the bad systems are P4s running " Xeno and using Broadcom ethernet cards. (At least, they used to be " Broadcoms. With Xeno running, I can no longer check.) The working " systems are a mix of P4 and P3, Xeno is running on two of them (but only " on P3s), and they're all eepro100 cards. " " My guess is that Xeno is interacting badly with either the bcm5700 or the " P4. I'm leaning toward the former. Is there any way to boot the machines

That "hard_start_xmit" message showed up on the hosts with Broadcom BCM5703 NICs.

We'll setup a test cluster to isolate what is going on with these network apps.