atom feed17 messages in org.freebsd.freebsd-currentRe: [regression] unable to boot: no G...
FromSent OnAttachments
David NaylorApr 11, 2011 9:39 pm 
Alexander MotinApr 11, 2011 11:17 pm 
David NaylorApr 12, 2011 12:31 pm.diff
Alexander MotinApr 12, 2011 1:12 pm 
YongHyeon PYUNApr 12, 2011 2:03 pm 
Alexander MotinApr 12, 2011 2:08 pm 
Garrett CooperApr 12, 2011 2:39 pm 
David NaylorApr 12, 2011 9:51 pm 
David NaylorApr 13, 2011 10:06 am 
John BaldwinApr 15, 2011 9:27 am 
David NaylorApr 15, 2011 2:29 pm.txt
David NaylorMay 9, 2011 11:24 am 
John BaldwinMay 9, 2011 11:48 am 
John BaldwinMar 28, 2012 11:37 am 
David NaylorApr 5, 2012 1:40 am 
John BaldwinApr 5, 2012 7:05 am 
David NaylorApr 6, 2012 3:35 am 
Subject:Re: [regression] unable to boot: no GEOM devices found.
From:John Baldwin (jh@freebsd.org)
Date:May 9, 2011 11:48:28 am
List:org.freebsd.freebsd-current

On Monday, May 09, 2011 2:24:37 pm David Naylor wrote:

On Friday 15 April 2011 18:28:06 John Baldwin wrote:

On Wednesday, April 13, 2011 1:07:06 pm David Naylor wrote:

On Tuesday 12 April 2011 22:12:55 Alexander Motin wrote:

David Naylor wrote:

On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:

David Naylor wrote:

I am running -current and since a few days ago (at least 2011/04/11) I am unable to boot.

The boot process stops when it looks to find a bootable device. The prompt (when pressing '?') does not display any device and yielding

one

second (or more) to the kernel (by pressing '.') does not improve the situation.

A known working date is 2011/02/20.

I am running amd64 on a nVidia MCP51 chipset.

MCP51... again...

+ata2: reiniting channel .. +ata2: SATA connect time=0ms status=00000113 +ata2: reset tp1 mask=01 ostat0=58 ostat1=00 +ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 +ata2: reset tp2 stat0=50 stat1=00 devices=0x1 +ata2: reinit done .. +unknown: FAILURE - ATA_IDENTIFY timed out LBA=0

As soon as all devices detected but not responding to commands, I would suppose that there is something wrong with ATA interrupts. There is a long chain of interrupt problems in this chipset. I have already tried to debug one case where ATA wasn't generating interrupts at all. Unfortunately, without success -- requests were executing, but not generating interrupts, it wasn't looked like ATA driver problem.

What's about possible candidate to revision triggering your problem, I would look on this message: +pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0

At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb) and it is interrupt related.

I reverted those two revs and everything works again.

Hmm, can you provide a full boot verbose dmesg? Alternatively, can you see if the device at pci0:0:9:0 is a PCI-PCI bridge?

I can provide a verbose dmesg if the following is not enough:

none17@pci0:0:9:0: class=0x050000 card=0x50011458 chip=0x027010de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'MCP51 Host Bridge' class = memory subclass = RAM

I see two PCI-PCI bridges at pci0:0:3:0 and pci0:0:16:0. I've attached the full `pciconf -lv` output.

FYI, this issue is still present on current (~24 hours old). Reverting the above mentioned revisions still fixes the problem.

Yes, I'm still chewing on how best to fix this. The problem is that for the most part we should enable the MSI mapping window everywhere, but for certain broken Nvidia chipsets it seems that doing so breaks INTx interrupts and we need to not enable it (and disable MSI globally) on those chipsets. Linux has some grotty code to allow PCI devices to figure out which Host Bridge device on PCI bus 0 is the real host bridge for each HT slave and to selectively enable it in the host bridge when an MSI interrupt is first enabled.

They also have a quirk to disable MSI altogether on certain nvidia chipsets if the MSI mapping window is not enabled by the BIOS. I attempted to implement the latter, but it broke perfectly good nvidia chipsets on older ppc-based Macs. I think I want to just disable MSI entirely on busted chipsets like yours, but I need to come up with a good way to detect your chipset (and similar).