3 messages in org.opensolaris.mdb-discuss[mdb-discuss] A question about ::vmem...
FromSent OnAttachments
Oliver YangAug 11, 2007 9:50 pm 
Jonathan AdamsAug 20, 2007 7:31 pm 
Oliver YangSep 6, 2007 8:08 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[mdb-discuss] A question about ::vmem debugging by using mdbActions...
From:Jonathan Adams (jwad@gmail.com)
Date:Aug 20, 2007 7:31:47 pm
List:org.opensolaris.mdb-discuss

(Oliver and I had an off-list discussion; I got his permission to forward the useful bits back to the list, for the curious and the record. Not everyone knows about ::vmem_seg and its useful options.)

On 8/12/07, Oliver Yang <Oliver.Yang at sun.com> wrote:

Hi All,

My DR testing failed after about 6000 of DR loops. And I've got lot's of vmem allocation failures during my DR testing, it seems the space of "device" is exhausted.

::vmem ! grep device

fffffffecac54000 device 1073741824 1073741824 1326624 109685

I also got lots of warning messages about pcihp and pcicfg, since these driver are important to DR. I think my DR failures were caused by the device vmem exhaustion.

Aug 12 12:13:30 pcihp: WARNING: pcihp (pcie_pci3): failed to attach one or more drivers for the card in the slot pcie1 Aug 12 12:13:32 pcicfg: WARNING: pcicfg: cannot map config space, to get map type

And I didn't find the memory leaks by ::findleaks dcmd, and psyscial memory had 90 free at that time.

::findleaks can't find vmem leaks in anything but the kmem_oversize arena.

Now I have 3 questions about this issue:

1. How can we know one vmem area used by which kernel modules or drivers? Can we check the vmem allocation info by mdb?

Yes; if you want the full stack trace of every allocation, you can do (on a machine with kmem_flags=0xf):

addr::walk vmem_alloc | ::vmem_seg -v

where addr is the address for the vmem arena (in the output above, fffffffecac54000) A first cut at determining who's leaking would be to do:

addr::walk vmem_alloc | ::vmem_seg -v ! sort | uniq -c | sort -n

The first few will be the allocation function; the ones after that should point you in the right direction. To see all segments with a particular function or function offset in their stack traces, use the '-c' option:

addr::walk vmem_alloc | ::vmem_seg -v -c func addr::walk vmem_alloc | ::vmem_seg -v -c $[func+offset]

(note the use of $[] for computed arguments)

2. The device vmem area seems to be important to vmem allocation

required by driver attach, does it?

Looks like its used to map device memory:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#1649

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#2518

Maybe something isn't cleaning up after itself? That's 1 gigabyte of VA that's being used up. --- cut here ---

Oliver responded:

I found lot's of e1000g driver stack traces by using vmem_seg -v, one of them like this:

fffffffed8e74910 ALLC ffffff028ea16000 ffffff028ea17000 4096 fffffffed0b60500 1ee9c045cd vmem_hash_insert+0x8b vmem_seg_alloc+0xbd vmem_alloc+0x129 device_arena_alloc+0x27 rootnex_map_regspec+0x102 rootnex_map+0x126 ddi_map+0x4d npe_bus_map+0x3a9 pepb_bus_map+0x31 ddi_map+0x4d ddi_regs_map_setup+0xc4 pci_config_setup+0x66 e1000g_attach+0xb1 devi_attach+0x7f attach_node+0x98 i_ndi_config_node+0x9d i_ddi_attachchild+0x3f devi_attach_node+0x7f devi_config_one+0x2bd ndi_devi_config_one+0xb0

Since my driver was already detached, I think it should be a bug.

...

After the some investigations, I found even I did one time driver attach and detach, I still can find e1000g stack trace above. But e1000g does call pci_config_teardown in e1000g_detach code, I have verified it by dtrace and mdb while running one time driver detach.

I think it shouldn't e1000g driver bug, and it might be a ddi or bus driver's bug.

And I think we'd better file a bug against DDI routine for the initial evaluation.

---- cut here ---

I agreed that it looked like the driver was doing the right thing.

Oliver, do you have a bugid for this?

Cheers, - jonathan