

![]() | Start a set with this search |
![]() | Include this search in one of my sets |
![]() | Exclude this search from one of my sets |
![]() | Permalink to these results Paste this link in email or IM: |
| Atom feed for tracking future search results Paste this URL into your reader: |
3 messages in org.opensolaris.mdb-discuss[mdb-discuss] A question about ::vmem...| From | Sent On | Attachments |
|---|---|---|
| Oliver Yang | Aug 11, 2007 9:50 pm | |
| Jonathan Adams | Aug 20, 2007 7:31 pm | |
| Oliver Yang | Sep 6, 2007 8:08 pm |

![]() | Permalink for this message Paste this link in email or IM: |
![]() | Permalink for this thread Paste this link in email or IM: |
| Atom feed for this thread Paste this URL into your reader: |
| Subject: | [mdb-discuss] A question about ::vmem debugging by using mdb | Actions... |
|---|---|---|
| From: | Jonathan Adams (jwad...@gmail.com) | |
| Date: | Aug 20, 2007 7:31:47 pm | |
| List: | org.opensolaris.mdb-discuss | |
(Oliver and I had an off-list discussion; I got his permission to forward the useful bits back to the list, for the curious and the record. Not everyone knows about ::vmem_seg and its useful options.)
On 8/12/07, Oliver Yang <Oliver.Yang at sun.com> wrote:
Hi All,
My DR testing failed after about 6000 of DR loops. And I've got lot's of vmem allocation failures during my DR testing, it seems the space of "device" is exhausted.
::vmem ! grep device
fffffffecac54000 device 1073741824 1073741824 1326624 109685
I also got lots of warning messages about pcihp and pcicfg, since these driver are important to DR. I think my DR failures were caused by the device vmem exhaustion.
Aug 12 12:13:30 pcihp: WARNING: pcihp (pcie_pci3): failed to attach one or more drivers for the card in the slot pcie1 Aug 12 12:13:32 pcicfg: WARNING: pcicfg: cannot map config space, to get map type
And I didn't find the memory leaks by ::findleaks dcmd, and psyscial memory had 90 free at that time.
::findleaks can't find vmem leaks in anything but the kmem_oversize arena.
Now I have 3 questions about this issue:
1. How can we know one vmem area used by which kernel modules or drivers? Can we check the vmem allocation info by mdb?
Yes; if you want the full stack trace of every allocation, you can do (on a machine with kmem_flags=0xf):
addr::walk vmem_alloc | ::vmem_seg -v
where addr is the address for the vmem arena (in the output above, fffffffecac54000) A first cut at determining who's leaking would be to do:
addr::walk vmem_alloc | ::vmem_seg -v ! sort | uniq -c | sort -n
The first few will be the allocation function; the ones after that should point you in the right direction. To see all segments with a particular function or function offset in their stack traces, use the '-c' option:
addr::walk vmem_alloc | ::vmem_seg -v -c func addr::walk vmem_alloc | ::vmem_seg -v -c $[func+offset]
(note the use of $[] for computed arguments)
2. The device vmem area seems to be important to vmem allocation
required by driver attach, does it?
Looks like its used to map device memory:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#1649
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#2518
Maybe something isn't cleaning up after itself? That's 1 gigabyte of VA that's being used up. --- cut here ---
Oliver responded:
I found lot's of e1000g driver stack traces by using vmem_seg -v, one of them like this:
fffffffed8e74910 ALLC ffffff028ea16000 ffffff028ea17000 4096 fffffffed0b60500 1ee9c045cd vmem_hash_insert+0x8b vmem_seg_alloc+0xbd vmem_alloc+0x129 device_arena_alloc+0x27 rootnex_map_regspec+0x102 rootnex_map+0x126 ddi_map+0x4d npe_bus_map+0x3a9 pepb_bus_map+0x31 ddi_map+0x4d ddi_regs_map_setup+0xc4 pci_config_setup+0x66 e1000g_attach+0xb1 devi_attach+0x7f attach_node+0x98 i_ndi_config_node+0x9d i_ddi_attachchild+0x3f devi_attach_node+0x7f devi_config_one+0x2bd ndi_devi_config_one+0xb0
Since my driver was already detached, I think it should be a bug.
...
After the some investigations, I found even I did one time driver attach and detach, I still can find e1000g stack trace above. But e1000g does call pci_config_teardown in e1000g_detach code, I have verified it by dtrace and mdb while running one time driver detach.
I think it shouldn't e1000g driver bug, and it might be a ddi or bus driver's bug.
And I think we'd better file a bug against DDI routine for the initial evaluation.
---- cut here ---
I agreed that it looked like the driver was doing the right thing.
Oliver, do you have a bugid for this?
Cheers, - jonathan







