

![]() | Start a set with this search |
![]() | Include this search in one of my sets |
![]() | Exclude this search from one of my sets |
![]() | Permalink to these results Paste this link in email or IM: |
| Atom feed for tracking future search results Paste this URL into your reader: |
4 messages in org.opensolaris.mdb-discuss[mdb-discuss] [Fwd: [osol-code] Quest...| From | Sent On | Attachments |
|---|---|---|
| Saurabh Misra | Aug 27, 2007 1:15 pm | |
| Oliver Yang | Aug 27, 2007 6:57 pm | |
| Saurabh Misra | Aug 27, 2007 8:56 pm | |
| Oliver Yang | Aug 28, 2007 12:43 am |

![]() | Permalink for this message Paste this link in email or IM: |
![]() | Permalink for this thread Paste this link in email or IM: |
| Atom feed for this thread Paste this URL into your reader: |
| Subject: | [mdb-discuss] [Fwd: [osol-code] Questions about overdue callout_execute] | Actions... |
|---|---|---|
| From: | Saurabh Misra (Saur...@Sun.COM) | |
| Date: | Aug 27, 2007 1:15:07 pm | |
| List: | org.opensolaris.mdb-discuss | |
The problem is that the level6 interrupt handler (network interrupt) is blocked (base spl of CPU 0 is 6) which blocked level1 as well. That's the reason softcall state is still SOFT_PEND (2).
This problem is being worked as part of bug :-
6292092 callout should not be blocked by interrupts from executing realtime timeouts 6540436 kpreempt() needs a more reliable way to generate level1 intr
We have seen this problem happening very often with e1000g.
Thanks,
-- Saurabh Mishra, Solaris Filesystems & Kernel | NAS OS http://blogs.sun.com/saurabh_mishra
Eric Saxe wrote:
FYI, question posed on opensolaris forums, I figured you probably have an answer. :)
-Eric
------------------------------------------------------------------------
Subject: [osol-code] Questions about overdue callout_execute From: Oliver Yang <Oliver.Yang at sun.com> Date: Mon, 27 Aug 2007 16:48:36 +0800 To: mdb-discuss at opensolaris.org, opensolaris-code at opensolaris.org
To: mdb-discuss at opensolaris.org, opensolaris-code at opensolaris.org
Hi All,
I have several questions about overdue callout_execute
1. How does the system handle a overdue callout_execute?
2. As far as I know, setrun entry in callout table will be handled as a softcall, which is a software interrupt(softlevel1) with PIL 1, right?
3. If we found there are so many overdue setrun entries in callout table, what we should check by using mdb?
Here are background information with my other questions:
I had encountered a system hang issue in my test env and I forced a crash dump file successfully.
In this crash dump file, I found a e1000g driver thread blocked on cv_timedwait, but we supposed it should return after 1 tick, but it had never returned.
::callout ! grep fffffe80044e5c80
setrun fffffe80044e5c80 3ffffffffffe1a80 1036d (T-33730) ---> it was overdue about 33730 ticks.
stack pointer for thread fffffe80044e5c80: fffffe80044e5880 [ fffffe80044e5880 _resume_from_idle+0xf8() ] fffffe80044e58c0 swtch+0x167() fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388, 1036d) fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390, ffffffff82f76388, 1036d) fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40) fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180) fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180) fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70) fffffe80044e5b50 runservice+0x42(ffffffff894acb70) fffffe80044e5b80 queue_service+0x42(ffffffff894acb70) fffffe80044e5bc0 stream_service+0x73(ffffffff83905740) fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820) fffffe80044e5c70 thread_start+8()
After I check the callout table, I found the relevant callout entry for this thread, and it was overdue for some reason:
::callout ! grep fffffe80044e5c80
setrun fffffe80044e5c80 3ffffffffffe1a80 1036d (T-33730) --------> This indicate it is overdue.
I also find about 2573 overdue entries in callout table, it was really a big number:
::callout
FUNCTION ARGUMENT ID TIME sigalarm2proc ffffffff9569aae0 7fffffffffffc010 144a1 (T-17038) -----> This indicated the it was overdue. sigalarm2proc ffffffff91bb7510 7fffffffffffe010 14484 (T-17067) sigalarm2proc ffffffff9569c380 7fffffffffffc020 144a1 (T-17038) sigalarm2proc ffffffff95428d48 7fffffffffffc030 144a1 (T-17038) sigalarm2proc ffffffff91bb8db0 7fffffffffffe030 14483 (T-17068) sigalarm2proc ffffffff9542b238 7fffffffffffc040 144a1 (T-17038) .....................[snipped]....................................................................
::callout ! grep "T-" | wc -l
2573
Why we could find so many overdue entries in callout table?
Since the setrun entry will be processed by softcall, I tried to print the global softcall list by softhead. And we did find there are two callout_execute was pending to handled by softint:
*softhead::list softcall_t sc_next|::print softcall_t
{ sc_func = callout_execute sc_arg = 0xffffffff80219000 sc_next = softcalls+0x1290 } { sc_func = callout_execute sc_arg = 0xffffffff80216000 sc_next = 0 }
softcall_state/J
softcall_state: softcall_state: 2
#define SOFT_PEND 0x02 /* softcall list needs processing */
But when I checked the softint by following command, I found no pending count in softlevel1 line, why?
I think the softlevel1 should call into softint(), then, the callout_execute would be executed.
But it seemed it had never happened, who can give me some ideas about it?
::softint
ADDR PEND PIL ARG1 ARG2 ISR(s) ffffffff8277a5c0 0 1 ffffffff8048da80 0 errorq_intr fffffffffbc05ae8 0 1 0 0 softlevel1 ------------> no pending count. ffffffff8277a4c0 0 2 ffffffff8048dd00 0 errorq_intr fffffffffbc00070 0 2 0 0 cbe_low_level ffffffff83a706c0 0 4 ffffffff90004d18 0 ghd_doneq_process ffffffff8ed31880 0 4 ffffffff90004d18 0 ghd_timeout_softintr ffffffff83946e00 0 4 ffffffff8f046c40 0 power_soft_intr ffffffff83a70c00 0 4 ffffffff83b1b000 0 bge_chip_factotum ffffffff83a70cc0 0 4 ffffffff83b1b000 0 bge_reschedule ffffffff8277a2c0 0 4 0 0 asysoftintr ffffffff82f4d100 0 9 ffffffff82f76370 0 e1000g_tx_softint_worker ffffffff82f4df00 0 9 ffffffff82f86370 0 e1000g_tx_softint_worker ffffffff833b3e80 0 9 ffffffff801af7e8 0 hcdi_soft_intr ffffffff8277a000 0 9 ffffffff801afb68 0 hcdi_soft_intr fffffffffbc00030 0 10 0 0 cbe_softclock
Actually, at that time, system still had 1 IDLE CPU, I think the system shouldn't have so many overdue callout entries.
::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fffffffffbc27730 1f 1 6 169 no no t-3 fffffe80000bfc80 sched | | | RUNNING <--+ | +--> PIL THREAD READY | 10 fffffe80000bfc80 QUIESCED | 6 fffffe80000b9c80 EXISTS | ENABLE +--> PRI THREAD PROC 99 fffffe80000d1c80 sched
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 fffffffffbc2f260 1b 1 0 -1 no no t-17 fffffe8000401c80 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 fffffe80044d9c80 sched EXISTS ENABLE
What else should I check?







