atom feed16 messages in org.freebsd.freebsd-scsiRe: mpt request timed out
FromSent OnAttachments
Ståle KristoffersenJun 5, 2010 6:44 pm 
Matthew JacobJun 5, 2010 7:21 pm 
Artem BelevichJun 5, 2010 7:24 pm 
Ståle KristoffersenJun 5, 2010 7:26 pm 
Matthew JacobJun 5, 2010 7:26 pm 
Matthew JacobJun 5, 2010 7:27 pm 
Artem BelevichJun 5, 2010 7:37 pm 
Ståle KristoffersenJun 5, 2010 7:48 pm 
Artem BelevichJun 5, 2010 7:48 pm 
Ståle KristoffersenJun 5, 2010 7:49 pm 
Artem BelevichJun 5, 2010 7:50 pm 
Matthew JacobJun 5, 2010 9:14 pm 
Ståle KristoffersenJun 6, 2010 7:18 am 
Matthew JacobJun 6, 2010 8:03 am 
Ståle KristoffersenJun 6, 2010 6:05 pm 
Ståle KristoffersenJun 6, 2010 6:11 pm 
Subject:Re: mpt request timed out
From:Artem Belevich (fbsd@src.cx)
Date:Jun 5, 2010 7:24:56 pm
List:org.freebsd.freebsd-scsi

I used to have "UNIT ATTENTION asc:29,0" errors on mpt when it was running with IR firmware (the one that supports RAID). The issue disappeared after firmware was changed to IT variant (w/o RAID). Keep in mind that in my case original IR firmware was quite a bit older than IT version that replaced it. It's possible that it was upgraded firmware that fixed the issue for me, not the switch to IT.

In my case those errors were correlating pretty well with disks' SMART UDMA_CRC_Error_Count. I guess corrupted transaction was re-issued and succeeded in the end as there were no ZFS errors in my case either.

--Artem

2010/6/5 Ståle Kristoffersen <sta@kristoffersen.ws>:

Hi, I'm not sure if this is the right list, please tell me if it is.

I'm having problems with mpt timeouts when putting load on the disks connected to it. I have the mpt-adapter connected to a sas-expander, and several disks connected to that expander:

mpt0: <LSILogic SAS/SATA Adapter> port 0xc800-0xc8ff mem 0xfe8fc000-0xfe8fffff,0xfe8e0000-0xfe8effff irq 16 at device 0.0 on pci1 mpt0: [ITHREAD] mpt0: MPI Version=1.5.20.0

ses0 at mpt0 bus 0 scbus0 target 0 lun 0 ses0: <LSILOGIC SASX36 A.1 7015> Fixed Enclosure Services SCSI-3 device ses0: 300.000MB/s transfers ses0: Command Queueing enabled ses0: SCSI-3 SES Device

And all the disks are consumer-grade SATA-diskes like this: da0 at mpt0 bus 0 scbus0 target 1 lun 0 da0: <ATA ST31000528AS CC38> Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

The error I'm seeing is this: http://folk.uio.no/stalk/mpt/timeout.txt

I've also put out a full dmesg from boot: http://folk.uio.no/stalk/mpt/dmesg.txt (i've since added 4 new disks, but the error was there before that).

What can be causing these timeouts? The controller resets everything and zfs is not complaining:

 pool: media  state: ONLINE  scrub: scrub stopped after 0h4m with 0 errors on Wed May 12 13:58:05 2010 config:

       NAME        STATE     READ WRITE CKSUM        media       ONLINE       0     0     0          raidz1    ONLINE       0     0     0            da14    ONLINE       0     0     0            da11    ONLINE       0     0     0            da6     ONLINE       0     0     0            da3     ONLINE       0     0     0          raidz1    ONLINE       0     0     0            da15    ONLINE       0     0     0            da12    ONLINE       0     0     0            da9     ONLINE       0     0     0            da5     ONLINE       0     0     0          raidz1    ONLINE       0     0     0            da8     ONLINE       0     0     0            da2     ONLINE       0     0     0            da0     ONLINE       0     0     0            da4     ONLINE       0     0     0          raidz1    ONLINE       0     0     0            da1     ONLINE       0     0     0            da13    ONLINE       0     0     0            da10    ONLINE       0     0     0            da7     ONLINE       0     0     0          raidz1    ONLINE       0     0     0            da17    ONLINE       0     0     0            da18    ONLINE       0     0     0            da19    ONLINE       0     0     0            da20    ONLINE       0     0     0

errors: No known data errors

but clients time out or gets an error if they try to do IO while the connection is down, and thats causing havoc. The timeouts last from 10 up to 30 seconds each.

I'd appreciate any ideas!