atom feed3 messages in org.freebsd.freebsd-scsiRe: ZFS reports problem on iscsi target
FromSent OnAttachments
Rolf GrossmannJun 11, 2010 8:17 am 
Daniel BranissJun 12, 2010 1:05 am 
Rolf GrossmannJun 12, 2010 7:26 am 
Subject:Re: ZFS reports problem on iscsi target
From:Rolf Grossmann (rg@xamine.com)
Date:Jun 12, 2010 7:26:06 am
List:org.freebsd.freebsd-scsi

On 12.06.2010 10:06, Daniel Braniss wrote:

Hi,

I'm having some trouble with iscsi on FreeBSD 8. My current setup is a stock FreeBSD 8.1-PRERELEASE (as of 2 days ago), GENERIC kernel with some modules loaded, running on a Dell PowerEdge R905 with 64GB RAM, 4 quad code CPUs. Attached is an EqualLogic PS6500 storage array with some configured volumes, one of which is for testing. It is configured in /etc/iscsi.conf like this:

test2 {

TargetName=iqn.2001-05.com.equallogic:0-8a0906-7a4bb9f06-038000000304c0d1-test2 TargetAddress=10.26.17.10:3260,1 tags = 256 }

Now I'm running the following sequence of commands (shown with output):

# iscontrol -n test2 iscontrol[56255]: running iscontrol[56255]: (pass2:iscsi0:0:0:0): tagged openings now 256 iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:1 iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:2 iscontrol[56255]: cam_open_btl: no passthrough device found at 2:0:3 iscontrol: supervise starting main loop # zpool create test2 da2 # zpool scrub test2 # zpool status test2 pool: test2 state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Fri Jun 11 16:56:33 2010 config:

NAME STATE READ WRITE CKSUM test2 ONLINE 0 0 0 da2 ONLINE 0 0 0

errors: No known data errors # cp -Rp /export/system /test2/ # zpool scrub test2 # zpool status test2 pool: test2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 19 errors on Fri Jun 11 17:00:38 2010 config:

NAME STATE READ WRITE CKSUM test2 ONLINE 0 0 19 da2 ONLINE 0 0 38

errors: 19 data errors, use '-v' for a list #

/export/system is a FreeBSD distribution (make install DESTDIR=/export/system). Note how zfs thinks there are 19 files broken after the copy. If I repeat the process, the files vary, but there are always some reported as broken. In this case, they don't seem to be (as checked with md5 and rsync --checksum), but I've had files only giving me an i/o error. Also, if I repeat the same steps on a local disk, zfs is reporting no errors.

What I would like to know is: - Is there anything I'm doing wrong? Is there a known problem? - Are there any tools to debug or more reliably reproduce (and narrow down) the problem? I've tried fsx (from /usr/src/tools/regression), but I couldn't find any usage suggestions (other than the usage when run without options) and it doesn't complain when run. - On a different system I've tried using a newer iscsi version from http://www.cs.huji.ac.il/~danny/ftp/freebsd/ but it didn't make any difference. Is that still preferable?

Some help would be appreciated.

Hi Rolf, I just ran a bunch of tests, like yours, without any problem. my setup: the target is a NetApp, the host runing the initiator is an AMD Phenom(tm) II X6 1090T Processor, running a very resent 8.1-PRERELEASE with 4GB of RAM so that "vfs.zfs.prefetch_disable" is true, so maybe you can try disabling it? appart from that, maybe you can check EqualLogic's logs. HTH, danny PS: you should use the latest iscsi-2.2.4.tar.gz

Hi Danny,

thanks for your reply. I've just tried again with vfs.zfs.prefetch_disable=1, but it makes no difference. I also don't expect zfs to be my problem, so I've just had the idea to try ufs with the following result (still on stock 8.1-PRERELEASE):

# newfs /dev/da2 /dev/da2: 20490.0MB (41963520 sectors) block size 16384, fragment size 2048 using 112 cylinder groups of 183.72MB, 11758 blks, 23552 inodes. super-block backups (for fsck -b #) at: 160, 376416, 752672, 1128928, 1505184, 1881440, 2257696, 2633952, [...] 41388320, 41764576 # fsck -t ufs /dev/da2 ** /dev/da2 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes PARTIALLY ALLOCATED INODE I=94272 CLEAR? [yn]

*ouch*

interesting is the fact that this time it seems to be very repeatable. I can even have fsck fix the problem (and subsequent fsck are fine), but after a newfs, fsck complains about this inode.

There is nothing in the EqualLogic's logs except for connect and disconnect entries. Also, I'm using a different volume on the same EqualLogic from a different machine running Ubuntu Linux with open-iscsi and fuse-zfs with no problems (except less performance ;P), so I don't suspect a hardware problem.

I guess I'll spend some time looking at a tcpdump of the newfs/fsck test, but it will be a while until I understand all the protocols involved. Any other suggestions would be very welcome.

Thanks, Rolf.