atom feed70 messages in org.freebsd.freebsd-scsiSCSI tape data loss
FromSent OnAttachments
Kern SibbaldJun 1, 2003 10:54 am 
Dan LangilleJun 1, 2003 11:32 am 
Justin T. GibbsJun 1, 2003 1:08 pm 
Kern SibbaldJun 1, 2003 2:44 pm 
Justin T. GibbsJun 1, 2003 3:39 pm 
Matthew JacobJun 1, 2003 5:00 pm 
Matthew JacobJun 1, 2003 5:13 pm 
Dan LangilleJun 1, 2003 6:58 pm 
Matthew JacobJun 1, 2003 7:03 pm 
Kern SibbaldJun 2, 2003 1:28 am 
Kern SibbaldJun 2, 2003 1:29 am 
Kern SibbaldJun 2, 2003 1:57 am 
Kern SibbaldJun 2, 2003 3:45 am 
Dan LangilleJun 2, 2003 4:28 am 
Matthew JacobJun 2, 2003 8:05 am 
Justin T. GibbsJun 2, 2003 8:10 am 
Dan LangilleJun 2, 2003 8:14 am 
Matthew JacobJun 2, 2003 8:21 am 
Kern SibbaldJun 2, 2003 8:27 am 
Dan LangilleJun 2, 2003 9:46 am 
Dan LangilleJun 2, 2003 11:05 am 
Matthew JacobJun 2, 2003 11:11 am 
Justin T. GibbsJun 2, 2003 11:49 am 
Dan LangilleJun 2, 2003 12:06 pm 
Justin T. GibbsJun 2, 2003 12:10 pm 
Matthew JacobJun 2, 2003 1:14 pm 
Dan LangilleJun 2, 2003 2:16 pm 
Matthew JacobJun 2, 2003 2:24 pm 
Kern SibbaldJun 2, 2003 2:46 pm 
Matthew JacobJun 2, 2003 2:55 pm 
Kern SibbaldJun 2, 2003 3:31 pm 
Carl ReisingerJun 2, 2003 3:44 pm 
Matthew JacobJun 2, 2003 3:44 pm 
Dan LangilleJun 2, 2003 6:37 pm 
Kern SibbaldJun 3, 2003 12:28 am 
Kern SibbaldJun 3, 2003 6:07 am 
Carl ReisingerJun 3, 2003 6:19 am 
Kern SibbaldJun 3, 2003 6:37 am 
Carl ReisingerJun 3, 2003 7:01 am 
Matthew JacobJun 3, 2003 7:34 am 
Justin T. GibbsJun 3, 2003 7:51 am 
Kern SibbaldJun 3, 2003 8:05 am 
Kern SibbaldJun 3, 2003 8:11 am 
Matthew JacobJun 3, 2003 9:03 am 
Dan LangilleJun 3, 2003 9:10 am 
Justin T. GibbsJun 3, 2003 9:24 am 
Kern SibbaldJun 3, 2003 9:40 am 
Justin T. GibbsJun 3, 2003 10:03 am 
Kern SibbaldJun 3, 2003 10:19 am 
Kern SibbaldJun 3, 2003 10:34 am 
Matthew JacobJun 3, 2003 11:00 am 
Matthew JacobJun 3, 2003 11:16 am 
Matthew JacobJun 3, 2003 11:39 am 
Justin T. GibbsJun 3, 2003 12:12 pm 
Dan LangilleJun 3, 2003 12:43 pm 
Matthew JacobJun 3, 2003 12:46 pm 
Kern SibbaldJun 3, 2003 1:05 pm 
PostMaster GeneralJun 3, 2003 2:21 pm 
Kern SibbaldJun 4, 2003 12:20 am 
Matthew JacobJun 4, 2003 7:51 am 
Kern SibbaldJun 4, 2003 9:51 am 
Kern SibbaldJun 6, 2003 7:38 am 
Dan LangilleJun 6, 2003 8:59 am 
Matthew JacobJun 6, 2003 11:50 am 
Dan LangilleJun 20, 2003 6:17 pm 
Dan LangilleJul 1, 2003 5:07 pm 
Matthew JacobJul 1, 2003 11:11 pm 
Michael L. SquiresAug 25, 2003 4:16 am 
Dan LangilleAug 25, 2003 9:13 am 
Michael L. SquiresAug 27, 2003 5:27 am 
Subject:SCSI tape data loss
From:Justin T. Gibbs (gib@scsiguy.com)
Date:Jun 1, 2003 1:08:26 pm
List:org.freebsd.freebsd-scsi

Hello,

I'm the author of a GPL'ed network backup program called Bacula (www.bacula.org). For the last three years, it has been working flawlessly on Solaris and Linux systems. When users attempted to use it recently on FreeBSD, it did not work. I subsequently modified Bacula so that it would work on FreeBSD -- basically, I had to program around some important differences in the way FreeBSD handles EOFs compared to Solaris and Linux. At some point in the future, I would like to discuss the problems I had in detail, if that interests you.

I would be interested as I'm sure would other readers of this list.

We've now worked on this problem for several weeks, and I believe we have now isolated the problem (data loss) to occur when the end of medium is reached.

We have now confirmed that Bacula correctly wrote to the tape, but when it was read back 13 blocks of 64512 bytes were missing.

Below, I have listed in pseudo-language what Bacula was doing. Each write with the exception of the first block on the second tape is 64512 bytes:

first tape mounted write(block 1) ... write(block 1554); write(block 1555); <=== block lost ... <=== blocks lost write(block 1567); <=== block lost write(block 1568) failed because of EOM detected ioctl(MTIOCERRSTAT);

What was the residual reported by MTIOCERRSTAT? If the device is in buffered mode, that residual can be larger than the last transaction that was failed. My guess is that either MTIOCERRSTAT is not properly pulling the residual out of the info field, or you are not backing up far enough in the data stream when the EOM occurs.

I have verified that Bacula did successfully write 1567 blocks to the first tape, but in reading back the tape, blocks 1555-1567 are not on the tape.

Now, the big question is: what caused the loss of those blocks? The most likely causes I can think of are:

1. Bacula is doing something (e.g. MTIOCERRSTAT, or the MTBSF) to cause the data to be lost. If this is the case, it is something specific to FreeBSD since this sequence of commands works on both Solaris and Linux (except that MTIOCERRSTAT is MTIOCLRERR on those systems).

Perhaps both Linux and Solaris force the tape drives to run in unbuffered mode?

2. The SCSI driver is doing asynchronous writes (very bad) and the End of Medium is not sent to Bacula until many writes after the end of the tape.

Disabling the tape drive's write buffer kills performance. All of the information required to handle buffered writes should be available to you.

Perhaps we should also implement the MTCACHE/MTNOCACHE opcodes so that userland apps can control this. It's not clear if this is exactly what they were created for, but it may be better to use these than to add some other opcodes.

3. The SCSI driver has some sort of bug that causes buffers to be lost.

I doubt that this would occur only at EOM.