atom feed19 messages in org.opensolaris.zfs-discussRe: [zfs-discuss] Does your device ho...
FromSent OnAttachments
Bryant EadonFeb 10, 2009 10:35 am 
Peter SchullerFeb 10, 2009 10:52 am 
Miles NordinFeb 10, 2009 11:23 am 
Chris RiddFeb 10, 2009 11:27 am 
TimFeb 10, 2009 12:19 pm 
Peter SchullerFeb 10, 2009 1:36 pm 
David Collier-BrownFeb 10, 2009 1:55 pm 
Miles NordinFeb 10, 2009 2:56 pm 
Peter SchullerFeb 10, 2009 3:45 pm 
Bob FriesenhahnFeb 10, 2009 4:08 pm 
Jeff BonwickFeb 10, 2009 4:41 pm 
Toby ThainFeb 10, 2009 5:23 pm 
Miles NordinFeb 10, 2009 6:10 pm 
Frank CusackFeb 10, 2009 7:36 pm 
Toby ThainFeb 10, 2009 8:53 pm 
Bryant EadonFeb 10, 2009 10:28 pm 
Eric D. MudamaFeb 11, 2009 12:25 am 
David Dyer-BennetFeb 11, 2009 7:27 am 
Frank CusackFeb 11, 2009 8:24 am 
Subject:Re: [zfs-discuss] Does your device honor write barriers?
From:Miles Nordin (car@Ivy.NET)
Date:Feb 10, 2009 11:23:35 am

"ps" == Peter Schuller <> writes:

ps> A test I did was to write a minimalistic program that simply ps> appended one block (8k in this case), fsync():ing in between, ps> timing each fsync().

were you the one that suggested writing backwards to make the difference bigger? I guess you found that trick unnecessary---speeds differed enough when writing forwards?

ps> * Write-back caching on the RAID controller (lowest latency).

Did you find a good way to disable this case so you could distinguish between the second two?

like, I thought there was some type of SYNCHRONIZE CACHE with a certain flag-bit set, which demands a flush to disk not to NVRAM, and that years ago ZFS was mistakenly sending this overly aggressive command instead of the normal ``just make it persistent'' sync, so there was that stale best-practice advice to lobotomize the array by ordering it to treat the two commands equivalent.

Maybe it would be possible to send that old SYNC command on purpose. Then you could start the tool by comparing speeds with to-disk-SYNC and normal-nvramallowed-SYNC: if they're the same speed and oddly fast, then you know the array controller is lobotomized, and the second half of the test is thus invalid. If they're different speeds, then you can trust the second half is actually testing the disks, so lnog as you send old-SYNC. If they're the same speed but slow, then you don't have NVRAM.

ps> you could write an ever increasing sequence of values to ps> deterministic but pseudo-random pages in some larger file, ps> such that you can, after a powerfail test, read them back in ps> and test the sequence of numbers (after sorting it) for the ps> existence of holes.

yeah, the perl script I linked to requires a ``server'' which is not rebooted and a ``client'' which is rebooted during the test, and the client checks in its behavior with the server. I think the server should be unnecessary---the script should just know itself, know in the check phase what it would have written. I guess the original script author is thinking more of the SYNC comand and less of the write barrier, but in terms of losing pools or _corrupting_ databases, it's really only barriers that matter, and SYNC matters only because it's also an implicit barrier, doesn't matter exactly when it returns.

so....I guess you would need the listening-server to test SYNC is not returning early, like if you want to detect that someone has disabled the ZIL, or if you have an n-tier database system with retries at higher tiers or a system that's distributed or doing replication, then you do care when SYNC returns and need the not-rebooted listening-server. But you should be able to make a serverless tool just to check write barriers and thus corruption-proofness.