| From | Sent On | Attachments |
|---|---|---|
| Maninya M | Feb 14, 2012 6:23 am | |
| Julian Elischer | Feb 14, 2012 8:56 am | |
| Jason Hellenthal | Feb 14, 2012 9:05 am | |
| Joshua Isom | Feb 14, 2012 9:12 am | |
| md...@FreeBSD.org | Feb 14, 2012 9:20 am | |
| Brandon Falk | Feb 14, 2012 9:25 am | |
| Rayson Ho | Feb 14, 2012 9:26 am | |
| Eitan Adler | Feb 14, 2012 10:04 am | |
| Uffe Jakobsen | Feb 14, 2012 10:43 am | |
| Julian Elischer | Feb 14, 2012 3:00 pm | |
| Jan Mikkelsen | Feb 14, 2012 3:50 pm | |
| Devin Teske | Feb 14, 2012 4:20 pm | |
| Rayson Ho | Feb 14, 2012 4:53 pm | |
| Jim Bryant | Feb 14, 2012 5:34 pm | |
| Jim Bryant | Feb 14, 2012 5:38 pm | |
| Julian Elischer | Feb 14, 2012 9:40 pm | |
| Da Rock | Feb 20, 2012 6:32 am | |
| Dieter BSD | Feb 20, 2012 10:57 am | |
| per...@pluto.rain.com | Feb 20, 2012 11:12 pm | |
| Julian Elischer | Feb 21, 2012 12:22 am | |
| Dieter BSD | Feb 24, 2012 1:09 pm | |
| Adam Vande More | Feb 24, 2012 1:28 pm |
| Subject: | Re: OS support for fault tolerance | |
|---|---|---|
| From: | Dieter BSD (diet...@engineer.com) | |
| Date: | Feb 20, 2012 10:57:58 am | |
| List: | org.freebsd.freebsd-hackers | |
Rayson writes:
The question is, are we planning to handle >95% of the errors for >99% of the hardware we run on, or are we really planning to spend years trying to design something that would require special hardware support?
I assume this started as: "Oh look, most CPUs have multiple cores these days, maybe we could play with fault tolerance". Which could be useful if CPU cores failed a lot, but in reality what fails is disks, disks, controllers, disks, random other things, and disks. Assuming you have avoided the garbage-quality stuff, and have the system on a UPS. If you have enough ports you can add more disks and mirror or some other version of RAID.
The next step is to duplicate everything. Not by looking for a mainboard with redundant everything, but by simply adding another computer. And rather than getting two of the same machine, you're better off if they are different, so that they don't have the same bugs.
The problem then is how to feed both machines the same inputs, and compare the outputs. Do we need a third machine to supervise? Which then leads to the issue of how to avoid problems when *it* breaks. Can we have each machine keep an eye on the other, avoiding the need for a third machine?
_______________________________________________ free...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "free...@freebsd.org"





