Mar 15, 2007

Of sloping baths and disk drive failure

Disk drive manufacturers quote MTBF (Mean Time Between Failures) of around a million hours under ideal conditions, suggesting a failure rate of less than 1% per year, but some studies show significantly worse performance (2-10% failure p.a.) in the Real World™. It seems the “bathtub” reliability curve has a sharply upward sloping or even stepped bottom, not the long flat period of stability often assumed.

Thanks to George Spafford's Daily News for both the above links :-)

If your data are vital and their availability is critical, the studies suggest the value of monitoring drive age, error rates and temperatures carefully. Also techniques such as RAID will help. However, the unpredictability of disk failure also implies the need to have contingency plans, backups and hot-swappable drives. Or, if money is no object, solid state disks might be the way to go (plus cosmic ray shielding!).

