Backups: a cautionary tale

by Charles Miller on January 6, 2009

There's probably some bad karma coming my way for kicking someone when they're so very down, but some situations just seem to embody the aphorism: “It could be that the purpose of your life is only to serve as a warning to others.” Such is the fate of blogging provider Journalspace. (mirror)

Journalspace is no more.

DriveSavers called today to inform me that the data was unrecoverable.

Here is what happened: the server which held the journalspace data had two large drives in a RAID configuration. As data is written (such as saving an item to the database), it's automatically copied to both drives, as a backup mechanism.

The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that's gone bad, and the server is back to operating with two redundant drives.

But that's not what happened here. There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten.

The first lesson here is that if you rely on any service ‘in the cloud,’ you should be very, very interested in how they are keeping your data safe. In my not so humble opinion, SaaS providers should be required, if not by law then by industry standard practice, to provide a detailed statement of their disaster recovery provisions. The provider should then be legally liable if they don't follow their stated procedures.

Or in other words, SaaS providers should be allowed to provide whatever shoddy service they want to, so long as they let you know about it clearly in advance.

The second lesson here? RAID is not a backup mechanism. It's really that simple. RAID mirroring protects you from one failure state—a single drive crash—and because that happens to be the most common cause of data loss people seem to think that's enough. Unfortunately, the number of situations in which data loss can wipe data all at once off your entire RAID array is legion. Not only are the drives generally in the same enclosure and on the same power supply, any problem that occurs outside the hardware itself will be faithfully mirrored to both drives by the RAID controller before you even know it's happening.

As jwz pointed out in his classic post about backups, “The universe tends towards maximum irony. Don't push it.”

Previously: Ragged Fringes

Next: Censorship Fail