Backups: a cautionary tale

January 6, 2009 11:23 AM

There's probably some bad karma coming my way for kicking someone when they're so very down, but some situations just seem to embody the aphorism: “It could be that the purpose of your life is only to serve as a warning to others.” Such is the fate of blogging provider Journalspace. (mirror)

Journalspace is no more.

DriveSavers called today to inform me that the data was unrecoverable.

Here is what happened: the server which held the journalspace data had two large drives in a RAID configuration. As data is written (such as saving an item to the database), it's automatically copied to both drives, as a backup mechanism.

The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that's gone bad, and the server is back to operating with two redundant drives.

But that's not what happened here. There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten.

The first lesson here is that if you rely on any service ‘in the cloud,’ you should be very, very interested in how they are keeping your data safe. In my not so humble opinion, SaaS providers should be required, if not by law then by industry standard practice, to provide a detailed statement of their disaster recovery provisions. The provider should then be legally liable if they don't follow their stated procedures.

Or in other words, SaaS providers should be allowed to provide whatever shoddy service they want to, so long as they let you know about it clearly in advance.

The second lesson here? RAID is not a backup mechanism. It's really that simple. RAID mirroring protects you from one failure state—a single drive crash—and because that happens to be the most common cause of data loss people seem to think that's enough. Unfortunately, the number of situations in which data loss can wipe data all at once off your entire RAID array is legion. Not only are the drives generally in the same enclosure and on the same power supply, any problem that occurs outside the hardware itself will be faithfully mirrored to both drives by the RAID controller before you even know it's happening.

As jwz pointed out in his classic post about backups, “The universe tends towards maximum irony. Don't push it.”

3 Comments

Mild disagreement. The first lesson here is that you should be very, very interested in how you get your data back from storage in the cloud.

Notwithstanding the JournalSpace fiasco, the most likely catastrophic failure mode for many online SaaS providers is a failure of business model. (In fact it may be argued that JournalSpace was exactly that, assuming that they were not able to hire/retain decent system administration).

No amount of rigorous data protection is going to help if they go out of business.

This is one of the many reasons why data export is the number one feature that I look for for any cloud service.

Good point. It sort of all goes back to something I said a while back. Ideally I'd want to separate the storage of my data from the services I choose to add value to that data. So if Flickr ever went tits up I could just tell Flickr's successor: "My photos are all here. Have at them." (And similarly if my data provider went down, I'd have one backup of all my stuff ready to move to the next host)

Digital Railroad already had the business model failure and it was similarly catastrophic: their hard drives were gone within days.

My related blog entry is at http://puzzling.org/logs/thoughts/2009/January/3/backup-policies

Comments are no longer being accepted for this blog entry. If you really want to make your voice heard, you can always email me.

Previously: Ragged Fringes

Next: Censorship Fail