Blog

The Roof (and The Planet) is on Fire

firehose.png

It isn’t very often that you get to watch a real live-action example of true crisis PR. But when you do, it’s always instructive, and usually amusing in that “bang head against wall, wash, rinse, repeat” kind of way.

This weekend, The Planet experienced a catastrophic outage when a transformer at H1, their Houston data center, exploded, blew out the walls of the electrical room, and started a fire. The building was evacuated, the fire brigade called, and at the insistance of the fire chief, all power – including backup power – shut down.

The good news: nobody was hurt, and data on all 9,000 servers was secure. The bad news: none of the servers had any power.

I do not consider this to be a particular failing of The Planet. Catastrophes happen, even with the best of plans. While it sucks for the people affected, and we were lucky to not be on that list, at the end of the day people, not hosts, are responsible for data resiliency and catastropic backup plans. If you’re not prepared to pay for the technical know-how and costs associated with that, then you’re either not running anything mission critical (my blog: not mission critical) or you’re going to have to be prepared to suck it every now and then.

Amidst all the bitching, moaning, threats of law suits, and small contingent of cheer leading, The Planet did a lot of things right:

  • Within hours, they promised updates every sixty minutes on their forum, and delivered them – even if all they had to report that there was no update.
  • They let customers know very early on that their SLAs would be honoured and that refunds and credits would be calculated as soon as normal operations resumed.
  • They pulled in manpower from their vendors, contractors and staff in the middle of the night on a weekend and worked for 28 straight hours to rebuild a power system virtually from scratch and manage a huge volume of support calls.

There were also some extremely odd choices made, some of which are harmful to their PR. As everyone playing along at home can guess, these failures were primarily in the area of transparency:

  • NONE of these updates appeared on The Planet’s blog. Not a single word. If you were pissed off enough to hunt down and root through their customer forum, you got info. If you go to their blog, which is where you’d expect to get crisis updates, you get bupkis.
  • For a particular set of legacy customers, both NS1 and NS2 nameservers were both hosted in the H1 data center. This was an example of EPIC FAIL on the part of The Planet, one which they remained basically silent about while the 3,000 customers hosed by this oversight were still without websites.
  • They did not post photos of the crispy data center. Seriously, guys: pics or it didn’t happen.

But there is one more thing they didn’t do that was a complete no brainer. Let us assume that several thousand people were on the phone, screaming for their boxes to be rescued from the embers and transported to the nearest operating DC. Let us also assume that there are only a certain number of boxes The Planet can fit into the racks in Dallas. At that point, you either take the customers who make you the most money, or knowing that you’re going to lose a boatload of customers one way the other other anyway, you take the customers who are in a position to do the most damage to your reputation.

They should have located the servers that host b3ta, the world’s most awesome and snarkiest website for nerds and geeks, picked them up, put them in a car, driven them to the Dallas data centre and prayed to the gods of DNS propagation for mercy.

But they didn’t. And three days later, b3ta is still in the “hosed” camp, without a website* and instead running an emergency forum, where the punters are predictably making “The roof, the roof, the roof is on fire” jokes and creating graphics that will commemorate this event for far longer than The Planet’s lack of blog entries.

This is really not a group of people you want to fuck with.

  
Share on Facebook del.icio.us Stumble Share on LinkedIn Share on Twitter Share/Bookmark
   03 Jun 2008 | In: Technology |

Leave a Reply