Home Others 4 Definitive Ways To Deal With Outages

4 Definitive Ways To Deal With Outages

752
0

by Shankar Ganesh, Marketing Analyst at Freshdesk

Back in 2007-08, I remember when not a week went by without Twitter going down. In fact, sometimes, the fail whale would take ages to load. The good people behind Twitter would apologize and if it was a slow news day, tech media would pick it up. People grumbled,.

But those were the dark ages. People went offline and “cloud computing” was just a bunch of castles in the air. Now, with more and more people spending their days permanently online, companies can’t afford to go offline anymore. Even a minute of downtime means millions lost in revenue and tons of bad PR.

But every now and then, either voluntarily (maintenance issues and such) or involuntary (hackers and DDoS attacks), services go down, problems arise and will have to be dealt with.

Here are four ways where companies shined in the way they dealt with their problem:

1. Buffer – Live blog the post-mortem.

Last October, Buffer users across the world saw spam being posted to their social profiles. When users started panicking, Buffer did not stay silent. Their Happiness Heroes got into action in minutes, and posted real time updates on all of their official channels. They revoked Buffer’s sharing features, and clearly acknowledged they’re getting to the problem.

Buffer’s nimble, distributed team came together on Hangouts on a Saturday afternoon and live-blogged developments as their engineers worked behind the scenes to fix the hack and restore services. It was probably the first time I saw a ‘downtime, sorry’ blog post without an F-bomb in the comments section.

The team also patiently responded to every tweet from their customers, and made sure to inform users how they can get back to speed once everything was ironed out.

2. Gmail and Google Apps – Process to the madness.

Millions of people rely on Google Apps at work, and every minute of Gmail downtime translates to thousands of dollars lost in productivity across the world. But when Gmail went down two weeks ago, people’s pipe dreams of heading home for the day were busted in 50 minutes – because the Reliability Team at Google had processes in place to thwart uncertainties.

While Yahoo! was milking the moment as people flocked to Twitter, the engineers at Google weren’t panicking, according to employees from the Reliability Team who were participating in an AMA on reddit at around the same time. The irony strikes strong, but there were some interesting nuggets everyone could learn from.

As the AMA continued, they clarified that they have a “well-oiled” process for dealing with the unexpected, and have “incident management” procedures to rely on when outages occur. It’s Google, what do you expect?

You may not own something as mission critical as Google’s Apps or Gmail, but it pays to invest in frameworks like ITIL and tools like Service Desks that make sure things are in control when something goes haywire. The processes you must have defined previously when such an issue occurred will help you go a long way when something new goes mishap.

3. Bebo – Use a banner ad to let everyone know.

There’s nothing like an honest apology from the CEO to defuse a downtime aftermath. But how do you make millions of users read it? A banner ad.

When Bebo, the UK social network went down in 2012, users flocked to Twitter to complain. The outage led people to assume that the site is shutting down and the hashtag #bebomemories started trending worldwide. Their community manager had left one month before the downtime, so it didn’t seem like queries were being attended to.

What Bebo failed to do during the downtime, they did later by plastering a site-wide banner ad leading users to a letter from Adam, the then CEO, in which he profusely apologized for the outage and promised to be more careful in the future.

A banner ad particularly worked well for Bebo because it’s a challenge in itself to get a million users to read an official blogpost. A banner ad is a simple, but surefire way to reassure users who rely on you day after day.

4. Sony – Nothing like a heartfelt executive apology.

In what is arguably the most massive security breach in history, hackers stole credit card and personal information from 77 million PlayStation users in mid-April 2011. Sony had to suspend  the PlayStation network as it investigated the intrusion. Users weren’t able to play games or access music and movies for more than a few weeks, as Sony engineers worked hard to put additional security measures in place.

When services were restored after nearly a month, Sony executives at the Tokyo headquarters bowed for several seconds to ask sorry for the security breach. They offered freebies as well – the ‘welcome back’ program provided all PSN users free 30-day access to the PlayStation Plus service, but that wasn’t what was talked about much.

An apology bow is part of Japan’s etiquette – the deeper and the longer the bow, the more sincere the apology is deemed to be. A company of Sony’s size could have gotten away with a note from the CEO hosted on their official website. Or could have just said ‘a hack happens every once in a while, move on’. But the heartfelt apology did go a long way in restoring their reputation after the breach.

 

Shankar Ganesh is a Marketing Analyst at Freshdesk. More than 15,000 businesses use Freshdesk to deliver an exceptional support experience to their customers.