Share what you know with millions of people
Focus is the best place to turn what you know into remarkable content
0
Lightning strikes Amazon's data center in Dublin, Ireland. Could this incident have been avoided?
Is it possible to design a data center to protect it from lightning?
Source: http://www.zdnet.com/blog/saas/lightning-strike-zaps-ec2-ireland
Events
- Dos and Don'ts of Small Business Marketing May 29 @ 11 am PT
- Lead Nurturing 202: The Next Generation May 31 @ 11 am PT
- The Tricks to Paid Media June 6 @ 11 am PT
- Display Advertising for Brand Awareness June 20 @ 11 am PT





6 Answers
Agree with Andrew on the lightning... Mother Nature is hard to control. And also agree with his question on redundancy.
But what's more problematic from my perspective is the time and effort it is taking to recover EBS instances. Obviously, if the physical infrastructure is dead in the water then there's not much you can do about it. But we've had lots of discussion lately here on Focus and other forums about cloud reliability, recovery, SLAs, etc. This looks to be turning into another example of how not being prepared for a worse case scenario can have a (significant) negative impact on your business. OK, I can hear you out there saying "hey, this is a *hundred year* event." But those hundred years are starting to come around pretty often.
Recovery in the cloud is not automatic, and needs lots of forethought, planning and practice. Read the AWS dashboard to see what I mean.
Avoiding lightning? I'm sure there are ways to lower the likelihood, but I don't know that there is a foolproof anti-lightning option that is cost-effective.
A more pertinent question, for me anyway, is why wasn't there enough redundancy that the lightning strike only took out half of the power generation infrastructure?
"why wasn't there enough redundancy that the lightning strike only took out half of the power generation infrastructure?"
Great point to highlight.
One thing I would point out: I'm sure future designs and subsequent upgrades to these data-centers will ensure this level of redundancy for "black-swan" events.
But despite all our best efforts, events like this will happen. Japan Nuclear, Ireland Cloud Outage, or even U.S. Downgrade (of what was once considered the "benchmark" financial note)... no matter how unpredictable, or how much we plan for them or mitigate against it: black-swan events will happen.
The best plan is the have your contingency for failure in place. Something AWS is clearly doing a great job of doing, albeit slower than some customers would like... those customers who had multiple providers to scale for such events or realistic expectation walking into their SLA (pricing up-time guarantee, not physical guarantee) weathered these storms.
Outages happen. The key is how fast the platform and applications can recover from them. And that depends on how they’ve been architected and built.
If the cloud platform is enterprise grade--with inherent fault tolerance and built in disaster recovery--customers won’t be nearly as vulnerable.
A few steps you can take to minimize your downtime in the event of an outage include:
• Look for an enterprise grade platform with inherent fault tolerance architected into ALL layers of the stack including server, storage, networking and virtualization. Many providers have redundancy in only one or two of these layers.
• Architect your application to take advantage of the cloud--don't assume the cloud provider will never fail. Build risk mitigation and fault tolerance into your app via clustering, DNS load balancing, etc.
• Realize that outages do happen - so what is important is mean time to recover (MTTR). Look and see what the provider's track record is.
Yes, it is certainly possible to design data centers to survive lightning strikes or many other natural disasters.
The real question is if it is cost effective to do so ?
Outside of a certain PR hit, how many customers will Amazon lose as a result of this outage ? How much potential revenue will they lose as other customers stay away from EC2 because they are nervous around Amazon minimizing EC2 data center redundancy expenditures ?
If the actual revenue / potential revenue losses are far less than the significant costs it would take to deploy multiple generators, electrical distribution systems, and fire suppression systems, it turns out Amazon made a good business call in allowing the data center fail due to a lightning strike.
Outages will happen, failures will happen - this is not a problem unique to cloud computing. Designing a solution requires architects to design for failure since failure _will_ happen. This means avoiding single points of failure (including designing, implementing and testing disaster recovery scenarios for major failures such as the loss of a datacenter). Designing for failure at a large scale can be very expensive and stakeholders may be reluctant to fund such initiatives. When faced with such a dilemma the architect must help stakeholders weigh the cost of designing for failure vs. the cost of solution downtime.
Blaming Amazon is not the appropriate reaction to this issue - a poor workman always blames his tools.
Answer This Question