By now, you most likely heard that Amazon suffered an outage yesterday of their industry leading Cloud services. The news of this outage transcended technical news publications and the blogosphere only to land them on the front page of CNNFN.com. How could this happen? Can the Cloud really go down? Where do you turn for help? Information?
Over the last few years, I’ve come to dislike the word Cloud. Even more amazing to me is the number of IT professionals that feel the same way. Why? The term Cloud implies a simplicity and carefree nature that discounts the complexities of what it takes to build and maintain a Cloud.
Simply put, Cloud computing takes the complexities of networking, compute, storage, applications, and security and abstracts it into new software layers. While these abstraction layers have the ability to dramatically improve efficiencies and cut costs, it also requires new levels of integration, orchestration, and monitoring. After all, Amazon’s services have an SLA of 99.95% that equates to over 4 hours of downtime a year.
To put it another way, let’s think of Amazon as a utility such as the electric company. When you come home and flip the light switch you expect the light to turn on. However, sometimes the power goes out and you must quickly adapt. Sure, you’ll call the electric company to be greeted with an automated voice response system that may give you an estimated time of repair. Otherwise, forgive the pun; you are in the dark until voila, the lights come back on. Is Cloud computing really any different? Is it not simply a black box that we pay a monthly fee and enjoy the services wrongly thinking that they will never go down?
Public, Private, or Hybrid Clouds require more than SLAs and guarantees from the Cloud providers. They require Cloud Instrumentation that provides you, the service owners, with the ability to quickly adapt and turn outages into incidents. It must not only monitor the services but also be able to understand the dynamic nature of the Cloud itself.
Just as a pilot wouldn’t fly through a storm without their instruments, you can’t offer Cloud services without instrumentation. The key to this instrumentation is a dynamic, model based, event driven, and agentless monitoring platform that is heterogeneous and crosses all classic IT silos giving you a true single pane of glass. Take the information from Amazon’s APIs, combine it with performance data, SLAs, and events from your services and make proactive decisions about your business independently from the Cloud’s black box.
Today Amazon had an outage; tomorrow another public Cloud may go down. Public, Private, Hybrid, or whatever else you want to call them Clouds will go down. The key is how you use your instruments to navigate the storm and react to the outages themselves.
Amazon’s outage is great news for the Cloud as it reminds us that Clouds are a serious business. Amazon’s Cloud is still one of the industry’s best and with the proper instrumentation it can be a wonderful addition to your Cloud strategy.
Image Credit: Bert Kaufmann