As an IT marketing guy, I spend a lot of time thinking about the number 9. Truth be told, I don’t always understand much about how developers add more “9's”, but I love to sing their virtues. I’ve become adept at making collateral & multimedia (video showing servers exploding, under ominous storm clouds, are cool) outlining just how much better our “9s” were than the competition, and how much our customers needed to worry about the number 9. After I finish making all that cool Mission-Critical app stuff, I take it to Sales and tell them to ask their customers have any mission-critical applications.
In the past, asking a customer whether they had any mission-critical apps has always made sense, for a couple of reasons. Firstly, Sales could tell the story of how our “9s” are better the competition, while using the aforementioned gloom & doom marketing material. Most importantly, however, a mission-critical application almost certainly meant more money – with redundancy demanding 2x the infrastructure, along with the extra overhead to manage it. Yeah, mission-critical applications were good business!
Nowadays, however, how many mission-critical applications are islands unto themselves? Yes, organizations still have some specific homegrown or off-the-shelf apps that they deem critical. These often run on fault-tolerant hardware, employing some sort of replication and/or HA clustering technology; often spread across two, or ever three, geographically disbursed datacenters. But I would argue that the age of the mission-critical app has come and gone, as the status of the application is not nearly as significant as the status of its associated service.
Yes, closely managing the health of a critical application is still important, but only in the context of its service going down. How often does the service provided by these bullet-proof apps, ones embedded with more “9s” than Charlie Manson’s favorite Beatles song, fall prey to a seemingly insignificant piece of infrastructure that was never deemed mission-critical. Worse yet, how long does it take organizations to find, let alone fix, the root causes of these failures? The fact of the matter is that, in today’s world, millions of smartphone users can have their messaging service go south, because a switch failed, during a routine backup.
Surely, feces will continue to hit the fan, but it won’t always come from the sources we’ve come to expect. The model of over-investing in redundancy of a specific mission-critical database or app, to yield more“9s”, must be weighed against the high-likelihood that a service outage won’t come from what we think of as the traditional mission-critical app. While you can still build an application that will never go down, its service will inevitably fail.
So ensuring the health of the service, vs. the app, is where it’s at. Organizations that embrace the mission-critical app mindset will likely have more difficulty in finding the cause of a service outage, when it does occur – and occur, it will. Perhaps these organizations would do well to jettison that frame of mind, in favor of one in which the service is mission-critical? Adopting this model means that the application that monitors the service is, arguably, the most mission-critical piece of your stack. And yes, I know of a product that can help with that. I’ll be able to better explain how Service Dynamics can help organizations maintain mission-critical services, once I figure out how to make lighting come out of the storm cloud, floating above this exploding server.