In today’s cloud-filled world, the conversation around operations management is – as it should be – all about service assurance. Can IT deliver the server, storage, networking, and application resources necessary to meet business needs? The focus on service assurance isn’t really a new concept – it’s been talked about since the transition from mainframe to client server architectures. But how it is talked about has significantly changed over time, reflecting the reality of datacenter evolution. Because of the emergence of virtualization, cloud, and converged infrastructure, how your organization talks about delivering operational excellence demands a new conversation. But what shape should that conversation take? Well, let’s start by looking at a few things that have been said in the past.
Enterprise Nirvana
At the beginning of the 1990’s, monolithic systems management frameworks from IBM, HP, and CA were the loudest voices in the room when it came to how enterprises approached IT operations management. They promised to stem the proliferation of elemental management tools in the enterprise, simplifying operations and giving businesses the first real insight into overall corporate health. Companies were sold on this vision of super-centralized management, buying up giant contracts that covered every possible nook and cranny of their datacenter. This was the pathway to IT service nirvana. Systems administrators would reign supreme at consoles that gave them unprecedented control over vast domains. World peace would surely follow.
Enterprise Retrenchment: Stay Out of my Yard!
Of course by the end of the decade it became clear that “vision” was pretty much the sum of what the framework solutions offered to companies. Analysts estimated that of the bundled product suites sold, only 25% -50% of the technologies included were ever deployed – the rest became shelfware. The integration work need to bring these monoliths into production with a modicum of the promised functionality was overwhelming. So the industry shifted gears – surely what we needed to be talking about was best-of-breed point products! These vendors could offer a better solution for meeting the service requirements of the business, because they only had to deal with their chosen piece of the stack. The conversation shifted to service assurance for your applications OR your network OR your server infrastructure. Siloed infrastructure became standard operating practice – and as long as problems or outages didn’t sit in your stack, you were free and clear. The conversation quickly became: “It’s not my problem, must be yours. Keep your eyes on your own screen! Don’t step on my grass!” Long live siloed management! Commence with world peace!
What does the conversation need to be today?
In light of these (oversimplified, to be sure) past conversations, where does that leave us today? Pretty much still waiting around for world peace, I’d say. While frameworks under-delivered on cross-enterprise control, siloed solutions introduced their own problems of information overload, obscured visibility, and created departmental cold wars. Despite all the solutions and approaches, the reality is that we’re still in a situation where IT struggles to deliver available, reliable services to support the business. We’re even being a bit more honest about it, recognizing that most businesses aren’t achieving anywhere close to 5 nines availability – or even 3 nines in most cases (see our recent Forrester study). But the good news is that right now we have unusually fertile ground for an interesting and meaningful conversation around IT operations management and service assurance– one that might even get us a few steps closer to world peace
The emergence of virtualization, cloud computing, and converged infrastructure has added new complexities to the IT operations management equation. By design, these technologies actually help deliver services to the user faster and more reliably, getting you a step or two further down the road of service assurance. The difficulties for IT operations come with the real-time allocation of resources across servers, VMs, VLANs, and SANs that complicate how you identify and monitor all of the elements that make up those resources. On top of that, new services are provisioned in hours or days – not weeks or months, and since multiple services might be underpinned by the same physical infrastructure, the impact of downtime can be quickly magnified. Every second matters in determining the root cause of an issue that impacts service delivery. Any conversation about IT operations management needs to reflect the benefits and hurdles of the new technologies entering the datacenter, as well as the need for a radically accelerated time to resolution.
In shifting our conversations to accommodate the current realities, there is actually some value in re-examining past conversations. Identifying past missteps can often help root out obsolete practices. Taking that into account, we think there are four key concepts that should be part of any meaningful operations management conversation:
#1: How can I break down silos get a better view of corporate health? The old framework approaches may have failed on execution – but their intended goal was actually on target. Finding a way to view the health of delivering services across your environment is the only way to understand how your business is operating. Switching between point product management consoles will never give you the holistic view that you need. So the conversation needs to be about the best way to incorporate data feeds from all the disparate elements across your network –but don’t forget about differentiating between truly critical events and “noise.” Without talking about some level of categorization, you’ll find yourself drowning in information, which is just as paralyzing as not having the information in the first place.
#2: How quickly can I understand the impact of operational failures to my business? The conversation today needs to be focused on REAL-TIME service assurance. What is the immediate impact to my business when resources are moved, services go down, etc.? Unfortunately for this one, looking in the traditional management playbook is not really going help elevate the discussion. That’s largely because traditional approaches are grounded in a classic client/server ancestry. One of the best examples of this is the CMDB. The CMDB is an incredibly useful repository for tracking and managing all of your assets, and traditional management approaches rely heavily on a CMDB to take care of inventory and resource mapping. Updates to the CMDB are not real-time, and in many cases require manual intervention. Think of an example where you have a private cloud and a critical application is reallocated from one VM to another. The CMDB has no way of tracking that activity – and using a CMDB-based service map (out of date in this case) might cause you to miss the culprit and lengthen your resolution time! So make sure you aren’t limiting your discussion by assuming there are only “traditional” tools or approaches available.
#3: What amount of effort is required to deploy, integrate, and maintain my approach? While appreciating all the wisdom of point #2, you also need to realize that your conversation is taking place among an existing (and often expensive) set of resources and tools. Any operations management discussion needs to take into account legacy applications, hardware, and tools – even if you’re setting up a brand new, “standalone” private cloud stack – because at the end of the day, if your operations management strategy can’t span all of your resources, you’re going to end up with “tool sprawl.” In our Forrester study, we found that some organizations are using up to 50 or more infrastructure and application management tools – purchased individually of as part of a suite. The amount of time needed to deploy, integrate, and update these tools takes an incredible toll on productivity. So think about if the effort to integrate and maintain is being offset by new efficiencies in operational decision making. If not, then you probably need to keep talking about a better approach.
#4: How am I going to consume this approach to meet operating efficiency requirements? These days, the packaging of an operations management solution can have as much impact on efficiency as features. While on-premises offerings are still largely preferred, SaaS offerings are gaining traction as a way to add IT staff bandwidth and reduce operating expense. Even the way products are bundled can impact efficiency. Bundling of multiple management tools into suites is not a new concept – and the idea of getting additional functionality for “free” can be enticing. But (back to point #3) keep in mind that if the functionality doesn’t give you what you need to meet your business requirements, “free” starts to be pretty expensive.
We hope that these conversation starters help you hone your approach. We’ll be exploring how these conversations can be applied specifically to cloud and converged infrastructure environments over the next few weeks – and what Zenoss is doing to make these conversations even easier. We’d love to address your questions and concerns around these topics in this forum, so feel free to make liberal use of the comments section.