In this final installment of our series looking at real-world unified monitoring benefits shared by some of our largest customers, it is not much of a surprise that faster incident resolution is the focus. This benefit is perhaps the one most commonly touted when it comes to unified monitoring. But really, it’s what ANY monitoring solution is supposed to do for an organization – ensuring that you limit downtime of business services caused by performance and availability issues. The reason that unified monitoring can achieve better results than traditional monitoring solutions is the built-in understanding of relationships between your infrastructure and services.
Because of this relationship insight, two key capabilities that help reduce Mean Time to Resolution (MTTR) are much more effective with unified monitoring. The first is root cause analysis. Unified monitoring platforms keep an up-to-date model of all of the resources in your environment, as well as how those resources factor into service delivery. So it’s easy for an administrator to quickly drill down to the root cause of service disruptions once an alert is received. One console gives you all the information you need – there is no need to hop between different point products.
The second important capability is event filtering. It is not unusual for a single component failure, such as a fan failure, to cause a cascade of failures, resulting in an event storm totaling thousands of individual events. Because unified monitoring has a holistic view of the environment, it can automate event filtering so that only the relevant, service-impacting events get through. Even better, because managed resources are defined in the resource model, each event comes with a rich set of information about what exactly has gone wrong with the device, making it easier to not only identify the source, but to know how to fix it.
By using the root cause analysis and event filtering and enrichment capabilities of Zenoss Service Dynamics (ZSD), one of our large enterprise customers was able to reduce their Mean Time to Response (MTTR) by 85%. That means instead of dealing with an average 6-8 hours of downtime while IT worked to resolve service issues, they are able to get services back online in less than an hour. More importantly, they are using unified monitoring to prevent issues from occurring in the first place. The ZSD unified console gives them the ability to identify at risk infrastructure and resolve issues before services are affected. Without having a platform that understands the relationships between infrastructure and services, they’d never be able to avoid future outages.
We hope these real-world examples that customers have shared with us have been helpful – hit the comments section if you have any questions, and we’ll be happy to respond. If you haven’t already, make sure to read Part 1 and Part 2 of our blogs in this series. You can also check out our 4 Profiles in Unified Monitoring Success paper.