Being new to Zenoss, I’m still trying to wrap my head around the service-oriented way of doing
things. When it comes to Microsoft Exchange, for example, my head still wonders: What are the top 10 Microsoft Exchange Server metrics? How important is monitoring the size of the message queue? What’s an acceptable threshold for LDAP read time, in ms? What’s a good target average delivery time, for the last 10 messages? Really, who cares about all of this stuff?
Yes, the above metrics are interesting, but that’s really about it. The overall health of the Messaging service trumps all of these metrics, any day of the week. It strikes me that there are really two ways to monitor the health of Microsoft Exchange; The usual suspects, and service heath.
The Usual Suspects – This is the method I know, and have become accustomed to. It’s pretty straightforward:
- Read up on the application you’re monitoring, so that you can understand its metrics (eg. What’s the delta between local and remote queue).
- Scan the Internet to find a list of important metrics and thresholds. If you’re lucky, the monitoring product you use will include some pre-baked metrics.
- Plug them into your environment, and see how far-off they are. Your mileage will vary, and you’ll undoubtedly have to tweak thresholds to match your environment.
- Focus on monitoring the most important metrics, for your specific environment.
- Wait to get a call from someone stating that mail service is down, because steps 1-4 are relatively ineffective. Once messaging service is impacted, however, the aforementioned metrics can usually help you to isolate a problem.
The Service-Oriented approach – This is the method that Service Dynamics uses:
- Construct a dynamic visual graph of your messaging infrastructure incorporating: hardware, OS, apps, networking, etc.
- Monitor the health of said service.
- If the service begins to degrade, allow administrators to drill down on the root causes, before an actual outage.
Admittedly, there will be times when the usual suspects approach is useful - and Service Dynamics does support that approach, through SNMP, WMI, etc. We will give you the metrics you need, to help an Exchange expert determine what Exchange-specific problems are occurring. But what if an Exchange expert isn’t on deck, at 3am? And what if the problem that’s impacting MS Exchange, doesn’t really have much to do with MS Exchange? You’ll lose valuable minutes or hours, as IT Operations staff on-deck try to figure out what an acceptable outbound message queue looks like. And what if the application isn’t as ubiquitous as MS Exchange, but something more specialized and arcane? Clearly, there’s a need to model a service and enable staff that aren’t experts in the associated applications to monitor them. Again, Service Dynamics supports both approaches.
Now, what are your top 10 MS Exchange Server metrics? Or, do you still care?
Image courtesy of Caro's Lines