From time to time organizations find that a critical application, say the one that produces a business critical report, slows down and the reason isn’t immediately clear. When the CEO or some Vice President starts calling to find out “where is that report?” the IT department often has to really scramble to find out what’s happening. Why does this happen? What can be done about it?
Modern application architectures
Modern application architectures are built differently than in times past. It used to be common for a complex application to be constructed as a monolithic block of code that executed on a single mainframe or midrange machine. Today, it is far more common for applications to be built as a series of services or tiers rather than as a monolithic block of code.
Each tier may run on one or more machines to improve overall performance, scalability and reliability. These tiers may either run on a machine in the local data center or on a machine in a remote data center. Large companies often run application tiers in multiple data centers.
These tiers may run on a physical system, a virtual system or up in the clouds. All of these structures may be in use in a single workload.
Why do Organizations use this approach?
The reason that organizations use this approach is that early industry standard systems didn’t have the processor power, the memory capacity or the storage capacity to effectively run a complex application. So, organizations split up their applications to make the best use of the available technology. Even though the systems have improved dramatically, this approach is still used.
This approach offers a number of key benefits. Workload management software can optimize how each tier performs. New instances of application services can be started when the workload overruns the capabilities of the services currently running. Idle instances can be stopped so that other functions can use the system.
Instances of application services can be moved from an overloaded system to a more lightly loaded system to improve performance. This might mean an application service was moved from one data center to another.
Application services might be moved from a physical machine to a virtual machine to consolidate functions on a single powerful system. The virtual machines may be moved from the organization’s own data center out to a cloud service provider’s environment to reduce costs or improve performance.
This approach has evolved from being a way to address the limited capabilities of early industry standard systems to being a way to offer extreme performance, scalability and reliability through the use of redundancy.
It is also very, very complex.
When something goes wrong, it can be very hard to find the root cause
One of the key challenges of this distributed, multi-tier approach to designing applications is that a problem can emerge in a low-level function and product wide ranging effects. A storage device or a network link could be over utilized and create performance problems that appear to be a hung or failing service.
When traditional management tools check in on the application service, it appears to be running just fine. It may, however, not be doing anything useful while it waits for a network link or storage device that is far away.
That critical report the CEO wants might slow to a crawl and the IT department might be hard pressed to discover the root cause. IT can spend quite a bit of time determining if it is a machine failure, the failure of an application service, a database problem, a storage problem or a network problem. Even worse, it could turn out to be a problem caused by an IT administrator doing a back up or loading an update at the wrong time of day.
By the time the root cause is found and fixed, customers, partners, staff members and that angry CEO have already experienced a failure.
Organizations need to deploy more intelligent tools
What is clear is that organizations need to deploy more intelligent tools to address the complexity in their IT infrastructure. The tool needs to be able to be dropped into the environment and find what resources are in use without help and without extensive configuration. It must be able to collect useful runtime data and provide quick analysis without also requiring the IT administrator possess a degree in statistical analysis. Furthermore, the tool must have the intelligence to make suggestions to optimize the environment and prevent problems from occurring.
Regardless of the time of day, IT must know what's running, where it is running, and what issues are emerging long before they become problems.