My grandmother passed away over a month ago, and out of nostalgia (and also because my automatic drip coffeemaker made weak, lukewarm coffee, and I cracked yet another French press), I bought a stainless steel Farberware percolator. Grandma liked her coffee “H-O-T, hot,” and so do I.
For my taste, my first brewed pot was excellent. I followed Grandma’s rule of keeping the stovetop percolator on low-to-medium heat and eyeballing the clear knob for the right color in between reading Mad Men recaps on Instapaper. The resulting brew was hot enough to burn my tongue and made my cheap coffee taste full-flavored - certainly no more bitter than Starbuck’s. And it smelled like Grandma’s last apartment, minus the brisket.
So what is the connection between these competing brewing methods and the #monitoringsucks meme? I recently spoke with Knewton’s Dave Zwieback, the main subject of my last post, Seeking Out the IT Generalists in the Sea of DevOps, and he said that most of today’s monitoring tools supply a “daily drip of data” that fail to provide a coherent and repeatable means of correlating that data.
“Unless you’re in active troubleshooting mode, you’re just not putting it together,” Zwieback says.
The monitoring solutions generate so many alerts based on scripts you’ve set up on, say, Nagios that it becomes overwhelming. Each drip of data seems to hold the same importance as any other drip, leaving you with a weak solution that can scorch you if you leave that data in the pot too long.
Cloud only complicates the situation. In the old rack-em, stack-em days, IT people could monitor each physical server for aberrant processes and respond to those problems. If something went wrong, you could patch or replace the software on a given server or remove the offending server and install the same software on a replacement box.
But in a cloud server environment, “the notion of persistence goes out the window,” Zwieback says.
Zwieback explains:
In a cloud environment, like Joyent or AWS, you may have 10 web servers that we’ll call System A1-to-A10. At some point you want to install a new version of software, so you load them on servers B1-to-B10. So for a short time you have 20 servers running, and then you kill all the old instances. So you still end up with 10 servers, but they’re not considered the same 10 servers to most [current monitoring systems].
Or in your A1-A10 scenario, A5 fails, and it’s replaced with A11. There are thousands of instances like this in the cloud. The cloud model has instances going away all the time, which causes problems for monitoring systems. It’s not like having 10 systems spinning up and down.
So it’s no surprise that #monitoringsucks is percolating as a movement and that DevOps people are spearheading this push.