Zen and the Art of Monitoring

I’ve been thinking recently about complete stack monitoring and how a good solution should be like a teenager – it can primarily take care of itself, but occasionally needs to have someone step in and give it direction.

Now as a former Network Engineer and Exchange Administrator (yes, both), I have some very different ideas about what constitutes complete stack monitoring.  If you take either half of my previous roles, you get very different demands.  The Network Engineer in me asks, “Can you get from one device to another?  Was there any packet loss along the way?  How’s the latency between sites?”  The Exchange Administrator in me asks, “How many people are talking to my CAS Array?  Are all my database copies healthy?  Can people open Outlook Web App from anywhere?”  I tend to think of these are two different sides of the same coin – one is based on the technical (unsympathetic) and the other is based on the user experience (sympathetic).

In this past life, I was also responsible for the network monitoring solution.  That was my introduction to SolarWinds Orion.  By no means was it the only solution that we had, but it was the one where I had the most free reign.  Because of that I got to give it the “care and feeding” necessary to keep it growing.  It eventually supplanted (almost) all of our other monitoring company-wide.  This was a personal achievement of mine of which I was particularly proud.  That’s not to say that there weren’t bumps and scrapes along the way – particularly in the beginning.  Early on, I fell into the trap of “let’s watch and alert on everything!”  So for a period of about six weeks, I hated the monitoring solution.

A few months ago, I was asked to be part of a session delivered at SolarWinds’ annual virtual conference, THWACKcamp, entitled “Don’t Hate your Monitoring.”

I wish that younger me could have seen this session… but if that was the case, would I have ever needed to record the session?  Damn you, Bootstrap Paradox!

I had the pleasure of presenting a bunch of information with cohorts @leonadato and @DaveJosephsen centering around some best practices you should use in any monitoring system.  In that session we covered a bunch of information, and I won’t even try to summarize it here.  It was actually so much that we had to have an extra session to cover some specifics.

Much of this took me back to the first days when I was learning how to care for a monitoring solution.  There were so many things that I wanted to do right out of the gate, but I needed someone to step in and tell me to slow down.  Unfortunately, there wasn’t someone like that so my inbox would occasionally get flooded with 12,000 emails during the overnight.  At one point I created an inbox rule to move all alert messages to a different folder.  That’s pretty much the definition of “fail” when it comes to a monitoring solution.

Those years running the monitoring solution taught me a few lessons, but most importantly it taught me the reason that IT even exists.  An IT Department exists to facilitate business.  That business is made up of end users who use a variety of systems and need them to perform to certain standards.  Those end users are the cogs in the organizational machine and the IT systems are just the oil that keeps everything running smoothly.  New technology for the sake of new technology is cool – but if it doesn’t move the business forward, it’s just expensive and unnecessary.

Going back to the title of this post, Zen is achieved when end users can continue to do their work without ever thinking about the technology.

How can this be achieved?  Prevent slow everywhere – slow is the new down.  Remember to review your environment from the eyes of the end user, not the IT professional.  For some of us, this is easier than others, but work towards this.  After all IT facilitates human interaction, it doesn’t replace it.

Leave a Comment