Digital data streams and code flowing across a 3D grid in a futuristic interface
|

Inside My Observability Stacks (and Why I Built them this Way)

At some point, reading about observability stops being enough.

You actually have to build something.

In my case, that turned into building not one, but two different stacks that I’m running side by side. One built around Elastic, the other taking a different approach with Prometheus. The goal is not to pick a winner. It is to understand how they behave in the real world.

Which has been… enlightening.

It has also confirmed something I probably should have expected.

Everything works great until you actually try to use it.

Why I Built My Own Stack

There are a lot of solid observability platforms out there.

I have worked with several directly over the years, and I have been part of more than a few “bake-offs” where you line up tools side by side and try to figure out which one actually meets your needs.

Running two stacks at the same time is basically a long-running bake-off I control.

Same environment + same data sources + same questionable decisions = different implementations.

Most tools can:

  • Collect data
  • Store data
  • Show dashboards

That part is not the issue anymore.

The question I care about is:

“What happens when someone actually tries to use this to solve a problem?”

What I’m Running (at a High Level)

For this post, I’ll focus on the Elastic-based stack, since it’s the more traditional “everything in one ecosystem” approach.

At a high level:

  • Elasticsearch for storage and indexing
  • Kibana for dashboards and exploration
  • Logstash as the central ingestion, parsing, and enrichment pipeline
  • Filebeat, Metricbeat, and Heartbeat for collecting logs, metrics, and uptime data
  • Goflow2 for network flow visibility
  • Unpoller feeding UniFi metrics into the system
  • A small scheduler container keeping enrichment data current

All of it is running in Docker, tied together with a compose file that has steadily grown as I kept adding “just one more thing.”

The other stack takes a different approach, which I will dig into separately.

Running both has been useful in ways I did not fully expect.

Why I’m Running Two Stacks

This is not just about building something. It is about understanding the tradeoffs.

By running both stacks in parallel, I get to:

  • Send the same data through completely different pipelines
  • See how each system handles ingestion, enrichment, and storage
  • Compare how easy it is to actually answer questions
  • Notice where one approach feels more natural than the other

It is the difference between:

  • Reading documentation
  • Watching a demo
  • And actually living with the system day to day

Turns out, those are three very different experiences.

Why Logstash Sits in the Middle

In the Elastic stack, Logstash is doing a lot of heavy lifting.

Everything flows through it.

  • Beats send logs and metrics into it
  • Syslog feeds into it
  • Flow data eventually ends up there
  • Enrichment happens there before anything is indexed

That adds complexity, but it also gives control.

Because once the data is in Logstash, you can:

  • Normalize formats
  • Enrich events
  • Translate IPs into meaningful device names
  • Add GeoIP and ASN data
  • Route things cleanly into indices

And this is where things start to matter.

Because the decisions you make here directly impact how understandable the data is later.

Collecting Data Is Still the Easy Part

With this setup, data shows up fast.

  1. Filebeat pulls logs.
  2. Metricbeat handles system and SNMP metrics.
  3. Heartbeat tracks uptime.
  4. Goflow2 streams network flows.
  5. Unpoller fills in UniFi-specific details.

You turn it on, give it a little time, and suddenly you have dashboards full of data.

It looks solid.

It feels like progress.

Which is usually where people stop.  This is also where I stopped for one day and came back a week later.

Which is also where the comparison gets interesting.

Where the Comparison Starts to Matter

When you run two stacks side by side, you stop asking, “Does this work?”

You start asking:

“Which one helps me understand what is happening faster?”

That is a very different question.

Because now you are comparing things like:

  • How easy is it to follow a problem across systems?
  • How much cleanup or enrichment was required?
  • Whether the dashboards help or just look nice?
  • How much context do you have to carry in your head?

This is where the differences show up. And they show up quickly.

Making It Usable Is the Hard Part

No matter which stack I am looking at, I keep running into the same moment.

I open a dashboard and try to answer a simple question.

“What just broke?”

And I find myself:

  • Jumping between views
  • Filtering different data sets
  • Trying to reconstruct context
  • Wondering why I named something the way I did

I have done this on both stacks. Which is both reassuring and slightly concerning. Reassuring because it is not just one tool. Concerning because it means the problem is bigger than the tool.

Why I Built It This Way

Running both stacks is intentional. I wanted to feel where each approach works well and where it breaks down.

If I can:

  • Ingest the same data
  • Enrich it
  • Visualize it in different systems

…and still struggle to quickly understand what is happening…

Then I am looking at the same problem most teams deal with.

Just without the pressure of an outage.

Which is a much better place to learn.

The Glue Still Matters More Than the Tools

If anything, running two stacks reinforced this idea. The tools matter. The connections between them matter more.

  • How data flows
  • Where enrichment happens
  • How things are named
  • How dashboards are structured

Those decisions determine whether the system is usable. You can swap out components and still end up with the same problem if the design does not prioritize understanding. I have managed to do that in two different ways now. Which feels like an accomplishment, just not the kind you put on a resume.

What This Actually Taught Me

Running both stacks side by side made a few things very obvious.

  1. Collecting data is not the bottleneck
  2. Every tool looks good until you try to answer a real question
  3. Ease of understanding is the real differentiator

And maybe the most useful one:

If you are not designing for your users, you are designing for yourself.

And those are not the same thing.

Where This Is Going

In the next few posts, I am going to start breaking pieces of this apart.

Some posts will focus on the Elastic stack. Some will focus on the Prometheus-based approach. Most will compare the two. Because the interesting part is not which one is “better.”

It is:

  • Where each one makes things easier
  • Where each one gets in the way
  • And what actually helps people understand what they are looking at

Similar Posts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.