Inside My Observability Stacks (and Why I Built them this Way)

At some point, reading about observability stops being enough.

You actually have to build something.

In my case, that turned into building not one, but two different stacks that I’m running side by side. One built around Elastic, the other taking a different approach with Prometheus. The goal is not to pick a winner. It is to understand how they behave in the real world.

Which has been… enlightening.

It has also confirmed something I probably should have expected.

Everything works great until you actually try to use it.

Why I Built My Own Stack

There are a lot of solid observability platforms out there.

I have worked with several directly over the years, and I have been part of more than a few “bake-offs” where you line up tools side by side and try to figure out which one actually meets your needs.

Running two stacks at the same time is basically a long-running bake-off I control.

Same environment + same data sources + same questionable decisions = different implementations.

Most tools can:

Collect data
Store data
Show dashboards

That part is not the issue anymore.

The question I care about is:

“What happens when someone actually tries to use this to solve a problem?”

What I’m Running (at a High Level)

For this post, I’ll focus on the Elastic-based stack, since it’s the more traditional “everything in one ecosystem” approach.

At a high level:

Elasticsearch for storage and indexing
Kibana for dashboards and exploration
Logstash as the central ingestion, parsing, and enrichment pipeline
Filebeat, Metricbeat, and Heartbeat for collecting logs, metrics, and uptime data
Goflow2 for network flow visibility
Unpoller feeding UniFi metrics into the system
A small scheduler container keeping enrichment data current

All of it is running in Docker, tied together with a compose file that has steadily grown as I kept adding “just one more thing.”

The other stack takes a different approach, which I will dig into separately.

Running both has been useful in ways I did not fully expect.

Why I’m Running Two Stacks

This is not just about building something. It is about understanding the tradeoffs.

By running both stacks in parallel, I get to:

Send the same data through completely different pipelines
See how each system handles ingestion, enrichment, and storage
Compare how easy it is to actually answer questions
Notice where one approach feels more natural than the other

It is the difference between:

Reading documentation
Watching a demo
And actually living with the system day to day

Turns out, those are three very different experiences.

Why Logstash Sits in the Middle

In the Elastic stack, Logstash is doing a lot of heavy lifting.

Everything flows through it.

Beats send logs and metrics into it
Syslog feeds into it
Flow data eventually ends up there
Enrichment happens there before anything is indexed

That adds complexity, but it also gives control.

Because once the data is in Logstash, you can:

Normalize formats
Enrich events
Translate IPs into meaningful device names
Add GeoIP and ASN data
Route things cleanly into indices

And this is where things start to matter.

Because the decisions you make here directly impact how understandable the data is later.

Collecting Data Is Still the Easy Part

With this setup, data shows up fast.

Filebeat pulls logs.
Metricbeat handles system and SNMP metrics.
Heartbeat tracks uptime.
Goflow2 streams network flows.
Unpoller fills in UniFi-specific details.

You turn it on, give it a little time, and suddenly you have dashboards full of data.

It looks solid.

It feels like progress.

Which is usually where people stop. This is also where I stopped for one day and came back a week later.

Which is also where the comparison gets interesting.

Where the Comparison Starts to Matter

When you run two stacks side by side, you stop asking, “Does this work?”

You start asking:

“Which one helps me understand what is happening faster?”

That is a very different question.

Because now you are comparing things like:

How easy is it to follow a problem across systems?
How much cleanup or enrichment was required?
Whether the dashboards help or just look nice?
How much context do you have to carry in your head?

This is where the differences show up. And they show up quickly.

Making It Usable Is the Hard Part

No matter which stack I am looking at, I keep running into the same moment.

I open a dashboard and try to answer a simple question.

“What just broke?”

And I find myself:

Jumping between views
Filtering different data sets
Trying to reconstruct context
Wondering why I named something the way I did

I have done this on both stacks. Which is both reassuring and slightly concerning. Reassuring because it is not just one tool. Concerning because it means the problem is bigger than the tool.

Why I Built It This Way

Running both stacks is intentional. I wanted to feel where each approach works well and where it breaks down.

If I can:

Ingest the same data
Enrich it
Visualize it in different systems

…and still struggle to quickly understand what is happening…

Then I am looking at the same problem most teams deal with.

Just without the pressure of an outage.

Which is a much better place to learn.

The Glue Still Matters More Than the Tools

If anything, running two stacks reinforced this idea. The tools matter. The connections between them matter more.

How data flows
Where enrichment happens
How things are named
How dashboards are structured

Those decisions determine whether the system is usable. You can swap out components and still end up with the same problem if the design does not prioritize understanding. I have managed to do that in two different ways now. Which feels like an accomplishment, just not the kind you put on a resume.

What This Actually Taught Me

Running both stacks side by side made a few things very obvious.

Collecting data is not the bottleneck
Every tool looks good until you try to answer a real question
Ease of understanding is the real differentiator

And maybe the most useful one:

If you are not designing for your users, you are designing for yourself.

And those are not the same thing.

Where This Is Going

In the next few posts, I am going to start breaking pieces of this apart.

Some posts will focus on the Elastic stack. Some will focus on the Prometheus-based approach. Most will compare the two. Because the interesting part is not which one is “better.”

It is:

Where each one makes things easier
Where each one gets in the way
And what actually helps people understand what they are looking at

Inside My Observability Stacks (and Why I Built them this Way)

Why I Built My Own Stack

What I’m Running (at a High Level)

Why I’m Running Two Stacks

Why Logstash Sits in the Middle

Collecting Data Is Still the Easy Part

Where the Comparison Starts to Matter

Making It Usable Is the Hard Part

Why I Built It This Way

The Glue Still Matters More Than the Tools

What This Actually Taught Me

Where This Is Going

Like this:

Docker Containers in my Home Lab

Like this:

Streamlining Docker Volume Backups: A PowerShell Approach

Like this:

Observability isn’t about Metrics. It’s about Understanding

Like this:

Systems, Stories, and Signals

Like this:

All Things Backup: Pi-hole (Teleporter)

Like this:

Why “Predictive Observability” Is a Misnomer: The Human Factor Can’t Be Automated

Like this:

Leave a ReplyCancel reply

Why I Built My Own Stack

What I’m Running (at a High Level)

Why I’m Running Two Stacks

Why Logstash Sits in the Middle

Collecting Data Is Still the Easy Part

Where the Comparison Starts to Matter

Making It Usable Is the Hard Part

Why I Built It This Way

The Glue Still Matters More Than the Tools

What This Actually Taught Me

Where This Is Going

$hare this:

Like this:

Similar Posts

$hare this:

Like this:

$hare this:

Like this:

$hare this:

Like this:

$hare this:

Like this:

$hare this:

Like this:

$hare this:

Like this:

Leave a ReplyCancel reply