Articles by Tag #observability

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Observability in Action: A Google Cloud Next demo

Quick run-down of one of the interactive demos that was presented at Next 2025, from the architecture to the products and features showcased.

Learn More 12 0May 5 '25

DNS Failures in EKS? The Real Bottleneck Was AWS Network Limits

During the DNS investigation, I initially focused on CoreDNS and NodeLocal DNS metrics. The real...

Learn More 22 1Dec 18 '25

Pomerium’s OpenTelemetry Tracing Support: Deeper Observability, Made Easy

Pomerium allows you to securely access Kubernetes APIs, internal apps, databases, and more—without a...

Learn More 11 3May 1 '25

OpenTelemetry Tracing on the JVM

You may know I'm a big fan of OpenTelemetry. I recently finished developing a master class for the...

Learn More 12 0Aug 7 '25

OpenTelemetry configuration gotchas

Last week, I described several approaches to OpenTelemetry on the JVM, their requirements, and their...

Learn More 8 0Aug 14 '25

SRE in Action: Understanding How Real Teams Use SLOs, SLIs, and Error Budgets to Stay Reliable Through Case Studies - Part 1

When people talk about Site Reliability Engineering (SRE), they often share abstract principles about...

Learn More 4 0Nov 16 '25

Securing Solace Metrics: How to Use OAuth with solace-prometheus-exporter

Deep dive into securing Solace PubSub+ metrics with OAuth 2.0 and Keycloak. Learn how to protect observability endpoints, configure scopes, roles, and audience claims, and build a secure production-ready setup.

Learn More 5 0Dec 19 '25

Turn log lines into alerts (without building a whole observability stack)

Cold truth: problems always show up in logs first. The trick is turning those “uh-oh” lines into a...

Learn More 6 0Sep 13 '25

Zero-Code Observability: Using eBPF to Auto-Instrument Services with OpenTelemetry

Instrumenting services for observability often means sprinkling tracing code across hundreds of files...

Learn More 4 0Nov 7 '25

Predicting Failures in a Serverless App with AWS DevOps Guru and OpenTelemetry

Limitation of the Traditional Monitoring The management of modern distributed applications...

Learn More 2 0Oct 25 '25

Composite SLOs for Serverless Event-Driven Systems

Measuring What Users Experience Across API Gateway -> Lambda -> DynamoDB -> EventBridge ...

Learn More 2 0Jan 5

Day 2 | 🎅 He knows if you have been bad or good... But what if he gets it wrong?

When Santa's AI misjudges Emma and puts her on the Naughty List, traditional observability can't help. Find out why AI agents need three layers of observability.

Learn More 1 0Dec 9 '25

Prometheus Architecture

A detailed overview of the fundamental components of Prometheus architecture.

Learn More 2 0Dec 19 '25

Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions

Learn how to build SLOs that actually drive decisions by starting with business impact (the roots), connecting to solid telemetry (the trunk), and ending with actionable targets (the leaves) that influence roadmaps and guide engineering choices.

Learn More 1 0Dec 17 '25

🔭 Observability Practices: The 3 Pillars with a Node.js + OpenTelemetry Example

🚀 Demystifying Observability: A Practical Guide with Node.js, OpenTelemetry, Prometheus, and...

Learn More 1 0Dec 1 '25

OpenTelemetry for Go: Measuring the Overhead

OpenTelemetry for Go: Measuring the Overhead Everything comes at a cost — and...

Learn More 0 0Dec 10 '25

Azure APIM MCP Audit Logging Without Breaking Everything

Audit logging, distributed tracing, and monitoring for Azure APIM MCP servers.

Learn More 0 0Dec 3 '25

A Practical Introduction to AsyncLocalStorage in Node.js (With Real Use Cases)

AsyncLocalStorage (ALS) is a powerful but often misunderstood feature in Node.js. At its core, ALS...

Learn More 0 0Dec 12 '25

Observability- My New Experience and Beyond

From AI/ML Background... In this article, I’m trying to jot down my journey, moving from...

Learn More 0 0Nov 25 '25

Two KubeCons, One Conference: While Everyone Demos AI Agents, Engineers Are Fighting With Syslogs

KubeCon North America 2025 was actually two different events happening simultaneously in the same...

Learn More 0 0Nov 18 '25

How to Give AI Agents Access to Runtime Traces

Debugging Locally with Execution-Aware AI (Using Runtime Traces) Who is it for? This post is for...

Learn More 0 0Dec 30 '25

Why your support team can't use Grafana

I've been building B2B SaaS for 10 years, and there's one loop I've never escaped: Support gets a...

Learn More 0 0Dec 9 '25

The Unofficial Guide to Contributing to OpenTelemetry — where to look and who to talk to!

OpenTelemetry provides the tools and standards to collect metrics, logs, and traces from applications...

Learn More 0 0Dec 17 '25

The Case of the Zombie Transaction: Solving 'Unknown Unknowns' with OpenTelemetry & High Cardinality

Monitoring tells you the server is slow. Observability tells you WHY user #4094 failed to checkout. Let's debug a real-world payment distributed trace using Python and OpenTelemetry.

Learn More 0 3Nov 30 '25

While We're Measuring Developer Productivity, Won't Someone Think of the Data Engineers?

Nicole Forsgren just dropped a new book, and I absolutely CONSUMED it. It's called Frictionless:...

Learn More 0 0Nov 25 '25

Your Audit Logs Are Lying to You: 6 Properties That Make Logs Actually Verifiable

The Uncomfortable Truth About Your Audit Logs You've implemented logging. You have...

Learn More 0 0Dec 22 '25

Observability isn’t about the tool. It’s about the truth

An enterprise client reports latency. Your dashboards say everything is fine. They blame you. You...

Learn More 0 0Oct 13 '25

Your Observability Stack Is Optimized for the Wrong Thing

TL;DR: Modern observability tools—Prometheus, Jaeger, the ELK stack—excel at collecting signals...

Learn More 0 0Dec 15 '25

All I Want for Christmas is Observable Multi-Modal Agentic Systems

How Session Replay + Online Evals Revealed How My Holiday Pet App Actually Works Original...

Learn More 0 0Dec 17 '25

Centralized EKS monitoring across multiple AWS accounts

Complex systems require extensive monitoring and observability. Systems as complex as Kubernetes...

Learn More 0 0Nov 20 '25