Articles by Tag #sitereliabilityengineering

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Key Concepts in SRE: SLA, SLO, and SLI Explained

Introduction to SLA, SLO, and SLI in Site Reliability Engineering (SRE) In a world driven...

Learn More 14 0Jan 5

Designing a fault-tolerant etcd cluster on AWS

Introduction In this article, we are going to discuss a strongly consistent, distributed...

Learn More 8 1Nov 4 '24

Embrace simple tech stacks and code generation in DevOps and data engineering

DevOps, data engineering, and other platform engineering teams must recognize that the choices they...

Learn More 2 0Jul 8 '24

Insider Realities of Site Reliability Engineering: Lessons from a DevRel Perspective

Hey everyone, I’m Rohan from Zenduty. I’m not an SRE myself, but I get to work with some of the...

Learn More 1 0Feb 14

Platform Engineering vs Site reliability Engineering (SRE)

Platform Engineering vs SRE: Detailed Comparison Overview of Platform Engineering...

Learn More 1 0May 27

DevOps vs SRE: Detailed Comparison

Overview of DevOps and SRE DevOps: A cultural and technical philosophy that bridges...

Learn More 1 0May 27

The Ultimate List of Incident Management Tools in 2024

Introduction Incident management tools are important for organizations to effectively...

Learn More 0 0Oct 27 '24

From Rejection to Redemption: How I Landed My SRE Role

Countless applications. Hundreds of rejection emails. Failed interviews. It was easily one of the...

Learn More 0 0Jun 23

Integrating OpenShift CoreDNS with Active Directory DNS

Integrating OpenShift CoreDNS with Active Directory (AD) enables your OpenShift cluster to resolve...

Learn More 0 0Jan 9

Guide: How to build an AI Agent for SRE Teams

Background The engineering team at Aptible AI has spent the last ~4 months building an AI...

Learn More 0 1Nov 7 '24

🪩 It's time for IDPCON!!

We are here to “learn, have fun, and make a difference.” - Dr. W. Edwards Deming, productivity...

Learn More 0 0Jul 23 '24

DevOps ShmevOps

Lessons from Software Engineering in Multi-Tenant Infrastructure Why ITIL means 'idle',...

Learn More 0 0Apr 14

Innovative Incident Management Strategies in SRE

The Critical Role of Incident Management in Site Reliability Engineering (SRE) Hey folks,...

Learn More 0 0Aug 14 '24

Building a Kubernetes Cluster on Bare Metal: Insights, Challenges, and a Complete Setup Guide

Spent last weekend diving deep into Kubernetes—this time, on my own bare metal server! 🚀 After...

Learn More 0 0Apr 21

Distributed Systems | What can we learn from Roblox's 3-day outage?

In October 2021, Roblox suffered the longest outage in its history—73 hours of complete downtime,...

Learn More 0 0Jun 1

What I Wish I Knew Before Becoming a Site Reliability Engineer

When I transitioned into Site Reliability Engineering (SRE), I wasn't prepared for the challenges...

Learn More 0 0May 12

Understanding PostgreSQL Isolation Levels

In PostgreSQL, transaction isolation levels determine how transactions are isolated from one...

Learn More 0 0Jan 7

Top 6 Reasons Why You Need a Status Page Aggregator

Introduction Your business depends on the reliability of the third-party services you use....

Learn More 0 0Apr 6