Articles by Tag #sitereliabilityengineering

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Key Concepts in SRE: SLA, SLO, and SLI Explained

Introduction to SLA, SLO, and SLI in Site Reliability Engineering (SRE) In a world driven...

Learn More 14 0Jan 5

Designing a fault-tolerant etcd cluster on AWS

Introduction In this article, we are going to discuss a strongly consistent, distributed...

Learn More 8 1Nov 4 '24

Web Accessibility: A business imperative, not just a technical detail

Web Accessibility: A business imperative, not just a technical detail Accessibility isn’t a...

Learn More 2 2Aug 17

Platform Engineering vs Site reliability Engineering (SRE)

Platform Engineering vs SRE: Detailed Comparison Overview of Platform Engineering...

Learn More 1 0May 27

DevOps vs SRE: Detailed Comparison

Overview of DevOps and SRE DevOps: A cultural and technical philosophy that bridges...

Learn More 1 0May 27

Insider Realities of Site Reliability Engineering: Lessons from a DevRel Perspective

Hey everyone, I’m Rohan from Zenduty. I’m not an SRE myself, but I get to work with some of the...

Learn More 1 0Feb 14

Building a Kubernetes Cluster on Bare Metal: Insights, Challenges, and a Complete Setup Guide

Spent last weekend diving deep into Kubernetes—this time, on my own bare metal server! 🚀 After...

Learn More 0 0Apr 21

Integrating OpenShift CoreDNS with Active Directory DNS

Integrating OpenShift CoreDNS with Active Directory (AD) enables your OpenShift cluster to resolve...

Learn More 0 0Jan 9

Distributed Systems | What can we learn from Roblox's 3-day outage?

In October 2021, Roblox suffered the longest outage in its history—73 hours of complete downtime,...

Learn More 0 0Jun 1

Monitoring Celery Workers with Flower: Your Tasks Need Babysitting

So you've got Celery workers happily executing tasks in your Kubernetes cluster, but you're flying...

Learn More 0 0Jul 1

[Boost]

Monitoring Celery Workers with Flower: Your Tasks Need...

Learn More 0 0Jul 1

Understanding the Site Reliability Engineering Career Path

Site Reliability Engineering (SRE) has evolved into one of the most critical roles in modern tech...

Learn More 0 0Jul 9

Top 6 Reasons Why You Need a Status Page Aggregator

Introduction Your business depends on the reliability of the third-party services you use....

Learn More 0 0Apr 6

What I Wish I Knew Before Becoming a Site Reliability Engineer

When I transitioned into Site Reliability Engineering (SRE), I wasn't prepared for the challenges...

Learn More 0 0May 12

Too Many Handshakes: A Tale of Corporate Chaos

Why are there so many handshakes involved in getting anything done at work? You'd think changing the...

Learn More 0 0Jul 14

The Ultimate List of Incident Management Tools in 2024

Introduction Incident management tools are important for organizations to effectively...

Learn More 0 0Oct 27 '24

From Rejection to Redemption: How I Landed My SRE Role

Countless applications. Hundreds of rejection emails. Failed interviews. It was easily one of the...

Learn More 0 0Jun 23

Understanding PostgreSQL Isolation Levels

In PostgreSQL, transaction isolation levels determine how transactions are isolated from one...

Learn More 0 0Jan 7

Guide: How to build an AI Agent for SRE Teams

Background The engineering team at Aptible AI has spent the last ~4 months building an AI...

Learn More 0 1Nov 7 '24

DevOps ShmevOps

Lessons from Software Engineering in Multi-Tenant Infrastructure Why ITIL means 'idle',...

Learn More 0 0Apr 14