Understanding Measures of Central Tendency in Data Science

When you think of "mean", "median" or "mode", chances are your brain flashes back to a math class you didn’t think you'd ever use again. 😅

But here I am ,knee-deep in datasets, and those three little words keep showing up. Not just as formulas, but as powerful tools that help tell the story behind the numbers.

This post is part of my continued journey into data science. After exploring tools like Excel,power Bi I started digging into core concepts - and measures of central tendency are some of the first I’ve truly appreciated in the real world.

Let’s break it down in plain English 👇

What Are Measures of Central Tendency? 🤔

Measures of central tendency help us understand the “center” or “typical” value in a dataset. Basically, they summarize what’s "normal" in your data, and that's a huge help when you’re making sense of hundreds (or millions) of numbers.

The three most common ones are:

Mean - the average value
Median - the middle value
Mode - the most frequently occurring value

They each tell you something slightly different, and choosing the right one depends on the situation.

Why Do They Matter in Data Science? 🎯

When you're working with data, you're usually trying to:

Understand trends
Compare groups
Make decisions
Build predictive models

Measures of central tendency give you a quick pulse check on your dataset. For example:

If you’re analyzing income data, the median might be better than the mean because of outliers (like billionaires).
If you're reviewing customer ratings from 1 to 5 stars, the mode could show you the most common sentiment.
If your data is pretty clean and normally distributed, the mean gives a solid summary.

Real-World Examples 🔍

Here are a few situations where these measures pop up:

📈 Business Reporting
Companies use the mean to summarize average sales, costs or customer satisfaction scores over time.
🏥 Healthcare
Hospitals might use the median to report wait times, since a few extreme cases can skew the average.
🛍️ Retail and Marketing
The mode helps track the most popular product sizes, colors or price points.

A Quick Python Example 🐍

If you’ve got a list of numbers, you can calculate all three super easily:

import statistics

data = [1, 2, 2, 3, 4, 4, 4, 5, 6]

mean = statistics.mean(data)     # 3.44
median = statistics.median(data) # 4
mode = statistics.mode(data)     # 4

print(mean, median, mode)

These tiny lines of code can give you a huge amount of insight.

My Reflection 💭

At first, I thought central tendency was just for passing stats exams. Now, I see it as one of the first things you should check when exploring a new dataset. It gives you a quick overview, helps spot data issues and sets the stage for deeper analysis or modeling.

Plus, it’s foundational. Whether you're in Excel, Python or SQL, you'll use these concepts everywhere.

If you're just getting started in data science like I am, don't overlook the basics. They’re called “central” for a reason. 😉

Naomi Jepkorir @datawithnaomi