Core Math Skills for Data Science: Statistics & Probability
Abhishek Peiris

Abhishek Peiris @abhishek_peiris_

About: Software Engineer | Innovator

Joined:
Dec 1, 2024

Core Math Skills for Data Science: Statistics & Probability

Publish Date: May 9
0 0

Mean, Median, Mode

Term What It Means Why We Need It Real Example Why Not Another
Mean Average value To find the center of data Average salary in a company If a few people earn huge salaries (outliers), mean gets unfairly pulled!
Median Middle value when data is ordered To find a fair "typical" middle when data has outliers Median house price in a city Median is better when data is uneven (e.g., few very expensive houses)
Mode Most common value To find the most frequent happening Most sold shoe size in a store Mode needed when "most common" is more important than "average"

✅ In short:

  • Mean = good for normal data.

  • Median = better when data is skewed (unfair).

  • Mode = for the most popular thing.

Range, Variance, Standard Deviation

Term What It Means Why We Need It Real Example Why Not Another
Range Max - Min value Quick idea of spread Highest and lowest test marks in a class Only shows extremes, not how all values behave
Variance Average of squared differences from mean Measure how spread out all data is Variance of stock prices Hard to interpret because it's squared units
Standard Deviation Square root of variance Easy-to-read spread measure (same unit as data) How much daily temperatures change from average Easier to explain to non-math people

✅ In short:

  • Range is rough and fast.
  • Variance is deep but harder to read.
  • Standard Deviation is what we usually tell people.

Correlation and Covariance

Term What It Means Why We Need It Real Example Why Not Another
Correlation Strength and direction of relationship between two variables (scale: -1 to 1) Find if two things move together Height vs Weight; Study hours vs Exam score Shows strength and direction both
Covariance Direction of relationship, but scale is not fixed Early step before correlation Stock A and Stock B moving up/down together Hard to compare across different datasets

✅ In short:

  • Use Correlation when you want to see if two things are linked.
  • Covariance is more technical, and only used internally.

Probability

What is Probability?
Probability = chance of something happening.

Example:

  • Tossing a coin: Chance of Heads = 0.5
  • Drawing a red card: Chance depends on deck.

Important Probability Concepts

Concept What It Means Why We Need It Real Example Why Not Another
Independent Events Events not affecting each other To model truly random things Tossing 2 coins separately Can't assume dependence when there is none
Dependent Events Events affect each other To model real-world connections Drawing 2 cards without replacement If you assume independence wrongly, you get wrong results
Bayes Theorem Updating probability when new information arrives To make smarter decisions after getting new data Medical test accuracy given patient history Without Bayes, decision making will be "blind" to new facts

✅ In short:

  • In real life, events are often dependent → like disease tests, fraud detection.
  • Bayes theorem makes models smarter when new evidence arrives.

Real-World Data Science Examples

Use Case Math Concept Used Why
Predicting customer churn (customer leaving) Probability + Statistics Need to model uncertainty
Stock price prediction Mean, Variance, Standard Deviation Understand volatility
Health risk prediction (Diabetes, Cancer) Bayes Theorem + Statistics Update risk with patient data
Recommendation systems (YouTube, Netflix) Mode, Correlation Find popular/common patterns

Comments 0 total

    Add comment