Mean, Median, Mode
Term | What It Means | Why We Need It | Real Example | Why Not Another |
---|---|---|---|---|
Mean | Average value | To find the center of data | Average salary in a company | If a few people earn huge salaries (outliers), mean gets unfairly pulled! |
Median | Middle value when data is ordered | To find a fair "typical" middle when data has outliers | Median house price in a city | Median is better when data is uneven (e.g., few very expensive houses) |
Mode | Most common value | To find the most frequent happening | Most sold shoe size in a store | Mode needed when "most common" is more important than "average" |
✅ In short:
Mean = good for normal data.
Median = better when data is skewed (unfair).
Mode = for the most popular thing.
Range, Variance, Standard Deviation
Term | What It Means | Why We Need It | Real Example | Why Not Another |
---|---|---|---|---|
Range | Max - Min value | Quick idea of spread | Highest and lowest test marks in a class | Only shows extremes, not how all values behave |
Variance | Average of squared differences from mean | Measure how spread out all data is | Variance of stock prices | Hard to interpret because it's squared units |
Standard Deviation | Square root of variance | Easy-to-read spread measure (same unit as data) | How much daily temperatures change from average | Easier to explain to non-math people |
✅ In short:
- Range is rough and fast.
- Variance is deep but harder to read.
- Standard Deviation is what we usually tell people.
Correlation and Covariance
Term | What It Means | Why We Need It | Real Example | Why Not Another |
---|---|---|---|---|
Correlation | Strength and direction of relationship between two variables (scale: -1 to 1) | Find if two things move together | Height vs Weight; Study hours vs Exam score | Shows strength and direction both |
Covariance | Direction of relationship, but scale is not fixed | Early step before correlation | Stock A and Stock B moving up/down together | Hard to compare across different datasets |
✅ In short:
- Use Correlation when you want to see if two things are linked.
- Covariance is more technical, and only used internally.
Probability
What is Probability?
Probability = chance of something happening.
Example:
- Tossing a coin: Chance of Heads = 0.5
- Drawing a red card: Chance depends on deck.
Important Probability Concepts
Concept | What It Means | Why We Need It | Real Example | Why Not Another |
---|---|---|---|---|
Independent Events | Events not affecting each other | To model truly random things | Tossing 2 coins separately | Can't assume dependence when there is none |
Dependent Events | Events affect each other | To model real-world connections | Drawing 2 cards without replacement | If you assume independence wrongly, you get wrong results |
Bayes Theorem | Updating probability when new information arrives | To make smarter decisions after getting new data | Medical test accuracy given patient history | Without Bayes, decision making will be "blind" to new facts |
✅ In short:
- In real life, events are often dependent → like disease tests, fraud detection.
- Bayes theorem makes models smarter when new evidence arrives.
Real-World Data Science Examples
Use Case | Math Concept Used | Why |
---|---|---|
Predicting customer churn (customer leaving) | Probability + Statistics | Need to model uncertainty |
Stock price prediction | Mean, Variance, Standard Deviation | Understand volatility |
Health risk prediction (Diabetes, Cancer) | Bayes Theorem + Statistics | Update risk with patient data |
Recommendation systems (YouTube, Netflix) | Mode, Correlation | Find popular/common patterns |