Statistics, as a whole, is one of the subjects I enjoy most as a data scientist. In this article, we explore measures of central tendency which are part of the fundamentals of statistics and get to understand how they are used.
Measures of central tendency are values that are used to summarize data in order to understand how the data is distributed. They include; mean, median and mode
i) Mean
It is the average value of a given data set and is obtained by adding all the values of the data and dividing the result by the number of values in the data set.
The mean is used when you want to see where the average value of the data set lies which helps you understand the nature of the distribution. It is also used to fill in missing values in data set where the distribution is a symmetric and has no outliers.
Calculating mean using python library numpy:
import numpy as np
num = [2,2,3,4,8,5]
mean = np.mean(num)
print(mean)
ii) Median
This is the midpoint in the data set; data is arranged in ascending or descending order and the middle value is obtained. In a symmetric distribution, the median value usually equals or is close to the mean value. For the median, it is used to fill missing values in a data set where the data has no outliers.
Calculating median using python library numpy:
import numpy as np
num = [2,2,3,4,8,5]
median = np.median(num)
print(median)
iii)Mode
It refers to the most repeated value in a data set. The mode is also used to replace missing values depending on how many times it appears and the nature of the distribution.
Calculating mode sing python library statistics:
import statistics
num = [2,2,3,4,8,5]
mode = statistics.mode(num)
print(mode)
In conclusion, the measures of central tendency are fundamental when exploring your data and can tell you so much about it. I hope this article has helped to shed some light on your understanding of the measures of central tendency and their importance!