What averages do you remember?
We have encountered three average measures: mean (the arithmetic average), median (the middle value) and mode (the most frequent value).
But why do we have so many averages?!
Couldn't we just calculate the mean and call that the main average?
Well, which one is most useful depends on the actual scenario!
Let's say you are applying for a job and the salary for the position you are applying for is not listed
But you did your research and found that in last year's company report, the mean salary was £51,000 and the median was £31,000.
Which one will give you a better idea of what you will actually earn?
Let's take some data that could have the mean of £51 k and median of £31 k to think through which one is better:
£29,000, £30,000, £30,700, £30,800, £31,000, £31,200, £31,500, £32,000, £200,000
We can see that the vast majority of the salaries are between £30-32 k and then there is one very large salary - maybe that of the CEO!
This one big salary skews the mean to £51 k even though most employees are earning around the median, i.e. £31 k!
Which one do you think you would be most likely to earn?
That's right, the median! Nobody in the company is even earning the mean salary £51 k - it's all skewed by the CEO!
So unless you were to be employed at the very top of the company, you will earn much nearer to the median salary.
Now, what about mode?
If we look back at the salaries, we can see that there is no mode as all the values appear exactly once, i.e. there is no most frequent value.
But even if there were, would we really care that, let's say 2 people earn £29 k?
It wouldn't be particularly useful in this scenario either!
We care about mode when the most popular choice has some significance, e.g. which political party was chosen most frequently or how many products customers bought at a time.
To summarise:
Mean is the arithmetic average found by adding all the values together and dividing that sum by how many values there are.
This means that each value is treated with the 'same importance' so it can be heavily skewed by any outliers.
If there are no outliers, then it is definitely a handy average to consider!
On the other hand, median is the middle value.
So it is particularly useful when we would like to know what the middle part of the data is roughly around.
If we are given both the mean and the median, we can conclude whether there were likely any significant outliers.
Finally, mode is the most frequently occurring value.
It is useful to consider when we care about the most popular choice.
It is also the only average we can find when we have qualitative (i.e. non-number) data, so that is when we typically use it.
Does that all make sense?
Let's test this out in some questions.