The standard deviation is the average distance between each value and the mean. This value tells you if the data is clustered around the mean or scattered, and therefore is a key value to assess if the reliability of mean as a precise/vague representation of the entire sample. Standard deviation can also be used to compare different samples which, although they have similar means, actually have values which may be clustered/dispersed in different ways.
The standard deviation is calculated using the formula above where:
x(i) is each value in the sample
x is the mean
n is the number of values in the sample
The standard deviation is calculated using the formula above where:
x(i) is each value in the sample
x is the mean
n is the number of values in the sample

The significance of the standard deviation is assessed by comparing it to the mean:
- Low SD = the values are tightly clustered (the distribution curve is steep) and the mean value is a reliable representation of the entire sample
- High HD = the values are scattered apart (the distribution curve is relatively flat) and the mean value is NOT a reliable representation of the entire sample

In a normally distributed sample (bell-curve):
- 68% of all individuals lie within +/- 1 standard deviation of the mean
- 95% of all individuals lie within +/- 2 standard deviations of the mean
- 99% of all individuals lie within +/- 3 standard deviations of the mean
Example using the ages of a sample of men and women:
Ages Sampled | Mean | Standard Deviation | |
---|---|---|---|
39,45, 54, 66, 66, 66, 74 |
|||
57,69, 70, 72, 75, 82, 83 |
Conclusion:
- In both cases, the SD is very high (= approx. half the value of the mean), which means that both samples are very scattered apart from their mean values. In both cases, the mean values are therefore not a very reliable representation of the sample.
- If you pick a random man from the sample, there is a 68% probability (see diagram above) that he will lie within 1 standard deviation of the mean, which means that his age will be = Mean +/- 1 SD = 41.7 +/- 18.98. Therefore, there is a 68% probability his age will be at least 22.7 years old but no more than 60.7 years old
- If you pick a random woman from the sample, there is a 95% probability (see diagram above) that she will lie within 2 standard deviations of the mean, which means that her age will be= Mean +/- 2 SD = 51.6 +/- (23.74x2). Therefore, there is a 68% probability her age will be at least 4.1 years old but no more than 99.0 years old