So how about if you were allowed to use two numbers and not just one to represent a set of data? What would the second number be?
- We consider 3 different measures of spread:
- Interquartile Range;
- Standard Deviation.
Range = Largest value – Smallest value
Simplest measure of spread
Ignores the pattern of the spread (e.g. 1,9,9,9,10 and 1,1,1,1,10 have the same range)
IQR = Q3-Q1, where Q1 is the lower quartile and Q3 is the upper quartile
Quartiles are the ‘quarter way’ values (like the median is the ‘halfway’ value)
If there are an even number of data values, Q1 is the median of the lower half of the data and Q3 is the median of the upper half of the data;
If there are an odd number of data values, remove the median and take the median of the lower half and proceed as above.
Finding Interquartile Range from tables or graphs
For frequency tables, apply the same process as with the median, measuring the cumulative frequency to find the class, and using interpolation based on the proportion through the class if it is necessary to estimate a specific value;
For cumulative frequency graphs, process is easy. Draw a line from the quarter way points on the y-axis and read off the values on the x-axis.
Five Number Summaries and Box & Whisker Plots
These are quick and useful ways to summarise the 5 most important numbers that represent a data set, i.e. the minimum value, the lower quartile, the median, the upper quartile and the maximum value
They are also very useful for quickly comparing 2 datasets.
Worked Example – Draw a Box Plot for the following data
Identifying “skew” from Box Plots
Variance & Standard Deviation
The formula on the left is easiest to understand, but the formula on the right is easiest to calculate, so we will generally be using that.
Deriving the formula on the right from the one on the left is off syllabus, but is a good home exercise for those who are interested.
Note that the values need to be squared to stop them from all cancelling out and giving always a variance of zero.
This is why we use the square root of variance as our measure, so that the units are the same as the individual data values.
The standard deviation is an excellent measure of dispersion, but can be significantly affected by outliers.
Example 1: Six masses were weighed as 4, 6, 6, 7, 9 and 10 kg. Find the mean, variance and standard deviation of these weights.
Variance and Standard Deviation from a Frequency Table
Example 1: Find the standard deviation of the values of x given in the following table, correct to 3 significant figures.
Example 2 (Grouped): Calculate an estimate of the standard deviation of the heights of the 20 children given in the following table:
|Number of children (f)||2||12||6|
Combined Sets of Data
We can’t find the variance of two datasets simply by adding the variances together and dividing them by 2.
As when we were calculating the mean of two datasets, we need to multiply up to find the other values in the formula first and then use these on the combined dataset.
Worked Example 2
The heights, x cm, of 10 boys are summarised by Σx = 1650 and Σx2 = 275 490.
The heights, ycm, of 15 girls are summarised by Σy = 2370 and Σy2 = 377835.
Calculate, to 3 significant figures, the standard deviation of the heights of all 25 children together.
Worked Example 3
In an examination, the percentage marks of the 120 boys are denoted by x, and the percentage marks of the 80 girls are denoted by y.
The marks are summarised by the totals Σx = 7020, Σx2 = 424 320 and Σy2 = 352 130. Calculate the girls’ mean mark, given that the standard deviation for all these students is 10.
Answer 1b is wrong, it should be 10.5
Adding a constant to all values in the data set translates the dataset, but leaves the variance unchanged, so: var(x)=var(x-b) (whereas ).
You can use the formulae shown below, but again, in practice we solve problems by multiplying up to give the ∑x and ∑x2 values that we require to use with the combined dataset.
Worked Example 2
Worked Solution 2
Worked Example 3
Eight values of x are summarised by the totals Σ(x − 10)2 = 1490 and Σ(x − 10) = 100. Twelve values of y are summarised by the totals Σ( y + 5)2 = 5139 and Σ( y + 5) = 234. Find the variance of the 20 values of x and y together.
Worked Example 4
It is known that 20 girls each have at least one brother. The number of brothers that they have is denoted by x. Information about the values of x – 1 is given in the following table.
|Number of girls (f)||2||4||8||5||1|
Use the coded values to calculate the standard deviation of the number of brothers, to 3 decimal places.
Solutions to Exercise
Solutions to Mixed Exercise