So how about if you were allowed to use two numbers and not just one to represent a set of data? What would the second number be?

- We consider 3 different measures of
**spread**:- Range;
- Interquartile Range;
- Standard Deviation.

**Range**

Range = Largest value – Smallest value

Simplest measure of spread

Ignores the pattern of the spread (e.g. 1,9,9,9,10 and 1,1,1,1,10 have the same range)

**Interquartile Range**

IQR = Q_{3}-Q_{1}, where Q_{1} is the lower quartile and Q_{3} is the upper quartile

Quartiles are the ‘quarter way’ values (like the median is the ‘halfway’ value)

Finding quartiles:

If there are an even number of data values, Q1 is the median of the lower half of the data and Q3 is the median of the upper half of the data;

If there are an odd number of data values, remove the median and take the median of the lower half and proceed as above.

**Worked Examples**

**Finding Interquartile Range from tables or graphs**

For frequency tables, apply the same process as with the median, measuring the cumulative frequency to find the class, and using interpolation based on the proportion through the class if it is necessary to estimate a specific value;

For cumulative frequency graphs, process is easy. Draw a line from the quarter way points on the y-axis and read off the values on the x-axis.

**Five Number Summaries and Box & Whisker Plots**

These are quick and useful ways to summarise the 5 most important numbers that represent a data set, i.e. the minimum value, the lower quartile, the median, the upper quartile and the maximum value

They are also very useful for quickly comparing 2 datasets.

**Worked Example – Draw a Box Plot for the following data**

**Identifying “skew” from Box Plots**

**Exercise**

**Answers**

**Variance & Standard Deviation**

The formula on the left is easiest to understand, but the formula on the right is easiest to calculate, so we will generally be using that.

Deriving the formula on the right from the one on the left is off syllabus, but is a good home exercise for those who are interested.

Note that the values need to be squared to stop them from all cancelling out and giving always a variance of zero.

This is why we use the square root of variance as our measure, so that the units are the same as the individual data values.

The standard deviation is an excellent measure of dispersion, but can be significantly affected by outliers.

**Worked Example**s

**Example 1:** Six masses were weighed as 4, 6, 6, 7, 9 and 10 kg. Find the mean, variance and standard deviation of these weights.

**Example 2:**

**Variance and Standard Deviation from a Frequency Table**

**Worked Examples**

**Example 1: **Find the standard deviation of the values of *x *given in the following table, correct to 3 significant figures.

x | f |

12 | 13 |

14 | 28 |

16 | 10 |

**Example 2 (Grouped): **Calculate an estimate of the standard deviation of the heights of the 20 children given in the following table:

Height (metres) | 1.2- | 1.4- | 1.5-1.7 |

Number of children (f) | 2 | 12 | 6 |

**Exercise**

**Answers**

**Combined Sets of Data**

We can’t find the variance of two datasets simply by adding the variances together and dividing them by 2.

As when we were calculating the mean of two datasets, we need to multiply up to find the other values in the formula first and then use these on the combined dataset.

**Worked Example**

**Worked Solution**

**Worked Example 2**

The heights, *x *cm, of 10 boys are summarised by Σ*x *= 1650 and Σ*x*^{2} = 275 490.

The heights, *y*cm, of 15 girls are summarised by Σ*y *= 2370 and Σ*y*^{2} = 377835.

Calculate, to 3 significant figures, the standard deviation of the heights of all 25 children together.

**NOTE**

**Worked Example 3**

In an examination, the percentage marks of the 120 boys are denoted by *x*, and the percentage marks of the 80 girls are denoted by *y*.

The marks are summarised by the totals Σ*x *= 7020, Σ*x*^{2} = 424 320 and Σ*y*^{2} = 352 130. Calculate the girls’ mean mark, given that the standard deviation for all these students is 10.

**Exercise**

**Solutions**

Answer 1b is wrong, it should be 10.5

**Coded Data**

Adding a constant to all values in the data set translates the dataset, but leaves the variance unchanged, so: var(x)=var(x-b) (whereas ).

You can use the formulae shown below, but again, in practice we solve problems by multiplying up to give the ∑x and ∑x^{2} values that we require to use with the combined dataset.

**Formulae**

**Worked Example**

**Worked Solution**

**Worked Example 2**

**Worked Solution 2**

**Worked Example 3**

Eight values of *x *are summarised by the totals Σ(*x *− 10)^{2} = 1490 and Σ(*x *− 10) = 100. Twelve values of *y *are summarised by the totals Σ( *y *+ 5)^{2} = 5139 and Σ( *y *+ 5) = 234. Find the variance of the 20 values of *x *and *y *together.

**Worked Example 4**

It is known that 20 girls each have at least one brother. The number of brothers that they have is denoted by *x*. Information about the values of *x *– 1 is given in the following table.

x-1 | 0 | 1 | 2 | 3 | 4 |

Number of girls (f) | 2 | 4 | 8 | 5 | 1 |

Use the coded values to calculate the standard deviation of the number of brothers, to 3 decimal places.

**Exercise**

**Mixed Exercise**

**Solutions to Exercise**

**Solutions to Mixed Exercise**