9231. Statistics. Non-Parametric Tests

So far all of our tests have been based on an assumed underlying distribution, for which we have been testing the parameter, e.g. the mean or the variance.

If we don’t have an assumed underlying distribution, we can use non-parametric tests. Typically with such tests the measure of location we are interested in is the median.

Below are summarised the types of non-parametric tests we will learn. Which we choose depends on whether we have a single sample or two samples and various conditions regarding the symmetry of the datasets and relationship between datasets. Note that all of the tests assume independent underlying data.

Type of TestTestAssumptions
Single sampleSS1. Sign testUnderlying data are continuous
Single sampleSS2. Wilcoxon signed-rank testUnderlying data are continuous
Underlying data are symmetric
Two samplesTS1. Paired sign testData are in matched pairs
Differences between matched pairs are continuous
Two samplesTS2. Wilcoxon matched-pairs signed-rank testData are in matched pairs
Differences between matched pairs are continuous
Differences between matched pairs are continuous
Two samplesTS3. Wilcoxon rank-sum testTwo samples are independent
Underlying data are symmetric

Single Sample Test 1: Sign Test

The single-sample sign test lets us identify if the median of a set of data differs from a stated median. The median is not as a a parameter, as we do not know the underlying distribution.

To perform the test, all values greater than the stated median are replaced with a “+” and all values less than the stated median are replaced with a “-“. If the data are evenly distributed about the median, we would expect a similar number of + and – signs.

Effectively this is a special case of the binomial test. For n data points, we are using X ~ Bi(n, 0.5), where the test statistic is the number of + signs. We calculate the probability that X is above the tests statistic, below the statistic, or, for a 2-tailed test, either above or below.

Worked Example. Single-sample sign test

The following dataset is believed to come from a population with median 135:

150130125140170
140190180175165
160130140140145

Perform a single-sample sign test at the 5% significance level to test this claim.

If n is large, we can also use the normal distribution with the sign test, as follows:

Let S = min(number of + signs, number of – signs). Then E(S) = n/2 and Var(S) = n/4.

For large n (n must be > 10), T ~ N(n/2 , n/4), we can use the normal approximation of the binomial with p = 0.5. We must also use a continuity correction (as we our using a continuous distribution to approximate a discrete distribution), so our x-value is z = \frac{S^+ - \mu + 0.5}{ \sigma }

Exercise 1

Answers to Exercise 1

Worked Solutions to Exercise 1

Single-sample Wilcoxon signed-rank Test

If the underlying data is known to be symmetric, the Wilcoxon signed-rank test is better to use as it ranks the data.

We rank each data point based on how far it is from the stated population median. The test statistic, T, is the smaller value of the sum of the negative ranks, N, and the sum of the positive ranks, P, i.e. T = min(P,N).

Although the data is continuous, the sum of ranks is discrete, and so the distribution of T is discrete. We should also note that P will be between 0 and \frac{n(n+1)}{2} (because the ranks go from 1 up to n). The closer the test statistic is to zero, the more spread the data is.

Worked Example. Single-sample Wilcoxon signed-rank Test

The kilogram weight of ten randomly selected mackerel are as follows: 1.6, 1.1, 2.1, 2.4, 2.2, 2.9, 2.6, 2.3, 2.7 and 1.9.

Test at the 5% significance level whether the median weight is greater than 1.8kg.

For large n, we can also approximate the Wilcoxon signed-rank test to a normal distribution. Given the statistic T = min (P,N), E(T) = \frac{n(n+1)}{4}, Var(T) = \frac{n(n+1)(2n+1)}{24} and for large n, T \tilde N(\frac{n(n+1)}{4}) , \frac{n(n+1)(2n+1)}{24}) .

We use a continuity correction, as we are approximating a discrete distribution with a continuous distribution. Our z-value is: z = \frac{T - \mu + 0.5}{ \sigma }

Worked Example. Single-sample Wilcoxon signed-rank Test (Normal Approximation)

In a clinical trial, the survival time, in weeks, for 19 patients with non-Hodgkin’s lymphoma are as detailed below:

3754738994110112123129132
148151173189201204213276281

Test at the 5% significance level whether the median differs from 150.

Exercise 2

Exercise 2. Answers

Exercise 2. Worked Solutions

Two Sample Tests

The paired-sample sign test extends the idea of the sign test by looking for a positive or negative difference. In other respects it works the same as the sign test.

Worked Example – Paired-sample sign test

Data is collected on the time, in seconds, that it takes nine children to tie up their left shoelace and their right shoelace:

ChildLeftRight
A4245
B3836
C5152
D4239
E3135
F4849
G6162
H3839
I4445

Test at the 10% significance level whether there is a difference in the time it takes for the children to tie each shoelace.

Exercise 3

Answers to Exercise 3

Worked Solutions to Exercise 3

Wilcoxon Matched-pairs signed-rank test

If we can assume that the differences in pairs of data are symmetric, then we can use the Wilcoxon matched-pairs signed-rank tests, following the same procedure as for the single sample Wilcoxon signed-rank test, testing to see whether the paired-difference median is zero.

Worked Example: Wilcoxon matched-pairs signed-rank test

An investigation is carried out into the effectiveness of two types of post-operative pain relief drug: Drug 1 and Drug 2. Seven adults agree to take Drug 1 on one day, and Drug 2 on the second. The time, in hours, of pain relief is recorded.

Drug 1Drug 2
A4.13.9
B3.23.3
C5.35.0
D5.14.6
E4.24.6
F3.83.2
G3.64.3

Exercise 4

Answers – Exercise 4

Worked Solutions – Exercise 4

Wilcoxon rank-sum test

We can only use matched-pairs testing if the data is in groups of equal size and can be paired. If we have two independent datasets of different sizes and want to test for a difference between their medians we can use the Wilcoxon rank-sum test, which has a similar design to the independent t-test.

First we rank all of the data as if it were from a single population. We then take separate sums of ranks for each group. The sum of the sample with m items of data is Rm and the sum of the sample with n items of data is Rn (where m ≤ n). The test statistic (a little tricky to memorise) that we use is W = min (Rm , m(n+m+1) – Rm)

Worked Example: Wilcoxon rank-sum test

Researchers are investigating the effect of vitamin B12 on the size of the brain. A sample of males aged between 25 and 40 years is selected. Nine of them are known to have low B12 levels and seven are known to have high B12 levels. After a brain scan, the ratio of brain volume to skull capacity is recorded.

Low B12 levels0.7950.7980.8020.8050.8060.8070.8080.810.812
High B12 levels0.7860.7890.7920.7960.7990.80.803

Carry out a Wilcoxon rank-sum test, at the 5% significance level, to see whether the level of vitamin B12 affects the size of the brain.

Normal approximation

If m and n are large (both ≥ 10), W can be approximated as a normal distribution, with E(W) = \frac{m(n+m+1)}{2} and Var(W) = \frac{mn(n+m+1)}{12} . We also need a continuity correction, so our z-value is z = \frac{ W - \mu + 0.5 }{ \sigma }

Worked Example: Wilcoxon rank-sum test – Normal Approximation

A company is investigating a new production technique to improve the quality of camera lenses for a phone. Samples of the lenses are given to a camera expert who is asked to rank the lenses, with rank 1 being the highest quality. The expert does not know which production technique has been used.

LensABCDEFGHIJ KL
Methodoldnewnewoldoldnewoldnewoldoldoldnew
Rank1212910521620222317
LensMNOPQRSTUV WX
Methodnewnewoldoldoldnewoldnewoldnewnewold
Rank14133419112416188715

Using a suitable approximation, test at the 5% significance level whether there is a difference in the quality of the production techniques.

Exercise 5

Answers to Exercise 5

End of “Non-Parametric Tests” Chapter Mixed Exercises

Worked Solutions to Exercise 5

Answers to End of “Non-Parametric Tests” Chapter Mixed Exercises