CHAPTER 7 - NON-PARAMETRIC STATISTICAL METHODS

Chapter 7: Non-parametric statistical methods
The aim of this chapter is for you to appreciate that nominal and ordinal data can also be
analysed using inferential tests known as non-parametric tests.
Key learning objectives are:
 to understand what nominal and ordinal data are
 to recognise when data cannot be analysed using t-tests and ANOVA
 to understand that significance testing can be applied to a comparison of group
frequencies when analysing nominal data
 to understand that significance testing can be applied to an analysis of ‘ranks’
when analysing ordinal data
 to know how to assign a rank to each raw score within a data set
 to be able to identify the correct non-parametric test, given specific experimental
circumstances
 to be able to calculate and interpret test values
 to be able to perform post-hoc analyses using nominal and ordinal data.
7.2.1 TESTS FOR NOMINAL DATA
DISCUSSION QUESTION
1. Provide real-world examples of experiments and other studies that generate
nominal data.
SUGGESTED RESPONSE
This question should be answered in accordance with the interests and experiences of the
instructor and students.
7.2.1.1 THE BINOMIAL TEST
CALCULATION QUESTION
1. Using the binomial test, are the following pairs of data significantly different
given a total sample size of 10:
(a) 6 and 4
(b) 8 and 2
You might like to determine your finding using both the formula and Table 7.2.
2. What do these results tell you about the need for a large sample size?
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-1
SUGGESTED RESPONSE
1. The binomial test:
(a)
n  pn
6  0.5  10
1
z A


 0.63
npq
10  0.5  0.5 1.58
Given that zcritical = +/−1.96 (see Table 7.1) for a two-tail test when
α = .05 we must
conclude that 0.63 < zcritical, and thus does not fall into the critical region. So we cannot
say that there is a statistically significant difference between 6 and 4.
From Table 7.2, the probability of finding a comparison of 6 and 4 assuming p = q = 0.5
is 0.377  2 = 0.75 since we are performing a two-tailed test. This is in accord with the
non-significant finding above. However, advanced students may notice a difference
between the z value above, its corresponding probability and the probability derived from
Table 7.2. This discord occurs because the values in Table 7.2 are exact, whereas the
arithmetic method seeking to calculate a z score assumes that the binomial and normal
distributions approximate. This may not be the case especially when the total number of
incidences observed is small.
(b)
n  pN
8  0.5  10
3
z A


 1.90
Npq
10  0.5  0.5 1.58
Given that zcritical = +/−1.96 (see Table 7.1) for a two-tail test when α = .05 then our value
of 1.90 for z falls short of the critical region and so we cannot say that there is a
statistically significant difference between 8 and 2. This may appear surprising but is a
consequence of having only 10 occurrences in total.
From Table 7.2, the probability is 0.055  2 = 0.11, which is in accord with the nonsignificant finding above. Again there is a small numerical difference between the
arithmetic method and when using Table 7.2 as explained above.
2. The result for (b) is surprising given the difference between 8 and 2 suggesting small
sample sizes demonstrate significance only when extreme differences exist between the
two outcomes. As a corollary, if small differences between the two outcomes are to be
seen as statistically significant you need a much larger sample size.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-2
7.2.1.2 THE CHI-SQUARE TEST
CALCULATION QUESTION
1. In educational research, it is often useful to poll students as to what facilities best
aid their learning. As such, each student in a class of 100 was asked to nominate
that learning aid that was most important to them. The results were:
Learning aid
Lectures
Tutorials
Number of students
35
65
As we can see, most students thought that tutorials were more important than lectures.
But was their a statistically significant difference between the two categories that may
sway how the subject is taught in the future?
SUGGESTED RESPONSE
2 
(35  50) 2 (65  50) 2 225 225



9
50
50
50
50
df  (k  1)  (2  1)  1
Using the table of chi-square critical values (Table 7.3), we find χ2critical = 3.84 when α =
.05. As our value for chi-square is bigger than the critical value (i.e. 9 > 3.84), it falls into
the critical region and we can declare a significant difference between our two categories.
As such, students clearly prefer tutorials to lectures.
7.2.2 TESTS FOR ORDINAL DATA
DISCUSSION QUESTION
1. Provide pertinent real-world examples of experiments and studies that generate
ordinal data.
SUGGESTED RESPONSE
This is an open-ended question that relies on the instructor’s knowledge and the interests
of the students.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-3
7.2.2.1 HOW TO RANK ORDINAL SCORES
CALCULATION QUESTION
1. Ranking is the basis for tests of ordinal data. Therefore:
(a) rank the following scores:
5
2
4
(b) rank the following scores:
8
13
14
1
8
11
21
SUGGESTED RESPONSE
ranking:
(a)
Raw score
Rank
1
1
2
2
4
3
5
4
8
5
Raw score
Rank
8
1
11
2
13
3
14
4
21
5
(b)
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-4
CALCULATION QUESTION
1. Tied scores are a common problem when ranking. Therefore:
(a) rank the following scores:
2
3
5
3
(b) rank the following scores:
5
5
5
2
7
7
SUGGESTED RESPONSE
Ties:
(a) The 3’s have the ordinal positions of 2 and 3. As such, their average rank is (2 + 3)/2
= 2.5
Raw score
Rank
2
1
3
2.5
3
2.5
5
4
7
5
(b) The 5’s have the ordinal positions of 2, 3 and 4. Therefore, their average rank is
+ 3 + 4)/3 = 3
Raw score
Rank
2
1
5
3
5
3
5
3
7
5
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-5
(2
7.2.2.2 WILCOXON SIGNED-RANKS TEST
CALCULATION QUESTIONS
1. Which of the following combinations demonstrate significance for the Wilcoxon
signed-ranks test?
(a) Tlesser = 7, Tcritical = 8
(b) Tlesser = 8, Tcritical = 5
(c) Tlesser = 7, n = 13, α = .05, appropriate for a two-tailed hypothesis
(d) Tlesser = 7, n = 13, α = .05, appropriate for a one-tailed hypothesis
(e) Tlesser = 55, n = 20, α = .05, appropriate for a two-tailed hypothesis
(f) Tlesser = 62, n = 20, α = .05, appropriate for a one-tailed hypothesis
2. A comparison of two conditions gave a set of 10 difference scores. For the
following difference scores determine if Tlesser is significant given α = .05 and the
hypothesis is two-tailed. The difference scores are:
−5
−10 −2
+1
+2
−3
+1
−4
−3
−6
SUGGESTED RESPONSE
1. Wilcoxon:
(a) Significant as Tlesser < 8
(b) Not significant as Tlesser > 5
(c) Significant as Tlesser < 17
(d) Significant as Tlesser < 21
(e) Not significant as Tlesser > 52
(f) Not significant as Tlesser > 60
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-6
2. Raw scores with corresponding ranks:
Raw score
Rank
+1
1.5
+1
1.5
+2
3.5
−2
3.5
−3
5.5
−3
5.5
−4
7
−5
8
−6
9
−10
10
Division of ranks by sign:
Ranks associated with positive scores
Ranks associated with negative scores
1.5
3.5
1.5
5.5
3.5
5.5
7
8
9
10
Tlesser = 6.5
Tgreater = 48.5
As Tlesser = 6.5 < T critical = 8. Thus we can conclude a significant difference.
DISCUSSION QUESTIONS
1. What effect do a number of tied scores have on the Wilcoxon signed-ranks test?
2. How can tied scores be avoided when designing a study?
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-7
SUGGESTED RESPONSE
1. Tied scores decrease variability in the ranks. As such, it can be more difficult to reject
the null hypothesis. If you have several ties, you can apply a correction to the test found
in Seigel and Castellan (1988).
2. One simple way is to use scales that allow for some variety in the choice of score given
by each participant. In this way, you might choose a 10-point scale as opposed to a fivepoint scale. As such, it becomes less likely that the same score will be chosen by multiple
participants and thus the number and extent of tied scores will be decreased. In addition,
the more participants you have the more likely you are to get tied scores. Therefore, your
analysis can become a compromise between the need for a large sample to maintain
statistical power and the problem of tied scores.
7.2.2.3 MANN-WHITNEY U TEST
CALCULATION QUESTIONS
1. For the data sets below calculate the sums of ranks:
(a) Data set A:
24
23
25
27
19
(b) Data set B:
3
4
3
6
1
2. For the following data determine if there is a significant difference using the
Mann-Whitney U Test assuming α = .05 and it is a two-tailed hypothesis:
Data set A:
2
4
3
5
3
Data set B:
5
4
6
8
8
3. Will UA equal UB if nA is different to nB?
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-8
SUGGESTED RESPONSE
1.
Raw score
Rank
1 (B)
1
3 (B)
2.5
3 (B)
2.5
4 (B)
4
6 (B)
5
19 (A)
6
23 (A)
7
24 (A)
8
25 (A)
9
27 (A)
10
ΣRA = 40, ΣRB = 15
2. Before calculating UA and UB we need to find the sum of ranks:
Raw score
Rank
2 (A)
1
3 (A)
2.5
3 (A)
2.5
4 (A)
4.5
4 (B)
4.5
5 (A)
6.5
5 (B)
6.5
6 (B)
8
8 (B)
9.5
8 (B)
9.5
ΣRA = 17, ΣRB = 38
Giving:
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-9
U A  (5  5)  (
5  (5  1)
)  17  25  15  17  23
2
U B  (5  5)  (
5  (5  1)
)  38  25  15  38  2
2
As UB represents the lesser of the two values for U we compare it to Ucritical. Given nA =
5, nB = 5 and α = .05, Ucritical = 2 (two tailed). As UB = Ucritical we can conclude a
significant difference between these two data sets.
3. UA and UB can equal if both sets of data are equally distributed amongst the ranks.
This is most likely to occur when the size of each data set is the same.
7.3.1.1 THE CHI-SQUARE TEST AND OPTIONS FOR A POSTHOC ANALYSIS
CALCULATION QUESTION
1. Undergraduate medical students were polled as to their prior use of alternative
therapies. The results are below:
Therapy
Number of students who have used each therapy
Acupuncture
15
Herbal remedies
17
Homeopathy
7
Voodoo
1
Using chi-square determine whether there is a preference for some forms of alternative
therapies over others.
SUGGESTED RESPONSE
2 
(15  10) 2 (17  10) 2 (7  10) 2 (1  10) 2 25 49 9 81





   16.4
10
10
10
10
10 10 10 10
df  (k  1)  (4  1)  3
Assuming α = .05, then the critical value for chi-square is 7.82. As 16.4 is greater than
7.82, then there is a statistically significant difference between the preferences shown for
the four alternative therapies.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-10
7.3.2.1 THE KRUSKAL-WALLIS TEST AND OPTIONS FOR A
POST-HOC ANALYSIS
CALCULATION QUESTIONS
1. Calculate the sum of ranks for the following data sets:
Condition A: 2
3
1
3
2
Condition B: 4
2
5
6
8
Condition C: 7
6
9
10
11
2. Assuming four treatment groups with the following characteristics, calculate KW:
Data set A:
sum of ranks = 28, sample size = 7
Data set B:
sum of ranks = 50, sample size = 5
Data set C:
sum of ranks = 93, sample size = 6
Data set D:
sum of ranks = 180, sample size = 8
SUGGESTED RESPONSE
1. First, rank all scores:
Raw score
Rank
1 (A)
1
2 (A)
3
2 (A)
3
2 (B)
3
3 (A)
5.5
3 (A)
5.5
4 (B)
7
5 (B)
8
6 (B)
9.5
6 (C)
9.5
7 (C)
11
8 (B)
12
9 (C)
13
10 (C)
14
11 (C)
15
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-11
Now place the ranks into the various conditions in place of their raw scores and then sum
the ranks:
Condition A
Condition B
Condition C
1
3
9.5
3
7
11
3
8
13
5.5
9.5
14
5.5
12
15
ΣRA = 18
ΣRB= 39.5
ΣRC = 62.5
2. KW:
KW 
12
R 2
12
28 2 50 2 93 2 180 2
 (
)  3(n  1) 
(



)  3  (26  1)  23.33
n(n  1)
nk
26  (26  1)
7
5
6
8
CALCULATION QUESTIONS
1. Using Table 7.17 calculate z, given α = .05 for:
(a) A one-tailed test of four comparisons
(b) A two-tailed test of six comparisons
2. Is it possible to calculate the minimum significant difference given that z = 2.394,
n = 24 and the size of the first treatment group under comparison is 6?
3. What is the absolute difference between these average ranks:
(a) 4.5 versus 2.4
(b) 2.4 versus 4.5
SUGGESTED RESPONSE
1.
(a) A one-tailed test of four comparisons has z = 2.241
(b) A two-tailed test of six comparisons has z = 2.638
2. No. The need for post-hoc tests suggests three or more treatment groups. You have
only accounted for 6 out of 24 participants and cannot assume that the size of the second
treatment group will also be 6. Until you know the size of the second treatment group you
cannot calculate the minimum significant difference.
3. In both instances, it is 2.1. Given that, the ‘absolute’ value is the value of the difference
irrespective of sign.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-12
7.3.2.2 FRIEDMAN TWO-WAY ANALYSIS BY RANKS AND
OPTIONS FOR A POST-HOC ANALYSIS
CALCULATION QUESTION
1. If Fr = 7.81, assuming four categories and α = .05, is this value for Fr significant?
SUGGESTED RESPONSE
If there are four categories, then df = 3. Using the table of critical chi-square values, we
can see that our value for Fr must be greater than 7.82 to be declared statistically
significant. However, as 7.81 is just less than 7.82 you cannot assign statistical
significance. Nevertheless, consider the effects of rounding errors etc and whether these
could influence the value for Fr. In addition, remember that null hypothesis significance
testing is a blunt instrument. You may still wish to express interest in the findings even if
the maths suggests non-significance. Finally, if you had chosen α = .1 you would have
found significance and the problem would have evaporated.
CALCULATION QUESTION
1. Find the value for q given:
(a) three possible comparisons when α = .05 and you have a two-tailed
hypothesis
(b) four possible comparisons when α = .05 and you have a one-tailed hypothesis.
SUGGESTED RESPONSE
Post-hocs:
(a) q = 2.35
(b) q = 2.16
7.4.1 THE CHI-SQUARE TEST FOR A 2x2 TABLE
CALCULATION QUESTION
1. For the following 2x2 table calculate chi-square:
Variable A
Variable B
Category 1
Category 2
Category 1
23
12
Category 2
11
26
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-13
SUGGESTED RESPONSE
 
2
 
2
n  ( (freq R1C1  freq R2C2 ) - (freq R1C2  freq R2C1 )  (n / 2)) 2
(freq R1C1  freq R1C2 )  (freq R2C1  freq R2C2 )  (freq R1C1  freq R2C1 )  (freq R1C2  freq R2C2 )
72  ( (23  26) - (12  11)  (72 / 2)) 2
(23  12)  (11  26)  (23  11)  (12  26)

13312800
 7.96
1673140
df  (r  1)  (c  1)  (2  1)  (2  1)  1
χ2critical = 3.84, given 1 degree of freedom when α = .05. As such, our value for chi-square
is much larger than the critical value and so we can declare a statistically significant
difference. This is not surprising given the differences between and across cells in the
data table above.
7.4.2 FISHER’S EXACT 2x2 TEST
CALCULATION QUESTION
1. When using Fisher’s exact 2x2 test what do the following exact probabilities
suggest when compared to α = .05:
(a) Prexact = .001
(b) Prexact = .02
(c) Prexact = .1
(d) Prexact = .35
SUGGESTED RESPONSE
Fisher’s exact 2x2 test:
(a) Prexact = .001 < .05, so is considered an ‘improbable’ test result and thus we attribute
statistical significance to it.
(b) Prexact = .02 < .05, so is considered an ‘improbable’ test result and thus we attribute
statistical significance to it.
(c) Prexact = .1 > .05, so is considered a ‘probable’ test result and thus we do not attribute
statistical significance to it.
(d) Prexact = .35 > .05, so is considered a ‘probable’ test result and thus we do not attribute
statistical significance to it.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-14
7.4.3 THE CHI-SQUARE TEST AS APPLIED TO A 2x3 TABLE AND
OPTIONS FOR A POST-HOC ANALYSIS OF INTERACTION
CALCULATION QUESTIONS
1. For the following table of data what are the expected cell frequencies?
Variable A
Variable B
Category 1
Category 2
Category 3
Category 1
observed freq. = 5
observed freq. = 8
observed freq. = 11
Category 2
observed freq. = 8
observed freq. = 4
observed freq. = 2
2. For a chi-square analysis of a 2x4 table, how many degrees of freedom are there?
3. For a chi-square analysis of a 2x4 table, how many
(observed frequency  expected frequency) 2
will have to be summed to calculate χ2?
expected frequency
SUGGESTED RESPONSE
1. Using:
expected frequencycell 
column total  row total
total sample size
We derive column and row totals noting total sample size equals 38:
Variable A
Variable B
Category 1
Category 2
Category 3
Row totals
Category 1
observed freq. = 5
observed freq. = 8
observed freq. = 11
24
Category 2
observed freq. = 8
observed freq. = 4
observed freq. = 2
14
Column
totals
13
12
13
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-15
We now get:
Variable B
Category 1
Category 2
Variable A
Category 1
observed freq. = 5
Category 2
observed freq. = 8
Category 3
observed freq. = 11
expected freq. = 8
expected freq. = 8
expected freq. = 8
observed freq. = 8
observed freq. = 4
observed freq. = 2
expected freq. = 5
expected freq. = 4
expected freq. = 5
2. df  (r  1)  (c  1)  (2  1)  (4  1)  3
3. 2 × 4 = 8 cells and thus 8 parts to the arithmetic to be summered together.
Instructor Resource Manual t/a Research Design and Statistics by Edwards
7-16