nonparametric tests - Winona State University

STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
In situations where the normality of the population(s) is suspect or the sample sizes are so
small that checking normality is not really feasible, it is sometimes preferable to use
nonparametric tests to make inferences about “average” value.
Wilcoxon Rank Sum Test (Mann-Whitney U Test)
This test is an alternative to the two-sample t-test for comparing the “average” value of
two populations where the samples from each population are taken independently.
The hypotheses tested can be stated as follows:
H o : The distribution of population 𝑋 and population 𝑌 are identical.
If the populations are symmetric (but not necessarily normal) the null hypothesis
can be expressed in terms of the population medians as:
M X  MY
H a : The distribution of population 𝑋 and population 𝑌 are different. (two-tailed)
M X  MY
or
H a : The distribution of population 𝑋 is shifted to the right of the distribution for
population 𝑌, i.e. the population 𝑋 values are generally larger than the population
𝑌 values. (right-tailed)
M X  MY
or
H a : The distribution of population 𝑋 is shifted to the left of the distribution for
population 𝑌, i.e. the population 𝑋 values are generally smaller than the population
𝑋 values. (left-tailed)
M X  MY
The tests statistic is based on the sum of the ranks assigned to the observed data from
each population when the combined sample is ranked from smallest to largest. We will
always assume that the sample size (m) for population 𝑋 is less than or equal to the
sample size (n) from population 𝑌.
Example 15.1 - Radial Lengths of Green and Red Sea Stars.
Researchers want to compare the radial lengths of two color morphs of the same species
of sea star. The red color morph is more liable to predation and, hence, those found
might be generally smaller than the green color morph.
Ho : M R  MG
Ha : M R  MG
vs.
234
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
The data below are radial lengths (mm) of two independently drawn random samples of
these sea stars.
Red:
108
64
80
92
40
Green:
102
116
98
132
104
124
The sum of the ranked lengths for the red sea stars is: _____________.
The sum of the ranked lengths for the green sea stars is: ____________.
The sum of the ranks for the red sea stars is smaller than the rank sum for the green sea
stars but this would be expected even if the null hypothesis were true. Why?
The test statistic, WR , is the sum of the ranks for population R (red sea star). Use
Wilcoxon Rank Sum Test tables to determine whether to reject the null or not. Intuitively
we will reject the null hypothesis if the sum of the ranked radial lengths for the red sea
stars is “small”. The table tells what “small” is for a given significance level (  ).
For m = 5 and n = 6 we find the following in the Lower Tail table.
For m = 5 and n = 6 we find the following in the Upper Tail table.
The table says we will reject the null at the   .05 level if:
WR  20 for H a : M R  M G
WR > 40 for H a : M R  M G
WR < 18 or WR  42 for H a : M R  M G
Thus we have evidence to conclude that red sea stars are generally smaller than green sea
stars in terms of radial length (p < .05).
235
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Data Table
In JMP
Results
236
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
237
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Normal Approximation to Wilcoxon Rank Sum Test
If one our sample sizes exceeds 12 we cannot use the table in the back of this handout to
find our critical value. When this is the case we can use the normal approximation
approach to find an approximate p-value.
The normal approximation test statistic is
W  W
n (n  nG  1)
where W  R R
and  W 
z
W
2
nR nG (nR  nG  1)
12
Here we have,
W = WR=18
5(5  6  1)
W 
 30
2
5  6(5  6  1)
W 
 5.48
12
z
18  30
 2.19
5.48
238
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Wilcoxon Signed Rank Test
This test is an alternative to the paired t-test which is used when we do not wish to
assume that the population of paired differences is normally distributed. As with the
Mann-Whitney U test, the Wilcoxon Signed-Rank Test use ranks based on the paired
differences rather than the actual values.
Example 15.2 - Effect of Togetherness on the Heart Rate of Rats
Rat
1
2
3
4
5
6
7
8
9
10
Alone
Rate
463
462
462
456
450
426
418
415
409
402
Together
Rate
523
494
461
535
476
454
448
408
470
437
di = Ti –
Ai
60
32
-1
79
26
28
30
-7
61
35
Sign
(+/-) | di |
Rank
|di|
Signed
Rank
We then calculate W = the sum of the positive signed ranks = _______________
and W = the sum of the negative signed ranks = _______________
The hypotheses can be stated in terms of the median of the paired differences if the
distribution of the paired differences is reasonably symmetric. Listed below are the
hypotheses along with the test statistic based on the signed rank sums used to test it.
H o : M d  0 vs. H a : M d  0
(two-tailed)
Test statistic W  min( W ,W )
H o : M d  0 vs. H a : M d  0
(right-tailed)
Test statistic W  W
H o : M d  0 vs. H a : M d  0
(left-tailed)
Test statistic W  W
For this example, if had originally hypothesized that the heart rate of a rat will increase
when it is placed in a social environment then we have the right-tailed alternative and our
test statistic W = _______.
The Wilcoxon Signed-Rank Test table gives critical values we can compare our observed
test statistic value W to for a given sample size, i.e. number of pairs, n.
239
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Because our observed test statistic value is less than the table value we reject the null
hypothesis and conclude that the heart rate of a rat will generally increase when it is taken
from a solitary confinement and placed in a social environment with other rats.
IN JMP
Select Analyze > Distribution > Test Mean > Enter 0 for the hypothesized value and
check the nonparametric test box. The p-value for the right-tailed test has been
highlighted.
(W W )
The test statistic reported by JMP = 
Why? I don’t know, but we only need
2
the p-value anyway (p = .0049).
Conclusion:
Normal Approximation to Wilcoxon Signed-Rank Test
If n>12 we can use a z-statistic and find the p-value from the standard normal table.
W  w
n(n  1)( 2n  1)
n(n  1)
where W 
and  W 
.
zW 
4
24
W
Here we have n = 10 so we don’t need to, but we can use the above approximation as
follows:
n(n  1) 10(10  1)
W 

 27.5
4
4
n(n  1)(2n  1)
10(11)(21)
W 

 9.81
24
24
Thus our z-statistic is
W  W
=
zW 
W
Now we find the p-value using the standard normal table.
240
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Nonparametric Approach to One-way ANOVA: Kruskal-Wallis Test
If the normality assumption is suspect or the sample sizes from each of the k populations
are too small to assess normality we can use the Kruskal-Wallis Test to compare the size
of the values drawn from the different populations. There are two basic assumptions for
the Kruskal-Wallis test:
1) The samples from the k populations are independently drawn.
2) The null hypothesis is that all k populations are identical in shape, with the
only potential difference being in the location of the typical values (medians).
Hypotheses:
H o : All k populations have the same median or location.
H a : At least one of the populations has a median different from other others
or
At least one population is shifted away from the others.
To perform to the test we rank all of the data from smallest to largest and compute the
rank sum for each of the k samples. The test statistic looks at the difference between the
R 
 N 1
average rank for each group  i  and average rank for all observations 
 . If there
 2 
 ni 
are differences in the populations we expect some groups will have an average rank much
larger than the average rank for all observations and some to have smaller average ranks.
2
k
 R N 1
12
 ~  k21 (Chi-square distribution with df = k-1)
H
ni  i 

N ( N  1) i 1  ni
2 
The larger H is the stronger the evidence we have against the null hypothesis that the
populations have the same location/median. Large values of H lead to small p-values!
Example 15.3 - Movement of Gastropods (Austrocochlea obtusa)
Preliminary observations on North Stradbroke Island indicated that the gastropod Austrocochlea obtuse
preferred the zone just below the mean tide line. In an experiment to test this, A. obtusa were collected,
marked, and placed either 7.5 m above this zone (Upper Shore), 7.5 m below this zone (Lower Shore), or
back in the original area (Control). After two tidal cycles, the snails were recaptured. The distance each
had moved (in cm) from where it had been placed was recorded. Is there a significant difference among the
median distances moved by the three groups?
Enter these data into two
columns, one denoting the group
the other containing the recorded
movement for each snail.
241
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
R1  84, R2  79, R3  162 and H  7.25 (p-value = .0287).
We have evidence to suggest that the movement distances significantly differ between the groups
(𝑝 = .0287).
Multiple Comparisons
To determine which groups significantly differ we can use the procedure outlined on pgs. 213-215 of the
textbook. To determine if group i significantly differs from group j we compute
z ij 
Ri R j

ni n j
and then compute p-value = P( Z  z ij ) .
N ( N  1)  1 1 

n n 
12
j 
 i
If the p-value is less then

2m
where
m  # of pair-wise comparisons to be made which would typically be
k 
  if all pair-wise comparisons are of interest. For this example, we can make a total of m =
 2
pair-wise comparisons so we compare our p-values to
 3
   3
 2
.05
 .00833 .
2(3)
242
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Comparing Upper Shore vs. Control
z13 
18.0  12.0
= 1.618  P(Z > 1.62) = .0526 > .00833 so we fail to conclude these locations
25(26)  1 1 
  
12  7 9 
significantly differ in terms of gastropod movement. Similar comparisons show the only significant
difference in movement is between lower and upper shore.
In JMP, there are multiple comparison procedures for comparing populations pairwise.
The Steel-Dwass method is the nonparametric equivalent to Tukey’s HSD from one-way
ANOVA, the Wilcoxon Each Pair option does all possible comparisons using the
Wilcoxon Rank Sum test but does not control for the experiment-wise error rate. The
Wilcoxon Each Pair option is fine to use if the number of groups is small, e.g. 𝑘 = 3
here.
Results of the Steel-Dwass method for the gastropod movement study.
243
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
Wilcoxon Rank Sum Test Tables
244
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
245
STAT 305: Chapter 15 – Nonparametric Tests
Spring 2017
246