hypothesis testing: about more than two

HYPOTHESIS TESTING:
ABOUT MORE THAN TWO (K)
RELATED POPULATIONS
1
Repeated Measures ANOVA
Often, one wants to administer the same test to the same
subjects repeatedly over a period of time or under different
circumstances.
In essence, one is interested in examining differences within
each subject, for example, subjects' improvement over time.
Such designs are referred to as within-subjects designs or
repeated measures designs.
2
For example, imagine that one wants to monitor the
improvement of students' algebra skills over three months
of instruction. A standardized algebra test is administered
after one month (level 1 of the repeated measures factor),
and comparable tests are administered after two months
(level 2 of the repeated measures factor) and after three
months (level 3 of the repeated measures factor). Thus,
the repeated measures factor (Time) has three levels.
3
Subject
1
i
n
Mean
1
x11

xi1


Time
j
x1j


t
x1t
Mean

xij

xit
xi.



x .1
x. j
x .t
x n1 
x nj  x nt
x1.
x n.
x..
4
Repeated measures can occure in different ways:
• Repeated measures can be taken at different time points in
a single group.
•Repeated measures can be taken at different time points in
several groups.
5
Example Temperatures of the forehead (in degrees Celsius)
measured at 30-minute intervals in a single group of subjects are
given in the table.
subject
1
2
3
4
5
6
7
8
9
10
time1
30,9
31,9
31,3
32,1
30,9
31,3
31,3
32,1
30,3
32,2
time2
30,7
31,6
31,1
31,0
31,2
31,7
31,8
33,0
30,9
32,1
time3
30,9
31,6
31,0
31,7
30,5
31,4
31,8
31,7
30,8
32,2
time4
30,9
31,7
31,3
31,3
30,8
31,2
31,7
31,5
30,6
32,4
H0:There is no difference between time periods
6
Sources of variation
SS
df
Times
0,178
3
Subjects
10,086 9
Error
2,812 27
MS
0,059
1,121
0,104
F
0,57
Sig.
0,64
There is no difference between time periods.
7
Example
Temperatures of
the forehead (in
degrees Celsius)
measured at 30minute intervals in
two groups of
subjects are given
in the table.
Subject Group
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
1
10
1
11
2
12
2
13
2
14
2
15
2
16
2
17
2
18
2
19
2
20
2
time1
30,90
31,90
31,30
32,10
30,90
31,30
31,30
32,10
30,30
32,20
31,50
31,20
31,30
30,40
30,70
29,80
31,40
30,90
31,10
31,30
time2
30,70
31,60
31,10
31,00
31,20
31,70
31,80
33,00
30,90
32,10
30,60
31,20
31,30
30,80
30,90
30,80
32,00
32,40
31,30
31,50
time3
30,90
31,60
31,00
31,70
30,50
31,40
31,80
31,70
30,80
32,20
30,80
31,10
31,50
30,40
30,90
30,90
31,70
31,80
31,20
31,60
time4
30,90
31,70
31,30
31,30
30,80
31,20
31,70
31,50
30,60
32,40
31,00
31,30
31,40
30,20
30,90
30,80
31,60
31,90
31,20
8
31,70
Three pairs of hypotheses can be tested:
1. Hypothesis on groups
2. Hypothesis on time points
3. Hypothesis on interaction
Source of variation
Groups
Times
Interaction
Subjects
Residual
df
1
3
3
18
54
SS
1,275
0,41
0,336
17,176
5,231
MS
1,275
0,137
0,112
0,954
0,097
F
1,336
1,412
1,158
Sig.
0,263
0,249
0,334
9
31,6
31,5
31,4
31,3
31,2
31,1
Group
31,0
1,00
30,9
2,00
1
2
3
4
Time
10
Cochran’s Q Test
Cochran’s Q Test extends McNemar Test to examine
change in a dichotomous variable (0-1) across more
than two observations. It is a particularly appropriate test
when subjects are used as their own controls and
dichotomous outcome variable is measured across
multiple time periods or under several types of
conditions.
11
2
 k 2 k
 
(k  1) k  G j   G j  
 j 1  
 j 1
Q
N
N
k  Li   L2i
i 1
i 1
where
k: # of conditions or time periods
N: # of subjects
Gj: The total # of 1s in the jth column
Li: The total # of 1s in the ith row
Post hoc tests are necessary to determine where the differences
lie.
12
ExampleThe children in the anxiety reduction intervention
groups were evaluated for the presence of certain symptom
before and after the intervention.
Presence of symptom
Is the clinical intervention Subject Preint.
effective in reducing the 1
1
symptom?
2
1
Since
the
pretest-posttest
measure is dichotomous and the
data are paired, the use of
McNemar test is appropriate to
answer this question.
Postint.
0
0
3
1
1
4
0
0
5
1
0
6
1
0
7
1
1
8
1
0
9
1
1
10
1
0
13
Suppose one month after the end of the intervention program a
third measurement is taken from all children. Is the proportion of
children who presents the symptom (yes,no) same across all
three data collection periods?
Since there are now three points of data collection, the McNemar
test can no longer be used. Cochran’s Q test, however, woud be
appropriate.
H0: the proportion of “yes” responses with regard to the presence
of the symptom same across all three time periods for those
children.
14
Presence of symptom
Subj
1
2
3
4
5
6
7
8
9
10
Gj
Preint. Postint
1
1
1
0
1
1
1
1
1
1
9
0
0
1
0
0
0
1
0
1
0
3
One
month
later.
1
0
1
0
1
1
1
0
1
0
6
Li
2
 k 2  k
 
(k  1) k  G j   G j  
 j 1  
 j 1
Q
N
N
k  Li   L2i
i 1
i 1
2
(3  1)3(9 2  3 2  6 2  18 2 
1 
3(2  1    1)  (2 2  12    12 )
3
108
0  12  9
2
2
2(2,0.05)=5.99 <Qcal, p<0.05
3
Reject H0.
1
3
1
18
15
Friedman Test
The Friedman Test extends the Wilcoxon Signed Ranks Test
to include more than two time periods of data collection or
conditions
12
 k 2
Fr 
 R j   3N (k  1)

Nk (k  1)  j 1 
Where
Rj: The sum of the ranks for column j
N: The # of subjects
k: # of time periods or conditions
This statistic is distributed as a chi-square with df=k-1.
16
Multiple Comparisons Test
Once a determination has been made that the overall
Friedman test is significant, post hoc tests can be
undertaken that compare the differences in average ranks for
all possible pairs to determine where the differences lie.
The null hypothesis of no differences in mean ranks of the
pairs being examined will be rejected if absolute value of
these differences is greater than a specified critical value.
That is, we would reject the null hypothesis if the following
condition holds true:
Ri  R j  z  / k ( k 1)  k (k  1) / 6 N
Ri: the mean rank for time or condition i
k: the number of time periods or conditions
N: the number of cass
17
Example Suppose we had collected information concerning
the 10 children’s anxiety levels not only at pretest and
immediately following the anxiety reduction intervention but
also just prior to the administration of the preoperative
medication. What are the differences in the anxiety levels of
the 10 children who took part in the intervention across the
three time periods?
H0: there will be no differences among the median anxiety
scores at preintervention, at postintervention, and at
preoperative medication for the 10 children who took part in
the intervention.
18
Children’s anxiety levels
3
4
Preint Rank Postint
3 5
7
2 4
4
3 5
3
1 6
3
5
6
7
8
6
7
6
7
Subj
1
2
9
5
10
7
Sum Rj
Rank Preop Rank
1 6
2
2 4
2
1.5 5
1.5
3 4
2
3 3
3 3
3 5
3 5
3 4
1 4
1 6
1.5 5
1 6
1.5 4
2
2
1.5
3 5
27
1 6
14.5
2
18.5
2
1.5
19
12
 k 2
Fr 
 R j   3N (k  1)

Nk (k  1)  j 1 


12
2
2
2

27  14.5  18.5  3(10)(3  1)
10(3)(3  1)
 8.15
2(2,0.05)=5.99 <Fr, p<0.05
Reject H0.
20
z  / k ( k 1) 
k ( k  1) / 6 N
 2.39 3( 4) / 6(10)
 1.08
Groups
Ri  R j
Critical value
Decision
1-2
1.25
1.08
Reject H0
1-3
0.85
<1.08
Accept H0
2-3
0.40
<1.08
Accept H0
21