Chapter 6 - UF Department of Statistics

Chapter 6
Inferences Regarding Locations of
Two Distributions
Comparing 2 Means - Independent Samples
• Goal: Compare responses between 2 groups (populations,
treatments, conditions)
• Observed individuals from the 2 groups are samples from
distinct populations (identified by (m1,s1) and (m2,s2))
• Measurements across groups are independent (different
individuals in the 2 groups)
• Summary statistics obtained from the 2 groups:
Group 1 : Mean : y1 Std. Dev. : s1 Sample Size : n1
Group 2 : Mean : y 2 Std. Dev. : s2 Sample Size : n2
Sampling Distribution of
Y1 Y 2
• Underlying distributions normal  sampling distribution
is normal
• Underlying distributions nonnormal, but large sample
sizes  sampling distribution approximately normal
• Mean, variance, standard error (Std. Dev. of estimator):

V Y

 s
E Y 1  Y 2  mY 1 Y 2  m1  m 2
sY
1 Y 2
1 Y 2

s 12
n1
2
Y 1 Y 2

s 22
n2

s 12
n1

s 22
n2
Small-Sample Test for m1m2
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis: H 0 : m1  m 2   0
• Alternative Hypotheses:
– 1-Sided:
H A : m1  m 2   0
– 2-Sided: H A : m1  m 2   0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
t obs 
( y1  y 2 )   0
sp
 1
1 



 n1 n2 
sp 
( n1  1) s12  ( n2  1) s22
n1  n2  2
Small-Sample Test for m1m2
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs  ta,n ==> Conclude m1m2  0
• If tobs < ta,n ==> Do not reject m1m2  0
– 2-sided alternative
• If tobs  ta/2 ,n ==> Conclude m1m2  0
• If tobs  -ta/2,n ==> Conclude m1m2 < 0
• If -ta/2,n < tobs < ta/2,n ==> Do not reject m1m2  0
Small-Sample Test for m1m2
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t  tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t  |tobs| ) (From the tn distribution)
• If P-Value  a, then reject the null hypothesis
Small-Sample (1-a100% Confidence Interval
for m1m2  Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of times this
rule would provide an interval that contains the true parameter
value m1m2 if it were applied over all possible samples
• Rule:
y  y   t
1
2
s
a /2 p
1 1
  
 n1 n2 
• Interpretation (at the a significance level):
– If interval contains 0, do not reject H0: m1 = m2
– If interval is strictly positive, conclude that m1 > m2
– If interval is strictly negative, conclude that m1 < m2
t-test when Variances are Unequal
• Case 2: Population Variances not assumed to be equal (s12s22)
• Approximate degrees of freedom
– Calculated from a function of sample variances and sample sizes (see formula
below) - Satterthwaite’s approximation
– Smaller of n1-1 and n2-1
• Estimated standard error and test statistic for testing H0: m1=m2:


Estimated standard error : SE Y 1  Y 2 
Test Statistic : t obs 
y1  y 2

SE y1  y 2
Satterthwa ite' s df : n 


s12
s22

n1
n2
y1  y 2
s12
s22

n1
n2
 s12
s22 



n
n
2 
 1
2
2
  s 2 2  s 2

 1

 2n 
n

1
2 

 n 1  n 1
1
2










Example - Maze Learning (Adults/Children)
• Groups: Adults (n1=14) / Children (n2=10)
• Outcome: Average # of Errors in Maze Learning Task
• Raw Data on next slide
Mean
Std Dev
Sample Size
Adults (i=1)
13.28
4.47
14
Children (i=2)
18.28
9.93
10
• Conduct a 2-sided test of whether true mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Gould and Perrin (1916)
Example - Maze Learning (Adults/Children)
Name
H
W
Mac
McG
L
R
Hv
Hy
F
Wd
Rh
D
Hg
Hp
Hl
McS
Lin
B
N
T
J
Hz
Lev
K
Group
Trials
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
41
25
33
31
41
48
24
32
46
47
35
69
27
27
42
89
38
20
49
40
50
40
54
58
Errors
Average
728
17.76
333
13.32
453
13.73
528
17.03
335
8.17
553
11.52
217
9.04
711
22.22
839
18.24
473
10.06
532
15.20
538
7.80
213
7.89
375
13.89
254
6.05
1559
17.52
1089
28.66
254
12.70
599
12.22
520
13.00
828
16.56
516
12.90
2171
40.20
1331
22.95
Group
n
1
2
Mean
Std Dev
14
13.28
4.47
10
18.28
9.93
Example - Maze Learning
Case 1 - Equal Variances
H0: m1m2  0
HA: m1m2  0
(a = 0.05)
(14  1)( 4.47) 2  (10  1)(9.93) 2
sp 
 52.15  7.22
14  10  2
13.28  18.28
 5.00
TS : tobs 

 1.67
2.99
1 
 1
7.22 


 14 10 
RR : | tobs |  t.025, 22  2.074
P  value : 2 P (T | 1.67 |)  .1091 (From EXCEL)
95%CI :  5.00  2.074( 2.99)   5.00  6.20  ( 11.2,1.2)
No significant difference between 2 age groups
Example - Maze Learning
Case 2 - Unequal Variances
H0: m1m2  0
S12
( 4.47) 2

 1.43
n1
14
HA: m1m2  0
(a = 0.05)
S 22
(9.93) 2

 9.86
n2
10
1.43  9.862
127.46
n 

 11.63
2
2
10.96
 (1.43)
(9.86) 



9
 13

13.28  18.28
 5.00
TS : t obs 

 1.49
2
2
3.36
( 4.47)
(9.93)

14
10
RR : | t obs |  t.025,11.63  2.19
*
95%CI :  5.00  2.19(3.36)   5.00  7.36  ( 12.36,2.36)
No significant difference between 2 age groups
Note: Alternative would be to use 9 df (10-1)
SPSS Output
Group Statistics
AVE_ERR
GROUP
Adult
Child
N
Mean
13.2761
18.2759
14
10
Std. Error
Mean
1.19408
3.14102
Std. Deviation
4.46784
9.93279
Independent Samples Test
Levene's Test for
Equality of Variances
F
AVE_ERR
Equal variances
ass umed
Equal variances
not as sumed
4.420
Sig.
.047
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-1.672
22
.109
-4.9998
2.99017
-11.20101
1.20145
-1.488
11.621
.163
-4.9998
3.36034
-12.34787
2.34831
(1a)100% Confidence Interval for m1-m2

Case 1 s  s
2
1
2
2
 : y
1

 y 2  ta / 2 s p
1 1
  
 n1 n2 
df  n1  n2  2
Maze Data (df  22) :
95%CI :  5.00  2.074(2.99)   5.00  6.20  (11.2,1.2)

 y
Case 2 s 12  s 22 :
1

 y 2  ta / 2
s12 s22

n1 n2
df  Satterthwa ite or smaller of n1  1, n2  1
Maze Data (df  11.63 or could use 9) :
95%CI :  5.00  2.19(3.36)   5.00  7.36  (12.36,2.36)
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Null hypothesis: Population Medians are equal H0: M1 = M2
– Rank measurements across samples from smallest (1) to
largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for group with smallest sample size (T )
– 1-sided tests:Conclude HA: M1 > M2 if T > TU
–
Conclude: HA: M1 < M2 if T < TL
– 2-sided tests: Conclude HA: M1  M2 if T > TU or T < TL
– Values of TL and TU are given in Table 5, p. 1092 for various
sample sizes and significance levels.
– This test is mathematically equivalent to Mann-Whitney U-test
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
Non-Dialysis
857 (12)
567 (9)
626 (10)
532 (8)
444 (5)
357 (1)
T1 = 45
Hemodialysis
527 (7)
740 (11)
392 (2.5)
514 (6)
433
(4)
392 (2.5)
T2 = 33
• 2-sided Test a = 0.05): TL=26, TU = 52, T=45 (Group 1)
• Conclude Medians differ (M1<M2) if T < 26
• Conclude Medians differ (M1>M2) if T > 52
• Neither criteria are met, do not conclude medians differ
Source: Zagornik, et al (1993)
Computer Output - SPSS
Ranks
AUC
GROUP
Non-Dialysis
Hemodialys is
Total
N
6
6
12
Mean Rank
7.50
5.50
Sum of Ranks
45.00
33.00
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asym p. Si g. (2-tail ed)
Exact Si g. [2*(1-tai led
Sig.)]
AUC
12.000
33.000
-.962
.336
.394
a
a. Not corrected for ti es .
b. Grouping Vari able: GROUP
Note that SPSS uses rank sum for Group 2 as test statistic
Rank-Sum Test: Normal Approximation
• Under the null hypothesis of no difference in the two
groups (let T be rank sum for group 1):
n1 ( N  1)
mT 
2
n1n2 ( N  1)
sT 
12
N  n1  n2
• A z-statistic can be computed and P-value (approximate)
can be obtained from Z-distribution
zobs 
T  mT
sT

T  n1 ( N  1) / 2
n1n2 ( N  1) / 12
Note: When there are many ties in ranks, a more complex formula
for sT is often used, see p. 321 of Longnecker and Ott.
Example - Maze Learning
Adults = Group 1
Hl
D
Hg
L
Hv
Wd
R
N
B
Hz
T
W
Mac
Hp
Rh
J
McG
McS
H
F
Hy
K
Lin
Lev
2
1
1
1
1
1
1
2
2
2
2
1
1
1
1
2
1
2
1
1
1
2
2
2
42
69
27
41
24
47
48
49
20
40
40
25
33
27
35
50
31
89
41
46
32
58
38
54
254
538
213
335
217
473
553
599
254
516
520
333
453
375
532
828
528
1559
728
839
711
1331
1089
2171
6.05
7.80
7.89
8.17
9.04
10.06
11.52
12.22
12.70
12.90
13.00
13.32
13.73
13.89
15.20
16.56
17.03
17.52
17.76
18.24
22.22
22.95
28.66
40.20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
0
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
1
0
1
1
1
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
0
1
0
0
0
1
1
1
0
2
3
4
5
6
7
0
0
0
0
12
13
14
15
0
17
0
19
20
21
0
0
0
158
T=T1
1
0
0
0
0
0
0
8
9
10
11
0
0
0
0
16
0
18
0
0
0
22
23
24
142
T2
Example - Maze Learning
H 0 : M1  M 2
Group 1 : Adults
H A : M 1  M 2 a  0.05
n1  14 n2  10 N  n1  n2  24
n1 ( N  1) 14(25)
T  158 mT 

 175
2
2
n1n2 ( N  1)
14(10)( 25)
sT 

 17.08
12
12
158  175
zobs 
 0.9954
17.08
RR : | zobs | za / 2  1.96
2  sided P - value : 2 P( Z | .9954 |)  2(.16)  .32
Computer Output - SPSS
Ranks
AVE_ERR
GROUP
Adult
Child
Total
N
14
10
24
Mean Rank
11.29
14.20
Sum of Ranks
158.00
142.00
Test Statisticsb
AVE_ERR
Mann-Whitney U
53.000
Wilcoxon W
158.000
Z
-.995
Asymp. Sig. (2-tailed)
.320
a
Exact Sig. [2*(1-tailed
.341
Sig.)]
a. Not corrected for ties .
b. Grouping Variable: GROUP
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: mD - Population mean difference
• Sample Statistics:

d
n
d
i 1 i
n

d d


2
n
s
2
d
i 1
i
n 1
sd  sd2
Test Concerning mD
• Null Hypothesis: H0:mD=0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: mD > 0
– 2-Sided: HA: mD  0
• Test Statistic:
tobs
d  0

sd
n
Test Concerning mD
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative (HA: mD > 0)
If tobs  ta ==> Conclude mD  0
If tobs < ta ==> Do not reject mD  0
2-sided alternative (HA: mD  0)
If tobs  ta/2 ==> Conclude mD  0
If tobs  -ta/2 ==> Conclude mD < 0
If -ta/2 < tobs < ta/2 ==> Do not reject mD  0
Confidence Interval for mD
 sd 
d  ta / 2 

 n
Example Antiperspirant Formulations
• Subjects - 20 Volunteers’ armpits (df=20-1=19)
• Treatments - Dry Powder vs Powder-in-Oil
• Measurements - Average Rating by Judges
– Higher scores imply more disagreeable odor
• Summary Statistics (Raw Data on next slide):
d  0.15 sd  0.248 n  20
Source: E. Jungermann (1974)
Example Antiperspirant Formulations
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Dry Powder
2
2.8
1.3
1.8
1.9
2.8
2
1.5
1.9
2.9
2.9
2.3
2.3
3.6
2.2
2.1
2.5
2.4
3.1
2
Powder-in-Oil Difference
1.9
0.1
2.4
0.4
1.5
-0.2
1.8
0
1.8
0.1
2.4
0.4
2.2
-0.2
1.5
0
1.7
0.2
2.8
0.1
2.7
0.2
1.5
0.8
2.5
-0.2
3.2
0.4
2.1
0.1
1.9
0.2
2.6
-0.1
2
0.4
2.9
0.2
1.9
0.1
0.15 Mean
0.248151058 Std Dev
Example Antiperspirant Formulations
H 0 : m D  0 (No difference in formulatio n effects)
H A : m D  0 (Formulati on effects differ)
TS : tobs 
RR : tobs
d
sd
0.15

0.248
20
n
 t.025  t.025  2.093
0.15

 2.70
.0555
P  value  2P(t  2.70)
sd
95% CI for m D : d  t.025
n
 0.15  2.093(.0555)  0.15  0.116  (0.034,0.266)
Evidence that scores are higher (more unpleasant) for the dry
powder (formulation 1)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s). n= number of non-zero differences
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T- , the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T=T-  T0
– 2-sided tests:Conclude HA: M1  M2 if T=min(T+ , T- )  T0
– Values of T0 are given in Table 6, pp 1093-1094 for various sample
sizes and significance levels. P-values printed by statistical
software packages.
Signed-Rank Test: Normal Approximation
• Under the null hypothesis of no difference in the
two groups :
n(n  1)
mT 
4
n(n  1)( 2n  1)
sT 
24
• A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
zobs 
T  mT
sT
T  n(n  1) / 4

n(n  1)( 2n  1) / 24
Example - Caffeine and Endurance
• Subjects: 9 well-trained cyclists
• Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2)
• Measurements: Minutes Until Exhaustion
• This is subset of larger study (we’ll see later)
• Step 1: Take absolute values of differences (eliminating 0s)
• Step 2: Rank the absolute differences (averaging ranks for ties)
• Step 3: Sum Ranks for positive and negative true differences
Source: Pasman, et al (1995)
Example - Caffeine and Endurance
Original Data
Cyclist
1
2
3
4
5
6
7
8
9
mg13
mg5
mg13-mg5
37.55
42.47
-4.92
59.30
85.15
-25.85
79.12
63.20
15.92
58.33
52.10
6.23
70.54
66.20
4.34
69.47
73.25
-3.78
46.48
44.50
1.98
66.35
57.17
9.18
36.20
35.05
1.15
Example - Caffeine and Endurance
Cyclist
1
2
3
4
5
6
7
8
9
Absolute Differences
Ranked Absolute Differences
T+ = 1+2+4+6+7+8=28
T- = 3+5+9=17
Cyclist
mg13
mg5
mg13-mg5 abs(diff)
37.55
42.47
-4.92
4.92
59.30
85.15
-25.85
25.85
79.12
63.20
15.92
15.92
58.33
52.10
6.23
6.23
70.54
66.20
4.34
4.34
69.47
73.25
-3.78
3.78
46.48
44.50
1.98
1.98
66.35
57.17
9.18
9.18
36.20
35.05
1.15
1.15
mg13
mg5
mg13-mg5 abs(diff) rank
9
36.20
35.05
1.15
1.15
7
46.48
44.50
1.98
1.98
6
69.47
73.25
-3.78
3.78
5
70.54
66.20
4.34
4.34
1
37.55
42.47
-4.92
4.92
4
58.33
52.10
6.23
6.23
8
66.35
57.17
9.18
9.18
3
79.12
63.20
15.92
15.92
2
59.30
85.15
-25.85
25.85
1
2
3
4
5
6
7
8
9
Example - Caffeine and Endurance
Under null hypothesis of no difference in the two groups (T=T+):
n(n  1) 9(9  1) 90
mT 


 22.5
4
4
4
n(n  1)( 2n  1)
9(9  1)(18  1)
1710
sT 


 8.44
24
24
24
T  mT 28  22.5 5.5
zobs 


 0.65
sT
8.44
8.44
P  Value : 2 P( Z | 0.65 |)  2(.2578)  .5156
There is no evidence that endurance times differ for the 2
doses (we will see later that both are higher than no dose)
SPSS Output
Ranks
N
MG5 - MG13
Negative Ranks
Pos itive Ranks
Ties
Total
a
6
3b
0c
9
Mean Rank
4.67
5.67
Sum of Ranks
28.00
17.00
a. MG5 < MG13
b. MG5 > MG13
c. MG5 = MG13
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
MG5 - MG13
-.652a
.515
a. Bas ed on positive ranks .
b. Wilcoxon Signed Ranks Tes t
Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Sample Sizes for Given Margin of Error
• Goal: Achieve a particular margin of error (E) for
estimating m1-m2 (Width of 95% CI will be 2E)
– Case 1: Independent Samples (Assumes equal variances)
E  za / 2s
1
1

 za / 2s
n1 n2
2
n
2 za2 / 2s 2
when n1  n2  n  n 
E2
– Case 2: Paired Samples
E  za / 2s d
za2 / 2s d2
1
n
n
E2
In practice, the variance will need to estimated in a pilot study or
obtained from previously conducted work.
Sample Size Calculations for Fixed Power
• Goal - Choose sample sizes to have a favorable chance of
detecting a specified difference in m1 and m2
• Step 1 - Define an important difference in means:   m
1
 m2
• Step 2 - Choose the desired power to detect the the clinically
meaningful difference (1-b, typically at least .80). For 2-sided test:
Independen t Samples : n1  n2 
Paired Samples : n 
2s
s za / 2  z b 
2
z
a /2
 zb

2
2
2
2
d

2
For 1-sided tests, replace za/2 with za
In practice, variance must be estimated, or  given in units of s
Example - Rosiglitazone for HIV-1
Lipoatrophy
•
•
•
•
•
Trts - Rosiglitazone vs Placebo
Response - Change in Limb fat mass
Clinically Meaningful Difference - =0.5s
Desired Power - 1-b = 0.80
Significance Level - a = 0.05
za / 2  1.96 z b  z.20  .84
21.96  0.84
n1  n2 
 63
2
(0.5)
2
Source: Carr, et al (2004)
Data Sources
• Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of
Orally Administered Levocabastine in Patients with Renal Insufficiency,”
Journal of Clinical Pharmacology, 33:1214-1218
• Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved
in the Maze Learning of Human Adults and Children,” Journal of Experimental
Psychology, 1:122-???
• Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and
Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638
• Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The
Effect of Different Dosages of Caffeine on Endurance Performance Time,”
International Journal of Sports Medicine, 16:225-230
• Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for
Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, PlaceboControlled Trial,” Lancet, 363:429-438