Chapter 6 Inferences Regarding Locations of Two Distributions Comparing 2 Means - Independent Samples • Goal: Compare responses between 2 groups (populations, treatments, conditions) • Observed individuals from the 2 groups are samples from distinct populations (identified by (m1,s1) and (m2,s2)) • Measurements across groups are independent (different individuals in the 2 groups) • Summary statistics obtained from the 2 groups: Group 1 : Mean : y1 Std. Dev. : s1 Sample Size : n1 Group 2 : Mean : y 2 Std. Dev. : s2 Sample Size : n2 Sampling Distribution of Y1 Y 2 • Underlying distributions normal sampling distribution is normal • Underlying distributions nonnormal, but large sample sizes sampling distribution approximately normal • Mean, variance, standard error (Std. Dev. of estimator): V Y s E Y 1 Y 2 mY 1 Y 2 m1 m 2 sY 1 Y 2 1 Y 2 s 12 n1 2 Y 1 Y 2 s 22 n2 s 12 n1 s 22 n2 Small-Sample Test for m1m2 Normal Populations • Case 1: Common Variances (s12 = s22 = s2) • Null Hypothesis: H 0 : m1 m 2 0 • Alternative Hypotheses: – 1-Sided: H A : m1 m 2 0 – 2-Sided: H A : m1 m 2 0 • Test Statistic:(where Sp2 is a “pooled” estimate of s2) t obs ( y1 y 2 ) 0 sp 1 1 n1 n2 sp ( n1 1) s12 ( n2 1) s22 n1 n2 2 Small-Sample Test for m1m2 Normal Populations • Decision Rule: (Based on t-distribution with n=n1+n2-2 df) – 1-sided alternative • If tobs ta,n ==> Conclude m1m2 0 • If tobs < ta,n ==> Do not reject m1m2 0 – 2-sided alternative • If tobs ta/2 ,n ==> Conclude m1m2 0 • If tobs -ta/2,n ==> Conclude m1m2 < 0 • If -ta/2,n < tobs < ta/2,n ==> Do not reject m1m2 0 Small-Sample Test for m1m2 Normal Populations • Observed Significance Level (P-Value) • Special Tables Needed, Printed by Statistical Software Packages – 1-sided alternative • P=P(t tobs) (From the tn distribution) – 2-sided alternative • P=2P( t |tobs| ) (From the tn distribution) • If P-Value a, then reject the null hypothesis Small-Sample (1-a100% Confidence Interval for m1m2 Normal Populations • Confidence Coefficient (1-a) refers to the proportion of times this rule would provide an interval that contains the true parameter value m1m2 if it were applied over all possible samples • Rule: y y t 1 2 s a /2 p 1 1 n1 n2 • Interpretation (at the a significance level): – If interval contains 0, do not reject H0: m1 = m2 – If interval is strictly positive, conclude that m1 > m2 – If interval is strictly negative, conclude that m1 < m2 t-test when Variances are Unequal • Case 2: Population Variances not assumed to be equal (s12s22) • Approximate degrees of freedom – Calculated from a function of sample variances and sample sizes (see formula below) - Satterthwaite’s approximation – Smaller of n1-1 and n2-1 • Estimated standard error and test statistic for testing H0: m1=m2: Estimated standard error : SE Y 1 Y 2 Test Statistic : t obs y1 y 2 SE y1 y 2 Satterthwa ite' s df : n s12 s22 n1 n2 y1 y 2 s12 s22 n1 n2 s12 s22 n n 2 1 2 2 s 2 2 s 2 1 2n n 1 2 n 1 n 1 1 2 Example - Maze Learning (Adults/Children) • Groups: Adults (n1=14) / Children (n2=10) • Outcome: Average # of Errors in Maze Learning Task • Raw Data on next slide Mean Std Dev Sample Size Adults (i=1) 13.28 4.47 14 Children (i=2) 18.28 9.93 10 • Conduct a 2-sided test of whether true mean scores differ • Construct a 95% Confidence Interval for true difference Source: Gould and Perrin (1916) Example - Maze Learning (Adults/Children) Name H W Mac McG L R Hv Hy F Wd Rh D Hg Hp Hl McS Lin B N T J Hz Lev K Group Trials 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 41 25 33 31 41 48 24 32 46 47 35 69 27 27 42 89 38 20 49 40 50 40 54 58 Errors Average 728 17.76 333 13.32 453 13.73 528 17.03 335 8.17 553 11.52 217 9.04 711 22.22 839 18.24 473 10.06 532 15.20 538 7.80 213 7.89 375 13.89 254 6.05 1559 17.52 1089 28.66 254 12.70 599 12.22 520 13.00 828 16.56 516 12.90 2171 40.20 1331 22.95 Group n 1 2 Mean Std Dev 14 13.28 4.47 10 18.28 9.93 Example - Maze Learning Case 1 - Equal Variances H0: m1m2 0 HA: m1m2 0 (a = 0.05) (14 1)( 4.47) 2 (10 1)(9.93) 2 sp 52.15 7.22 14 10 2 13.28 18.28 5.00 TS : tobs 1.67 2.99 1 1 7.22 14 10 RR : | tobs | t.025, 22 2.074 P value : 2 P (T | 1.67 |) .1091 (From EXCEL) 95%CI : 5.00 2.074( 2.99) 5.00 6.20 ( 11.2,1.2) No significant difference between 2 age groups Example - Maze Learning Case 2 - Unequal Variances H0: m1m2 0 S12 ( 4.47) 2 1.43 n1 14 HA: m1m2 0 (a = 0.05) S 22 (9.93) 2 9.86 n2 10 1.43 9.862 127.46 n 11.63 2 2 10.96 (1.43) (9.86) 9 13 13.28 18.28 5.00 TS : t obs 1.49 2 2 3.36 ( 4.47) (9.93) 14 10 RR : | t obs | t.025,11.63 2.19 * 95%CI : 5.00 2.19(3.36) 5.00 7.36 ( 12.36,2.36) No significant difference between 2 age groups Note: Alternative would be to use 9 df (10-1) SPSS Output Group Statistics AVE_ERR GROUP Adult Child N Mean 13.2761 18.2759 14 10 Std. Error Mean 1.19408 3.14102 Std. Deviation 4.46784 9.93279 Independent Samples Test Levene's Test for Equality of Variances F AVE_ERR Equal variances ass umed Equal variances not as sumed 4.420 Sig. .047 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper -1.672 22 .109 -4.9998 2.99017 -11.20101 1.20145 -1.488 11.621 .163 -4.9998 3.36034 -12.34787 2.34831 (1a)100% Confidence Interval for m1-m2 Case 1 s s 2 1 2 2 : y 1 y 2 ta / 2 s p 1 1 n1 n2 df n1 n2 2 Maze Data (df 22) : 95%CI : 5.00 2.074(2.99) 5.00 6.20 (11.2,1.2) y Case 2 s 12 s 22 : 1 y 2 ta / 2 s12 s22 n1 n2 df Satterthwa ite or smaller of n1 1, n2 1 Maze Data (df 11.63 or could use 9) : 95%CI : 5.00 2.19(3.36) 5.00 7.36 (12.36,2.36) Small Sample Test to Compare Two Medians - Nonnormal Populations • Two Independent Samples (Parallel Groups) • Procedure (Wilcoxon Rank-Sum Test): – Null hypothesis: Population Medians are equal H0: M1 = M2 – Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks. – Obtain the rank sum for group with smallest sample size (T ) – 1-sided tests:Conclude HA: M1 > M2 if T > TU – Conclude: HA: M1 < M2 if T < TL – 2-sided tests: Conclude HA: M1 M2 if T > TU or T < TL – Values of TL and TU are given in Table 5, p. 1092 for various sample sizes and significance levels. – This test is mathematically equivalent to Mann-Whitney U-test Example - Levocabostine in Renal Patients • 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6) • Outcome: Levocabastine AUC (1 Outlier/Group) Non-Dialysis 857 (12) 567 (9) 626 (10) 532 (8) 444 (5) 357 (1) T1 = 45 Hemodialysis 527 (7) 740 (11) 392 (2.5) 514 (6) 433 (4) 392 (2.5) T2 = 33 • 2-sided Test a = 0.05): TL=26, TU = 52, T=45 (Group 1) • Conclude Medians differ (M1<M2) if T < 26 • Conclude Medians differ (M1>M2) if T > 52 • Neither criteria are met, do not conclude medians differ Source: Zagornik, et al (1993) Computer Output - SPSS Ranks AUC GROUP Non-Dialysis Hemodialys is Total N 6 6 12 Mean Rank 7.50 5.50 Sum of Ranks 45.00 33.00 Test Statisticsb Mann-Whitney U Wilcoxon W Z Asym p. Si g. (2-tail ed) Exact Si g. [2*(1-tai led Sig.)] AUC 12.000 33.000 -.962 .336 .394 a a. Not corrected for ti es . b. Grouping Vari able: GROUP Note that SPSS uses rank sum for Group 2 as test statistic Rank-Sum Test: Normal Approximation • Under the null hypothesis of no difference in the two groups (let T be rank sum for group 1): n1 ( N 1) mT 2 n1n2 ( N 1) sT 12 N n1 n2 • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution zobs T mT sT T n1 ( N 1) / 2 n1n2 ( N 1) / 12 Note: When there are many ties in ranks, a more complex formula for sT is often used, see p. 321 of Longnecker and Ott. Example - Maze Learning Adults = Group 1 Hl D Hg L Hv Wd R N B Hz T W Mac Hp Rh J McG McS H F Hy K Lin Lev 2 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 1 2 1 1 1 2 2 2 42 69 27 41 24 47 48 49 20 40 40 25 33 27 35 50 31 89 41 46 32 58 38 54 254 538 213 335 217 473 553 599 254 516 520 333 453 375 532 828 528 1559 728 839 711 1331 1089 2171 6.05 7.80 7.89 8.17 9.04 10.06 11.52 12.22 12.70 12.90 13.00 13.32 13.73 13.89 15.20 16.56 17.03 17.52 17.76 18.24 22.22 22.95 28.66 40.20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 2 3 4 5 6 7 0 0 0 0 12 13 14 15 0 17 0 19 20 21 0 0 0 158 T=T1 1 0 0 0 0 0 0 8 9 10 11 0 0 0 0 16 0 18 0 0 0 22 23 24 142 T2 Example - Maze Learning H 0 : M1 M 2 Group 1 : Adults H A : M 1 M 2 a 0.05 n1 14 n2 10 N n1 n2 24 n1 ( N 1) 14(25) T 158 mT 175 2 2 n1n2 ( N 1) 14(10)( 25) sT 17.08 12 12 158 175 zobs 0.9954 17.08 RR : | zobs | za / 2 1.96 2 sided P - value : 2 P( Z | .9954 |) 2(.16) .32 Computer Output - SPSS Ranks AVE_ERR GROUP Adult Child Total N 14 10 24 Mean Rank 11.29 14.20 Sum of Ranks 158.00 142.00 Test Statisticsb AVE_ERR Mann-Whitney U 53.000 Wilcoxon W 158.000 Z -.995 Asymp. Sig. (2-tailed) .320 a Exact Sig. [2*(1-tailed .341 Sig.)] a. Not corrected for ties . b. Grouping Variable: GROUP Inference Based on Paired Samples (Crossover Designs) • Setting: Each treatment is applied to each subject or pair (preferably in random order) • Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i • Parameter: mD - Population mean difference • Sample Statistics: d n d i 1 i n d d 2 n s 2 d i 1 i n 1 sd sd2 Test Concerning mD • Null Hypothesis: H0:mD=0 (almost always 0) • Alternative Hypotheses: – 1-Sided: HA: mD > 0 – 2-Sided: HA: mD 0 • Test Statistic: tobs d 0 sd n Test Concerning mD Decision Rule: (Based on t-distribution with n=n-1 df) 1-sided alternative (HA: mD > 0) If tobs ta ==> Conclude mD 0 If tobs < ta ==> Do not reject mD 0 2-sided alternative (HA: mD 0) If tobs ta/2 ==> Conclude mD 0 If tobs -ta/2 ==> Conclude mD < 0 If -ta/2 < tobs < ta/2 ==> Do not reject mD 0 Confidence Interval for mD sd d ta / 2 n Example Antiperspirant Formulations • Subjects - 20 Volunteers’ armpits (df=20-1=19) • Treatments - Dry Powder vs Powder-in-Oil • Measurements - Average Rating by Judges – Higher scores imply more disagreeable odor • Summary Statistics (Raw Data on next slide): d 0.15 sd 0.248 n 20 Source: E. Jungermann (1974) Example Antiperspirant Formulations Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Dry Powder 2 2.8 1.3 1.8 1.9 2.8 2 1.5 1.9 2.9 2.9 2.3 2.3 3.6 2.2 2.1 2.5 2.4 3.1 2 Powder-in-Oil Difference 1.9 0.1 2.4 0.4 1.5 -0.2 1.8 0 1.8 0.1 2.4 0.4 2.2 -0.2 1.5 0 1.7 0.2 2.8 0.1 2.7 0.2 1.5 0.8 2.5 -0.2 3.2 0.4 2.1 0.1 1.9 0.2 2.6 -0.1 2 0.4 2.9 0.2 1.9 0.1 0.15 Mean 0.248151058 Std Dev Example Antiperspirant Formulations H 0 : m D 0 (No difference in formulatio n effects) H A : m D 0 (Formulati on effects differ) TS : tobs RR : tobs d sd 0.15 0.248 20 n t.025 t.025 2.093 0.15 2.70 .0555 P value 2P(t 2.70) sd 95% CI for m D : d t.025 n 0.15 2.093(.0555) 0.15 0.116 (0.034,0.266) Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1) Small-Sample Test For Nonnormal Data • Paired Samples (Crossover Design) • Procedure (Wilcoxon Signed-Rank Test) – Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s). n= number of non-zero differences – Rank the observations by |di| (smallest=1), averaging ranks for ties – Compute T+ and T- , the rank sums for the positive and negative differences, respectively – 1-sided tests:Conclude HA: M1 > M2 if T=T- T0 – 2-sided tests:Conclude HA: M1 M2 if T=min(T+ , T- ) T0 – Values of T0 are given in Table 6, pp 1093-1094 for various sample sizes and significance levels. P-values printed by statistical software packages. Signed-Rank Test: Normal Approximation • Under the null hypothesis of no difference in the two groups : n(n 1) mT 4 n(n 1)( 2n 1) sT 24 • A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution zobs T mT sT T n(n 1) / 4 n(n 1)( 2n 1) / 24 Example - Caffeine and Endurance • Subjects: 9 well-trained cyclists • Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2) • Measurements: Minutes Until Exhaustion • This is subset of larger study (we’ll see later) • Step 1: Take absolute values of differences (eliminating 0s) • Step 2: Rank the absolute differences (averaging ranks for ties) • Step 3: Sum Ranks for positive and negative true differences Source: Pasman, et al (1995) Example - Caffeine and Endurance Original Data Cyclist 1 2 3 4 5 6 7 8 9 mg13 mg5 mg13-mg5 37.55 42.47 -4.92 59.30 85.15 -25.85 79.12 63.20 15.92 58.33 52.10 6.23 70.54 66.20 4.34 69.47 73.25 -3.78 46.48 44.50 1.98 66.35 57.17 9.18 36.20 35.05 1.15 Example - Caffeine and Endurance Cyclist 1 2 3 4 5 6 7 8 9 Absolute Differences Ranked Absolute Differences T+ = 1+2+4+6+7+8=28 T- = 3+5+9=17 Cyclist mg13 mg5 mg13-mg5 abs(diff) 37.55 42.47 -4.92 4.92 59.30 85.15 -25.85 25.85 79.12 63.20 15.92 15.92 58.33 52.10 6.23 6.23 70.54 66.20 4.34 4.34 69.47 73.25 -3.78 3.78 46.48 44.50 1.98 1.98 66.35 57.17 9.18 9.18 36.20 35.05 1.15 1.15 mg13 mg5 mg13-mg5 abs(diff) rank 9 36.20 35.05 1.15 1.15 7 46.48 44.50 1.98 1.98 6 69.47 73.25 -3.78 3.78 5 70.54 66.20 4.34 4.34 1 37.55 42.47 -4.92 4.92 4 58.33 52.10 6.23 6.23 8 66.35 57.17 9.18 9.18 3 79.12 63.20 15.92 15.92 2 59.30 85.15 -25.85 25.85 1 2 3 4 5 6 7 8 9 Example - Caffeine and Endurance Under null hypothesis of no difference in the two groups (T=T+): n(n 1) 9(9 1) 90 mT 22.5 4 4 4 n(n 1)( 2n 1) 9(9 1)(18 1) 1710 sT 8.44 24 24 24 T mT 28 22.5 5.5 zobs 0.65 sT 8.44 8.44 P Value : 2 P( Z | 0.65 |) 2(.2578) .5156 There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose) SPSS Output Ranks N MG5 - MG13 Negative Ranks Pos itive Ranks Ties Total a 6 3b 0c 9 Mean Rank 4.67 5.67 Sum of Ranks 28.00 17.00 a. MG5 < MG13 b. MG5 > MG13 c. MG5 = MG13 Test Statisticsb Z Asymp. Sig. (2-tailed) MG5 - MG13 -.652a .515 a. Bas ed on positive ranks . b. Wilcoxon Signed Ranks Tes t Note that SPSS is taking MG5-MG13, while we used MG13-MG5 Sample Sizes for Given Margin of Error • Goal: Achieve a particular margin of error (E) for estimating m1-m2 (Width of 95% CI will be 2E) – Case 1: Independent Samples (Assumes equal variances) E za / 2s 1 1 za / 2s n1 n2 2 n 2 za2 / 2s 2 when n1 n2 n n E2 – Case 2: Paired Samples E za / 2s d za2 / 2s d2 1 n n E2 In practice, the variance will need to estimated in a pilot study or obtained from previously conducted work. Sample Size Calculations for Fixed Power • Goal - Choose sample sizes to have a favorable chance of detecting a specified difference in m1 and m2 • Step 1 - Define an important difference in means: m 1 m2 • Step 2 - Choose the desired power to detect the the clinically meaningful difference (1-b, typically at least .80). For 2-sided test: Independen t Samples : n1 n2 Paired Samples : n 2s s za / 2 z b 2 z a /2 zb 2 2 2 2 d 2 For 1-sided tests, replace za/2 with za In practice, variance must be estimated, or given in units of s Example - Rosiglitazone for HIV-1 Lipoatrophy • • • • • Trts - Rosiglitazone vs Placebo Response - Change in Limb fat mass Clinically Meaningful Difference - =0.5s Desired Power - 1-b = 0.80 Significance Level - a = 0.05 za / 2 1.96 z b z.20 .84 21.96 0.84 n1 n2 63 2 (0.5) 2 Source: Carr, et al (2004) Data Sources • Zagonik, J., M.L. Huang, A. Van Peer, et al. (1993). “Pharmacokinetics of Orally Administered Levocabastine in Patients with Renal Insufficiency,” Journal of Clinical Pharmacology, 33:1214-1218 • Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-??? • Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25:621-638 • Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16:225-230 • Carr, A., C. Workman, D. Crey, et al, (2004). “No Effect of Rosiglitazone for Treatment of HIV-1 Lipoatrophy: Randomised, Double-Blind, PlaceboControlled Trial,” Lancet, 363:429-438
© Copyright 2026 Paperzz