252meanx1 10/6/05 (Open this document in 'Outline' view!) Re-edited to replace with D . D. COMPARISON OF TWO SAMPLES Examples for comparison of means. H0 : 1 2 H1 : 1 2 or more generally H 0 : D D0 H 1 : D D0 . where D 1 2 and d x1 x 2 . General formulas: Degrees of freedom DF n1 n 2 2 a. Confidence Interval: D d t 2 s d . b. Test Ratio: t d D0 . sd c. Critical Value: d cv D0 t 2 s d . The difference between the cases comes down to the choice of t and the formula for s d . Let us now consider the first four cases. 1. Two Means, Two Independent Samples, Large Samples. If the total number of degrees of freedom is large (or the two samples come from normally distributed populations with known variances 12 and 22 ), then replace t with z and use s d s12 s 22 . n1 n 2 First Example: We wish to test the earnings of retail clerks in New York x1 and Philadelphia x 2 for Equality. H 0 : 1 2 H 0 : 1 2 0 H0 : D 0 or or .05 H 1 : 1 2 H 1 : 1 2 0 H1 : D 0 Data: x1 300 x 2 330 s12 400 s 22 360 n1 169 n 2 144 Since DF n1 n2 2 169 144 2 311 are well over 100, we are justified in using a large sample method. d x1 x 2 300 330 30 sd s12 s 22 400 360 2.3669 2.5000 4.8669 2.2061 n1 n 2 169 144 Solutions: Use z z.025 1.960 in place of t . 2 a. Confidence Interval: D d t 2 s d -30 1.960 2.2061 30 4.32 . Make a diagram showing a Normal curve centered at -30 and a Confidence interval bounded by 34.32 and -25.68. Since D0 0 is not between them, reject H 0 . b. Test Ratio: t d D0 - 30 - 0 13 .59 . Make a diagram showing a Normal curve sd 2.2061 centered at zero and an 'accept' region bounded by z.025 1.960 and z .025 1.960 . Since -13.59 is not between them, reject H 0 . Since this is actually a value of z , a p- value would be easy. pval 2P d 30 2Pz 13.59 .5 .5 0. c. Critical Value: d cv D0 t 2 s d 0 1.960 2.2061 4.32 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -4.32 and 4.32. Since d 30 is not between them, reject H 0 . Second Example: (Whitmore, Netter, Wasserman) We wish to learn if battery type B x 2 has a longer service life (in months) than Battery type A x1 . Note that this statement becomes an alternate hypothesis because it does not contain an equality. H 0 : 1 2 H 0 : 1 2 0 H0 : D 0 or or .05 H 1 : 1 2 H 1 : 1 2 0 H1 : D 0 Data: x1 18 .4 x 2 21 .3 s1 3.3 s 2 4 .2 n1 121 n 2 36 Since DF n1 n 2 2 121 36 2 155 are well over 100, we are justified in using a large sample method. d x1 x 2 18.4 21.3 2.9 sd s12 s 22 n1 n 2 3.32 4.22 121 36 0.09 0.49 0.58 0.7616 Solutions: Use z z.05 1.645 in place of t . This is a 1-sided test. a. Confidence Interval: Given H 1 : D 0 , use D d t s d -2.9 1.645 0.7616 2.9 1.253 1.647 . Make a diagram showing a Normal curve centered at -2.9 and a confidence interval, D 1.647 , represented by shading the area below -1.647. the null hypothesis H 0 : D 0 is represented shading the area above zero. Since D0 0 is not in the confidence interval, reject H 0 . b. Test Ratio: t d D0 - 2.9 - 0 3.808 . Make a diagram showing a Normal curve sd 0.7616 centered at zero and an 'accept' region above z.05 1.645 . Since -3.808 is below 1,645, reject H 0 . c. Critical Value: Given H 1 : D 0 , we want a critical value below zero. Use d cv D0 t s d 0 - 1.645 0.7616 1.253 . Make a diagram showing a Normal curve centered at zero and a reject' region below -1.253. Since d 2.9 is below -1.253, reject H 0 . 2. Two Means, Two Independent Samples, Populations Normally Distributed, Population Variances Assumed Equal. n 1s12 n2 1s 22 . 1 1 , where s p2 1 t t n1 n2 2 and s d s p2 n1 n 2 2 n1 n 2 Example: (Whitmore, Netter, Wasserman) Each of two groups of ten men are assigned a razor blade and asked how many shaves they got from a package. We wish to find out if there is a significant difference in the durability of the two blades. Type A will be x1 . type B will be x 2 . H 0 : 1 2 H 1 : 1 2 Data: x1 46 .9 or H 0 : 1 2 0 H 1 : 1 2 0 or H0 : D 0 H1 : D 0 .01 x 2 75 .6 s1 14 .0 s 2 21 .0 Since DF n1 n2 2 10 10 2 18 are well below 100, we need a small n1 10 n 2 10 sample method. Because of the similarity of the two samples we assume that 12 22 . d x1 x2 46.9 75.6 28.7 . Because of our assumption about variances, we use a pooled variance 2 2 2 2 1 1 2 n1 1s1 n 2 1s 2 10 114 .0 10 121 .0 sp 318 .50 s d s p2 n1 n 2 2 10 10 2 n1 n 2 1 1 318 .50 318 .50 0.2 63 .7 7.9812 . 10 10 Solutions: Use t n1 n2 2 18 t .005 2.878 . 2 a. Confidence Interval: D d t 2 s d -28.7 2.878 7.9812 28 .7 22 .96 . Make a diagram showing a Normal curve centered at -28.7 and a Confidence interval bounded by -51.66 and -5.74. Since D0 0 is not between them, reject H 0 . d D0 - 28.7 - 0 3.596 . Make a diagram showing a Normal sd 7.9812 curve centered at zero and an 'accept' region bounded by t 18 2.878 and b. Test Ratio: t .005 18 t .005 2.878 . Since -3.596 is not between them, reject H 0 . If we want a p-value, we 18 need pval 2P d 28.70 2Pt 3.596. Since 3.596 lies between t .005 2.878 18 3.611 , we double the p-value to .002 pval .01 . and t .001 c. Critical Value: d cv D0 t 2 s d 0 2.878 7.9812 22 .96 . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -22.96 and 22.96. Since d 28.7 is not between them, reject H 0 . (3. Two Means, Two independent Samples, Populations Normally Distributed, Population Variances not Assumed Equal. This time the degrees of freedom for t must be calculated by the Satterthwaite approximation. The 2 s2 s2 1 2 n1 n 2 formula is df , but the formula for the standard deviation is the same as in method 1, 2 2 s2 s 22 1 n2 n1 n2 1 n1 1 sd s12 s 22 . n1 n 2 Example: We wish to use a 2-sided 95% confidence interval to test for a significant difference between the time it takes an employee to type a page on word processor A x1 and word processor B x 2 . 16 pages are typed on each processor. H 0 : 1 2 H 0 : 1 2 0 H0 : D 0 or or .05 H 1 : 1 2 H 1 : 1 2 0 H1 : D 0 Data: x1 8.20 x 2 7.10 s 22 4.20 Since DF n1 n 2 2 16 16 2 30 are well below 100, we would be on s12 4.10 n1 16 n 2 16 very, very shaky ground if we use a large sample method. Even with a small sample method we probably need an assumption of Normality. If we do not want to assume 12 22 , we need the Satterthwaite method. d x1 x2 8.20 7.10 1.10 . To find the standard error of d and the number of degrees of freedom, do the following calculations: s12 4.1 s 2 4.2 s2 s2 0.25625 , 2 0.26250 , so 1 2 0.25625 0.26250 0.51875 , n1 16 n2 16 n1 n2 DF s12 s22 n 1 n2 2 0.51875 2 0.25625 2 0.26250 2 0.26910 29 .9 . 0.00438 0.00459 I take the conservative s12 s22 15 15 n 1 n2 n1 1 n2 1 approach of rounding this down to 29 degrees of freedom. Notice how little difference there is between this and DF n1 n 2 2 16 16 2 30 . This is because of the near-equality of the sample variances. sd 2 2 s12 s22 0.51875 0.720 . This is almost the same result as we would have gotten if we had n1 n2 assumed that 12 22 , again because of the near-equality of the sample variances. Solutions: Use t 29 2.045 for a 2-sided confidence interval. D d t s 1.10 2.045 0.720 .025 2 d 1.10 1.47 . Make a diagram showing a Normal curve centered at 1.10 and a Confidence interval bounded by -0.37 and 2.57. Since D0 0 is between them, do not reject H 0 . A more complete version of this problem appears in Problem D3. Look here for a computer example of this method. at document 252meanx3 4. Two Means, Paired Samples (If samples are small, populations should be normally distributed). 1 If n is the number of pairs of data, then t t n1 and s d n d 2 n d n 1 2 . In this case d1 x11 x 21 , d 2 x 21 x 22 , etc. Example: We have been told that income in a region has risen by $6.00/week over the last year. We interviewed 100 families last year and found an average weekly income x1 of $200. We reinterview the same families and find out their present incomes so that we can compute how much they have risen. We find that the new average income x 2 is $204. From the data we compute a standard deviation of the income change of $6.00. We wish to test to see if the $6 rise is believable. H 0 : 2 1 6 H 0 : 1 2 6 H 0 : D 6 or or .05 H 1 : 2 1 6 H 1 : 1 2 6 H 1 : D 6 Data: x1 200 , x 2 204 , s d 6 , n=100. Though we may have 200 pieces of data, we have 100 pairs, and the actual numbers we use are the 100 differences in income. DF n 1 100 1 99 and we ought to use s 6 0.6 t. d x1 x 2 200 204 4 . s d d n 100 Solutions: Use t n1 t 99 1.984 . 2 .025 a. Confidence Interval: D d t 2 s d -4 1.984 0.6 4 1.19 . Make a diagram showing a Normal curve centered at -4 and a Confidence interval bounded by -5.19 and 2.81. Since D0 6 is not between them, reject H 0 . d D0 - 4 - - 6 3.333 . Make a diagram showing a Normal curve sd 0.6 centered at zero and an 'accept' region bounded by t 99 1.984 and t 99 1.984 . b. Test Ratio: t .025 .025 Since 3.333 is not between them, reject H 0 . If we want a p-value, we need 99 3.175 , we double the pval 2P d 4 2Pt 3.333. Since 3.333 lies above t .001 implied p-value to pval .002 . c. Critical Value: d cv D0 t 2 s d -6 1.984 0.61 6 1.19 . Make a diagram showing a Normal curve centered at -6 and an 'accept' region bounded by -7.12 and 4.81. Since d 4 is not between them, reject H 0 . Note: We might have been better off in this problem defining D as 2 1 . Then our hypotheses would read H 0 : D 6 and H 1 : D 6 . We would say d x2 x1 204 200 4 and our critical value, for example, would be d cv 6 1.9840.61 6 1.19. The conclusion would not change. © 2002 Roger Even Bove
© Copyright 2026 Paperzz