handout

```Unit 8: Inference for Two
Samples
Statistics 571: Statistical Methods
Ramón V. León
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
1
Introductory Remarks
• A majority of statistical studies whether experimental or
observational are comparative
• Simplest type of comparative study compares two
populations
• Two principal designs for comparative studies
– Using independent samples
– Using matched pairs
• Graphical methods for informal comparisons
• Formal comparisons of means and variances of normal
populations
– Confidence intervals
– Hypothesis tests
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
2
1
Independent Samples Design
Example: Compare Control Group to Treatment Group
•Children are randomly assigned to two groups. One group is
taught using an old teaching method and the other group using a
new teaching method. To avoid one possible source of
confounding both methods are taught by the same teacher.
Independent samples design:
Sample 1: x1 , x2 ,..., xn1
Possibly different numbers
Sample 2: y1 , y2 ,..., yn2
•The two samples are independent
•Independent sample design relies on random assignment to make
the two groups equal (on the average) on all attributes except for
the treatment used (treatment factor).
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
3
Matched Pairs Design
Example: Comparison of Two Methods of Teaching Spelling.
•Pretest the children and match pairs of children with
similar test scores.
•Randomly assign one child of each pair to the control group and
the other to the treatment group
•The pretest score is called a blocking factor
Pair:
Sample 1:
Sample 2:
1
x1
y1
2
x2
y2
…
n
xn
yn
The same number
The x’s are the changes in the test scores of the children taught by the old
method.
The y’s are the changes in the test scores of the children taught by the new
method.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
4
2
Pros and Cons of Each Design
• In the independent sample design the two groups may not be
quite equal on some attribute, especially if the sample sizes
are small, because randomization assures equality only on the
average. Does one group have a higher pretest score than the
other? Difference in the groups could be the result of this
difference not the treatment factor.
• The matched pair design enables more precise comparisons to
be made between treatment groups because of smaller
experimental error
• However, it can be difficult to form matched pairs, especially
when dealing with human subjects
• The two types of designs can be used in observational studies
where the two groups are not formed by randomization. Then
a cause and effect relationship cannot be established.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
5
Hospitalization Cost:
Independent Sample Design
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
6
3
Graphical Methods for Comparing
Independent Samples: Side-by-Side Box Plots
Outliers present and
variation seems
different
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
7
Side-by-Side Box Plots:
Log Costs
•No outliers for
LogeCost
•Assumption of equal
variance appears
reasonable for the two
groups in the log scale
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
8
4
Another
Graphical
Methods for
Comparing
Two
Independent
Samples
Q-Q Plot
Plot of the order
statistics ordered
pairs ( x(i ) , y( i ) )
which are the
 i 

 quantiles
 n+1 
of the respective
samples
7/14/2004
Plot suggests
that treatment
group costs are
less than
control group
costs. But is
this true?
Book discusses how to prepare this graph when the
two samples are not of the same size.
Unit 8 - Stat 571 - Ramón V. León
9
Graphical Displays of Data from
Matched Pairs
• Plot the pairs (xi, yi) in a scatter plot. Using the 45° line as
a reference, one can judge whether the two sets of values
are similar or whether one set tends to be larger than the
other
• Plots of the differences or the ratios of the pairs may prove
to be useful
• Will show a JMP plot that is very informative
• A Q-Q plot is meaningless for paired data because the
same quantiles based on the order observations do not, in
general, come from the same pair.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
10
5
Comparing Means of Two Populations:
Independent Samples Design
(Large Samples Case)
Suppose that the observations x1 , x2 ,..., xn1 and y1 , y2 ,..., yn2
are independent random samples from two populations with means µ1
and µ2 and variances σ 12 and σ 22 . Both means and variances
are assumed to be unknown. The goal is to compare µ1 and
µ2 in terms of their difference µ1 -µ2 . We assume that n1 and
n2 are large (say > 30).
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
11
Comparing Means of Two Populations:
Independent Samples Design
E ( X − Y ) = E ( X ) − E (Y ) = µ1 − µ2
Var ( X − Y ) = Var ( X ) + Var (Y ) =
σ 12
n1
+
σ 22
n2
Therefore the standarized r.v.
Z=
X − Y − ( µ1 − µ2 )
σ 12 n1 + σ 22 n2
has mean = 0 and variance = 1
If n1 and n2 are large, then Z is approximately N (0,1) by
the Central Limit Theorem though we did not assume the samples
came from normal populations. (We also use fact that the
difference of independent normal r.v.'s is also normal.)
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
12
6
Large Sample (Approximate) 100(1-α)% CI for µ1−µ2
( x − y ) − zα 2
s12 s22
s12 s22
+
≤ µ1 − µ2 ≤ ( x − y ) + zα 2
+
n1 n2
n1 n2
Note si2 has been substituted for σ i2 because samples are
large, i.e., bigger than 30.
Example 8.2: Two large calculus classes of 150 students taught by
two different professors (99% CI for difference of means)
x − y ± z.005
(15) 2 (12) 2
s12 s22
+
= (75 − 70) ± 2.576
+
= [0.960,9.040]
150
150
n1 n2
Note interval does not contain zero so there is a α =.01 level
significant difference between the two professors
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
13
Large Sample (Approximate) Test of Hypothesis
H 0 : µ1 − µ 2 = δ 0 vs. H1 : µ1 − µ 2 ≠ δ 0 (Typically δ 0 = 0)
Test statistics: z =
(x − y) − δ0
s12 n1 + s22 n2
Example 8.2: Final exam grades for large calculus classes of 150
students taught by two different professors. (Same test used on
both classes.)
(x − y) − δ0
(75 − 70) − 0
=
= 3.188
z=
2
2
s1 n1 + s2 n2
(15) 2 150 + (12) 2 150
P − value = P[ z ≥ 3.188] = 2[1 − Φ (3.188)] = .0014
Result is significant at the α = .01 level.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
14
7
Inference for Small Samples
Case 1: Variances σ 12 and σ 22 assumed equal.
Assumption of normal populations is important
since we cannot invoke the CLT
Pooled estimate of the common variance:
S2 =
(n1 − 1) S12 + (n2 − 1) S22
=
(n1 − 1) + (n2 − 1)
∑(X
i
− X ) 2 + ∑ (Yi − Y ) 2
n1 + n2 − 2
Note: S = ( S + S ) / 2 if sample sizes are equal
2
T=
2
1
2
2
X − Y − ( µ1 − µ 2 )
has t -distribution with n1 + n2 − 2 d.f.
S 1 n1 + 1 n2
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
15
Inference for Small Sample:
Confidence Intervals and Hypothesis Tests
Case 1: Variances σ 12 and σ 22 assumed equal.
Two-sided 100(1-α )% CI is given by:
x − y − tn1 + n2 − 2,α 2 s
1 1
1 1
+
≤ µ1 − µ 2 ≤ x − y + tn1 + n2 − 2,α 2 s
+
n1 n2
n1 n2
Test of Hypothesis: H 0 : µ1 − µ 2 = δ 0 vs. H1 : µ1 − µ 2 ≠ δ 0
Test statistics: t =
x − y −δ0
1 1
s
+
n1 n2
Reject H 0 if t > tn1 + n2 − 2,α 2
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
16
8
JMP: Checking Assumptions for Log Costs
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
17
Normal Plots by Group
After log
transformation,
control group
data seems
normal
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
18
9
Normal Plots by Group
After log
transformation,
treatment group
data seems
normal
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
19
Testing for Normality
Control:
Treatment:
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
20
10
Hospitalization Cost Example
•Work with log cost since then the normal assumption holds
•Assumption of equal variance is also reasonable as seen by
the side-by-side box plots of loge cost (Slide 8)
Pooled estimate of the common variance:
29(1.167) 2 + 29(0.905) 2
s=
= 1.045 with 58 d.f.
58
SE(x - y ) = 0.2698
95% CI:
[ x − y ± t58,.025 s
1 1
1
1
+
= (8.251 − 8.084) ± 2.002 × 1.045
+
= [−0.373, 0.707]
n1 n2
30 30
Contains zero
Difference in cost in logarithmic scale are not significant.
Contrast this conclusion with apparent difference seen on the Q-Q plot in Figure 8.1 (Slide 9)
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
21
Interpretation of Difference in
Means on the Log Scale
Mean (log Cost) = Median (log Cost) = log (Median Cost)
Because distribution
of log cost is
symmetric
Because the log
preserves ordering
−0.373 ≤ Mean(log CostC ) − Mean(log CostT ) ≤ 0.707
−0.373 ≤ log( Median CostC ) − log( Median CostT ) ≤ 0.707
 Median CostC 
−0.373 ≤ log 
 ≤ 0.707
 Median CostT 
Median CostC
.689 = exp(−0.373) ≤
≤ exp(0.707) = 2.028
Median CostT
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
This
Interpretation
is not in your
textbook
95%
confidence
interval for
the ratio of
median
costs
22
11
JMP Analysis of Hospitalization Cost Example
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
23
JMP Analysis of Hospitalization Cost Example
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
24
12
JMP Analysis of Hospitalization Cost Example
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
25
Means Diamonds and x-Axis Proportional
A means diamond illustrates a sample mean and 95% confidence
interval, as shown by the schematic in "Means Diamonds and XProportional Options". The line across each diamond represents the
group mean. The vertical span of each diamond represents the 95%
confidence interval for each group. Overlap marks are
drawn
above and below the group mean. For groups with
equal sample sizes, overlapping marks indicate that the two group
means are not significantly different at the 95% confidence level.
If the X-Axis Proportional display option is in effect, the horizontal
extent of each group along the x-axis (the horizontal size of the
diamond) is proportional to the sample size of each level of the x
variable. It follows that the narrower diamonds are usually the taller
ones because fewer data points yield a less precise estimate of the
group mean.
The confidence interval computation assumes that variances are equal
across observations. Therefore, the height of the confidence interval
(diamond) is proportional to the reciprocal of the square root of the
number of observations in the group.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
26
13
JMP Analysis of Hospitalization Cost
Example
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
27
Inference for Small Samples
Case 2: Variances σ 12 and σ 22 unequal.
T=
X − Y − ( µ1 − µ 2 )
does not have a Student t - distribution
S12 S 22
+
n1 n2
T is not a pivotal quantity since it depends on ratio of sigmas.
However, when n1 and n2 are large T has an approximate N(0,1) distribution
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
28
14
Inference for Small Samples
Case 2: Variances σ 12 and σ 22 unequal.
For small samples
T=
X − Y − ( µ1 − µ 2 )
has an approximately t -distribution
S12 S22
+
n1 n2
with ν =
( w1 + w2 ) 2
degrees of freedom
w12 (n1 − 1) + w22 (n2 − 1)
where w1 = ( SEM( x ) ) =
2
s12
s2
2
and w2 = ( SEM( y ) ) = 2
n1
n2
Note: d.f. are estimated from the data and are not a function of the samples sizes alone
Note: ν is not usually an integer but is rounded down to the nearest integer
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
29
Inference for Small Samples
Case 2: Variances σ 12 and σ 22 unequal.
Approximate 100(1-α )% two-sided CI for µ1 − µ 2 :
x − y − tν ,α 2
s12 s22
s12 s22
+
≤ µ1 − µ 2 ≤ x − y − tν ,α 2
+
n1 n2
n1 n2
Test statistics for H 0 : µ1 − µ 2 = δ 0 vs. H1 : µ1 − µ 2 ≠ δ 0
is t =
x − y −δ0
s12 n1 + s12 n1
Reject H 0 if t > tν ,α 2 .
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
30
15
Hospitalization Costs: Inference Using
Separate Variances
Since n1 = n2 = n we have
s12 s22  2  s12 + s22  2  2 2 s 2
+ =  ×
=  s =
n1 n2  n 
n
2
n
where s 2 is the pooled sample variance. Hence,
t=
x − y − δ0
s12 n1 + s12 n1
=
x − y − δ0
2s 2 n
=
x − y − δ0
s 1 n +1 n
= 0.619
Recall that we
are working
with log cost
That is, with equal sample sizes this t is the same as the t with pooled variance
The textbook calculates the d.f. to be ν = 54.6 = 54
Recall that assuming equal variance the d.f. were n1+ n2 - 2 = 58
Since the t distributions with 54 and 58 d.f. do not differ by much the results
not assuming equal variances do not differ by much from those gotten
assuming equal variances.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
31
Hospitalization Costs: Inference Using
Separate Variances – Conclusions
• Result with equal variances and unequal variance do not differ
by much because
– The two sample variances are not too different
– The two sample sizes are equal
• In other cases there can be considerable loss in d.f. as a result
of using separate variance estimates
• If there is any doubt about the equality of variances do not use
the pooled variance method since this method is not robust
against the violation of the assumption of equal variances.
This is particularly so when sample sizes are unequal.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
32
16
Testing for the Equality of Variances
Section 8.4 covers the classical F test for the equality of two variances and
associated confidence intervals. However, this method is not robust against
departures from normality. For example, p-values can be off by a factor of 10 if
the distributions have shorter or longer tails than the normal.
A robust alternative is Levene’s Test. His test applies the two-sample t-test to the
absolute value of the difference of each observation and the group mean
| Y1i − Y1 |, i = 1, 2, ⋅⋅⋅, n1
| Y2i − Y2 |, i = 1, 2, ⋅⋅⋅, n2
This method works well even though these absolute deviations are not
independent.
In the Brown-Forsythe test the response is the absolute value of the
difference of each observation and the group median
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
33
JMP Analysis Without Assuming Equal Variances
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
34
17
JMP 4 Analysis Without Assuming Equal Variances
Classical Test for
equality of several
variances is not
robust against
departure from
the normality
assumption
ν = 54.6 = 54
These tests can be used when there are
more than two populations
Note that (.6181)2 = 0.3820
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
35
Independent Sample Design: Sample Size Determination
Assuming Equal Variances
H 0 : µ1 − µ 2 = 0 vs. H1 : µ1 − µ 2 ≠ 0
 σ ( zα 2 + z β ) 
n = n1 = n2 = 2 

δ


2
Because we assume a known
variance this n is a slight
underestimate of sample size
Smallest difference of practical
importance that we want to detect
Hospital cost example:
Assume σ = 1 in loge scale costs (consistent with data)
Assume we want to detect change of a factor of two in the median cost
with at least .90 power. The alpha level is .05.
 median2 Cost 
Then δ = E (log Y1 ) − E (log Y2 ) log 
 = log 2 = 0.693
 median1 Cost 
σ = 1, zα 2 = z.025 = 1.960, and z β = z.10 = 1.282
2
1.0(1.960 + 1.282) 
n = 2
 = 43.75 44
.693

7/14/2004
Unit 8 - Stat 571 - Ramón V. León
36
18
Sample Size Determination With JMP
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
37
Sample Size Determination With JMP: Method 1
.693/2
= δ/2
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
38
19
Sample Size Determination With JMP
Sample size
here is 2n
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
39
Sample Size Determination With JMP:
Exploring a Range of Standard Deviations
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
40
20
Sample Size Determination With JMP
Exploring a Range of Standard Deviations
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
41
Sample Size Determination With JMP: A Simpler Way
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
42
21
Sample Size Determination With JMP: A Simpler Way
Extra Params
is only for multi-factor
designs. Leave this field
zero in simple cases.
2n
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
43
Matched Pairs Design
Example: Clinical study to compare two methods of measuring
cardiac output discussed in Example 4.11. Method A is invasive
while Method B is noninvasive. To control for difference between
persons, both methods are used on each person participating in
the study. The purpose is to compare the two methods in terms of
the mean difference.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
44
22
Cardiac Output Data
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
45
Cardiac Output JMP File
this file in the course
the Textbook Data Sets
Methods used for
matched pair are the
one-sample methods
to the A-B data.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
46
23
Cardiac Output JMP Analysis
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
47
Testing Equality of Means
Distribution JMP platform:
Example 8.6 in
the textbook has
a slight rounding
error in the
calculation of
the mean
difference
Test that the mean
difference is zero
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
48
24
Confidence Interval for Difference of Means
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
49
JMP’s Matched Pairs Platform: Another Way
to Compare Matched Pairs
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
50
25
JMP’s Matched Pairs Platform
Significant difference between the methods
because significance bands do not contain 0.
X axis measures how different the pairs are. The more
scatter in this axis the more likely the results are to
generalize to other persons outside the sample.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
51
Statistical Justification of Matched Pairs Design
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
52
26
Sample Size Determination
 ( zα + z β )σ D 
n=

δ


2
 ( zα 2 + z β )σ D 
n=

δ


(One-Sided Test)
2
(Two-Sided Test)
•One needs a planning value for σD
•This formulas come from the one-sample formulas applied
to the differences
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
53
Comparing Variances of Two Populations
•Application arises when comparing instrument precision or
uniformities of products. Also when checking assumptions.
•The method discussed in the book (F test) is applicable only
under the assumption of normality of the data. The F test is
highly sensitive to even modest departures from normality.
• In case of nonnormal data there are nonparametric and other
robust methods for comparing data dispersion. We recall
some supported by JMP.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
54
27
Example 8.7 (Hospitalization Costs:
Test of Equality of Variances)
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
55
Comparing Variances with JMP
These tests are alternatives to the F test used in
Example 8.7 on Page 288. These tests can be used
with more than two samples. Levene’s test is
relatively robust against violations of the normality
assumption . See JMP’s help for details.
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
56
28
Comparing Variances of Two Populations: F Test
Independent sample design:
Sample 1: x1 , x2 ,..., xn1 is a random sample from N ( µ1 , σ 12 )
Sample 2: y1 , y2 ,..., yn1 is a random sample from N ( µ1 , σ 12 )
F=
S12 σ 12
has an F distribution n1 − 1 and n2 − 1 d.f. respectively
S 22 σ 22
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
57
An Important Industrial Application:
Example 8.8
Do the two labs have equal measurement precision?
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
58
29
Example 8.8: Calculation Details
Used the fact that
d.f. = (12 - 1) + (12 - 1) + (12 - 1) = 33
d.f. = (6 - 1) + (6 - 1) + (6 - 1) = 15
7/14/2004
Unit 8 - Stat 571 - Ramón V. León
1
f n ,m,1−α
= f m,n ,α
59
30
```