October 30

ST 380
Probability and Statistics for the Physical Sciences
Comparing Two Samples
We are often interested in comparing measurements made under two
different sets of conditions.
Examples
Water quality measurements below an effluent outfall, compared
with upstream measurements.
Strengths of concrete beams manufactured with a plasticizer
versus without.
Anxiety levels in a group of heart attack survivors who are
visited by a volunteer with a trained dog, versus another group
who are visited only by a volunteer.
1 / 22
Two Sample Tests
Introduction
ST 380
Probability and Statistics for the Physical Sciences
In each case, we have a sample of measurements X1 , X2 , . . . , Xm
made under one set of conditions, and another sample Y1 , Y2 , . . . , Yn
made under a different set of conditions.
Note that the samples are often not of the same size.
If the populations have means µ1 and µ2 respectively, we usually
want to make inferences about the difference µ1 − µ2 .
We shall want to use the three basic modes of inference:
A point estimator of µ1 − µ2 .
An interval estimator of µ1 − µ2 .
A test of the null hypothesis H0 : µ1 = µ2 .
2 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Point Estimator
Since we have point estimators X̄ and Ȳ of µ1 and µ2 respectively,
the natural point estimator of µ1 − µ2 is X̄ − Ȳ .
Since X̄ and Ȳ are unbiased estimators of µ1 and µ2 respectively,
X̄ − Ȳ is an unbiased estimator of µ1 − µ2 :
E (X̄ − Ȳ ) = E (X̄ ) − E (Ȳ ) = µ1 − µ2 .
In most cases, we assume that the two samples are independent, and
then
r
q
σ12 σ22
σX̄ −Ȳ = σX̄2 + σȲ2 =
+
m
n
3 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
As always, we can estimate σ12 by
m
S12
1 X
=
(Xi − Ȳ )2
m − 1 i=1
S22
1 X
=
(Yi − Ȳ )2 .
n − 1 i=1
and σ22 by
4 / 22
n
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
But we often assume that σ12 = σ22 = σ 2 , and estimate the common
variance σ 2 by the “pooled” estimate
Pm
Pn
2
2
2
i=1 (Xi − X̄ ) +
i=1 (Yi − Ȳ )
Sp =
m−1+n−1
(m − 1)S12 + (n − 1)S22
=
.
m+n−2
We then estimate σX̄ −Ȳ by
r
r
Sp2 Sp2
1
1
+
= Sp
+ .
m
n
m n
5 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Confidence Interval
As you might expect, a 100(1 − α)% confidence interval for µ1 − µ2
is of the form
X̄ − Ȳ ± (critical value)α × (standard error)
We have three ways of estimating the standard error, and each needs
a different critical value.
6 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
The simplest case is when the variances are known, or the sample
sizes are large enough that they are esssentially known.
Then the standard error is
r
σ12 σ22
+
m
n
r
or
s12 s22
+
m
n
and the critical value zα/2 comes from the normal distribution.
7 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Next, suppose that the variances are unknown but assumed to be
equal.
Then the standard error is
r
sp
1
1
+
m n
and the critical value tα/2,m+n−2 comes from the t-distribution with
(m + n − 2) degrees of freedom.
8 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
The final case, when the variances are unknown and cannot be
assumed to be equal, is the most complicated.
Then the standard error is
r
s12 s22
+
m
n
and the critical value tα/2,ν comes from the t-distribution with ν
degrees of freedom, where ν must be calculated from m, n, s12 ,
and s22 .
This is known as Welch’s procedure.
9 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Hypothesis Test
When comparing two samples, the null hypothesis is usually
H0 : µ 1 = µ 2 .
It could, more generally, be H0 : µ1 − µ2 = δ for some specified δ; the
test is only slightly more complicated.
The test is based on
T =
X̄ − Ȳ
standard error
and the test statistic is either T or |T |, depending on whether the
test is one-sided or two-sided.
10 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
The calculation of the P-value, or the determination of the critical
value for rejection of H0 , uses:
the normal distribution;
the t-distribution with (m + n − 2) degrees of freedom;
the t-distribution with Welch’s ν degrees of freedom;
depending on the form of the standard error, as in the case of the
confidence interval.
11 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Example 9.7: strength of pipe liner
Liners used to reinforce a pipeline can be manufactured with or
without a certain fusion process. The question is whether the process
affects their tensile strength.
In R, the t.test() function can be used to test the null hypothesis
of equal strength against either two-sided or one-sided alternatives,
using either the Welch procedure or the pooled method:
liner <- read.table("Data/Example-09-07.txt", header = TRUE)
t.test(Strength ~ Fused, liner)
t.test(Strength ~ Fused, liner, var.equal = TRUE)
t.test() also produces a confidence interval with the same
specification as the hypothesis test.
12 / 22
Two Sample Tests
Comparing Population Means
ST 380
Probability and Statistics for the Physical Sciences
Analysis of Paired Data
So far, we have assumed that the two samples were collected in such
a way that they are statistically independent.
The calculation of the standard error of X̄ − Ȳ depended on this
assumption.
Sometimes the samples are deliberately collected so as not to be
independent, to improve the precision of the comparison.
13 / 22
Two Sample Tests
Analysis of Paired Data
ST 380
Probability and Statistics for the Physical Sciences
Example 9.8: zinc concentration in rivers
The question is whether zinc concentrations differ between bottom
water (x) and surface water (y ):
Data from six locations:
Location
14 / 22
1
2
3
4
x
y
0.430
0.415
0.266
0.238
0.567
0.390
0.531
0.410
0.707 0.716
0.605 0.609
x −y
0.015
0.028
0.177
0.121
0.102 0.107
Two Sample Tests
5
Analysis of Paired Data
6
ST 380
Probability and Statistics for the Physical Sciences
Both measurements for a given river are affected by the overall zinc
levels in the sources of the river, so we would expect them to be
dependent.
The scatterplot and the correlation coefficient strongly support
dependence:
zinc <- read.table("Data/Example-09-08.txt", header = TRUE)
with(zinc, plot(Bottom, Surface))
with(zinc, cor(Bottom, Surface))
15 / 22
Two Sample Tests
Analysis of Paired Data
ST 380
Probability and Statistics for the Physical Sciences
The conventional analysis of paired data is through the differences:
with(zinc, t.test(Bottom - Surface))
t.test() can also work with the two samples, provided you use the
paired = TRUE option:
with(zinc, t.test(Bottom, Surface, paired = TRUE))
The calculations are the same, but the results are labeled better.
16 / 22
Two Sample Tests
Analysis of Paired Data
ST 380
Probability and Statistics for the Physical Sciences
The advantage of paired data is that V (X̄ − Ȳ ) is generally smaller
than it would be with independent samples, V (X̄ ) + V (Ȳ ):
with(zinc, var(Bottom - Surface))
with(zinc, var(Bottom) + var(Surface))
The test has fewer degrees of freedom, which results in lower power,
but that is more than compensated by the increase in precision,
except when the correlation is very low.
17 / 22
Two Sample Tests
Analysis of Paired Data
ST 380
Probability and Statistics for the Physical Sciences
Comparing Binomial Parameters
Suppose that X ∼ Bin(m, p1 ) and Y ∼ Bin(n, p2 ), and that X and Y
are independent, and we wish to compare p1 and p2 .
We have unbiased point estimators
p̂1 =
X
m
and p̂2 =
Y
n
of p1 and p2 , respectively, so p̂1 − p̂2 is an unbiased estimator of
p1 − p2 .
Also we know the standard error of each, and large-sample normal
approximations to their distributions, from which we can derive a
confidence interval for p1 − p2 .
18 / 22
Two Sample Tests
Comparing Population Proportions
ST 380
Probability and Statistics for the Physical Sciences
However, it is not clear that “comparing” p1 and p2 requires making
inferences about the difference p1 − p2 .
Sometimes the interesting quantity is the log-odds-ratio
p2
p1 /(1 − p1 )
p1
− log
log
= log
p2 /(1 − p2 )
1 − p1
1 − p2
So instead of focusing on inference about p1 − p2 , we study testing
the hypothesis H0 : p1 = p2 = p, because that is the same as equality
of the odds ratios.
19 / 22
Two Sample Tests
Comparing Population Proportions
ST 380
Probability and Statistics for the Physical Sciences
The obvious test statistic is of the form
p̂1 − p̂2
standard error
where
r
standard error =
which under H0 is
p1 (1 − p1 ) p2 (1 − p2 )
+
,
m
n
s
p(1 − p)
20 / 22
1
1
+
.
m n
Two Sample Tests
Comparing Population Proportions
ST 380
Probability and Statistics for the Physical Sciences
When H0 is true, both X and Y result from the same underlying
Bernoulli trial, so we estimate the common parameter p by
p̂ =
X +Y
m+n
The estimated standard error is then
s
1
1
p̂(1 − p̂)
+
m n
21 / 22
Two Sample Tests
Comparing Population Proportions
ST 380
Probability and Statistics for the Physical Sciences
So the test statistic is
p̂1 − p̂2
T =q
p̂(1 − p̂) m1 + n1
In a two-sided test, the P-value is
P(|T | ≥ |t|) = 2 × (1 − Φ(t)).
22 / 22
Two Sample Tests
Comparing Population Proportions