February 21

Applications of Hotelling’s T 2
Testing a Hypothesis about the Mean
I
Consider as usual a random sample x1 , x2 , . . . , xN from Np (µ, Σ).
I
We can test the null hypothesis
H0 : µ = µ0
with the size-α critical region
T 2 ≥ T02
where the critical value T02 is
T02 =
NC STATE UNIVERSITY
(N − 1)p
Fp,N−p (α).
N −p
1 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
I
We can also set up a 100(1 − α)% confidence region as the set of µ∗
satisfying
N (x̄ − µ∗ )0 S−1 (x̄ − µ∗ ) ≤
I
(N − 1)p
Fp,N−p (α).
N −p
We will often find the equivalent set of 100(1 − α)% simultaneous
confidence statements, that c0 µ lies between
s
(N − 1)p
0
Fp,N−p (α),
c x̄ ± (N −1 c0 Sc)
N −p
a more convenient and interpretable set of statements.
NC STATE UNIVERSITY
2 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
Comparing Two Means
I
I
I
A T 2 statistic can also be used to test equality of means in two
populations.
Suppose that
(1)
(1)
(1)
(2)
(2)
(2)
I
x1 , x2 , . . . , xN1 , is a random sample of size N1 from Np (µ(1) , Σ), and
I
x1 , x2 , . . . , xN2 , is a random sample of size N2 from Np (µ(2) , Σ).
We estimate the common Σ with the pooled matrix
S=
(N1 − 1)S(1) + (N2 − 1)S(2)
N1 + N2 − 2
which has the Wishart distribution Wp (Σ, N1 + N2 − 2).
NC STATE UNIVERSITY
3 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
I
Since
(1)
x̄
− x̄
(2)
(1)
∼ Np µ
(2)
−µ
1
1
,
Σ+
Σ ,
N1
N2
the test statistic is
−1 0 1
1
T = x̄ − x̄
+
S
x̄(1) − x̄(2)
N1 N2
0
N1 N2 (1)
x̄ − x̄(2) S−1 x̄(1) − x̄(2) .
=
N1 + N2
2
I
(1)
(2)
The distributional result is that
N1 + N2 − p − 1
× T2
(N1 + N2 − 2)p
has the F distribution with p and N1 + N2 − p − 1 degrees of freedom.
NC STATE UNIVERSITY
4 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
I
Under the null hypothesis µ(1) = µ(2) , the distribution is central.
I
We can test this hypothesis using T 2 .
I
We can set up a confidence region for µ(1) − µ(2) .
NC STATE UNIVERSITY
5 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
Comparing More Than Two Means
(i)
(i)
(i)
I
Suppose that x1 , x2 , . . . , xNi , is a random sample of size Ni from
Np (µ(i) , Σ), i = 1, 2, . . . , g .
I
Testing the null hypothesis
H0 : µ(1) = µ(2) = · · · = µ(g )
requires more general methods than T 2 , and will be covered when
discussing the multivariate general linear model.
I
In certain problems, we may want to test the hypothesis
H0 :
g
X
βi µ(i) = µ0
i=1
for known constants β1 , β2 , . . . , βg and µ0 .
NC STATE UNIVERSITY
6 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
Examples:
I
The two-sample comparison is of this form, with g = 2, β1 = 1,
β2 = −1, and µ0 = 0.
I
Fisher studied 4-dimensional data for three species of iris, for which
genetic theory suggests
3µ(1) = µ(3) + 2µ(2) ,
where the species are Iris versicolor (1), Iris setosa (2), and Iris
virginica (3). Here β1 = 3, β2 = −2, and β3 = −1, and again µ0 = 0.
NC STATE UNIVERSITY
7 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
I
The hypothesis can be tested using a T 2 statistic based on
y=
g
X
βi x̄(i) − µ,
i=1
since, under H0 ,
g
X
β2
"
i
y ∼ Np 0,
i=1
I
The statistic is
g
X
β2
Σ .
!−1
i
i=1
Ni
! #
Ni
y0 S−1 y
where S is the corresponding pooled estimator of Σ.
NC STATE UNIVERSITY
8 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
Other Tests in a Single Sample
I
I
In a sample from a single population, we may want to test other
hypotheses than µ = µ0 .
Examples:
I
I
I
Symmetry: µ1 = µ2 = · · · = µp .
In longitudinal data, linearity: µi = α + βi, i = 1, 2, . . . , p.
In general:
H0 : Cµ = 0
I
for some q × p coefficient matrix C.
The test statistic is
T 2 = N(Cx̄)0 (CSC0 )−1 (Cx̄)
and under the null hypothesis,
N −q
× T2
(N − 1)q
has the central F distribution with q and N − q degrees of freedom.
NC STATE UNIVERSITY
9 / 10
Statistics 784
Multivariate Analysis
Applications of Hotelling’s T 2
Likelihood Ratio Tests
I
In every one of the above cases, the test based on T 2 is eqivalent to
the generalized likelihood ratio test.
I
The reason is that in each case,
Σ̂ω = Σ̂Ω + ξξ 0
for an appropriate ξ, and we use the result
det Σ̂ω
det Σ̂Ω
I
−1
= 1 + ξ 0 Σ̂Ω ξ.
In the multi-sample case, when testing the hypothesis
µ(1) = µ(2) = · · · = µ(g ) , Σ̂ω and Σ̂Ω differ by a matrix with
rank > 1.
NC STATE UNIVERSITY
10 / 10
Statistics 784
Multivariate Analysis