April 3

Principal Components
Principal Components
I
Often, a multivariate response has relatively few dominant modes of
variation.
I
E.g. length, width, and height of turtle shells: overall size is a
dominant mode of variation, making all three responses larger or
smaller simultaneously.
I
Principal components analysis (PCA) explores modes of variation
suggested by a variance-covariance matrix.
NC STATE UNIVERSITY
1 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Identifying the modes of variation can lead to:
I
I
I
new insights and interpretation;
reduction in dimensionality and collinearity.
Often used to prepare data for further analysis, e.g. in regression
analysis (Principal Components Regression).
NC STATE UNIVERSITY
2 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Basic idea: find linear combinations of responses that have most of
the variance.
I
More precisely: given X with E(X) = 0 and Cov(X) = Σ, find a1 to
maximize Var (a01 X), subject to a01 a1 = 1.
Notes:
I
I
I
a1 must be constrained, otherwise the variance could be made
arbitrarily large just by making a1 large.
Why this constraint? Because the problem has a convenient solution.
NC STATE UNIVERSITY
3 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Solution:
Var a01 X = a01 Σa1 ,
which is maximized at a1 = e1 , the first eigenvector of Σ;
I
I
The maximized value is λ1 , the associated eigenvalue.
Often, the elements of e1 are all positive and similar in magnitude ⇒
a mode in which all responses vary together:
I
I
not often interesting;
a useful summary or composite variable.
NC STATE UNIVERSITY
4 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
What about other modes of variation? Find a2 to maximize
Var (a02 X), subject to:
I
I
I
Solution:
I
I
I
a02 a2 = 1;
Cov (a01 X, a02 X) = 0.
a2 = e2 , the second eigenvector of Σ;
The maximized value is λ2 , the associated eigenvalue.
Similarly modes 3, 4, . . . , p.
NC STATE UNIVERSITY
5 / 30
Statistics 784
Multivariate Analysis
Principal Components
Alternative Approach: Data Compression
I
Observer sees X but communicates only Y = a0 X.
I
Receiver uses X∗ = bY to approximate X.
I
Error is X − X∗ = (I − ba0 )X.
I
Error covariance matrix is Cov(X − X∗ ) = (I − ba0 )Σ(I − ab0 ).
I
Total error variance is
trace[Cov(X − X∗ )] = trace(Σ) − a0 Σb − b0 Σa + b0 ba0 Σa.
NC STATE UNIVERSITY
6 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
For a given b, total error variance is minimized by
a=
I
b
.
b0 b
Minimum for a given b is
trace(Σ) −
b0 Σb
.
b0 b
I
Best b is e1 , and then best a = e1 also.
I
Best (i.e., minimum) total error variance is
trace(Σ) − λ1 = λ2 + λ3 + · · · + λp .
NC STATE UNIVERSITY
7 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Higher dimensions: suppose observer can communicate
Y = (Y1 , Y2 )0 = A0 X, and receiver uses X∗ = BY to approximate X.
I
Similar math: best A = B = (e1 , e2 ), with total error
trace(Σ) − λ1 − λ2 = λ3 + λ4 + · · · + λp .
I
Similarly for k channels, k < p: proportion of total variance
“explained” (i.e., communicated) is
λ1 + λ2 + · · · + λk
.
λ1 + λ2 + · · · + λk + λk+1 + · · · + λp
NC STATE UNIVERSITY
8 / 30
Statistics 784
Multivariate Analysis
Principal Components
Scaling in PCA
I
Recall: PCA is the solution to a data compression problem, where
“error” is quantified by total error variance.
I
Question: is “total variance” appropriate?
I
Variables in different units must be scaled.
I
Variables in the same units but with very different variances are
usually scaled.
NC STATE UNIVERSITY
9 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Simplest scaling: divide each variable by its standard deviation ⇒
covariances are correlations.
I
In other words: use eigen structure of correlation matrix R, not
covariance matrix Σ.
NC STATE UNIVERSITY
10 / 30
Statistics 784
Multivariate Analysis
Principal Components
PCA for some Special Cases
I
Diagonal matrix: if



Σ=

σ1,1
0 ...
0 σ2,2 . . .
..
..
..
.
.
.
0
0 ...
0
0
..
.





σp,p
then the principal components are just the original variables.
NC STATE UNIVERSITY
11 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Compound symmetry: if



Σ=

σ 2 ρσ 2 . . .
ρσ 2 σ 2 . . .
..
..
..
.
.
.
ρσ 2 ρσ 2 . . .
ρσ 2
ρσ 2
..
.
σ2





then (if ρ > 0):
I
I
I
I
λ1 = 1 + (p − 1)ρ and e1 = p −1/2 (1, 1, . . . , 1)0 ;
λk = 1 − ρ, k > 1;
e2 , e3 , . . . , ep are an arbitrary basis for the rest of Rp .
If ρ < 0 the order is reversed, but note that ρ must satisfy
1 + (p − 1)ρ ≥ 0 ⇒ ρ ≥ −1/(p − 1).
NC STATE UNIVERSITY
12 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Time series (1st order autoregression):

σ2
φσ 2
2
 φσ
σ2

Σ=
..
..

.
.
...
...
..
.
φp−1 σ 2 φp−2 σ 2 . . .
I
φp−1 σ 2
φp−2 σ 2
..
.
σ2





No closed form, but for large p the eigen vectors are like sines and
cosines.
NC STATE UNIVERSITY
13 / 30
Statistics 784
Multivariate Analysis
Principal Components
Sample PCA
I
Essentially the eigen analysis of S (or R):
Sêk = λ̂k êk ,
and
ŷk = Xdev êk ,
where
Xdev
1
= X − 110 X =
n
NC STATE UNIVERSITY
14 / 30
1 0
I − 11 X
n
Statistics 784
Multivariate Analysis
Principal Components
I
Note:
1
S=
X0 Xdev =
n − 1 dev
I
1
√
Xdev
n−1
0 1
√
Xdev
n−1
Singular value decomposition:
√
1
Xdev = UDV0
n−1
where U and V have orthonormal columns and D is diagonal (but
may not be square).
I
The diagonal entries of D are the square roots of the largest p
eigenvalues of both (n − 1)−1 X0dev Xdev = S and (n − 1)−1 Xdev X0dev .
I
The columns of V are the eigenvectors of X0dev Xdev .
NC STATE UNIVERSITY
15 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Also
Xdev V = [ŷ1 , ŷ2 , . . . , ŷp ] =
√
n − 1 UD
so the singular value decomposition of (n − 1)−1/2 Xdev provides all
the details of the sample principal components:
I
I
I
the coefficients V;
the values UD.
Similarly, if X∗ is Xdev with its columns normalized (sum of squares =
1), then
R = X∗ 0 X∗ ,
and the singular value decomposition of X∗ gives the PCA of R.
NC STATE UNIVERSITY
16 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Example: 5 stocks
I
I
I
I
I
DU PONT E I DE NEM (NYSE:DD) (a former Dow Industrials stock)
HONEYWELL INTL INC (NYSE:HON) (a former Dow Industrials
stock)
EXXON MOBIL CP (NYSE:XOM) (a Dow Industrials stock)
CHEVRON CORP (NYSE:CVX) (a Dow Industrials stock)
DOW CHEMICAL (NYSE:DOW) (former Dow stock)
NC STATE UNIVERSITY
17 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
I
SAS proc princomp program and output.
R code and graphs for an updated set of stocks: DD, HON, and
XOM, plus MSFT (Microsoft) and WMT (Walmart).
stocksPCAcor = prcomp(stocks(), scale. = TRUE);
print(stocksPCAcor);
plot(stocksPCAcor);
biplot(stocksPCAcor);
stocksPCAcov = prcomp(stocks());
print(stocksPCAcov);
plot(stocksPCAcov);
biplot(stocksPCAcov);
NC STATE UNIVERSITY
18 / 30
Statistics 784
Multivariate Analysis
Principal Components
1.0
Variances
1.5
2.0
stocksPCAcor
NC STATE UNIVERSITY
19 / 30
Statistics 784
Multivariate Analysis
Principal Components
Biplot for standardized stock returns
−5
0
5
0.2
17
109
112
85
80 43
104 2138
82
37
WMT
0.0
PC2
0.1
111
101
64
MSFT
48
20110
49 88
3115
106
893 96
35 39
51
7
25
58 81
60
31 6965 79
70
11350
639
95114
14
33
84
34
52
66
62
27
7415
57
94
56
10
622
83
46
71
13
2
24
67
97
87
26
Statistics 784
54
59
68 107 Multivariate
NC STATE UNIVERSITY
20 / 30
Analysis
Principal Components
10
Variances
15
stocksPCAcov
NC STATE UNIVERSITY
21 / 30
Statistics 784
Multivariate Analysis
Principal Components
Biplot for unstandardized stock return
−20
−10
0
10
20
0.0
PC2
0.1
0.2
28
NC STATE UNIVERSITY
73
75
36
30
86
32 11
12
100
8972
105
98
19 16
55
44
61
76
77
18
7859 29
40
103
74 99 41 5
107
45
92
97
108
6 90
47
25
15
84
67 5366
14
1 54
68 2
87
56
24 52
58 26 71
13
8369 46
1027
50
113
23
94
22
114
91
57 65
60
51
62
33
7
Statistics 784
8
34
70
3
20
95
35
22 / 3081
111
9 Multivariate Analysis
X
Principal Components
How Many Components?
I
How many components are important?
I
I
No definitive answer in the PCA framework.
The factor analysis model allows maximum likelihood estimation, hence
hypothesis testing.
I
For insight, use all that have a substantive interpretation.
I
For other uses such as regression, we need an objective rule.
NC STATE UNIVERSITY
23 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
I
Use “scree” plot (plot of eigenvalues), and look for an “elbow”; very
subjective.
Simple rule of thumb: use all eigenvalues larger than the average.
I
I
Note: for the correlation matrix, the average is always 1, so the rule is:
use all eigenvalues > 1.
Overland and Preisendorfer (1982) proposed a rule (“Rule N”) based
on comparison of observed eigenvalues with the distribution of
eigenvalues in the case of iid N(0, σ 2 ).
NC STATE UNIVERSITY
24 / 30
Statistics 784
Multivariate Analysis
Principal Components
Large Samples
I
Suppose that the rows of the data matrix X are a random sample of
size n from Np (µ, Σ).
I
Assume that Σ has distinct eigenvalues
λ1 > λ2 > · · · > λp > 0.
I
Then, approximately for large n,
√ n λ̂ − λ ∼ Np 0, 2 × diag λ2 .
NC STATE UNIVERSITY
25 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
In other words, λ̂i and λ̂k are approximately independent for k 6= i,
and
2λ2i
.
λ̂i ∼ N λi ,
n
I
Note: if
nλ̂
∼ χ2n
λ
then similarly, approximately,
2λ2
λ̂ ∼ N λ,
.
n
NC STATE UNIVERSITY
26 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
So we could also state that, approximately,
nλ̂i
∼ χ2n .
λi
I
Simulations suggest that this is a better approximation for small n, if
the eigenvalues are well separated.
I
Asymptotics suggest that the degrees of freedom for λ̂i could be
n − i + 1 instead of n.
√
Also n (êi − ei ) is approximately Np (0, Ei ), where
I
Ei = λi
p
X
λk
(λk − λi )2
k=1
× ek e0k .
k6=i
NC STATE UNIVERSITY
27 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Test for equal correlations:



H0 : R = 

1
ρ
..
.
ρ ...
1 ...
.. . .
.
.
ρ ρ ...
ρ
ρ
..
.





1
against the alternative of a general (unstructured) Σ or R.
I
H0 is equivalent to equality of all eigenvalues of R but one.
I
Likelihood ratio test (and Lawley’s test) give large-sample χ2 statistic
with (p + 1)(p − 2)/2 d.f.
NC STATE UNIVERSITY
28 / 30
Statistics 784
Multivariate Analysis
Principal Components
I
Other related tests of interest:
I
I
I
I
compound symmetry is a variance-components structure; it is stronger,
requiring equal correlations and equal variances;
equivalent to equality of all eigenvalues of Σ but one.
sphericity is the necessary and sufficient condition for univariate
repeated measures to be valid; it is a weaker condition:
σi,j = (σi,i + σj,j ) /2 − τ 2 .
no simple characterization in terms of eigen structure.
NC STATE UNIVERSITY
29 / 30
Statistics 784
Multivariate Analysis