Functional principal components

Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Functional principal components
Přemysl Bejda
[email protected]
2013
Motivation and maximization problem
Optimal empirical orthonormal basis
Contents
1
Motivation and maximization problem
2
Optimal empirical orthonormal basis
3
Functional principal components
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
Contents
1
Motivation and maximization problem
2
Optimal empirical orthonormal basis
3
Functional principal components
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
Contents
1
Motivation and maximization problem
2
Optimal empirical orthonormal basis
3
Functional principal components
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Motivation
The dimension of functional data (L2 ) is infinity.
For better manipulation with data we need to decrease the
dimension to the finite number.
We employ similar method to principal components, where
we try to reduce number of dimensions to p (where p is
single digit number).
We are looking for orthonormal basis, by which the
maximum of variance of the data can be described.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Reminder
We work on the separable Hilbert space L2 ([0, 1]). It is the
set of measurable real-valued functions x defined on [0, 1]
1
satisfying R0 x 2 (t)dt < ª. The inner product is
1
`x, ye = R0 x(t)y(t)dt.
Hilbert-Schmidt operator is a linear operator and it satisfies
further conditions:
There exist two orthonormal bases vj and fj and a real
sequence λj converging to zero, such that
2
Ψ(x) = Pª
j=1 λj `x, vj efj , for all x > L ([0, 1]).
ª
2
Pj=1 λj < ª
An operator Ψ is said to be symmetric if
`Ψ(x), ye = `x, Ψ(y)e for all x, y > L2 .
An operator Ψ is said to be positive definite if `Ψ(x), xe C 0,
for all x > L2 .
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Maximization problem
Theorem 1 (Maximization problem theorem).
Suppose Ψ is a symmetric, positive definite Hilbert-Schmidt
operator with eigenfunctions vj and eigenvalues λj satisfying
λ1 A λ2 A . . . . Then
sup ™`Ψ(x), xe YxY = 1, `x, vj e = 0, 1 B j B i − 1ž = λi
and the supremum is reached if x = vi . The maximizing function
is unique up to a sign.
Proof:
On the last lesson was shown that a symmetric
positive-definite Hilbert-Schmidt operator Ψ admits the
decomposition ψ(x) = Pª
j=1 λj `x, vj evj , where vj are
orthonormal eigenfunctions of Ψ (i.e. Ψ(vj ) = λj vj ).
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Maximization problem - proof
So we maximize
ª
ª
j=1
j=1
`Ψ(x), xe = dQ λj `x, vj evj , xi = Q λj `x, vj e2 .
2
Parseval’s equality says to us YxY2 = Pª
j=1 `x, ej e . This
holds for any basis ej .
Since we maximize subject to YxY = 1, from previous point
2
we get that we have to maximize subject to Pª
j=1 `x, vj e = 1.
We ordered λ1 A λ2 A . . . , so we take `x, v1 e2 = 1 and
`x, vj e = 0 for j A 1.
Thus, `Ψ(x), xe is maximized at v1 (or −v1 ) and the
maximum is λ1 .
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Maximization problem - proof
Suppose now that we want to maximize `Ψ(x), xe subject
not only to YxY = 1, but also to `x, v1 e = 0.
Thus we want to find another unit norm function, which is
orthogonal to the function found in the first step.
2
Such a function, clearly satisfies `Ψ(x), xe = Pª
j=2 λj `x, vj e
2
and Pª
j=2 `x, vj e = 1.
By the same argumentation as before we get x = v2 and
maximum is λ2 .
We can employ same procedure for any j C 1.
j
Motivation and maximization problem
Optimal empirical orthonormal basis
Outline
1
Motivation and maximization problem
2
Optimal empirical orthonormal basis
3
Functional principal components
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Assumptions
Suppose we observe functions x1 , x2 , . . . , xN .
We can think of them as the observed realizations of
random function in L2 .
For simplicity we suppose that observations have mean
zero.
Fix an integer p < N. We think of p as being much smaller
than N, typically a single digit number.
We want to find an orthonormal basis u1 , u2 , . . . , up such
that we minimze
N
p
i=1
k=1
Ŝ = Q ]xi − Q `xi , uk euk ]
2
2
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Optimal empirical orthonormal basis
After finding such a basis we can replace each curve xi by
p
its approximation Pk=1 `xi , uk euk .
For the p we have chosen, this approximation is uniformly
optimal, in the sense of minimizing Ŝ 2 .
This means, that instead of working with infinitely
dimensional curves, we can work with p-dimensional
vectors x i = [`xi , u1 e, `xi , u2 e, . . . , `xi , up e]— .
This is a central idea of functional data analysis, as to
perform any practical calculations we must reduce the
dimension from infinity to a finite number.
The functions uj are called collectively the optimal
empirical orthonormal basis or natural orthonormal
components.
The words empirical and natural are emphasizing that they
are computed directly from the functional data.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Optimal empirical orthonormal basis are
eigenfunctions of the sample covariance operator
The sample covariance operator is defined as
Ĉ(x) = N −1 PN
i=1 `xi , xexi .
On the last seminar was shown that sample covariance
operator is symmetric, positive definite Hilbert-Schmidt
operator.
It was also shown how to find the eigenfunctions and
eigenvalues of the covariance operator.
We denote the first p eigenvalues and eigenfunctions of
Ĉ(x) as λ̂1 , λ̂2 , . . . , λ̂p and v̂1 , v̂2 , . . . , v̂p .
Proposition 2.
The functions u1 , u2 , . . . , up minimizing Ŝ 2 are equal (up to a
sign) to the normalized eigenfunctions of the sample
covariance operator.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Proof
Proof:
In the beginning we suppose that p = 1.
I.e. we want to find u with YuY = 1 which minimizes
N
N
N
N
i=1
i=1
N
i=1
N
i=1
2
2
2
2
2
Q Yxi − `xi , ueuY = Q Yxi Y − 2 Q`xi , ue + Q`xi , ue YuY
= Q Yxi Y2 − Q`xi , ue2 .
i=1
i=1
2
I.e. we want to maximize PN
i=1 `xi , ue .
But we have
−1 N
bĈ(u), ug = aN −1 PN
Pi=1 `xi , ue2 .
i=1 `xi , uexi , uf = N
So we want to maximize `Ĉ(u), ue and according to the
maximization problem theorem the maximum is in v̂1 .
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Proof
The general case is treated analogously.
The constraints are given by the fact, that we are looking
for orthonormal basis.
We get in the same way as before
p
N
Ŝ 2 = PN
i=1 Yxi Y − Pi=1 Pk=1 `xi , uk e.
So we need to maximize
p
p
2
Pk=1 PN
i=1 `xi , uk e = N Pk=1 bĈ(uk ), uk g.
According to the maximization problem theorem and the
condition that uk creates orthonormal basis the maximum
is attained if u1 = v̂1 , . . . , up = v̂p .
j
Motivation and maximization problem
Optimal empirical orthonormal basis
Outline
1
Motivation and maximization problem
2
Optimal empirical orthonormal basis
3
Functional principal components
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Definition
Suppose X1 , X2 , . . . , XN are functional observations.
The eigenfunctions of the sample covariance operator Ĉ
are called the Empirical functional principal components
(EFPC’s).
If the observations are distributed as a squared integrable
L2 valued random function X , then eigenfunctions of C
covariance operator of X are called functional principal
components (FPC’s).
On the last seminar was shown that under some regularity
conditions the EFPC’s estimate FPC’s (up to a sign).
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Interpretation
Previous section explains that the EFPC’s can be
interpreted as optimal orthonormal basis with respect to
which we can expand the data.
The inner product `Xi , v̂j e = R0 Xi (t)v̂j (t)dt is called the jth
score of Xi .
1
It is interpreted as the contribution of the v̂j to the curve Xi .
2
Earlier we have shown N −1 PN
i=1 `Xi , xe = bĈ(x), xg.
This statistic can be viewed as the sample variance in the
direction of the function x.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Variance
If we are interested in finding the function x which is most
correlated with the variability of the data, we must find x
which maximizes bĈ(x), xg.
Further we have to impose a restriction on the norm
YxY = 1. Otherwise we would compare something
incomparable.
According to the maximization problem theorem x = v̂1 .
In case that we want to know other functions, which are
orthogonal to the previous and explains variability the
most, we can employ the same theorem and we get that
the directions are v̂2 , . . . , v̂N
Since v̂1 , . . . , v̂N is the orthonormal basis of the space
generated by X1 , . . . , XN , it can be easily shown that
Xi = PN
j=1 `Xi , v̂j ev̂j .
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Variance
We need
λ̂j = `λ̂j v̂j , v̂j e = bĈ(v̂j ), v̂j g = d
1 N
1 N
2
Q`Xi , v̂j eXi , v̂j i = Q`Xi , v̂j e
N i=1
N i=1
Now let us compute variance
1 N
2
Q YXi Y =
N i=1
N
1 N
1 N N
2
Q`Xi , Q`Xi , v̂j ev̂j e = Q Q`Xi , v̂j e
N i=1
N
i=1 j=1
j=1
N
N
1 N
2
Q`Xi , v̂j e = Q λ̂j .
j=1 N i=1
j=1
= Q
Thus, we may say that the variance in the direction v̂j is λ̂j .
Alternatively we say that v̂j explains λ̂j ~ PN
k=1 λ̂k of total
variance.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Variance
We will proceed similarly for population analysis of
variance.
On the last seminar was defined covariance operator of
random function X as C(y) = E[`X , yeX ], where y > L2 .
It is symmetric, positive definite Hilbert-Schmidt operator
with eigenfunctions v1 , v2 . . . (orthonormal basis) and
eigenvalues λ1 , λ2 , . . .
Let us compute `C(vj ), vj e = `E[`X , vj eX ], vj e = E`X , vj e2 .
We get
ª
ª
ª
j=1
j=1
EYX Y2 = E`X , Q`X , vj evj e = Q E[`X , vj e2 ] = Q`C(vj ), vj e
j=1
ª
ª
j=1
j=1
= Q`λj vj , vj e = Q λj
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Example
Let us now present an example which describes how
functional data with specified FPC’s can be generated.
Let aj are real numbers, Zjn are iid mean zero random
variables with unit variance for every n and j. ej are
orthonormal functions (j = 1, . . . , p). We put
p
Xn (t) = Q aj Zjn ej (t).
j=1
Denote by X random function with the same distribution as
p
each Xn . I.e. X (t) = Pj=1 aj Zj ej (t).
Covariance operator of X acting on x is equal to
C(x)(t) = E ‹S X (s)x(s)ds X (t) = S E[X (t)X (s)]x(s)ds.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Example
From the independence of Zj we get
= p
<p
p
@
A
@
E[X (t)X (s)] = E @Q aj Zj ej (t) Q ai Zi ei (s)AA = Q aj2 ej (t)ej (s).
A j=1
@j=1
i=1
?
>
Therefore,
p
C(x)(t) = Q aj2 ‹S ej (s)x(s)ds ej (t).
j=1
Since C(x) is symmetric, positive definite Hilbert-Schmidt
operator we know from the last lesson that we can write
C(x) = Pª
j=1 λj `x, vj evj . With common notation of
eigenfunctions and eigenvalues.
From the previous we see that FPC’s of the Xn are the ej
and the eigenvalues are λj = aj2 .
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Choice of p
Methods of functional data which employs EFPC’s assume
that the data are well approximated by relations like
p
Pj=1 `xi , uk euk .
We suppose further that p is small and the functions uj are
relatively smooth.
The crucial question is the choice of p. It should be small,
on the other hand it should approximate the data well
enough.
The popular method is scree plot.
To apply it, one plots the successive eigenvalues λ̂j against
their numbers j.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Scree plot
Figure: The most of variance is explained by the first component.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Choice of p
In scree plot we look for the p for which the trend is
stabilized.
Another used method is cumulative percentage of total
variance (CPV).
For this method we compute how many percent of
variance is explained by the first p components
p
p
CPV(p) =
Pk=1 λ̂k
2
N −1 PN
i=1 YXi Y
=
Pk=1 λ̂k
PN
k=1 λ̂k
.
We choose p for which CPV(p) exeeds a desired level. It is
recommended the value about 85 %.
There are also other criteria like pseudo-AIC and cross
validation.
These methods are implemented in Matlab in package
PACE.
Motivation and maximization problem
Optimal empirical orthonormal basis
Functional principal components
Bibliography
L. Horvath, P. Kokoszka
Inference for Functional Data with Applications.
Springer, 2011.
J. Lukeš
Zápisky z funkcionálnı́ analýzy.
Karolinum, 2003.
Motivation and maximization problem
Optimal empirical orthonormal basis
End
Time for your questions
Functional principal components
Motivation and maximization problem
Optimal empirical orthonormal basis
End
Thank you for your attention
Functional principal components