How to Estimate Eigenvalues and Eigenvectors of Covariance

How to Estimate Eigenvalues and Eigenvectors of
Covariance Function when Parameters Are
Estimated
Kohtaro Hitomi
Kyoto Institute of Technologyв€—
6th March 2002
Abstract
This paper introduces an estimation method of the eigenvalues and
eigenvectors from the sample covariance function that involves estimated
parameters. We prove that the estimated eigenvalues and eigenvectors are
consistent.
It also includes the approximation method of the critical values of ICM
test using estimated eigenvalues.
в€— Department of Architecture and Design, Kyoto Institute of Technology, Sakyo, Kyoto
606-8585 Japan.
1
Introduction
This article proposes an estimation method for eigenvalues and eigenvectors of
covariance functions. This is an extension of Principal Components Analysis
(PCA) of finite dimensional random vectors to random “functions.”
Over the past few decades, a considerable number of studies have been conducted on functional data analysis in statistics. Some surveys of the field are
given in Ramsey and Silverman (1997) and Ramsey and Daizill (1991).
Such methods might be useful in econometrics. For example, suppose the
ICM test (Bierens and Ploberger 1997) for functional form E[y|x] = Q(Оё, x).
This test uses the following random function on Оћ вЉ‚ Rk ,
1
zn = в€љ
n
Л† xt )) exp(Оѕ xt ) Оѕ в€€ Оћ вЉ‚ Rk ,
(yt в€’ Q(Оё,
and the null distribution of the test statistic depends on the eigenvalues of the
covariance function of zn . Thus if we can estimate the eigenvalues, critical values
of the test statistic are easily calculated. See Hitomi (2000) for detail.
In many econometric models, sample covariance functions include estimated
parameters and are defined on Rk Г— Rk instead of R1 Г— R1 . Ramsey and Silver-
man (1997) have used a discrete approximation method for estimating eigenvalues and eigenvector of covariance functions on a subset of R1 Г—R1 . It is difficult,
however, to extend their method to higher dimensions. Dauxois, Pousse and
Romain (1982) have investigated the convergence of estimated eigenvalues and
eigenvectors of sample covariance functions on separable Hilbert space. Their
sample covariance function has not included estimated parameters and they
have proposed no estimation method, however.
This article solves the above problems for applying the functional data analysis to econometric models. It proposes an estimation method of eigenvalues and
eigenvectors from a sample covariance function on a subset of Rk Г— Rk , which
involves estimated parameters, and proves consistency of estimated eigenvalues
and eigenvectors.
The plan of the paper is the following. Section 2 explains the model and
the estimation method. The consistency of the estimated eigenvalues and eigenvectors is proved under high-level assumptions in section 3. As the example,
low-level assumptions for the ICM test are derived in section 3. The last section
is concluding remarks. Some mathematical proofs are included in Appendix.
1
2
Model and Estimation Method
Suppose that we are interested in eigenvalues and eigenvectors of a continuous
covariance function О“0 (Оѕ1 , Оѕ2 ) on Оћ Г— Оћ, where Оћ is a compact subset of Rk ,
k ≥ 1. We assume that Γ0 satisfies
|О“0 (Оѕ1 , Оѕ2 )| dВµ(Оѕ1 )dВµ(Оѕ2 ) < в€ћ,
where Вµ(Оѕ) is a known probability measure on Оћ.
An eigenvalue О» and an eigenvector П€(Оѕ) of О“0 (Оѕ1 , Оѕ2 ) are the solution of
characteristic equation
О“0 (Оѕ1 , Оѕ)П€(Оѕ1 )dВµ(Оѕ1 ) = О»П€(Оѕ).
(1)
Assume that there is a consistent estimator О“n of О“0 , which involves an estimate
of unknown parameters Оё0 в€€ О� вЉ‚ Rq ,
Л† Оѕ1 , Оѕ2 ) = 1
О“n (Оё,
n
n
Л† wt , Оѕ1 )an (Оё,
Л† wt , Оѕ2 ),
an (Оё,
(2)
t=1
where an (.) : О� Г— Rd Г— Оћ в†’ R1 is a function that satisfies an (Оё, wt , Оѕ) 2 < в€ћ
for all (Оё, wt ) в€€ О�Г—Rd , wt в€€ Rd is an i.i.d. random variable and ОёЛ† is a consistent
estimator of Оё0 .
We begin by introducing some notation. < f, g > is the inner product in
L2 (Вµ(Оѕ)), i.e.
< f, g >=
and В·
2
f(Оѕ)g(Оѕ)dВµ(Оѕ),
is L2 (Вµ(Оѕ)) norm. Let О¦ : L2 (Вµ(Оѕ)) в†’ L2 (Вµ(Оѕ)) be a bounded linear
operator, we write О¦f when we apply operator О¦ to f в€€ L2 (Вµ(Оѕ)). Thus О“n f
implies
О“n f =
В·
F
Л† Оѕ1 , Оѕ)f(Оѕ1 )dВµ(Оѕ1 ).
О“n (Оё,
is the uniform operator norm, i.e.
О¦
F
= sup
f
2 =1
О¦f
2.
Л† wi , Оѕ) by ai (Оѕ) or ai .
For notational simplicity, sometime we abbreviate an (Оё,
Л† Оѕ1 , Оѕ2 ). The operator О“n
We estimate eigenvalues and eigenvectors of О“n (Оё,
maps arbitrary function f(Оѕ) в€€ L2 (Вµ(x)) on a finite dimensional space, which is
2
spanned by {a1 (Оѕ), a2 (Оѕ), . . . , an (Оѕ)}, since
О“n f
=
Л† Оѕ1 , Оѕ)f(Оѕ1 )dВµ(Оѕ1 )
О“n (Оё,
=
1
n
=
n
at (Оѕ1 )at (Оѕ)f(Оѕ1 )dВµ(Оѕ1 )
t=1
n
1
n
at , f at (Оѕ)
t=1
n
=
bt at (Оѕ),
t=1
where bt = n1 at , f . Let Hn be the space that is spanned by {a1 (Оѕ), a2 (Оѕ), . . . , an (Оѕ)}.
Let О»n and П€n be a solution of the sample characteristic equation
Л† Оѕ1 , Оѕ)П€n (Оѕ1 )dВµ(Оѕ1 ) = О»n П€n (Оѕ)
О“n (Оё,
⇔
О“n П€n
= О»n П€n .
(3)
(4)
П€n is a linear combination of {a1 , a2 , . . . , an } since О“n П€n в€€ Hn . Therefore we
can express П€n as
n
П€n =
О±t at (Оѕ)
t=1
and put it into (3), we get
n
О“n П€n
= О»n
О±t at
t=1
n
⇔
1
n
⇔
n
t=1
⇔
n
t=1
n
t=1
at , П€n at
= О»n
О±t at
t=1
n
n
s=1
at ,
О±s as at
= О»n
О±t at
t=1
n
n
s=1
О±s at , as at
= О»n
О±t at .
t=1
Comparing the coefficients of at , we get
n
О±s at , as = О»n О±t
(5)
s=1
Now we define the n Г— n matrix A such that the (i, j) element of A is ai , aj ,
A = { ai , aj }
3
and the m Г— 1 vector О± as О± = (О±1 , . . . , О±n ) . Then the matrix expression of (5)
is
AО± = О»n О±.
(6)
This implies that an eigenvalue of (3) is an eigenvalue of matrix A and an eigenvector of (3) is П€n =
n
t=1
О±t at , where О±t is the t-th element of the eigenvector
of matrix A corresponding О»n .
We got the following lemma,
Lemma 1. Suppose an (Оё, wt , Оѕ)
2
< в€ћ for all (Оё, wt ) в€€ О� Г— Rd .
The following statements are equivalent,
1. О»n and П€n is a solution of the characteristic equation
Л† Оѕ1 , Оѕ)П€n (Оѕ1 )dВµ(Оѕ1 ) = О»n П€n (Оѕ).
О“n (Оё,
2. О»n is a eigenvalue of A, П€n =
n
t=1
О±t at (Оѕ), where О±t is the t-th element
of the corresponding eigenvector of О»n and
A =
3
{ ai , aj } .
Consistency
We assume the following two sets of high-level assumptions. An example of
low-level assumptions is discussed in section 4.
Assumption a.s.
1. (uniform convergence) О� and Оћ are compact subset of Rq and Rk respectively. Let О“n (Оё, Оѕ1 , Оѕ2 ) be
О“n (Оё, Оѕ1 , Оѕ2 ) =
1
n
n
an (Оё, wt , Оѕ1 )an (Оё, wt , Оѕ2 ).
t=1
О“n (Оё, Оѕ1 , Оѕ2 ) converges to a nonrandom continuous function О“(Оё, Оѕ1 , Оѕ2 )
a.s. uniformly on О� Г— Оћ Г— Оћ. And О“(Оё0 , Оѕ1 , Оѕ2 ) = О“0 (Оѕ1 , Оѕ2 ) for all (Оѕ1 , Оѕ2 ) в€€
Оћ Г— Оћ.
Л† ОёЛ† converges to Оё0 в€€ О� a.s.
2. (consistency of Оё)
4
Assumption pr
1. (uniform convergence) О� and Оћ are compact subset of Rq and Rk respectively. Let О“n (Оё, Оѕ1 , Оѕ2 ) be
О“n (Оё, Оѕ1 , Оѕ2 ) =
1
n
n
an (Оё, wt , Оѕ1 )an (Оё, wt , Оѕ2 ).
t=1
О“n (Оё, Оѕ1 , Оѕ2 ) converges to a nonrandom continuous function О“(Оё, Оѕ1 , Оѕ2 ) in
probability uniformly on О� Г— Оћ Г— Оћ. And О“(Оё0 , Оѕ1 , Оѕ2 ) = О“0 (Оѕ1 , Оѕ2 ) for all
(Оѕ1 , Оѕ2 ) в€€ Оћ Г— Оћ.
Л† ОёЛ† converges to Оё0 в€€ О� in probability.
2. (consistency of Оё)
The first set of assumptions corresponds to almost sure convergence of the eigenvalues and the eigenvectors and the second set of assumptions corresponds to
convergence in probability.
Let {О»i } be the decreasing sequence of the non-null eigenvalues of О“0 (Оѕ1 , Оѕ2 )
and {П€i } be the corresponding sequence of the eigenvectors of О“0 (Оѕ1 , Оѕ2 ), {О»ni }
Л† Оѕ1 , Оѕ2 ) and {П€ni }be
be the decreasing sequence of the non-null eigenvalues of О“n (Оё,
Л† Оѕ1 , Оѕ2 ) and define the set
the corresponding sequence of the eigenvectors of О“n (Оё,
Ii as the following,
Ii = {j|О»i = О»j },
|Ii | = ki .
Proposition 2. Suppose Assumption a.s. is satisfied. When О»i is of order ki ,
there are ki sequences {О»nj |j в€€ Ii } converging to О»i a.s. If Assumption pr is
satisfied instead of Assumption a.s., {О»nj |j в€€ Ii } converge to О»i in probability.
Moreover the convergence is uniform in j.
Proof. First we think the almost sure convergence case.
Л† Оѕ1 , Оѕ2 ) в€’ О“0 (Оѕ1 , Оѕ2 )
sup О“n (Оё,
Оѕ1 ,Оѕ2
Л† Оѕ1 , Оѕ2 ) в€’ О“(Оё,
Л† Оѕ1 , Оѕ2 ) + sup О“(Оё,
Л† Оѕ1 , Оѕ2 ) в€’ О“(Оё0 , Оѕ1 , Оѕ2 )
sup О“n (Оё,
≤
Оѕ1 ,Оѕ2
≤
Оё,Оѕ1 ,Оѕ2
Оѕ1 ,Оѕ2
Л† Оѕ1 , Оѕ2 ) в€’ О“(Оё0 , Оѕ1 , Оѕ2 ) .
sup |О“n (Оё, Оѕ1 , Оѕ2 ) в€’ О“(Оё, Оѕ1 , Оѕ2 )| + sup О“(Оё,
Оѕ1 ,Оѕ2
The first term of the last inequality converges to zero a.s. since О“n (Оё, Оѕ1 , Оѕ2 )
converges to О“(Оё, Оѕ1 , Оѕ2 ) a.s. uniformly on О� Г— Оћ Г— Оћ. О“(Оё, Оѕ1 , Оѕ2 ) is uniformly
continuous because О“(Оё, Оѕ1 , Оѕ2 ) is continuous on О� Г— Оћ Г— Оћ and О� Г— Оћ Г— Оћ is
a compact set. Thus the second term converges to zero a.s. since ОёЛ† в†’ Оё0 a.s.
Л† Оѕ1 , Оѕ2 ) в†’ О“0 (Оѕ1 , Оѕ2 ) a.s. uniformly on Оћ Г— Оћ.
Therefore О“n (Оё,
Let think the distance between О“n and О“0 in the uniform operator norm on
5
L2 (Вµ(Оѕ)) в†’ L2 (Вµ(Оѕ)).
О“n в€’ О“0
=
sup
f
≤
2 =1
2
Л† Оѕ1 , Оѕ) в€’ О“0 (Оѕ1 , Оѕ) f(Оѕ1 )dВµ(Оѕ1 )
О“n (Оё,
sup
f
F
О“n f в€’ О“0 f
2 =1
2
1/2
2
≤
sup
f
2 =1
≤
f
=
sup
пЈ«
sup пЈ­
f
≤
Л† Оѕ1 , Оѕ) в€’ О“0 (Оѕ1 , Оѕ) f(Оѕ1 )dВµ(Оѕ1 )
О“n (Оё,
2 =1
2 =1
2
dВµ(Оѕ)
1/2
Л† Оѕ1 , Оѕ) в€’ О“0 (Оѕ1 , Оѕ) dВµ(Оѕ1 )
О“n (Оё,
2
Л† Оѕ1 , Оѕ) в€’ О“0 (Оѕ1 , Оѕ) dВµ(Оѕ1 )dВµ(Оѕ)
О“n (Оё,
1/2
f(Оѕ1 )2 dВµ(Оѕ1 )
1/2
f(Оѕ)
2
Л† Оѕ1 , Оѕ) в€’ О“0 (Оѕ1 , Оѕ) .
sup О“n (Оё,
Оѕ1 ,Оѕ2
Thus О“n converges to О“0 a.s. in the uniform operator norm. The conclusion
follows directly from lemma 5 of Donford and Schwartz (1988, p1091).
The proof of the convergence in probability case is similar. Since О“n (Оё, Оѕ1 , Оѕ2 ) в†’
О“(Оё, Оѕ1 , Оѕ2 ) uniformly on О� Г— ОћГ— Оћ in probability and ОёЛ† в†’ Оё0 in probability, then
Л† Оѕ1 , Оѕ2 ) в†’ О“0 (Оѕ1 , Оѕ2 ) uniformly on Оћ Г— Оћ in probability. Thus any subseО“n (Оё,
quence nk in sequence n contains a further subsequence nki such that О“nki в†’ О“0
a.s. uniformly on О� Г— Оћ Г— Оћ. Therefore О“nki в†’ О“0 a.s. in the uniform operator
norm. Then apply lemma 5 of Donford and Schwartz (1988, p1091) to the sequence of operator О“nki . Therefore for j в€€ Ii , О»nki j в†’ О»i a.s. This implies for
j в€€ Ii , О»nj в†’ О»i in probability.
The following proposition is worth mentioning in passing.
Proposition 3. Soppose Assumption a.s. (Assumption pr)is satisfied. Then
n
|О»nj |
в†’
|О»j |
|О»nj |2
в†’
|О»j |2
j=1
n
j=1
a.s. (in pr)
a.s. (in pr).
Proof. Since
j=1
|О»j | =
О“0 (Оѕ, Оѕ)dВµ(Оѕ)
6
2
пЈ¶1/2
dВµ(Оѕ)пЈё
and
j=1
n
j=1
|О»j |2 =
О“0 (Оѕ1 , Оѕ2 )2 dВµ(Оѕ1 )dВµ(Оѕ2 ),
в€ћ
|О»nj | в€’
j=1
|О»j |
=
О“n (Оѕ, Оѕ)dВµ(Оѕ) в€’
≤
О“0 (Оѕ, Оѕ)dВµ(Оѕ)
|О“n (Оѕ, Оѕ) в€’ О“0 (Оѕ, Оѕ)| dВµ(Оѕ)
≤
sup О“n (Оѕ1 , Оѕ2 ) в€’ О“0 (Оѕ1 , Оѕ2 )
Оѕ1 ,Оѕ2
в†’ 0
a.s. (in pr).
Similarly,
в€ћ
n
j=1
|О»nj |2 в€’
j=1
|О»j |2
О“n (Оѕ1 , Оѕ2 )2 dВµ(Оѕ1 )dВµ(Оѕ2 ) в€’
=
О“n (Оѕ1 , Оѕ2 )2 dВµ(Оѕ1 )dВµ(Оѕ2 )
2
(О“0 в€’ (О“0 в€’ О“n )) в€’ О“20 dВµ(Оѕ1 )dВµ(Оѕ2 )
=
≤
2
О“0 (О“0 в€’ О“n )dВµ(Оѕ1 )dВµ(Оѕ2 ) +
≤
2 sup |О“0 (Оѕ1 , Оѕ2 ) в€’ О“n (Оѕ1 , Оѕ2 )|
(О“0 в€’ О“n )2 dВµ(Оѕ1 )dВµ(Оѕ2 )
О“0 dВµ(Оѕ1 )dВµ(Оѕ2 )
Оѕ1 ,Оѕ2
2
+ sup |О“0 (Оѕ1 , Оѕ2 ) в€’ О“n (Оѕ1 , Оѕ2 )|
Оѕ1 ,Оѕ2
в†’
0
a.s. (in pr).
Let П†i (Оѕ) and П†ni(Оѕ) be normalized eigenvectors corresponding to О»i and О»ni
respectively, thus
П†i (Оѕ) =
П€ni (Оѕ)
П€i (Оѕ)
, П†ni (Оѕ) =
and П†i (Оѕ) = П†ni(Оѕ) = 1 for all i and n.
П€i (Оѕ) 2
П€ni (Оѕ) 2
When the multiplicity of the eigenvalue О»i is larger than one, corresponding
normalized eigenvectors are not uniquely determined. Therefore we could not
have a convergence property for each corresponding eigenvector sequence. However the corresponding eigenspace is unique. Therefore next we think about the
convergence property of the projection operators that maps functions on the
eigenspace corresponding to the eigenvalue О»i .
Let Pj (Оѕ1 , Оѕ2 ) = kв€€Ij П†k (Оѕ1 )П†k (Оѕ2 ) be an orthogonal projection operator
that maps L2 (Вµ(Оѕ)) on the eigenspace corresponding to the eigenvalue О»i , and
7
Pnj (Оѕ1 , Оѕ2 ) =
φnk (ξ1 )φnk (ξ2 ) be an estimator of Pj . Note that the covariˆ ξ1 , ξ2 ) could be decomposed as the followings,
ance function О“(Оѕ1 , Оѕ2 ) and О“n (Оё,
kв€€Ij
в€ћ
О“(Оѕ1 , Оѕ2 ) =
О»j Pj (Оѕ1 , Оѕ2 )
j=1
m
Л† Оѕ1 , Оѕ2 ) =
О“n (Оё,
О»nj Pnj (Оѕ1 , Оѕ2 ).
j=1
Proposition 4. Suppose Assumption a.s. (Assumption pr) is satisfied. Then
for each j, Pnj converges Pj a.s. (in probability) in the uniform operator norm.
Proof. From the proof of Proposition 2, О“n converges to О“ a.s. (in probability)
in the uniform operator norm. In the proof of Proposition 3 in Dauxois, Pousse
and Romain (1982), they have shown that Pnj в€’ Pj
We have proved the proposition.
F
в†’ 0 if О“n в€’ О“
F
в†’ 0.
When the multiplicity of the eigenvalue О»i is equal to one, we could talk
about the convergence of eigenvectors. However if П†i (Оѕ) is the normalized eigenvector corresponding to О»i , в€’П†i (Оѕ) also satisfies the characteristic equation and
в€’П†i (Оѕ)
2
= 1. So we need further specification. Choose one of normalized
eigenvector, named φ1i (ξ), corresponding to λi and choose a sequence of eigenvectors φ1ni (ξ) corresponding to λnj such that φ1i (ξ), φ1ni (ξ) ≥ 0.
Corollary 5. Suppose Assumption a.s. (or Assumption pr) is satisfied and the
multiplicity of the eigenvalue О»i is one. Then П†1ni (Оѕ) converges to П†1i (Оѕ) a.s. (in
probability) on L2 (Вµ(Оѕ)).
Proof. Since the multiplicity of the eigenvalue О»i is 1, Pi (Оѕ1 , Оѕ2 ) = П†1i (Оѕ1 )П†1i (Оѕ2 )
and Pni (Оѕ1 , Оѕ2 ) = П†1ni(Оѕ1 )П†1ni (Оѕ2 ). Then
Pi в€’ Pnj
F
=
sup
f
=
sup
f
≥
2 =1
2 =1
П†1i (Оѕ1 )П†1i (Оѕ) в€’ П†1ni (Оѕ1 )П†1ni (Оѕ) f(Оѕ1 )dВµ(Оѕ1 )
П†1ni , f
П†1ni
в€’
П†1i , f
П†1ni , П†1i П†1ni в€’ П†1i , П†1i П†1i
2
П†1i 2
2
2 1/2
П†1ni, П†1i
=
1в€’
=
1 1
П† в€’ П†1i
2 ni
2
2
1/2
1 + П†1ni , П†1i
≥ 0.
Now 1 + φ1ni, φ1i ≥ 1 by the construction of φ1ni and Pi − Pnj
to zero a.s. (in probability). Therefore
probability).
8
П†1ni
в€’
П†1i 2
F
converges
converges to zero a.s. (in
4
Example: ICM test
Let wt = (yt , xt ) be a sequence of i.i.d. random variables on R1 Г— Rd . The ICM
test statistics for testing H0 : yt = Q(xt , Оё0 ) + ut uses the random function
1
zn (Оѕ) = в€љ
n
n
t=1
Л† exp(Оѕ О¦(xt )),
(y в€’ Q(xt , Оё))
where ОёЛ† is the nonlinear least squares estimator of Оё0 and О¦(x) is a bounded
one-to-one function on Rd в†’ Rd . The test statistics is
zn (Оѕ)2 dВµ(Оѕ).
Tn =
Under Assumption A in Bierens (1990), which is also included in Appendix A for
readers’ convenience, zn (ξ) converges a Gaussian process z(ξ) with covariance
function
∂Q(xt , θ0 )
∂θ
∂Q(xt , θ0 )
exp(Оѕ2 О¦(xt )) в€’ b(Оё0 , Оѕ2 )Aв€’1
,
∂θ
E u2t exp(Оѕ1 О¦(xt )) в€’ b(Оё0 , Оѕ1 )Aв€’1
О“0 (Оѕ1 , Оѕ2 ) =
Г—
where b(θ0 , ξ) = E[(∂/∂θ)Q(xt , θ0 ) exp(ξ Φ(xt ))] and
A=E
∂Q(xt , θ0 ) ∂Q(xt , θ0 )
.
∂θ
∂θ
A natural estimator of О“0 (Оѕ1 , Оѕ2 ) is
Л† Оѕ1 , Оѕ2 ) =
О“n (Оё,
1
n
n
t=1
Л†
yt в€’ Q(xt , Оё)
2
Л†
Л† Оѕ1 )An (Оё)
ˆ −1 ∂Q(xt , θ)
exp(Оѕ1 О¦(xt )) в€’ bn (Оё,
∂θ
Л† Оѕ2 )An (Оё)
Л† в€’1
Г— exp(Оѕ2 О¦(xt )) в€’ bn (Оё,
Л†
∂Q(xt , θ)
∂θ
,
where
bn (Оё, Оѕ)
An (Оё)
=
1
n
=
1
n
n
t=1
n
t=1
∂Q(xt , θ)
exp(Оѕ О¦(xt )),
∂θ
∂Q(xt , θ) ∂Q(xt , θ)
.
∂θ
∂θ
The low level assumptions for the consistency of the estimated eigenvalues and
the eigenvectors are not so restrictive. The following assumptions, which were
used in Bierens (1990) and Bierens and Ploberger (1997) to derive the asymp-
9
totic distribution of the ICM test, ensure the consistency of the estimated eigenvalues, the projection operators and the eigenvectors.
Assumption ICM
1. Assumption A is satisfied
2. Оћ is a compact subset of Rd . The probability measure Вµ(Оѕ) is chosen
absolutely continuous with respect to Lebesgue measure.
Proposition 6. Assumption ICM implies Assumption a.s.
Proof. See Appendix B.
One application of estimated eigenvalues is to estimate critical values of the ICM
test. Bierens and Ploberger (1997) could not get critical values of their test since
it depends on the distribution of independent variables. They reported only case
independent upper bounds of the critical values. It might be too conservative.
It might be possible to apply Hansen’s bootstrapping method (Hansen 1996),
however it is very time consuming. With the estimated eigenvalues we could
estimate the critical values.
As shown in Bierens and Ploberger (1997), under the null hypothesis the
test statistics Tn converges to
в€ћ
d
Tn в€’
в†’q=
О»i Zi2 ,
i=1
where λi s are the eigenvalues of Γ0 and Zi ’s are independent standard normal
random variables.
Construct new random variable qn based on the estimated eigenvalues as the
following,
n
О»ni Zi2,
qn =
i=1
where λni ’s are the eigenvalues of Γn . The critical values of Tn could be estimated by the critical values of qn . The following theorem justifies the above
method.
Theorem 7. Suppose Assumption ICM is satisfied. Let {О»j : j = 1, 2, . . . , в€ћ}
and {О»nj : j = 1, 2, . . . , n} be the decreasing sequences of the eigenvalues of
Л† Оѕ1 , Оѕ2 ) respectively and П†q (t) and П†q (t) be the characteristic
О“0 (Оѕ1 , Оѕ2 ) and О“n (Оё,
n
Л† Оѕ1 , Оѕ2 )
function of q and the conditional characteristic function of qn on О“n (Оё,
respectively. Then
П†qn (t) в†’ П†q (t)
10
a.s.
Proof. See Appendix B.
Consider local alternatives to the following form
Hn :
g(xt )
yt = Q(xt , Оё0 ) + в€љ + ut ,
n
(7)
where g(x) satisfies 0 < E[g(x)2 ] < в€ћ.
Under the local alternative (7), zn (Оѕ) converges to a Gaussian process with
mean function
Л†
Л† Оѕ1 )An (Оё)
ˆ −1 ∂Q(xt , θ)
О·(Оѕ) = E g(xt ) exp(Оѕ1 О¦(xt )) в€’ bn (Оё,
∂θ
and the covariance function О“0 (Оѕ1 , Оѕ2 ) as shown in Theorem 2 of Bierens and
Ploberger (1997). Under the Assumption A and the local alternative assumpton,
It could be shown that
в€љ
d
n(ОёЛ† в€’ Оё0 ) в€’
в†’
ОёЛ† в†’
N (E[g(xt )
Оё0
∂Q
∂Q ∂Q
], Aв€’1 E u2t
Aв€’1 )
∂θ
∂θ ∂θ
a.s.
О“n (Оё, Оѕ1 , Оѕ2 ) could be decomposed as the followings,
О“n (Оё, Оѕ1 , Оѕ2 ) =
=
1
n
1
n
n
2
t=1
n
2
(ut + Q(xt , Оё0 ) в€’ Q(xt , Оё)) w(xt , Оѕ1 , Оё)w(xt , Оѕ2 , Оё)
t=1
n
2
+
n
+
(yt в€’ Q(xt , Оё)) w(xt , Оѕ1 , Оё)w(xt , Оѕ2 , Оё)
1
n
g(xt )
(ut + Q(xt , Оё0 ) в€’ Q(xt , Оё)) в€љ w(xt , Оѕ1 , Оё)w(xt , Оѕ2 , Оё)
n
t=1
n
t=1
g(xt )2
w(xt , Оѕ1 , Оё)w(xt , Оѕ2 , Оё),
n
where
w(xt , Оѕ1 , Оё) = exp(Оѕ1 О¦(xt )) в€’ bn (Оё, Оѕ1 )An (Оё)в€’1
∂Q(xt , θ)
.
∂θ
Using the same argument in the proof of Theorem 7, the first term converges
to О“(Оё, Оѕ1 , Оѕ2 ) a.s. uniformly on Оћ. The second and the third term converges to
zero a.s. uniformly on Оћ.
Л† Оѕ1 , Оѕ2 ) converges to О“0 (Оѕ1 , Оѕ2 ) a.s. uniformly on Оћ under
Therefore О“n (Оё,
the local alternative. This implies that we can consistently estimate О»i s and
Theorem 7 also holds under the local alternative.
11
5
Concluding Remarks
This paper introduces an estimation method of the eigenvalues and eigenvectors from the sample covariance function, which involves estimated parameters.
Then prove the consistency of the estimated eigenvalues and the eigenvectors.
One drawback of the method is that it needs considerable computation time.
The estimated eigenvalues are the eigenvalues of the n Г— n matrix A where the
i-j element of A is
Л† wi , Оѕ)an (Оё,
Л† wj , Оѕ)dВµ(Оѕ).
an (Оё,
In general, it might be difficult to integrate analytically. We need about order
n Г— n times numerical integration to get A. Therefore some effective numerical
integration method or approximation method are required.
Unsolved problem is the asymptotic distribution of the estimated eigenvalues and eigenvectors. Some central limit theorems in Hilbert space might be
applicable. However, this further elaboration is beyond the scope of the present
paper.
12
References
[1] Philippe Besse and J. O. Ramsay. Principal components analysis of sampled
functions. Psychometrika, 51(2):285–311, 1986.
[2] Herman J. Bierens. A consistent conditional moment test of functional
form. Econometrica, 58(6):1443–1458, 1990.
[3] Herman J. Bierens. Topics in Advanced Econometrics. Cambridge University Press, 1994.
[4] Herman J. Bierens and Werner Ploberger. Asymptotic theory of integrated
conditional moment tests. Econometrica, 65(5):1129–1151, 1997.
[5] Nelson Dunford and Jacob T. Schwartz. Linear Operators Part II Spectral
Theory. Wiley, 1988.
[6] B. E. Hansen. Inference when a nuisance parameter is not identified under
the null hypothesis. Econometrica, 64(2):413–430, 1996.
[7] Kohtaro Hitomi. Common structure of consistent misspecification tests and
a new test. mimeo, 2000.
[8] A. Pousse J. Dauxois and Y. Romain. Asymptotic theory for the principal component analysis of vector random function: Some applications to
statistical inference. Journal of Multivariate Analysis, 12:136–154, 1982.
[9] James O. Ramsay and Bernard W. Silverman. Functional Data Analysis.
Springer-Verlog, 1997.
[10] J. O. Ramsey and C J. Dalzell. Some tools for functional data analysis (with
discussion). Journal of Royal Statistical Society Series B, 53(3):539–572,
1991.
[11] Alexander Weinstein and William Stenger. Methods of Intermediate Problems for Eigenvalues, volume 89 of Mathematics in Science and Engineering. Academic Press inc., 1972.
13
A
Maintained Assumption for ICM test
The following is Assumption A in Bierens (1990).
Assumption A
1. Let {wt = (yt , xt )|i = 1, 2, . . . , n} be a sequence of i.i.d. random variable
on R Г— Rd . Moreover, E[yt2 ] < в€ћ.
2. The parameter space О� is a compact and convex subset of Rq and Q(xt , Оё)
is for each Оё в€€ О� a Borel measurable real function on Rd and for each dvector x a twice continuously differentiable real function on О�. Moreover,
E[supОёв€€О� Q(xt , Оё)] < в€ћ and for i1 , i2 = 1, 2, . . . , q,
∂Q(xt , θ) ∂Q(xt , θ)
∂θi1
∂θi1
Оёв€€О�
∂Q(xt , θ) ∂Q(xt , θ)
E sup(yt в€’ Q(xt , Оё))2
∂θi1
∂θi2
Оёв€€О�
E sup
E sup (yt в€’ Q(xt , Оё))
Оёв€€О�
∂ 2 Q(xt , θ)
∂θi1 ∂θi2
< в€ћ,
< в€ћ,
< в€ћ.
3. E[(yt в€’ Q(xt , Оё))2 ] takes a unique minimum on О� at Оё0 . Under H0 the
parameter vector Оё0 is an interior point of О�.
4. A = E[(∂/∂θ)Q(xt , θ0 )(∂/∂θ )Q(xt , θ0 )] is nonsingular.
B
Mathematical Proofs
Lemma 8. Suppose An (Оё) and Bn (Оё) be random functions on a compact subset
О� вЉ‚ Rk , and An (Оё) and Bn (Оё) converges to nonrandom continuous functions
A(Оё) and B(Оё) a.s. uniformly on О� respectively. Then An (Оё)Bn (Оё) converges to
A(Оё)B(Оё) a.s. uniformly on О�.
Proof. There are null set N1 and N2 such that for every Оµ > 0 and every
ω ∈ Ω \ (N1 ∪ N2 ) ,
sup |An (ω, θ) − A(θ)| ≤ ε and sup |Bn (ω, θ) − B(θ)| ≤ ε if n > n0 (ω, ε).
Оёв€€О�
Оёв€€О�
Since A(Оё) and B(Оё) are continuous function on a compact set О�, there is M
such that supОёв€€О� |A(Оё)| < M and supОёв€€О� |B(Оё)| < M. And for every П‰ в€€
14
Ω \ (N1 ∪ N2 ),
sup |An (Оё)Bn (Оё) в€’ A(Оё)B(Оё)|
Оёв€€О�
≤
sup |An (Оё) в€’ A(Оё)| sup |B(Оё)| + sup |Bn (Оё) в€’ B(Оё)| sup |A(Оё)|
Оёв€€О�
Оёв€€О�
Оёв€€О�
Оёв€€О�
+ sup |An (Оё) в€’ A(Оё)| sup |Bn (Оё) в€’ B(Оё)|
Оёв€€О�
≤
Оёв€€О�
2MОµ + Оµ2
if n > n0 (П‰, Оµ). Thus supОёв€€О� |An (Оё)Bn (Оё) в€’ A(Оё)B(Оё)| в†’ 0 a.s.
Theorem 9. (theorem 2.7.5 Bierens 1994) Let X1 , X2 , . . . be a sequence of
i.i.d. random variable in Rd . Let f(x, Оё) be a Borel measurable function continuous on Rd Г— О�, where О� is a compact Borel set in Rk , which is continuous
in Оё for each x в€€ Rd . If E[supОёв€€О� |f(Xj , Оё)|] < в€ћ, then
1
n
n
i=1
E[f(x, Оё)] a.s. uniformly on О�.
f(Xi , Оё) в†’
Proof of Proposition 6
Л† Оѕ1 , Оѕ2 ) в†’ О“0 (Оѕ1 , Оѕ2 ) a.s. point
Proof. Almost sure convergence of ОёЛ† and О“n (Оё,
wisely satisfied under Assumption A using standard argument. So we concentrate a.s. uniform convergence of О“n (Оё, Оѕ1 , Оѕ2 ).
We could decompose О“n (Оё, Оѕ1 , Оѕ2 ) as the followings,
О“n (Оё, Оѕ1 , Оѕ2 ) =
1
n
n
t=1
(yt в€’ Q(xt , Оё)2 ) exp(Оѕ1 О¦(xt )) exp(Оѕ2 О¦(xt ))
в€’bn (Оё, Оѕ1 )An (Оё)в€’1
1
n
в€’bn (Оё, Оѕ2 )An (Оё)в€’1
1
n
n
t=1
n
t=1
: О“1n
∂Q(xt , θ)
(yt в€’ Q(xt , Оё) exp(Оѕ2 О¦(xt ))
∂θ
2
: bn Aв€’1
n О“n
∂Q(xt , θ)
(yt в€’ Q(xt , Оё) exp(Оѕ1 О¦(xt ))
∂θ
2
: bn Aв€’1
n О“n
+bn (Оё, Оѕ1 )An (Оё)в€’1 bn (Оё, Оѕ2 ).
Because of lemma 8, it is enough to show that
О“1n (Оё, Оѕ1 , Оѕ2 ) в†’
О“1n (Оё, Оѕ1 , Оѕ2 ) в†’
bn (Оё, Оѕ)
в†’
An (Оё)
в†’
E[(yt в€’ Q(xt , Оё))2 exp(Оѕ1 О¦(xt )) exp(Оѕ2 О¦(xt ))]
a.s. uniformly
∂Q(xt , θ)
E
(yt в€’ Q(xt , Оё) exp(Оѕ О¦(xt ))
∂θ
a.s. uniformly
∂Q(xt , θ)
E
exp(Оѕ О¦(xt )) a.s. uniformly
∂θ
∂Q(xt , θ) ∂Q(xt , θ)
E
a.s. uniformly.
∂θ
∂θ
15
(8)
(9)
(10)
(11)
Since Оћ is compact and О¦(x) is a bounded function, there exists M such that
for all x, | exp(Оѕ О¦(x))| < M. Thus
sup (yt в€’ Q(xt , Оё))2 exp(Оѕ1 О¦(xt )) exp(Оѕ2 О¦(xt ))
E
Оё,Оѕ1 ,Оѕ2
≤
M 2 E sup(yt в€’ Q(xt , Оё))2
≤
2M 2 E[yt2 ] + 2ME sup Q(xt , Оё)2
<
в€ћ,
Оё
Оё
by Assumption A.1 and A.2. Therefore (8) is satisfied by Theorem 9.
Similarly,
∂Q(xt , θ)
(yt в€’ Q(xt , Оё)) exp(Оѕ О¦(xt ))
∂θ
E sup
Оё,Оѕ
≤
ME sup
Оё
≤
M
<
в€ћ,
∂Q(xt , θ)
(yt в€’ Q(xt , Оё))
∂θ
∂Q(xt , θ)
E sup
∂θ
Оё
2
1/2
1/2
E sup (yt в€’ Q(xt , Оё))2
Оё
by Assumption A.1 and A.2. (9) is satisfied.
For bn (Оё, Оѕ),
∂Q(xt , θ)
exp(Оѕ О¦(xt ))
∂θ
E sup
Оё,Оѕ
∂Q(xt , θ)
∂θ
≤
ME sup
≤
∂Q(xt , θ)
ME sup
∂θ
Оё
<
в€ћ,
Оё
2 1/2
∂Q(xt , θ)
E sup
∂θ
Оё
2 1/2
by Assumption A.2. Therefore (10) is satisfied by Theorem 9. And (11) also
satisfied by Assumption A.2 and Theorem 9.
Then (8)-(11) were all satisfied. Thus by lemma 8, О“n (Оё, Оѕ1 , Оѕ2 ) в†’ О“(Оё, Оѕ1 , Оѕ2 )
a.s. uniformly on О� Г— Оћ Г— Оћ. We have proved the proposition.
Proof of Theorem 7
16
Proof. The characteristic function of q is
в€ћ
П†q (t) =
j=1
(1 в€’ 2iО»j t)в€’1/2 ,
and the conditional characteristic function of qn on О“n is
в€ћ
П†qn (t) =
j=1
where i =
в€љ
(1 в€’ 2iО»nj t)в€’1/2 ,
в€’1 and О»nj = 0 if j > n. It is enough to show that
log П†qn (t) в†’ log П†q (t)
a.s.
(12)
First, we prove that
в€’
1
2
в€ћ
j=1
{log(1 в€’ 2iО»nj t) + 2iО»nt t} в†’ в€’
1
2
в€ћ
j=1
{log(1 в€’ 2iО»j t) + 2iО»t t}
a.s.
(13)
as n increases. Let decompose the difference between the both side of the above
equation as the following, take a large enough integer r then
в€’
1
2
= в€’
1
2
1
в€’
2
в€ћ
j=1
r
j=1
в€ћ
{log(1 в€’ 2iО»nj t) + 2iО»nt t} +
1
2
{log(1 в€’ 2iО»nj t) + 2iО»nt t} +
1
2
j=r+1
в€ћ
j=1
r
j=1
1
{log(1 в€’ 2iО»nj t) + 2iО»nt t} +
2
{log(1 в€’ 2iО»j t) + 2iО»t t}
{log(1 в€’ 2iО»j t) + 2iО»t t}
: Sr
в€ћ
j=r+1
{log(1 в€’ 2iО»j t) + 2iО»t t}
= S r + Rr .
Since О»nj converges to О»j by Proposition 2, choosing large enough n we get
|Sr | < Оµ.
On the other hand there is a constant C1 that satisfies
| log(1 в€’ 2iО»t) + 2iО»t| < C1 О»2 t2 .
17
: Rr
Therefore
в€ћ
|Rr |
≤ C 1 t2
в€ћ
О»2nj + C1 t2
j=r+1
О»2j
j=r+1
в€ћ
≤ C1 t2 λn(r+1)
в€ћ
О»nj + C1 t2 О»r+1
j=1
О»j
j=1
since О»nj and О»j are decreasing sequence. By Proposition 2, О»nj converges to О»j
в€ћ
j=1
uniformly in j almost surely and
О»nj converges to
в€ћ
j=1
О»j almost surely
by Proposition 2,
lim |Rr | = 0
a.s.
r→∞
Thus (13) is satisfied.
Next, we will show that
в€’
1
2
n
j=1
log(1 в€’ 2iО»nj t) в†’ в€’
1
2
в€ћ
j=1
log(1 в€’ 2iО»j t) a.s.
The left hand side converges to
в€’
1
2
n
j=1
log(1 в€’ 2iО»nj t)
в€’
1
2
в†’ в€’
1
2
=
n
j=1
в€ћ
j=1
1
{log(1 в€’ 2iО»nj t) + 2iО»nj t} + 2it
2
1
{log(1 в€’ 2iО»j t) + 2iО»j t} + 2it
2
n
О»nj
j=1
в€ћ
О»j
a.s.
j=1
by (13) and Proposition 3. On the other hand,
lim в€’
n→∞
1
2
n
j=1
log(1 в€’ 2iО»j t) =
=
пЈ±
пЈІ 1
lim в€’
n→∞  2
пЈ±
пЈІ 1
lim в€’
n→∞  2
n
n
1
{log(1 в€’ 2iО»j t) + 2iО»j t} + 2it
О»j
пЈѕ
2
j=1
j=1
пЈј
n
n
пЈЅ
1
{log(1 в€’ 2iО»j t) + 2iО»j t} + lim 2it
О»j
 n→∞ 2
j=1
j=1
since the both term in the last equation converge. This implies that
в€’
1
2
n
j=1
log(1 в€’ 2iО»nj t) в†’ в€’
We have proved.
18
1
2
пЈј
пЈЅ
в€ћ
j=1
log(1 в€’ 2iО»j t) a.s.