APPENDIX B B.1 Proof of Lemma A.1

APPENDIX B
In Sections B.1, B.2, B.3 and B.4 we prove Lemma A.1, Lemma A.2, Corollary
A.3 and Lemma A.4, respectively, used in Appendices A.1-A.4 of the paper. These
proofs require technical Lemmas B.1-B.5, which are proved in Sections B.5-B.9.
B.1 Proof of Lemma A.1
Write:
T
1X
ϕ
T t=1
!
n
1X
a(Yi,t , fˆn,t (β), β) − E0 [ϕ (µt (β))]
n i=1
T
1X
=
ϕ (µt (β)) − E0 [ϕ (µt (β))]
T t=1
(
!
)
T
n
1X
1X
ϕ
a(Yi,t , ft (β), β) − ϕ (µt (β))
+
T t=1
n i=1
(
!
!)
T
n
n
X
1X
1X
1
+
ϕ
a(Yi,t , fˆn,t (β), β) − ϕ
a(Yi,t , ft (β), β)
T t=1
n i=1
n i=1
=:
J1,T (β) + J2,n,T (β) + J3,n,T (β).
Term J1,T (β) is the time series average of a nonlinear transformation of process µt (β).
We show in Section B.1.1 that sup|J1,T (β)| = op (1) . Term J2,n,T (β) accounts for the
β∈B
n
1X
discrepancy between the cross-sectional average
a(Yi,t , ft (β), β) and the conn i=1
ditional expectation µt (β). We prove that sup|J2,n,T (β)| = op (1) in Section B.1.2.
β∈B
Finally, term J3,n,T (β) is induced by the approximation of the pseudo-true factor
value ft (β) with estimator fˆn,t (β), and we show that sup|J3,n,T (β)| = op (1) in Section
β∈B
B.1.3.
B.1.1 Proof that sup|J1,T (β)| = op (1)
β∈B
p
From Theorem 21.9 in Davidson (1994) we have sup |J1,T (β)| → 0 if and only if:
β∈B
p
(i) |J1,T (β)| → 0, for any β in a dense subset of B,
1
(ii) J1,T (β) is stochastically equicontinuous, that is, for any ε > 0 there exists δ > 0
such that:


0 lim sup P 
sup
J1,T (β) − J1,T (β ) > ε < ε.
T →∞
β,β 0 ∈B:kβ 0 −β k≤δ
We prove points (i) and (ii) in Sections B.1.1.1 and B.1.1.2, respectively.
B.1.1.1 Pointwise convergence
Let β ∈ B be given. Let us first show that µt (β) is Near-Epoch Dependent in L2 -norm
t+m
(L2 -NED) on {ft : t ∈ Z} with size −α. 1 Indeed, for Ft−m
= σ (ft+τ : τ = 0, ±1, ..., ±m) we
have:
t+m µt (β) − E µt (β) | Ft−m
E a(Yi,t , ft (β), β) | ft − E [a(Yi,t , ft (β), β) | ft , ..., ft−m ]
≤
2
2
−α
= O m
,
by Condition (1) (iv), where kZk2 = E [kZk2 ]1/2 . Since µt (β) is L2 -NED of size −α
on {ft : t ∈ Z}, and ϕ is Lipshitz by condition (2), it follows that ϕ (µt (β)) is also
L2 -NED of size −α on {ft : t ∈ Z} by Theorem 17.12 of Davidson (1994). Moreover,
from condition (2) we have that ϕ (µt (β)) is Lτ -bounded, τ > 2. Then, since (ft ) is
geometric strong mixing (Assumption A.4), it follows that ϕ (µt (β)) − E0 [ϕ (µt (β))]
is a L2 -mixingale of size −α w.r.t. the filtration Ft = σ (fτ : τ ≤ t) by Theorem 17.5
in Davidson (1994). 2 Moreover, ϕ (µt (β)) is also uniformly integrable by Theorem
18.5 in Davidson (1994). It follows by the WLLN in Andrews (1988), Theorem 1,
T
1X
{ϕ (µt (β)) − E0 [ϕ (µt (β))]} = op (1).
that
T t=1
1
A strictly stationary process Xt is L2 -LED on {ft : t ∈ Z} with size −α if:
t+m Xt − E Xt | Ft−m
= O m−α , m → ∞,
2
t+m
where Ft−m
= σ (ft+τ : τ = 0, ±1, ..., ±m) [see Davidson (1994), Definition 17.1].
2
A strictly stationary, Ft -adapted, zero-mean process Xt is a L2 -mixingale of size −α w.r.t. Ft
if:
kXt − E [Xt | Ft−m ]k2 = O m−α , m → ∞,
[see Definition 1 in Andrews (1988) and Definition 16.1 in Davidson (1994)].
2
B.1.1.2 Stochastic equicontinuity
T
1X
qt (β) is stochastically
From Theorem 21.11 in Davidson (1994), J1,T (β) =
T t=1
equicontinuous if:
0
0
qt β − qt (β) ≤ Bt β − β , ∀β, β ∈ B,
(B.1)
T
1X
Bt = Op (1). Now,
T t=1
qt (β) = ϕ (µt (β)) − E0 [ϕ (µt (β))]. Using that ϕ is Lipshitz, and denoting by L the
Lipshitz constant, we get:
and Bt is a stochastic sequence independent of β such that
i
h 0 0
0
qt β − qt (β) ≤ L µt β − µt (β) + LE µt β − µt (β) .
∂a (Yi,t , ft (β), β) 0
0
Using µt β − µt (β) ≤ supE |ft β − β , we see that (B.1)
0
∂β β∈B
∂a (Yi,t , ft (β), β) ∂a (Yi,t , ft (β), β) |ft +E supE |ft .
is satisfied with Bt = supE ∂β 0
∂β 0
β∈B
β∈B
T
∂a (Yi,t , ft (β), β) 1X
|ft < ∞, which
Thus
Bt = Op (1) follows from E supE 0
T t=1
∂β
β∈B
is implied by condition (1) (iii).
B.1.2 Proof that sup|J2,n,T (β)| = op (1)
β∈B
Let ε > 0. Using that ϕ is Lipshitz, and denoting by L the Lipshitz constant, we
have that kx − yk ≤ L/ε implies |ϕ(x) − ϕ(y)| ≤ ε. Thus, we get:
(
!
)
#
T
n
1 X
1X
P sup ϕ
a(Yi,t , ft (β), β) − ϕ (µt (β)) ≥ ε
n i=1
β∈B T t=1
n
#
"
1 X
[a(Yi,t , ft (β), β) − µt (β)] ≥ L/ε = P1,ε .
≤ P sup sup β∈B 1≤t≤T n i=1
"
To bound P1,ε , define for any δ > 0 the event:
)
n
1 X
sup sup [a(Yi,t , ft (β), β) − µt (β)] ≤ δ .
β∈B 1≤t≤T n i=1
(
Ω1,n,T (δ) =
3
Lemma B.1: Under conditions (1) (i)-(iii), (v)-(vi) and (4) of Lemma A.1: P [Ω1,n,T (δ)] →
1 as n, T → ∞, for any δ > 0.
Since P1,ε = 1−P [Ω1,n,T (L/ε)], by Lemma B.1 we get that P1,ε → 0 as n, T → ∞,
for any ε > 0. It follows that sup|J2,n,T (β)| = op (1).
β∈B
B.1.3 Proof that sup|J3,n,T (β)| = op (1)
β∈B
Let ε > 0 be given. Then:
(
!
!)
#
T
n
n
1 X
X
1X
1
P sup ϕ
a(Yi,t , fˆn,t (β), β) − ϕ
a(Yi,t , ft (β), β)
≥ε
n i=1
n i=1
β∈B T t=1
#
"
n h
1 X
i
ˆ
a(Yi,t , fn,t (β), β) − a(Yi,t , ft (β), β) ≥ L/ε = P2,ε .
≤ P sup sup β∈B 1≤t≤T n i=1
"
To bound P2,ε , define for any δ > 0 the event:
n
)
1 Xh
i
sup sup a(Yi,t , fˆn,t (β), β) − a(Yi,t , ft (β), β) ≤ δ .
β∈B 1≤t≤T n i=1
(
Ω2,n,T (δ) =
Lemma B.2: Under conditions (1) (i)-(iii), (vii)-(viii) and (4) of Lemma A.1:
P [Ω2,n,T (δ)] → 1 as n, T → ∞, for any δ > 0.
Since P2,ε = 1−P [Ω2,n,T (L/ε)], by Lemma B.2 we get that P2,ε → 0 as n, T → ∞,
for any ε > 0. It follows sup|J3,n,T (β)| = op (1).
β∈B
B.2 Proof of Lemma A.2
n
1X
Let us first show that
a(Yi,t , fˆn,t (β), β) ∈ U for any 1 ≤ t ≤ T and β ∈ B, with
n i=1
probability approaching 1. We need the following Lemma B.3.
4
Lemma B.3: Under conditions (1) (ii) and (ix) of Lemma A.2, for any η > 0 there
exists a compact set K ⊂ U such that P [{µt (β), β ∈ B} ⊂ K] ≥ 1 − η.
Now, let η > 0 be given and, by Lemma B.3, let K1 be a compact subset of
U such that P [{µt (β), β ∈ B} ⊂ K1 ] ≥ 1 − η. Let further δ > 0 be such that
x ∈ SRr×r : dist(x, K1 ) ≤ δ ⊂ U, where SRr×r is the subset of (r, r) symmetric
matrices. Then:
"( n
)
#
1X
a(Yi,t , fˆn,t (β), β), 1 ≤ t ≤ T, β ∈ B ⊂ U
Pn,T := P
n i=1
≥ P [{µt (β), β ∈ B} ⊂ K1 ] + P [Ω1,n,T (δ)] − 1 ≥ P [Ω1,n,T (δ)] − η.
From Lemma B.1, it follows that lim supPn,T ≥ 1 − η. Since η can be chosen
n,T →∞
arbitrarily small, we get lim Pn,T = 1. Therefore we can focus on the event
n,T →∞
)
( n
X
1
a(Yi,t , fˆn,t (β), β), 1 ≤ t ≤ T, β ∈ B ⊂ U.
n i=1
Let us now prove Lemma A.2. Let ε, η > 0 be given. We have to prove that:
T
1 X
ϕ
lim supP sup n,T →∞
β∈B T t=1
"
n
1X
a(Yi,t , fˆn,t (β), β)
n i=1
!
#
− E0 [ϕ (µt (β))] ≥ ε ≤ η. (B.2)
Let us introduce an approximation for ϕ that is globally Lipshitz. More precisely, let
K ⊂ U be a compact set and let ϕ̃ be a Lipshitz function on U such that
ϕ̃ = ϕ on K and |ϕ̃| ≤ |ϕ| on U.
(B.3)
Such a function exists by condition (2) (i). Then (B.2) follows if:
T
1 X
ϕ̃
= lim supP sup n,T →∞
β∈B T t=1
"
A1,ε
!
#
n
1X
a(Yi,t , fˆn,t (β), β) − E0 [ϕ̃ (µt (β))] ≥ ε/3 ≤ η/2,
n i=1
A2,ε
"
T
1 X
ϕ
= lim supP sup n,T →∞
β∈B T t=1
"
!
n
1X
a(Yi,t , fˆn,t (β), β) − ϕ̃
n i=1
≤ η/2,
!#
#
n
1X
a(Yi,t , fˆn,t (β), β) ≥ ε/3
n i=1
(B.4)
5
and:
sup |E0 [ϕ̃ (µt (β))] − E0 [ϕ (µt (β))]| ≤ ε/3.
(B.5)
β∈B
From (B.3) and conditions (1) (ii) and (2) (ii), function ϕ̃ is Lipshitz and such that
E0 |ϕ̃(µt (β))|δ < ∞ for some δ > 2. Thus we get A1,ε = 0 by Lemma A.1. Let us
now show that the inequalities (B.4) and (B.5) hold for an appropriate choice of the
approximating function ϕ̃.
Let us first consider A2,ε . Since ϕ̃ = ϕ on K, in the event that defines A2,ε only
n
1X
the terms with
a(Yi,t , fˆn,t (β), β) ∈ Kc contribute to the sum. Let K0 ⊂ K be a
n i=1
compact set and δ > 0 such that:
dist (K0 , Kc ) > 2δ.
(B.6)
Define for δ > 0 the events:
(
Ω3,n,T (δ) =
)
n
X
1 1
sup sup
[a(Yi,t , ft (β), β) − µt (β)] ≤ δ .
β∈B 1≤t≤T λt (β) n i=1
and:
n
)
h
i
X
1 1
sup sup
a(Yi,t , fˆn,t (β), β) − a(Yi,t , ft (β), β) ≤ δ .
β∈B 1≤t≤T λt (β) n i=1
(
Ω4,n,T (δ) =
Lemma B.4: Under conditions (1) (i)-(iii), (v)-(vi), (ix)-(x) and (4) of Lemma A.2:
P [Ω3,n,T (δ)] → 1 as n, T → ∞, for any δ > 0.
Lemma B.5: Under conditions (1) (i)-(iii), (vii)-(x) and (4) of Lemma A.2: P [Ω4,n,T (δ)] →
1 as n, T → ∞, for any δ > 0.
When Ωn,T (δ) = ∩4j=1 Ωj,n,T (δ) occurs, we have that kA − Dk ≤ 2δ and A =
n
1X
a(Yi,t , fˆn,t (β), β) and D = µt (β), for
(1 + ∆) D with k∆k ≤ 2δ, P -a.s., for A =
n i=1
6
t = 1, ..., T and β ∈ B. Then, by (B.3) and condition (2) (ii) we have:
ϕ
!
n
1X
a(Yi,t , fˆn,t (β), β) − ϕ̃
n i=1
!
n
X
1
ˆ
a(Yi,t , fn,t (β), β) ≤ 2 ϕ
n i=1
!
n
1X
ˆ
a(Yi,t , fn,t (β), β) n i=1
≤ C kµt (β)kτ ψ (µt (β)) ,
for a constant C and δ < 1/4, and by (B.6):
n
1X
a(Yi,t , fˆn,t (β), β) ∈ Kc ⇒ µt (β) ∈ K0c ,
n i=1
P -a.s., for any t = 1, ..., T and β ∈ B. Thus, when Ωn,T (δ) occurs, we have:
"
T
1 X
sup ϕ
β∈B T t=1
!
n
1X
a(Yi,t , fˆn,t (β), β) − ϕ̃
n i=1
!#
n
1X
ˆ
a(Yi,t , fn,t (β), β) n i=1
T
1X
1 {µt (β) ∈ K0c } kµt (β)kτ ψ (µt (β))
≤ Csup
β∈B T t=1
T
CX
(1 − 1 [{µt (β), β ∈ B} ⊂ K0 ]) sup kµt (β)kτ ψ (µt (β)) .
≤
T t=1
β∈B
It follows that:
"
"
T
1 X
P sup ϕ
β∈B T t=1
≤
4
X
!
n
1X
a(Yi,t , fˆn,t (β), β) − ϕ̃
n i=1
#
!#
n
1X
a(Yi,t , fˆn,t (β), β) ≥ ε/3
n i=1
P [Ωj,n,T (δ)c ]
j=1
"
#
T
CX
τ
+P
(1 − 1 [{µt (β), β ∈ B} ⊂ K0 ]) sup kµt (β)k ψ (µt (β)) ≥ ε/3 .
T t=1
β∈B
From Lemmas B.1-B.2 and B.4-B.5 we get:
"
A2,ε
#
T
CX
≤P
(1 − 1 [{µt (β), β ∈ B} ⊂ K0 ]) sup kµt (β)kτ ψ (µt (β)) ≥ ε/3 = P2,ε .
T t=1
β∈B
By the Markov inequality, the Minkowsky inequality and the Cauchy-Schwartz in-
7
equality, we have:
P2,ε
3C
τ
≤
E (1 − 1 [{µt (β), β ∈ B} ⊂ K0 ])sup kµt (β)k ψ (µt (β))
ε
β∈B
1/q
3C
1/p
q
τq
≤
(1 − P [{µt (β), β ∈ B} ⊂ K0 ]) E supkµt (β)k ψ (µt (β))
ε
β∈B
1/(p0 q)
0
3C
1/p
τ qp
≤
(1 − P [{µt (β), β ∈ B} ⊂ K0 ]) E supkµt (β)k
ε
β∈B
0
1/(qq )
0
E supψ (µt (β))qq
,
β∈B
0
0
0
0
4
with p, q, p , q > 1 such that 1/p + 1/q = 1, 1/p + 1/q = 1. Fix q ∈ (1, 1+τ
) and
0
p = 4/(τ q). We get:
A2,ε ≤
3C
(1 − P [{µt (β), β ∈ B} ⊂ K0 ])1/p c1 ,
ε
τ /4 1/q−τ /4
4
4
where c1 = E supkµt (β)k
E supψ (µt (β))
< ∞ by conditions (1) (ii)
β∈B
β∈B
and (2) (ii).
Let us now bound sup |E0 [ϕ̃ (µt (β))] − E0 [ϕ (µt (β))]| . By similar arguments as
β∈B
above:
sup |E0 [ϕ̃ (µt (β))] − E0 [ϕ (µt (β))]| ≤ C (1 − P [{µt (β), β ∈ B} ⊂ K0 ])1/p c1 .
β∈B
From Lemma
B.3,
n
p we can
p ofix K0 , K and δ such that P [{µt (β), β ∈ B} ⊂ K0 ] ≥
εη
ε
, 3Cc1
and (B.6) hold. Then (B.4) and (B.5) follow, and the
1 − min
6Cc1
proof is concluded.
B.3 Proof of Corollary A.3
Let us first consider Case (A) and check condition (2) of Lemma A.2. (i) Let K ⊂ U
be compact, and let w, z ∈ K. Using that w−1 − z −1 = −z −1 (w − z) w−1 we deduce
2
that ϕ is Lipshitz on K with Lipshitz constant L = sup z −1 < ∞. (ii) Let
z∈K
w, z ∈ U, w = (1 + ∆)z, k∆k ≤ 1/2. Then 1 + ∆ is a nonsingular matrix. From
8
w−1 = z −1 (1 + ∆)−1 we see that condition (2) (ii) is satisfied with C = (1 + ∆)−1 ,
τ = 0 and ψ(z) = kz −1 k.
Let us now consider Case (B). Use the block notation:
x11 x12
x21 x22
x=
!
.
Then ϕ (x) = x11 − x12 x−1
22 x21 = ϕ1 (x) − ϕ2 (x), where ϕ1 (x) = x11 and ϕ2 (x) =
−1
x12 x22 x21 . The convergence of the transformation with ϕ1 follows from Lemma A.1.
Let us focus on ϕ2 and apply Lemma A.2. Condition (2) (i) is satisfied, since ϕ2 is
the product of three mappings that are Lipshitz on compact sets. To check condition
(2) (ii), let w, z ∈ U, w = (1 + ∆)z, k∆k ≤ 1/2. Then:
−1 −1 −1 kw21 k ≤ kwk2 w22
≤ k1 + ∆k2 kzk2 w22
.
kϕ2 (w)k ≤ kw12 k w22
Denote by d = r − s the dimension of w22 . Since w is positive definite, and norms on
Rd×d are equivalent, we have:
−1 w22 ≤ C
= C
sup
x∈Rd :kxk=1
sup
x∈Rr :kxk=1
x
0
−1
w22
x
=C
0
inf
x w22 x
x∈Rd :kxk=1
−1
≤C
inf
x∈Rr :kxk=1
−1
x wx
0
0
0 x w−1 x ≤ CC w−1 ,
where C, C 0 are constants. From the argument above for the inverse mapping, we
0
know that kw−1 k ≤ (1 + ∆)−1 kz −1 k. We get that kϕ2 (w)k ≤ CC (1 + ∆)−1 k1 + ∆k2
kzk2 kz −1 k and condition (2) (ii) is satisfied with τ = 2 and ψ(z) = kz −1 k.
B.4 Proof of Lemma A.4
The proof consists of several steps. a) First, we prove that β̂ is consistent. For any
ε > 0 we have:
"
#
h
i
P |β̂ − β0 | ≥ ε ≤ P
sup
LnT (β, θ̂) ≥ LnT (β̂, θ̂)
β∈B:|β−β0 |≥ε
"
≤ P
#
sup
LnT (β, θ̂) ≥ LnT (β0 , θ0 ) .
β∈B:|β−β0 |≥ε
9
By using (1i), (1ii) and (5i), we get:
h
"
i
#
P |β̂ − β0 | ≥ ε ≤ P
∗
∗
L (β) − L (β0 ) ≥ op (1) .
sup
β∈B:|β−β0 |≥ε
The probability in the RHS is o(1), since
sup
L∗ (β) − L∗ (β0 ) < 0 by identifi-
β∈B:|β−β0 |≥ε
cation condition (2i) and compactness of calB.
1
. Estimator β̂ satisfies with probb) Second, we prove that β̂ − β0 = Op √
nT
ability approaching (w.p.a.) 1 the first-order condition:
0=
∂L∗ 1 ∂L
∂LnT 1 ∂L2,nT 1,nT
nT
β̂, θ̂ =
β̂ +
β̂, θ̂ + 2
β̂, θ̂ .
∂β
∂β
n ∂β
n ∂β
By using the mean value Theorem, (4ii) and (5ii), we get:
0=
∂L∗nT
(β0 )
+
∂β
∂L∗nT β̃ ∂β∂β 0
β̂ − β0
1
+ Op
,
n
−1
β̃

where β̃ is a mean value. From (3i) and the consistency of β̂, matrix −
∂β∂β 0

∂L∗nT
exists w.p.a. and is Op (1). Thus, from (4i) and since T /n → 0, we get:
−1
∗
β̃
∂LnT (β0 )
1
1


β̂ − β0 = −
+ Op
= Op √
.
∂β∂β 0
∂β
n
nT

∂L∗nT
1
follows from similar arguments as above, by
c) Third, β̂ − β0 = Op √
nT
setting functions L1,nT (β, θ) and L2,nT (β, θ) equal to zero.
d) Fourth, let us show that θ̂ is consistent. For any ε > 0 we have:
∗
h
i
P |θ̂ − θ0 | ≥ ε ≤ P
"
#
sup
LnT (β̂, θ) ≥ LnT (β̂, θ̂)
θ∈Θ:|θ−θ0 |≥ε
"
≤ P
#
sup
LnT (β̂, θ) ≥ LnT (β0 , θ0 ) .
θ∈Θ:|θ−θ0 |≥ε
10
Using (1ii), (5i) and the consistency of β̂, the RHS probability is such that:
"
P
#
LnT (β̂, θ) ≥ LnT (β0 , θ0 )
sup
θ∈Θ:|θ−θ0 |≥ε
#
1
1
1
∗
∗
L1,nT (β̂, θ) − L1,nT (β0 , θ0 ) ≥ LnT (β0 ) − LnT β̂ + Op
= P
sup
n
n2
θ∈Θ:|θ−θ0 |≥ε n
#
"
h
i
∗
∗
= P
sup L1 (β0 , θ) − L1 (β0 , θ0 ) ≥ n LnT (β0 ) − LnT β̂ + op (1) .
"
θ∈Θ:|θ−θ0 |≥ε
Thus:
h
i
P |θ̂ − θ0 | ≥ ε ≤ P
#
h
i
L1 (β0 , θ) − L1 (β0 , θ0 ) ≥ n L∗nT (β0 ) − L∗nT β̂ + op (1) .
"
sup
θ∈Θ:|θ−θ0 |≥ε
i
∗
∗
β̂ = op (1), we can expand LnT (β0 ) and LnT β̂
(β0 ) −
To prove that n
∂L∗nT β̂ ∗
= 0. More precisely, we get:
around β̂ ∗ at order 2 and use that
∂β
h
L∗nT
L∗nT
L∗nT
(β0 ) =
L∗nT
0 ∂ 2 L∗nT β̈ 1
∗
β̂ − β̂ ∗
β̂
−
β̂
,
β̂ = L∗nT β̂ ∗ +
2
∂β∂β 0
and:
L∗nT
β̂
∗
+
1
2
β0 − β̂
∗
0 ∂ 2 L∗nT β̇ ∂β∂β 0
β0 − β̂
∗
,
√ where β̇ and β̈ are mean values. From b) and c) above, we have β0 −β̂ ∗ = Op 1/ nT
h
i
√ and β̂−β̂ ∗ = Op 1/ nT , and from (3i) we get n L∗nT (β0 ) − L∗nT β̂ = Op (1/T ) =
op (1). Therefore we get:
h
i
"
#
P |θ̂ − θ0 | ≥ ε ≤ P
L1 (β0 , θ) − L1 (β0 , θ0 ) ≥ op (1) .
sup
θ∈Θ:|θ−θ0 |≥ε
The RHS probability is o(1), since
sup
L1 (β0 , θ) − L1 (β0 , θ0 ) < 0 from (2ii) and
θ∈Θ:|θ−θ0 |≥ε
the compactness of Θ.
e) Fifth, let us show the asymptotic normality. The first-order conditions for β̂, θ̂
11
are:
∂L∗ 1 ∂L
∂LnT 1 ∂L2,nT 1,nT
nT
β̂, θ̂ =
β̂ +
β̂, θ̂ + 2
β̂, θ̂ ,
∂β
∂β
n ∂β
n ∂β
∂LnT
∂L1,nT
1 ∂L2,nT
0 =
β̂, θ̂ ⇔ 0 =
β̂, θ̂ +
β̂, θ̂ ,
∂θ
∂θ
n ∂θ
0 =
where a factor 1/n in the second equation cancels. Let us multiply the first equation
√
√
by nT , the second equation by T , and use the mean-value Theorem to get:
√
∂L∗nT (β0 ) ∂ 2 L∗nT (β̃) √ 0 =
+
nT
nT β̂ − β0
∂β
∂β∂β 0
r
T ∂L1,nT (β0 , θ0 ) 1 ∂ 2 L1,nT (β̃, θ̃) √ 1 ∂ 2 L1,nT (β̃, θ̃) √ √
+
+
nT
β̂
−
β
+
T
θ̂
−
θ
0
0
n
∂β
n
∂β∂β 0
∂β∂θ0
n
r
1 T ∂L2,nT +
β̂, θ̂ ,
n n ∂β
and:
0 =
∂ 2L
√ √ ∂L1,nT (β0 , θ0 )
1 ∂ 2 L1,nT (β̃, θ̃) √ 1,nT (β̃, θ̃)
T
+√
nT
β̂
−
β
+
T
θ̂
−
θ
0
0
∂θ
∂θ∂β 0
∂θ∂θ0
n
√
T ∂L2,nT β̂, θ̂ ,
+
n ∂β
where β̃ and θ̃ are mean values. Thus, we get from (4ii), (5ii) and T /n → 0:

−
∂ 2 L∗nT (β̃)
∂ 2 L1,nT (β̃,θ̃)
+ n1 ∂β∂β
0
∂β∂β 0
2L
∂
(
β̃,
θ̃)
1,nT
√1
n
∂θ∂β 0
∂ 2 L1,nT (β̃,θ̃)
√1
n
∂β∂θ0
∂ 2 L1,nT (β̃,θ̃)
∂θ∂θ0
 √

nT β̂ − β0
 √ =
T θ̂ − θ0
" √
∂L∗nT (β0 )
∂β
∂L1,nT (β0 ,θ0 )
∂θ
nT
√
T
Then, from (3i), (3ii), (3iii),
√ √ ∂L∗ (β0 )
nT β̂ − β0 = (I0∗ )−1 nT nT
+ op (1),
∂β
√ √ ∂L1,nT (β0 , θ0 )
−1
T θ̂ − θ0 = I1,θθ
T
+ op (1).
∂θ
The joint asymptotic normality follows from (4i).
f) Finally, the asymptotic expansion of β̂ ∗ is the same as that of β̂. The conclusion
follows.
12
#
+op (1).
B.5 Proof of Lemma B.1
The proof of Lemma B.1 is obtained by a similar argument as in the proof of Lemma
B.4 (Section B.8) replacing λt (β) with 1.
B.6 Proof of Lemma B.2
The proof of Lemma B.2 is obtained by a similar argument as in the proof of Lemma
B.5 (Section B.9) replacing λt (β) with 1.
B.7 Proof of Lemma B.3
Let λ(x) and Λ(x) denote the smallest and respectively the largest eigenvalues of
the symmetric matrix x ∈ SRr×r , and let λt (β) = λ (µt (β)) and Λt (β) = Λ (µt (β)).
For any constants c, C such that 0 < c ≤ C < ∞ define the compact set Kc,C =
x ∈ SRr×r : c ≤ λ(x) ≤ Λ(x) ≤ C ⊂ U. Then:
P [{µt (β), β ∈ B} ⊂ Kc,C ] = P inf λt (β) ≥ c, supΛt (β) ≤ C
β∈B
β∈B
≥ 1 − P inf λt (β) < c − P supΛt (β) > C
β∈B
β∈B
−1
−1
= 1 − P supλt (β) > c
− P supkµt (β)k > C
β∈B
β∈B
−1
−1
≥ 1 − cE supλt (β)
− C E supkµt (β)k ,
β∈B
β∈B
by the Markov inequality. The two expectations in the last line are finite by conditions
(1) (ii) and (ix). Then, for any η > 0, there exist c > 0 and C < ∞ such that
P [{µt (β), β ∈ B} ⊂ Kc,C ] ≥ 1 − η.
13
B.8 Proof of Lemma B.4
n
1 X
Let us define Wn,t (β) = √
[a(Yi,t , ft (β), β) − µt (β)]. Then:
n i=1
X
T
kWn,t (β)k
1
kWn,t (β)k
1
P [Ω3,n,T (δ) ] ≤ P √ sup sup
≥δ ≤
P √ sup
≥δ
n β∈B 1≤t≤T λt (β)
n β∈B λt (β)
t=1
1
kWn,t (β)k
≥δ .
= T P √ sup
n β∈B λt (β)
c
2
Denote by Wj,l,n,t (β) the elements of matrix Wn,t (β). Since kWn,t (β)k =
r
X
|Wj,l,n,t (β)|2 ,
j,l=1
we have:
X
r
1
kWn,t (β)k
1
|Wj,l,n,t (β)|
δ
P √ sup
≥δ ≤
P √ sup
≥
,
λ
r
n β∈B λt (β)
n
β∈B
t (β)
j,l=1
Thus, we have to show that:
δ
1
|Wj,l,n,t (β)|
≥
→ 0,
T P √ sup
λt (β)
r
n β∈B
(B.7)
for any j, l.
To control the sup over β ∈ B, let us introduce a finite covering of the compact
set B ⊂ Rq by M open balls B (βm , ε) around βm and with radius ε, m = 1, ..., M .
We let M = MT and ε = εT depend on sample size T , such that εT → 0, MT → ∞
and MT = O ε−q
T . We have:
sup
β∈B
|Wj,l,n,t (β)|
≤
λt (β)
≤
|Wj,l,n,t (β)|
m=1,..,MT β∈B(βm ,ε )
λt (β)
T
max
max
m=1,..,MT
sup
0
W
β
|Wj,l,n,t (βm )|
W
(β)
j,l,n,t
j,l,n,t
+
sup
−
.
0
λt (βm )
λ
(β
)
λ
(β)
0
0
t
t
β,β ∈B:kβ −β k≤εT
14
Thus we get:
1
|Wj,l,n,t (β)|
δ
P √ sup
≥
λt (β)
r
n β∈B


0
1
Wj,l,n,t (β) δ
Wj,l,n,t β
sup
≤ P √
−
≥ 
0
λt (β) 2r
n β,β 0 :kβ 0 −β k≤εT λt (β )
1 |Wj,l,n,t (β)|
δ
≥
=: A1 + A2 .
(B.8)
+ MT supP √
λt (β)
2r
n
β∈B
B.8.1 Bound of A1
By the Markov inequality we have:

0
2r
Wj,l,n,t (β) 
Wj,l,n,t β

√
A1 ≤
−
E
sup
.
λt (β 0 )
λt (β) δ n
β,β 0 :kβ 0 −β k≤εT

To bound the expectation we use:
0
W
Wj,l,n,t (β) j,l,n,t β
−
sup ≤ supλt (β)−1 sup |Wj,l,n,t (β 0 ) − Wj,l,n,t (β)|
0
0
λ
(β
)
λ
(β)
β∈B
t
t
kβ −βk≤εT
kβ 0 −βk≤εT
+ sup|Wj,l,n,t (β)|
β∈B
|λt (β 0 )−1 − λt (β)−1 |.
sup
kβ 0 −βk≤ε
T
We have:
n 1 X
sup ka (Yi,t , ft (β), β)k + E sup ka (Yi,t , ft (β), β)k | ft ,
sup|Wj,l,n,t (β)| ≤ √
n i=1 β∈B
β∈B
β∈B
and:
|Wj,l,n,t (β 0 ) − Wj,l,n,t (β)|
sup
kβ 0 −βk≤ε
1
≤ √
n
T
n X
i=1
∂a (Yi,t , ft (β), β) ∂a (Yi,t , ft (β), β) + E sup | f t εT .
sup ∂β 0
∂β 0
β∈B
β∈B
15
Moreover, for any β, β 0 ∈ B such that kβ 0 − βk ≤ εT :
|λt (β 0 )−1 − λt (β)−1 |
0
0 −1
0
−1 = sup x µt (β ) x − sup x µt (β) x
x:kxk=1
x:kxk=1
0
≤ sup x µt (β 0 )−1 − µt (β)−1 x
x:kxk=1
0 −1
≤ kµt (β )
∂a (Yi,t , ft (β), β) |ft εT .
− µt (β) k ≤ supkµt (β) k E sup ∂β 0
β∈B
β∈B
−1
−1 2
Thus, using supkµt (β)−1 k ≤ Csupλt (β)−1 , we get:
β∈B
A1
β∈B
∂a (Yi,t , ft (β), β) 4CrεT
−1
≤
E supλt (β) E sup | ft
δ
∂β
β∈B
β∈B
∂a (Yi,t , ft (β), β) 4CrεT
−2
|ft
+
E E sup ka (Yi,t , ft (β), β)k | ft supλt (β) E sup δ
∂β 0
β∈B
β∈B
β∈B
4Crc1 εT
,
≤
δ
where:
#1/2
∂a (Yi,t , ft (β), β) 2
= E supλt (β)
E sup ∂β
β∈B
β∈B
4 #1/4
1/2 1/4 "
∂a
(Y
,
f
(β),
β)
i,t t
+ E supλt (β)−4
< ∞,
E sup ka (Yi,t , ft (β), β)k4
E sup ∂β
β∈B
β∈B
β∈B
c1
−2
1/2
"
by Conditions (1) (ii), (iii), (ix).
B.8.2 Bound of A2
To bound A2 , write for β ∈ B:
1 |Wj,l,n,t (β)|
δ
P √
≥
λt (β)
2r
n
1
δ
= E P √ |Wj,l,n,t (β)| ≥ λt (β) | (ft )
2r
n
" " n
##
X
δλ (β)
t
= E P [aj,l (Yi,t , ft (β), β) − µj,l,t (β)] ≥
n | (ft ) ,
2r
i=1
16
where aj,l and µj,l,t denote the elements of matrices a and µt , respectively. To bound
the inner conditional probability, we use the independence property of the Yi,t conditional on (ft ), and the Bernstein’s inequality [e.g., Bosq (1998), Theorem 1.2]. Let
us first check the Cramer’s conditions. From conditions (v) and (x), we have: 3
h
i
h
i
E kaj,l (Yi,t , ft (β), β) − µj,l,t (β)kk |ft ≤ E ka(Yi,t , ft (β), β) − µt (β)kk |ft ≤ γt (β)k k!,
for k = 3, 4, · · · . Then, from the Bernstein’s inequality applied conditional on (ft ),
we get:
" n
#
X
δ
P [aj,l (Yi,t , ft (β), β) − µj,l,t (β)] ≥ n λt (β) | (ft )
2r
i=1
!
2
n 4rδ 2 λt (β)2
≤ 2 exp −
4γt (β)2 + rδ λt (β)γt (β)
λt (β)2
≤ 2 exp −c4 n
,
γt (β)2 + λt (β)γt (β)
P -a.s., where c4 =
(B.9)
δ2
1
. By using:
2
4r max {4, δ/r}
−1
λt (β)2
γt (β)
γt (β)
=
1+
≥ ξt ,
γt (β)2 + λt (β)γt (β)
λt (β)
λt (β)
for any β ∈ B, we get:
1 |Wj,l,n,t (β)|
δ
sup P √
≥
≤ 2E [exp (−c4 nξt )] .
νt (β)
2r
n
β∈B
Using conditions (vi) and (x), and condition (4) on the rate of divergence of n and
T , we get E [exp (−c4 nξt )] ≤ C1 exp −c5 nδ ≤ C1 exp −c6 T δd . Thus:
A2 ≤ 2C1 MT exp −c6 T δd .
3
For the purpose of simplifying the regularity assumptions, the Cramer’s conditions are stated
in a slightly different form compared to Theorem 1.2 in Bosq (1998). The inequality (B.9) can be
proved by the same arguments as in the proof of Theorem 1.2 in Bosq (1998).
17
B.8.3 Proof of (B.7)
From (B.8) we get:
δ
4Crc1
1
|Wj,l,n,t (β)|
≥
≤
T εT + 2C1 T MT exp −c6 T δd .
T P √ sup
νt (β)
r
δ
n β∈B
qb
Now chose εT = T −b for b > 1. Using MT = O ε−q
=
O
T
, (B.7) follows.
T
B.9 Proof of Lemma B.5
ˆ
For any η > 0, if sup sup fn,t (β) − ft (β) ≤ η then:
β∈B 1≤t≤T
n h
n
1 X
i
∂a (Yi,t , f, β) 1X
ˆ
η.
a(Yi,t , fn,t (β), β) − a(yi,t , ft (β), β) ≤ sup sup
sup
0
n
β∈B 1≤t≤T n
∂f
i=1
i=1 f :kf −ft (β)k≤η
Thus, for any sequence ηT → 0 and constant η ∗ > 0 we get:
ˆ
P [Ω4,n,T (δ) ] ≤ P sup sup fn,t (β) − ft (β) > ηT
β∈B 1≤t≤T
"
#
n
∂a (Yi,t , f, β) 1 1X
>δ .
+P ηT sup sup
sup
∂f 0
β∈B 1≤t≤T λt (β) n i=1 f :kf −ft (β)k≤η ∗
c
∂a (Yi,t , f, β) , νt (β) = E0 b(Yi,t , ft (β), β)|ft
By denoting b(Yi,t , ft (β), β) =
sup
0
∂f
f :kf −ft (β)k≤η ∗
1
and ςt = sup
νt (β), we get:
β∈B λt (β)
ˆ
P [Ω4,n,T (δ) ] ≤ P sup sup fn,t (β) − ft (β) > ηT
β∈B 1≤t≤T
"
#
n
1 1X
δ
|b(Yi,t , ft (β), β) − νt (β)| ≥
+ P sup sup
2ηT
β∈B 1≤t≤T λt (β) n i=1
δ
=: P1,n,T + P2,n,T + P3,T .
+ P sup ςt ≥
2ηT
1≤t≤T
c
Let ηT = R(logT )−1/δ̄ , where R > 0 is a constant and δ̄ is defined in condition (1) (x).
Then P1,n,T = o(1) as n, T → ∞ from condition (3). We prove that P2,n,T = o(1) and
18
P3,T = o(1) in Sections B.9.1 and B.9.2, respectively. Then the conclusion follows.
B.9.1 Proof that P2,n,T = o(1)
Since
δ
→ ∞, we have:
2ηT
"
P2,n,T
#
n
X
1 1
≤ P sup sup
|b(Yi,t , ft (β), β) − νt (β)| ≥ δ ∗ ,
β∈B 1≤t≤T λt (β) n i=1
for any δ ∗ > 0 and large T . The RHS probability converges to zero by the same
argument as in the proof of Lemma B.4 in Section B.8 and using conditions (1)
(i)-(iii), (vii), (ix)-(x).
B.9.2 Proof that P3,T = o(1)
We have from condition (1) (x):
P3,T
δ
≤ T P ςt ≥
2ηT
−δ̄
−δ̄
≤ C3 T exp −c4 R logT = C3 T 1−c4 R ,
for c4 = C4 (δ/2)δ̄ . Then, for R < (1/c4 )−1/δ̄ we get P3,T = o(1).
19