Supplement to “Nonparametric Stochastic Discount Factor
Decomposition”
Timothy M. Christensen
This supplementary material contains sufficient conditions for several assumptions in Sections
3 and 4 and proofs of all results in the main text.
C
Some sufficient conditions
This appendix presents sufficient conditions for Assumptions 3.3, 3.4(b) and 4.3 and bounds
⇤
for the terms ⌘n,k and ⌘n,k
in display (23) and ⌫n,k in display (37). Proofs of results in this
appendix are contained in the Online Appendix.
C.1
Sufficient conditions for Assumptions 3.3 and 3.4(b)
We assume that the state process X = {Xt : t 2 T } is either beta-mixing or rho-mixing.
The beta-mixing coefficient between two -algebras A and B is:
2 (A, B) = sup
X
(i,j)2I⇥J
|P(Ai \ Bj )
P(Ai )P(Bj )|
with the supremum taken over all A-measurable finite partitions {Ai }i2I and B-measurable
finite partitions {Bj }j2J . The beta-mixing coefficients of X are defined as:
q
= sup ( (. . . , Xt 1 , Xt ), (Xt+q , Xt+q+1 , . . .)) .
t
We say that X is exponentially beta-mixing if
coefficients of X are defined as:
⇢q =
sup
2L2 :E[ ]=0,k k=1
q
Ce
for some C, c > 0. The rho-mixing
⇥
⇤1/2
E E[ (Xt+q )|Xt ]2
.
We say that X is exponentially rho-mixing if ⇢q e
We use the sequence ⇠k = supx kG
cq
1/2 k
cq
for some c > 0.
b (x)k to bound convergence rates. When X has
1
bounded rectangular support and Q has a density that is bounded away from 0 and 1,
p
⇠k is known to be O( k) for (tensor-product) spline, cosine, and certain wavelet bases and
O(k) for (tensor-product) polynomial series (Newey, 1997; Chen and Christensen, 2015).
It is also possible to derive alternative sufficient conditions in terms of higher moments of
kG 1/2 bk (Xt )k (instead of supx kG 1/2 bk (x)k) by extending arguments in Hansen (2015) to
accommodate weakly-dependent data and asymmetric matrices.
C.1.1
Sufficient conditions in Case 1
The first result below uses an exponential inequality for weakly-dependent random matrices
from Chen and Christensen (2015). The second extends arguments from Gobet et al. (2004).
Lemma C.1 Let the following hold:
(a) X is exponentially beta-mixing
(b) E[m(Xt , Xt+1 )r ] < 1 for some r > 2
2+4/r
(c) ⇠k
(log n)2 /n = o(1).
Then: (1) Assumption 3.3 holds.
p
1+2/r
⇤
(2) We may take ⌘n,k = ⌘n,k
= ⇠k
(log n)/ n in display (23).
4+8/r
(3) If, in addition, ⇠k
(log n)4 /n = o(1) then Assumption 3.4(b) holds.
Lemma C.2 Let the following hold:
(a) X is exponentially rho-mixing
(b) E[m(Xt , Xt+1 )r ] < 1 for some r > 2
2+4/r
(c) ⇠k
k/n = o(1).
Then: (1) Assumption 3.3 holds.
1+2/r p
⇤
(2) We may take ⌘n,k = ⌘n,k
= ⇠k
/ n in display (23).
4+8/r 2
(3) If, in addition, ⇠k
k /n = o(1) then Assumption 3.4(b) also holds.
C.1.2
Sufficient conditions in Case 2 with parametric first-stage
The following lemma presents one set of sufficient conditions for Assumption 3.3 and 3.4(b)
when ↵0 2 A ✓ Rd↵ is a finite-dimensional parameter.
Lemma C.3 Let the conditions of Lemma C.1 hold for m(x0 , x1 ) = m(x0 , x1 ; ↵0 ), and let:
2
(a) kˆ
↵ ↵0 k = Op (n 1/2 )
(b) m(x0 , x1 ; ↵) be continuously di↵erentiable in ↵ on a neighborhood N of ↵0 for all
(x0 , x1 ) 2 X 2 and let there exist a function m̄ : X 2 ! R with E[m̄(Xt , Xt+1 )2 ] < 1
such that:
@m(x0 , x1 ; ↵)
sup
m̄(x0 , x1 ) for all (x0 , x1 ) 2 X 2 .
@↵
↵2N
Then: (1) Assumption 3.3 holds.
p
1+2/r
⇤
(2) We may take ⌘n,k = ⌘n,k
= ⇠k
(log n)/ n in display (23).
4+8/r
(3) If, in addition, ⇠k
(log n)4 /n = o(1) then Assumption 3.4(b) holds.
⇤
The conditions on k and bounds for ⌘n,k and ⌘n,k
are the same as Lemma C.1. Therefore,
b and M
c relative
here first-stage estimation of ↵ does not reduce the convergence rates of G
to Case 1.
C.1.3
Sufficient conditions in Case 2 with semi/nonparametric first-stage
We now present one set of sufficient conditions for Assumptions 3.3 and 3.4(b) when ↵0 2
A ✓ A is an infinite-dimensional parameter and the parameter space is A ✓ A (a Banach
space) equipped with a norm k · kA . This includes the case in which ↵ is a function, i.e. ↵ = h
with A = H a function space, and the case in which ↵ consists of both finite-dimensional
and function parts, i.e. ↵ = (✓, h) with A = ⇥ ⇥ H where ⇥ ✓ Rdim(✓) .
For each ↵ 2 A we define M(↵) as the operator M(↵) (x) = E[m(Xt , Xt+1 ; ↵) (Xt+1 )|Xt = x]
with the understanding that M(↵0 ) = M. Let M = {m(x0 , x1 ; ↵) m(x0 , x1 ; ↵0 ) : ↵ 2 A}.
We say M has an envelope function E if there exists some measurable E : X 2 ! [1, 1) such
that |m(x0 , x1 )| E(x0 , x1 ) for every (x0 , x1 ) 2 X and m 2 M. Let M⇤ = {m/E : m 2 M}.
The functions in M⇤ are clearly bounded by ±1. Let N[ ] (u, M⇤ , k · kp ) denote the entropy
with bracketing of M⇤ with respect to the Lp norm k · kp . Finally, let `⇤ (↵) = kM(↵) Mk
and observe that `⇤ (↵0 ) = 0.
Lemma C.4 Let the conditions of Lemma C.1 hold for m(x0 , x1 ) = m(x0 , x1 ; ↵0 ), and let:
(a) M have envelope function E with kEk4s < 1 for some s > 1
(b) log N[ ] (u, M⇤ , k·k 4sv ) C[ ] u 2⇣ for some constants C[ ] > 0, ⇣ 2 (0, 1) and v 2 (1, 2s)
2s v
(c) `⇤ (↵) is pathwise di↵erentiable at ↵0 with |`⇤ (↵) `⇤ (↵0 ) `˙⇤↵0 [↵ ↵0 ]| = O(k↵ ↵0 k2A ),
p
kˆ
↵ ↵0 kA = op (n 1/4 ) and n`˙⇤↵0 [ˆ
↵ ↵0 ] = Op (1)
v
p
4 2ssv v
⇣ 2s
1/3
(d) ⇠k
(k log k)/n = o(1), ⇠k 2sv = O( k log k) and (log n) = O(⇠k ).
3
Then: (1) Assumption 3.3 holds.
p
2
1+2/r
⇤
(2) We may take ⌘n,k = ⌘n,k
= ⇠k
(log n)/ n + ⇠k
(3) If, in addition,
8
4+8/r
[⇠k
(log n)4 +⇠k
v
⇣ 2s
2sv
Note that the condition ⇠k
C.2
4s 2v
sv
2s v
2sv
p
(k log k)/n in display (23).
(k log k)2 ]/n = o(1) then Assumption 3.4(b) holds.
p
p
= O( k log k) is trivially satisfied when ⇠k = O( k).
Sufficient conditions for Assumption 4.3
The following is one set of sufficient conditions for Assumption 4.3 assuming beta-mixing.
Recall that ⇠k = supx kG 1/2 bk (x)k.
Lemma C.5 Let the following hold:
(a) X is exponentially beta-mixing
(b) E[(G1t+1 )2s ] < 1 for some s > 1
2s 1
(c) [⇠k2 (log n)2 + ⇠k2+2 k]/n = o(1) and (log n) s 1 k/n = o(1).
Then: (1) Assumption 4.3 holds.
p
p
(2) We may take ⌫n,k = ⇠k1+ k/n + ⇠k (log n)/ n in display (37).
D
Proofs of results in the main text
Notation: For v 2 Rk , define:
kvk2G = v 0 Gk v
1/2
or equivalently kvkG = kGk vk. For any matrix A 2 Rk⇥k we define:
kAkG = sup{kAvkG : v 2 Rk , kvkG = 1} .
We also define the inner product weighted by Gk , namely hu, viG = u0 Gk v. The inner product
h·, ·iG and its norm k · kG are germane for studying convergence of the matrix estimators, as
(Rk , h·, ·iG ) is isometrically isomorphic to (Bk , h·, ·i). The notation an . bn for two positive
sequences an and bn means that there exists a finite positive constant C such that an Cbn
for all n sufficiently large; an ⇣ bn means an . bn and bn . an .
4
D.1
Proofs of results in Sections 2, 3 and 4
Proof of Proposition 2.1. Theorem V.6.6 of Schaefer (1974) implies, in view of Assumption 2.1, that ⇢ := r(M) > 0 and that M has a unique positive eigenfunction 2 L2
corresponding to ⇢. Applying the result to M⇤ in place of M guarantees existence of ⇤ 2 L2 .
This proves part (a). Theorem V.6.6 of Schaefer (1974) also implies that ⇢ is isolated and
the largest eigenvalue of M. Theorem V.5.2(iii) of Schaefer (1974), in turn, implies that ⇢
is simple, completing the proof of part (c). Theorem V.5.2(iv) of Schaefer (1974) implies
that is the unique positive solution to (6). The same result applied to M⇤ in place of M
guarantees uniqueness of ⇤ , proving part (b). Part (d) follows from Proposition F.3.
Proof of Theorem 3.1. Immediate from Lemmas A.2 and A.4.
Proof of Corollary 3.1. We first verify Assumption 3.2. By Theorem 12.8 of Schumaker
(2007) and (ii)–(iv), for each 2 L2 there exists a hk (M ) 2 Bk such that:
hk (M )k . k
kM
p̄/d
kM kW p̄ . k
p̄/d
k k.
Therefore,
kM
⇧k M k = kM
and so kM
hk (M ) + ⇧k (hk (M )
⇧k Mk = O(k
Similar arguments yield
k
p̄/d
M )k 2kM
hk (M )k . k
p̄/d
k k
) = o(1) as required.
= O(k
p/d
) and
⇤
k
= O(k
p/d
).
By Lemma C.2, conditions (iv)–(vii) are sufficient for Assumption 3.3 and we may take
rd
p
⇤
⌘n,k = ⌘n,k
= k (r+2)/(2r) / n. Choosing k ⇣ n 2rp+(2+r)d balances bias and variance terms and
we obtain the convergence rates as stated.
P1
Proof of Remark 3.1. First observe that M =
n=1 µn h , 'n ign . Taking the inner
product of both sides of M = ⇢ with gn , we obtain µn h , 'n i = ⇢h , gn i for each n 2 N.
P
P
By Parseval’s identity, k k2 = n2N h , 'n i2 ⇢2 n2N:µn >0 µn 2 h , gn i2 . Similarly, k ⇤ k2
P
⇢2 n2N:µn >0 µn 2 h ⇤ , 'n i2 . Note that h , gn i = 0 and h ⇤ , 'n i = 0 if µn = 0.
P
As Bk spans the linear subspace in L2 generated by {gn }kn=1 , we have k := kn=1 h , gn ign 2
Bk . Therefore, assuming µk+1 > 0 (else the result is trivially true):
k
2
kk =
X
n k+1
h , gn i2 = µ2k+1
X h , gn i 2
X
µ2k+1
2
µk+1
n k+1
n k+1:µ
5
n
h , gn i 2
k k2
2
µ
.
k+1
2
2
µ
⇢
n
>0
It follows that:
k
=k
⇧k k = k
⇤
k
A similar argument gives
k
+ ⇧k (
)k 2k
k
kk
= O(µk+1 ) .
= O(µk+1 ).
Before proving Theorem 3.2 we first present a lemma that controls higher-order bias terms
involving k and ⇤k . Define:
k,⇢ (x0 , x1 )
with
k
and
⇤
k
=
⇤
k (x0 )m(x0 , x1 ) k (x1 )
normalized so that k
kk
= 1 and h
⇢k
⇤
ki
k,
⇤
k (x0 ) k (x0 )
= 1, and:
n 1
,n,k
where
⇢
=
1X
n t=0
⇢,k (Xt , Xt+1 )
⇢ (Xt , Xt+1 )
is from display (25).
To simplify notation, let
t
= (Xt ),
⇤
t
=
⇤
(Xt ),
=
k (Xt )
,n,k
= Op (
k,t
Lemma D.1 Assumption 3.1 and 3.2 hold. Then:
and
⇤
k,t
+
⇤
k ).
k
=
⇤
k (Xt ).
Proof of Lemma D.1. First write
n 1
,n,k
1X
=
(
n t=0
(⇢k
n 1
⇤
k,t
⇤
t )m(Xt , Xt+1 ) k,t+1
n 1
1X
⇢)
n t=0
n 1
1X
⇢
(
n t=0
⇤
k,t k,t
1X
+
n t=0
⇤
k,t k,t
⇤
t m(Xt , Xt+1 )( k,t+1
⇤
t t)
Tb1 + Tb2 + Tb3 + Tb4 .
=:
By iterated expectations:
E[|(
⇤
k,t
⇤
t )m(Xt , Xt+1 ) k,t+1 |]
= h|
⇤
k
⇤
|, M(|
k |)i
k
⇤
k
t+1 )
⇤
kkMkk
kk
= O( k⇤ )
using Cauchy-Schwarz, boundedness of M (Assumption 3.1) and Lemma A.2 (note that the
normalization h ⇤k , k i = 1 and h , ⇤ i = 1 instead of k ⇤k k = 1 and k ⇤ k = 1 do not a↵ect
the conclusions of Lemma A.2). Markov’s inequality then implies Tb1 = Op ( k⇤ ). Similarly,
E[| ⇤t m(Xt , Xt+1 )(
and so Tb2 = Op ( k ).
k,t+1
t+1 )|]
= h ⇤ , M(|
6
k
|)i k ⇤ kkMkk
k
k = O( k )
Since ⇢k ⇢ = O( k ) by Lemma A.2(a) and
A.2(b)(c), we obtain Tb3 = Op ( k ). Finally,
E[|
⇤
k,t k,t
⇤
t t |]
k
⇤
k
⇤
1
n
kk
Pn
1
t=0
kk
⇤
k,t k,t
+ k ⇤ kk
= Op (1) follows from Lemma
k = O(
k
again by Cauchy-Schwarz and Lemma A.2. Therefore, Tb4 = Op (
k
+
k
+
⇤
k)
⇤
k ).
Proof of Theorem 3.2. First note that:
p
n(ˆ
⇢
⇢) =
=
=
p
p
p
n(ˆ
⇢
⇢k ) +
p
n(⇢k
⇢)
n(ˆ
⇢
⇢k ) + o(1)
c ⇢k G)c
b k + op (1)
nc⇤0 (M
(S.1)
k
where the second line is by Assumption 3.4(a) and the third line is by Lemma B.1 and
Assumption 3.4(b) (under the normalizations kGck k = 1 and c⇤0
k Gck = 1). By identity, we
may write the first term on the right-hand side of display (S.1) as:
p
n 1
c
nc⇤0
k (M
X
b k = p1
⇢k G)c
n t=0
p
n⇥
⇢ (Xt , Xt+1 )
+
⇢ (Xt , Xt+1 )
+ op (1)
,n,k
n 1
1 X
=p
n t=0
(S.2)
where the second line is by Lemma D.1 and Assumption 3.4(a). The result follows by substituting (S.2) into (S.1) and applying a CLT for stationary and ergodic martingale di↵erences
(e.g. Billingsley (1961)), which is valid in view of Assumption 3.4(c).
Proof of Theorem 3.3. This is a consequence of Theorem B.1 in Appendix B.
Proof of Theorem 3.4. Let mt (↵) = m(Xt , Xt+1 ; ↵). By Assumption 3.4(a), Lemma B.1
and Assumption 3.4(b):
p
1 X⇣
⇢) = p
n t=0
n 1
n(ˆ
⇢
n 1
1 X
=p
n t=0
⇤
ˆ)
k,t k,t+1 m(Xt , Xt+1 , ↵
⇢k
⇤
k,t k,t
n 1
1 X
p
(X
,
X
)
+
⇢
t
t+1
n t=0
where the second equality is by Lemma D.1.
7
⇤
k,t k,t+1
⌘
+ op (1)
mt (ˆ
↵)
mt (↵0 ) + op (1)
(S.3)
We decompose the second term on the right-hand side of (S.3) as:
n 1
1 X
p
n t=0
n 1
⇤
k,t k,t+1
mt (ˆ
↵)
1 X
mt (↵0 ) = p
n t=0
@mt (↵0 )
⇤
(ˆ
↵
t t+1
0
@↵
n 1
1 X
+p
n t=0
⇤
t t+1
n 1
1 X
+p
(
n t=0
n 1
1 X
=: p
n t=0
⇣
mt (ˆ
↵)
⇤
k,t k,t+1
↵0 )
mt (↵0 )
@mt (↵0 )
(ˆ
↵
@↵0
⇤
↵)
t t+1 )(mt (ˆ
@mt (↵0 )
⇤
(ˆ
↵
t t+1
0
mt (↵0 ))
↵0 ) + Tb1 + Tb2 .
@↵
↵0 )
⌘
(S.4)
For term Tb1 , whenever ↵
ˆ 2 N (which it is wpa1) we may take a mean value expansion to
obtain:
n 1
⇣ @m (˜
1X ⇤
@mt (↵0 ) ⌘ p
t ↵)
Tb1 =
⇥ n(ˆ
↵ ↵0 )
n t=0 t t+1
@↵0
@↵0
where ↵
˜ is in the segment between ↵
ˆ and ↵0 . It follows by routine arguments (e.g. Lemma 4.3
of Newey and McFadden (1994), replacing the law of large numbers by the ergodic theorem)
that:
n 1
⇣ @m (˜
1X ⇤
@mt (↵0 ) ⌘
t ↵)
= op (1)
(S.5)
n t=0 t t+1
@↵
@↵
p
holds under Assumption 3.5(c)(d). Moreover, n(ˆ
↵ ↵0 ) = Op (1) by Assumption 3.5(a)(b).
Therefore, Tb1 = op (1).
For term Tb2 , observe that by Assumption 3.5(c), whenever ↵
ˆ 2 N (which it is wpa1) we
have:
|mt (ˆ
↵) mt (↵0 )| m̄(Xt , Xt+1 ) ⇥ kˆ
↵ ↵0 k
where max0tn
we have:
Tb2
p
1
|m̄(Xt , Xt+1 )| = op (n1/s ) because E[m̄(Xt , Xt+1 )s ] < 1. Therefore, wpa1
n 1
1X
n⇥
|
n t=0
⇤
k,t k,t+1
⇤
t t+1 |
⇥ max |m̄(Xt , Xt+1 )| ⇥ kˆ
↵
↵0 k
0tn 1
n 1
1X
=
|
n t=0
⇤
k,t k,t+1
⇤
t t+1 |
⇥ op (n1/s )
=
Op (
k
+
⇤
k)
⇥ op (n1/s )
by similar arguments to the proof of Lemma D.1. Finally, observe that n1/s (
by Assumption 3.4(a) and the condition s 2. Therefore, Tb2 = op (1).
8
k
+
⇤
k)
= o(1)
Since Tb1 and Tb2 in display (S.4) are both op (1), we have:
✓ X
◆
n 1
1
@mt (↵0 ) p
⇤
mt (↵0 ) =
n(ˆ
↵ ↵0 ) + op (1)
n t=0 t t+1 @↵0
@m(Xt , Xt+1 ; ↵0 ) p
= E ⇤ (Xt ) (Xt+1 )
n(ˆ
↵ ↵0 ) + op (1) .
@↵0
n 1
1 X
p
n t=0
⇤
k,t k,t+1
mt (ˆ
↵)
Substituting into (S.3) and using Assumption 3.5(a):
p
n 1
1 X 0
⇢) = p
h
n t=0 [2a]
n(ˆ
⇢
✓
⇢,t
↵,t
◆
+ op (1)
and the result follows by Assumption 3.5(b).
Proof of Theorem 3.5. We follow similar arguments to the proof of Theorem 3.4. Here,
we can decompose the second term on the right-hand side of display (S.3) as:
n 1
1 X
p
n t=0
⇤
k,t k,t+1
mt (ˆ
↵)
mt (↵0 ) =
p
`(↵0 )) + Tb1 + Tb2
n(`(ˆ
↵)
n 1
1 X
=p
n t=0
where the second line is by Assumption 3.6(b)(c), with:
1 X⇣
b
T1 = p
n t=0
`,t
+ op (1) + Tb1 + Tb2
n 1
⇤
↵)
t t+1 (mt (ˆ
mt (↵0 ))
(`(ˆ
↵)
`(↵0 ))
n 1
1 X
(
Tb2 = p
n t=0
⇤
k,t k,t+1
⇤
↵)
t t+1 )(mt (ˆ
⌘
mt (↵0 )) .
The result will follow by Assumption 3.6(c)(d) provided Tb1 and Tb2 are both op (1).
For term Tb1 , notice that Tb1 = Zn (g↵ˆ ) where Zn denotes the centered empirical process on G.
We have K(g↵ˆ , g↵ˆ ) = op (1) by Assumption 3.6(c). Appropriately modifying the arguments
of Lemma 19.24 in van der Vaart (1998) (i.e. replacing the L2 norm by the norm induced by
K, which is the appropriate semimetric for the weakly dependent case) gives Zn (g↵ˆ ) !p 0.
For term Tb2 , observe that:
E[|(
⇤
k,t k,t+1
⇤
↵)
t t+1 )(mt (ˆ
⇥
mt (↵0 ))|] . E |(
9
⇤
k,t k,t+1
⇤
⇤
s/(s 1) (s 1)/s
)|
t t+1
by Assumption 3.6(e) and Hölder’s inequality. We complete the proof assuming k k k2s/(s 2) =
O(1) and k ⇤ k2s/(s 2) < 1; the proof under the alternative condition in Assumption 3.6(e)
is analogous. By the Minkowski and Hölder’s inequalities and Assumption 3.6(e) we have:
⇥
E |(
⇥
E |(
k
⇤
⇤
s/(s 1) (s 1)/s
t t+1 )|
⇤
k,t k,t+1
⇤
k,t
⇤
k
⇤
kk
= O(1) ⇥ O(
⇤
⇤
s/(s 1) (s 1)/s
t ) k,t+1 |
⇤
k
k k2s/(s 2)
+
+k
k
⇥
+ E | ⇤t (
kk ⇤ k2s/(s
t+1 )|
k,t+1
2)
⇤
s/(s 1) (s 1)/s
k) .
It follows by Assumption 3.4(a) and Markov’s inequality that Tb2 = op (1).
The following Lemma is based on Lemma 6.10 in Akian, Gaubert, and Nussbaum (2016).
Lemma D.2 Let the conditions of Proposition 4.1 hold. Then: there exists finite positive
constants C, c and a neighborhood N of h such that:
kTn
for all
hk Ce
cn
2 N.
Proof of Lemma D.2. Fix some constant ā such that r(Dh ) < ā < 1. By the Gelfand
m
formula, there exists m 2 N such that kDm
h k < ā . Fréchet di↵erentiability of T at h together
with the chain rule for Fréchet derivatives implies that:
kTm
Tm h
Dm
h(
hk) as k
h)k = o(k
hk ! 0
hence:
kTm
hk kDm
h kk
hk + o(k
hk) < (ām + o(1)) ⇥ k
hk .
We may choose ✏ > 0 and a 2 (ā, 1) such that kTm
hk am k
hk for all 2 B✏ (h) :=
{ 2 L2 : k
hk < ✏}. (B✏ (h) is the neighborhood in the statement of the lemma.) Then
for any 2 B✏ (h) and any k 2 N we have:
kTkm
hk akm k
hk .
(S.6)
It is straightforward to show via induction that boundedness of G and homogeneity of degree
of T together imply:
kTn
1
Tn
2k
1
(1 + kGk) 1 k
10
1
2k
n
(S.7)
for any
1,
2
Take any n
2 L2 .
m and let k = bn/mc. By (S.6) and (S.7) we have:
kTn
hk = kT(n
km)
Tkm
T(n
1
(1 + kGk) 1 kTkm
1
hk
hk
(n km)
(1 + kGk) 1 ✏
for any
km)
(akm )
(n km)
(n km)
2 B✏ (h). The result follows for suitable choice of C and c.
Proof of Proposition 4.1. Take C and c from Lemma D.2 and B✏ (h) from the proof of
Lemma D.2. Let N = { 2 L2 : k
k < ✏/khk} and note that {khk : 2 N } = B✏ (h).
Take any 2 {af : f 2 N, a 2 R \ {0}}. For any such
f ⇤ = khkf 2 B✏ (h). By homogeneity of T:
n+1 ( ) =
Tn (
kTn (
))
Tn (
=
kTn (
1 ( ))k
1(
1 (f
1
= (a/khk)f ⇤ where
we can write
⇤
))
(f ⇤ ))k
=
n+1 (f
⇤
)
for each n 1 (note positivity of G ensures that kTn f ⇤ k > 0 for each n and each f ⇤ 2 N ).
It follows from Lemma D.2 that:
k
n+1 (
)
k=k
n+1 (f
⇤
)
k=
Tn (f ⇤ )
kTn (f ⇤ )k
h
2
kTn (f ⇤ )
khk
khk
hk
2
Ce
khk
cn
as required.
Proof of Corollary 4.1. The result for is stated in the text. For h, let C, c, and B✏ (h)
be as in Lemma D.2 and its proof. Suppose h0 is a fixed point of T belonging to B✏ (h). Then
by Lemma D.2:
kh0 hk = kTn h0 hk Ce cn ! 0
hence h0 = h.
Proof of Theorem 4.1. Immediate from Lemmas A.6 and A.8.
D.2
Proofs for Appendix A.1
Proof of Lemma A.1. We first prove that there exists K 2 N such that the maximum
eigenvalue ⇢k of the operator ⇧k M : L2 ! L2 is real and simple whenever k K.
11
Under Assumption 3.1, ⇢ is a simple isolated eigenvalue of M. Therefore, there exists an
✏ > 0 such that |
⇢| > 2✏ for all 2 (M). Let denote a positively oriented circle in C
centered at ⇢ with radius ✏. Let R(M, z) = (M zI) 1 denote the resolvent of M evaluated
at z 2 C \ (M), where I is the identity operator. Note that:
CR := sup kR(M, z)k < 1
(S.8)
z2
because R(M, z) is a holomorphic function on
and
is compact.
By Assumption 3.2, there exists K 2 N such that:
CR ⇥ k⇧k M
Mk < 1
(S.9)
holds for all k K. It follows by Theorem IV.3.18 on p. 214 of Kato (1980) that whenever
k K: (i) the operator ⇧k M has precisely one eigenvalue ⇢k inside and ⇢k is simple; (ii)
⇢ (C \ (⇧k M)); and (iii) (⇧k M) \ {⇢} lies on the exterior of . Note that ⇢k must be
real whenever k K because complex eigenvalues come in conjugate pairs. Thus, if ⇢k were
complex-valued then its conjugate would also be inside , which would contradict the fact
that ⇢k is the unique eigenvalue of ⇧k M on the interior of .
Any nonzero eigenvalue of ⇧k M is also en eigenvalue of (M, G) with the same multiplicity.
Therefore, the largest eigenvalue ⇢k of (M, G) is positive and simple whenever k K.
Let ⇧k M|Bk : Bk ! Bk denote the restriction of ⇧k M to Bk . Recall that ⇤k (x) = bk (x)0 c⇤k
where c⇤k solves the left-eigenvector problem in (15). Here, ⇤k is the eigenfunction of the
adjoint (⇧k M|Bk )⇤ : Bk ! Bk corresponding to ⇢k . That is, h(⇧k M|Bk )⇤ ⇤k , i = ⇢k h ⇤k , i
for all 2 Bk .
Another adjoint is also relevant for the next proof, namely (⇧k M)⇤ : L2 ! L2 which is the
adjoint of ⇧k M in the space L2 . It follows from Lemma A.1 that (⇧k M)⇤ has an eigenfunction,
+
say +
K. That is, h(⇧k M)⇤ +
k corresponding to ⇢k whenever k
k , i = ⇢k h k , i for all
+
⇤
2 L2 . Notice that +
k does not necessarily belong to Bk , so we may have that k 6= k .
Proof of Lemma A.2. Step 1: Proof of part (b). By Proposition 4.2 of Gobet et al.
(2004) (taking T = M, T" = ⇧k M and = the boundary of B(, ⇢) in their notation), the
inequality:
k
M) k
k k const ⇥ k(⇧k M
holds for all k sufficiently large, where the constant depends only on CR . The result follows
12
by noticing that
k(⇧k M
M) k = ⇢ ⇥ k⇧k
k = O( k ) .
(S.10)
Step 2: Proof of part (a). By Corollary 4.3 of Gobet et al. (2004), the inequality:
⇢k | const ⇥ k(⇧k M
|⇢
M) k
holds for all k sufficiently large, where the constant depends only on CR and kMk. The result
follows by (S.10).
⇤
Step 3: Proof that k +
k = O( k⇤ ) under the normalizations k ⇤ k = 1 and k +
k
k k = 1. Let
⇤
⇤
Pk denote the spectral projection on the eigenspace of (⇧k M) corresponding to ⇢k . By the
proof of Proposition 4.2 of Gobet et al. (2004) (taking T = M⇤ , T" = (⇧k M)⇤ and = the
boundary of B(, ⇢) in their notation and noting that kR(M⇤ , z)k = kR(M, z̄)k holds for all
z 2 ), the inequality:
⇤
k
Pk⇤
⇤
k const ⇥ k((⇧k M)⇤
M⇤ ) ⇤ k
for all k sufficiently large, where the constant depends only on CR . Moreover,
k((⇧k M)⇤
M⇤ ) ⇤ k = k(M⇤ ⇧k
by definition of
⇤
k
M⇤ ) ⇤ k = kM⇤ (⇧k
+
k
⌦
(x) = h
k)
⇤
)k kMkk⇧k
⇤
⇤
k
, i⇥
⇤
Pk⇤
+
k (x)
Pk⇤ =
+ 2
k
⇤
⇤
k = O( k⇤ ).
1
h k,
+
+ ( k
ki
+
kk
2k
(S.11)
2 L2 . We use the fact that:
for any
under the normalizations k k k = 1 and k
sign normalization h ⇤ , +
0, we have:
ki
⌦
k)
= 1 (Chatelin, 1983, p. 113). Then, under the
⇤
(
+
k
⌦
+
⇤ 2
k) k
(see the proof of Proposition 4.2 of Gobet et al. (2004)). Moreover,
k
⇤
(
k = O( k⇤ )
(cf. display (22)) and boundedness of M. Therefore,
k
Define (
⇤
+
k
⌦
+
⇤ 2
k) k
⇤
✓
+
k
⌦
13
k
h
k,
+
ki
◆
2
⇤
⌘k
⇤
Pk⇤
⇤ 2
k
It follows by (S.11) that k
+
kk
⇤
= O( k⇤ ).
⇤
Step 4: Proof that k ⇤k
k = O( k⇤ ). To relate
(⇧k M)⇤ and (⇧k M|Bk )⇤ we must have:
E[
E[
+
k (X)⇧k M
(X)] = ⇢k E[
⇤
k (X)⇧k M k (X)]
= ⇢k E[
+
k (X)
+
k
to
⇤
k,
(X)]
observe that by definition of
2 L2
for all
⇤
k (X) k (X)]
for all
k
2 Bk .
It follows from taking = k in the first line of the above display that ⇧k +
k =
the triangle inequality and the fact that ⇧k is a weak contraction, we have:
k
⇤
⇤
kk
=k
⇤
⇧k
+
kk
k
k
⇤
⇧k
⇤
⇤
⇧k
⇤
where the final equality is by definition of
⇤
k
k + k⇧k
k+k
⇤
⇧k
⇤
+
kk
⇤
k.
Now by
+
kk
= O( k⇤ ) + O( k⇤ )
(see display (22)) and Step 3.
The following lemma collects some useful bounds on the orthogonalized estimators.
b is invertible then:
Lemma D.3 (a) If G
b o) 1M
co
(G
co
M =M
o
bo
(b) In particular, if kG
b o) 1M
co
k(G
b o Mo + ( G
b o)
G
Ik
1
2
1
we obtain:
co
Mo k kM
⇣
bo
(G
bo
Mo k + 2kG
b is invertible we have:
Proof of Lemma D.3. If G
b o) 1M
co
(G
2
o
I) M + (I
b o )(M
co
G
co
Ik ⇥ (kMo k + kM
b o ) 1 (G
b o I))M
c o Mo
Mo = (I (G
c o Mo ( G
b o ) 1 (G
b o I)Mo
=M
b o ) 1 (G
bo
(G
o
M) .
Mo k) .
co
I)(M
Mo ) .
b o ) 1 k 2 whenever kG
bo
Part (b) follows by the triangle inequality, noting that k(G
b o ) 1 = (I (G
b o ) 1 (G
b o I)) into the preceding display yields:
Substituting (G
b o) 1M
co
(G
as required.
co
Mo = M
co
=M
⌘
Ik 12 .
b o ) 1 (G
b o I))(G
b o I)Mo (G
b o ) 1 (G
b o I)(M
co
Mo (I (G
b o Mo + ( G
b o ) 1 (G
b o I)2 Mo (G
b o ) 1 (G
b o I)(M
c o Mo ) .
G
14
Mo )
Proof of Lemma A.3. Step 1: We show that:
kR(⇧k M|Bk , z)k kR(⇧k M, z)k
holds for all z 2 C \ ( (⇧k M) [ (⇧k M|Bk )). Fix any such z. For any k 2 Bk we have
R(⇧k M|Bk , z) k = ⇣k where ⇣k = ⇣k ( k ) 2 Bk is given by k = (⇧k M zI)⇣k . For any 2 L2
we have R(⇧k M, z) = ⇣ where ⇣ = ⇣( ) 2 L2 is given by = (⇧k M zI)⇣. In particular,
taking k 2 Bk we must have ⇣k ( k ) = ⇣( k ). Therefore, R(⇧k M|Bk , z) k = R(⇧k M, z) k
holds for all k 2 Bk . We now have:
kR(⇧k M|Bk , z)k = sup{kR(⇧k M|Bk , z)
= sup{kR(⇧k M, z)
kk
sup{kR(⇧k M, z) k :
kk
:
:
k
k
2 Bk , k
2 Bk , k
kk
kk
= 1}
= 1}
2 L2 , k k = 1} = kR(⇧k M, z)k .
c G)
b has a unique eigenvalue ⇢ˆ inside
Step 2: We show that (M,
proof of Lemma A.1.
wpa1, where
is from the
As the nonzero eigenvalues of ⇧k M, ⇧k M|Bk , and G 1 M are the same, it follows from the
proof of Lemma A.1 that for all k
K the curve encloses precisely one eigenvalue of
1
G M, namely ⇢k , and that ⇢k is a simple eigenvalue of G 1 M.
Recall that G 1 M is isomorphic to ⇧k M|Bk on (Rk , h·, ·iG ). Let R(G 1 M, z) denote the
resolvent of G 1 M on (Rk , h·, ·iG ). By step 1, we then have:
sup kR(G 1 M, z)kG = sup kR(⇧k M|Bk , z)k sup kR(⇧k M, z)k .
z2
z2
The second resolvent identity gives R(⇧k M, z) = R(M, z) + R(⇧k M, z)(M
It follows that whenever (S.9) holds (which it does for all k K):
sup kR(⇧k M, z)k
z2
1
(S.12)
z2
CR
CR k⇧k M
Mk
= CR (1 + o(1))
⇧k M)R(M, z).
(S.13)
by Assumption 3.2. Combining (S.12) and (S.13), we obtain:
sup kR(G 1 M, z)kG = O(1) .
z2
15
(S.14)
By Lemma D.3(b), Assumption 3.3, and boundedness of M:
b 1M
c
kG
b o) 1M
co
G 1 MkG = k(G
Mo k = op (1) .
It follows by (S.14) that the inequality:
b 1M
c
kG
holds wpa1.
G 1 MkG ⇥ sup kR(G 1 M, z)kG < 1
(S.15)
z2
b 1M
c has precisely
By Theorem IV.3.18 on p. 214 of Kato (1980), whenever (S.15) holds: G
b 1M
c are on the
one eigenvalue, say ⇢ˆ, inside ; ⇢ˆ is simple, and; the remaining eigenvalues of G
exterior of . Note that ⇢ˆ must necessarily be real whenever (S.15) holds (because complex
eigenvalues come in conjugate pairs) hence the corresponding left- and right-eigenvectors ĉ⇤
and ĉ are also real and unique (up to scale).
Proof of Lemma A.4. Take k K from Lemma A.1 and work on the sequence of events
upon which
b 1M
c G 1 MkG ⇥ sup kR(G 1 M, z)kG < 1
kG
(S.16)
2
z2
holds. By the proof of Lemma A.3, this inequality holds wpa1 and ⇢ˆ, ĉ and ĉ⇤ to (16) are
unique on this sequence of events.
Step 1: Proof of part (b). Under the normalizations kĉkG = 1 and kĉ⇤ kG = 1, whenever
(S.16) holds (which it does wpa1), we have
kˆ
2
k k = kĉ
ck k2G
p
b 1M
c
8 sup kR(G 1 M, z)kG ⇥ k(G
z2
G 1 M)ck kG
b 1M
c = T" , G 1 M = T and = the
by Proposition 4.2 of Gobet et al. (2004) (setting G
boundary of B(, ⇢) in their notation). The result now follows by (S.14) and the fact that
b 1M
c
k(G
(cf. display (23)).
b o) 1M
co
G 1 M)ck kG = k((G
Mo )c̃k k = Op (⌘n,k )
(S.17)
Step 2: Proof of part (a). In view of (S.16), (S.14) and the fact that kG 1 MkG = k⇧k M|Bk k
kMk < 1, by Corollary 4.3 of Gobet et al. (2004), we have:
|ˆ
⇢
b 1M
c
⇢k | O(1) ⇥ k(G
16
G 1 M)ck kG .
The result follows by (S.17).
Step 3: Proof of part (c). Identical arguments to the proof of part (b) yield:
k ˆ⇤
⇤
kk
= kĉ⇤
c⇤k kG
p
b 1M
c0
8 sup kR(G 1 M0 , z)kG ⇥ k(G
z2
G 1 M0 )c⇤k kG
under the normalization kĉ⇤ kG = kc⇤k kG = 1. The result now follows by (S.14), noting that
supz2 kR(G 1 M0 , z)kG = supz2 kR(G 1 M, z)kG , and the fact that:
b 1M
c0
k(G
(cf. display (23)).
D.3
b o) 1M
c o0
G 1 M0 )c⇤k kG = k((G
⇤
Mo0 )c̃⇤k k = Op (⌘n,k
)
Proofs for Appendix A.2
Some of the proofs in this subsection make use of properties of fixed point indices. We refer
the reader to Section 19.5 of Krasnosel’skii et al. (1972) for details.
Proof of Lemma A.5. By Assumption 4.1 and Corollary 4.1, we may choose " > 0 such
that N = { 2 L2 : k
hk "} contains only one fixed point of T, namely h. We
verify the conditions of Theorem 19.4 in Krasnosel’skii et al. (1972) where, in our notation,
⌦ = N , En = Bk , Pn = ⇧k , T = T, and Tn = ⇧k T|Bk (i.e. the restriction of ⇧k T to Bk ).
The compactness condition is satisfied by Assumption 4.1(b) (recall that compactness of G
implies compactness of T). The fixed point h has nonzero index by Assumption 4.1(c); see
result (5) on p. 300 of Krasnosel’skii et al. (1972). Finally, condition (19.28) in Krasnosel’skii
et al. (1972) holds by Assumption 4.2(b) and their condition (19.29) is trivially satisfied.
Proof of Remark A.1. This follows by the proof of result (19.31) in Theorem 19.3 in
Krasnosel’skii et al. (1972).
Proof of Remark A.2. This follows by Theorem 19.7 in Krasnosel’skii et al. (1972).
Proof of Lemma A.6. Part (c) follows by the proof of display (19.50) on p. 310 in
Krasnosel’skii et al. (1972) where, in our notation, x0 = h, xn = hk , Pn = ⇧k , P (n) =
I ⇧k , T = T, and T 0 (x0 ) = Dh . Note that Assumption 4.2(a) implies their condition
kT 0 (x0 ) Pn T 0 (x0 )k ! 0 as n ! 1. Part (b) then follows from the inequality:
h
khk
hk
2
kh
khk k
khk
17
hk k .
Finally, part (a) follows from the fact that khk
bility of x 7! x1 at each x > 0.
khk k = O(⌧k ) and continuous di↵erentia-
The next lemma presents some bounds on the estimators which are used in the proof of
Lemmas A.7 and A.8.
Lemma D.4 (a) Let Assumptions 4.1(b) and 4.3 hold. Then:
sup
v2Rk :kvk
G c
(b) Moreover:
b 1 Tv
b
kG
G 1 TvkG = op (1) .
b 1 Tv
b
kG
G 1 TvkG = Op (⌫n,k )
sup
v2Rk :kv 0 bk
hk"
where ⌫n,k is from display (37).
b o, T
b o , and To , we have
Proof of Lemma D.4. By definition of G
sup
v2Rk :kvk
Whenever kI
b o) 1T
b ov
(G
G c
b 1 Tv
b
kG
G 1 TvkG =
sup
v2Rk :kvkc
b o) 1T
b ov
k(G
To vk .
b o k < 1 (which it is wpa1 by Assumption 4.3), for any v 2 Rk we have:
G
b ov
To v = T
To v
b o ) 1 (G
bo
(G
I)To v
b o ) 1 (G
bo
(G
b ov
I)(T
To v)
(S.18)
Part (a) follows by the triangle inequality and Assumption 4.3, noting that supv2Rk :kvkc kTo vk
sup :k kc kT k < 1 holds for each c by Assumption 4.1(b).
b o, T
b o , To , and ⌫n,k in display (37).
Part (b) follows similarly by definition of G
Proof of Lemma A.7. Let ", K and Nk be as in Lemma A.5. Also define the sets N =
{ 2 L2 : k
hk < "}, = { 2 L2 : k
hk = "}, k = { 2 Bk : k
hk = "},
Nk = {v 2 Rk : v 0 bk (x) 2 Nk }, and k = {v 2 Rk : v 0 bk (x) 2 k }.
Let (I T; ) denote the rotation of the field (I T) on . Assumption 4.1 implies that
| (I T; )| = 1; see result (5) on p. 300 of Krasnosel’skii et al. (1972). Also notice that
sup kT
2
⇧k T k < inf k
2
T k
holds for all k sufficiently large by Assumption 4.2(b) (note that inf
18
(S.19)
2
k
T k > 0,
otherwise T would have a fixed point on , contradicting the definition of N in the proof of
Lemma A.5). Result (2) on p. 299 of Krasnosel’skii et al. (1972) then implies that whenever
(S.19) holds we have | (I ⇧k T; )| = | (I T; )| = 1. Result (3) on p. 299 of Krasnosel’skii
et al. (1972) then implies that | (I ⇧k T|Bk ; k )| = 1 whenever (S.19) holds. Finally, by
isomorphism, we have that | (I G 1 T; k )| = 1 whenever (S.19) holds.
We now show that the inequality:
b 1T
b
sup k(G
v2
k
⇧k T k
G 1 T)vkG < inf k
2
k
(S.20)
holds wpa1. The left-hand side is op (1) by Lemma D.4(a). For the right-hand side, we claim
that lim inf k!1 inf 2 k k
⇧k T k > 0. Suppose the claim is false. Then there exists a
subsequence { kl : l 1} with kl 2 kl such that kl ⇧kl T kl ! 0. Since T is compact,
there exists a convergent subsequence {T klj : j 1}. Let ⇤ = limj!1 T klj . Then:
k
kl j
⇤
kk
kl j
⇧klj T
kl j k
+ k⇧klj T
klj
⇧k l j
⇤
k + k⇧klj
⇤
⇤
k!0
as j ! 1, where the first term vanishes by definition of kl , the second vanishes by definition of ⇤ , and the third vanishes by Assumption 4.2(b). Therefore, ⇤ 2 . Moreover, by
continuity of T and definition of ⇤ :
kT
⇤
⇤
k kT
⇤
T
kl j k
+ kT
kl j
⇤
k!0
as j ! 1, hence ⇤ 2 is a fixed point of T. But this contradicts the fact that h is the
unique fixed point of T in N = N [ (cf. the proof of Lemma A.5). This proves the claim.
Result (2) on p. 299 of Krasnosel’skii et al. (1972) then implies that whenever (S.19) and
b 1 T;
b k ) = (I G 1 T; k ). Therefore,
(S.20) hold (which they do wpa1), we have (I G
b 1 T;
b k )| = 1 also holds wpa1 and hence, by result (1) on p. 299 of Krasnosel’skii
| (I G
b 1T
b has at least one fixed point v̂ 2 Nk . We have therefore shown that
et al. (1972), G
ĥ(x) = bk (x)0 v̂ is well defined wpa1 and kĥ hk < " wpa1. Consistency of ĥ follows by
repeating the preceding argument with any positive "0 < ".
Proof of Remark A.3. Fix any positive "0 < " and let A = { 2 L2 : "0 k
hk "},
0
k
0 k
Ak = { 2 Bk : " k
hk "} and Ak = {v 2 R : v b (x) 2 Ak }. Clearly T has
no fixed point in A. Moreover, similar arguments to the proof of result (19.31) in Theorem
19.3 in Krasnosel’skii et al. (1972) imply that Ak contains no fixed points of ⇧k T for all
k sufficiently large. By similar arguments to the proof of Lemma A.7 we may deduce that
19
b 1 Tvk
b
lim inf k!1 inf 2Ak k
⇧k T k =: c⇤ > 0. Then for any v 2 Ak , we have kv G
c⇤ op (1) where the op (1) term holds uniformly over Ak by Lemma D.4(a). Therefore,
b 1 Tvk
b
kv G
c⇤ /2 holds for all v 2 Ak wpa1. On the other hand, any fixed point v̂ of
b 1T
b with bk (x)0 v̂ 2 Nk necessarily has kv̂ G
b 1 Tv̂k
b = 0. Therefore, no such fixed point v̂
G
belongs to Ak wpa1.
Proof of Lemma A.8. We first prove part (c). The Fréchet derivative of ⇧k T|Bk at h
is ⇧k Dh |Bk . This may be represented on (Rk , h·, ·iG ) by the matrix G 1 Dh where Dh =
E[bk (Xt ) G1t+1 h(Xt ) 1 bk (Xt+1 )0 ]. By Lemma A.7, v̂ (equivalently, ĥ) is well defined wpa1.
Therefore, wpa1 we have:
(I
G 1 Dh )(vk
v̂) = G 1 Tv̂
b 1 Tv̂
b
G
G 1 Tv̂
G 1 Tvk
G 1 Dh (v̂
vk )
b 1 Tv̂k
b G = Op (⌫n,k ) by Lemma D.4(b) and consistency of ĥ. Therefore,
Note that kG 1 Tv̂ G
k(I
G 1 Dh )(vk
v̂)kG Op (⌫n,k ) + kG 1 Tv̂
G 1 Tvk
G 1 Dh (v̂
vk )kG .
(S.21)
By isomorphism, we have k(I G 1 Dh )(vk v̂)kG = k(I ⇧k Dh )(hk ĥ)k. Assumptions
4.1(c) and 4.2(a) together imply that (I ⇧k Dh ) 1 exists for all k sufficiently large and the
norms k(I ⇧k Dh ) 1 k are uniformly bounded (for all k sufficiently large). Therefore,
k(I
G 1 Dh )(vk
const ⇥ khk
v̂)kG
ĥk
(S.22)
holds for all k sufficiently large. Also notice that:
kG 1 Tv̂
= k⇧k Tĥ
kTĥ
kTĥ
G 1 Tvk
⇧k Thk
G 1 Dh (v̂
⇧k Dh (ĥ
vk )kG
hk )k
Th
Dh (ĥ
h)
Th
Dh (ĥ
h)k + kThk
= o(1) ⇥ (kĥ
hk k + khk
(Thk
Th
Th
Dh (hk
Dh (hk
hk) + o(1) ⇥ kh
h))k
h)k
hk k
(S.23)
where the first inequality is because ⇧k is a (weak) contraction on L2 and the final line is by
Assumption 4.1(c). Substituting (S.22) and (S.23) into (S.21) and rearranging, we obtain:
(1
o(1)) ⇥ khk
ĥk Op (⌫n,k ) + op (⌧k ) .
Parts (a) and (b) follow by similar arguments to the proof of Lemma A.6.
20
D.4
Proofs for Appendix B
Proof of Proposition B.1. First note that:
p
n(L̂
L) =
p
n 1
n log ⇢ˆ
1X
log m(Xt , Xt+1 ) + E[log m(Xt , Xt+1 )]
n t=0
log ⇢
n 1
1 X
=p
(⇢
n t=0
1
⇢,t
lm,t )
!
+ op (1)
where the second line is by display (24) and a delta-method type argument. The result now
follows from the joint convergence in the statement of the proposition.
Proof of Proposition B.2. Similar arguments to the proof of Proposition B.1 yield:
p
1 X⇣
L) = p
⇢
n t=1
n
n(L̂
1
⇢,t
+⇢
log mt (ˆ
↵)
1 ⇤
k,t k,t+1
log mt (↵0 )
mt (ˆ
↵)
lm,t
⌘
mt (↵0 )
+ op (1) .
By similar arguments to the proof of Theorem 3.4, we may deduce:
1 X⇣ 1 ⇤
p
⇢
↵)
k,t k,t+1 mt (ˆ
n t=0
p
= D↵,lm n(ˆ
↵ ↵0 ) + op (1)
n 1
where
D↵,lm = E
✓
⇤
mt (↵0 )
(Xt ) (Xt+1 )
⇢
log mt (ˆ
↵)
1
m(Xt , Xt+1 , ↵)
◆
log mt (↵0 )
⌘
@m(Xt , Xt+1 , ↵)
.
@↵0
Substituting into the expansion for L̂ and using Assumption 3.5(a) yields:
p
n
n(L̂
1 X
L) = p
⇢
n t=1
1
⇢,t
+ D↵,lm
↵,t
lm,t
+ op (1) .
The result follows by the joint CLT assumed in the statement of the proposition.
Proof of Theorem B.1. We prove part (1) first. We first characterize the tangent space
as in pp. 878–880 of Bickel and Kwon (2001) (their arguments trivially extend to Rd -valued
Markov processes). Let Q2 denote the stationary distribution of (Xt , Xt+1 ). Consider the
tangent space H0 = {h(Xt , Xt+1 ) : E[h(Xt , Xt+1 )2 ] < 1 and E[h(Xt , Xt+1 )|Xt = x] = 0
almost surely} endowed with the L2 (Q2 ) norm. Take any bounded h 2 H0 and consider
21
the one-dimensional parametric model which we identify with the collection of transition
probabilities {P1⌧,h : |⌧ | 1} where each transition probability P1⌧,h is dominated by P1 (the
true transition probability) and is given by:
dP1⌧,h (xt+1 |xt )
= e⌧ h(xt ,xt+1 )
dP1 (xt+1 |xt )
where:
A(⌧, xt ) = log
✓Z
e
⌧ h(xt ,xt+1 )
A(⌧,xt )
P1 (dxt+1 |xt )
◆
.
For each ⌧ we define the linear operator M(⌧,h) on L2 by:
M
(⌧,h)
(xt ) =
Z
m(xt , xt+1 ) (xt+1 )P1⌧,h (dxt+1 |xt ) .
Observe that:
(M
(⌧,h)
M) (xt ) =
Z
m(xt , xt+1 ) (xt+1 ) e⌧ h(xt ,xt+1 )
A(⌧,xt )
1 P1 (dxt+1 |xt ) .
(S.24)
is a bounded linear operator on L2 (since kMk < 1 and h is bounded). By Taylor’s theorem:
e⌧ h(xt ,xt+1 )
A(⌧,xt )
1 = ⌧ h(xt , xt+1 ) + O(⌧ 2 )
(S.25)
where the O(⌧ 2 ) term is uniform in (xt , xt+1 ). It now follows by boundedness of h that
kM(⌧,h) Mk = O(⌧ ). Similar arguments to the proof of Lemma A.1 imply that there exists
✏ > 0 and ⌧¯ > 0 such that the largest eigenvalue ⇢(⌧,h) of M(⌧,h) is simple and lies in the
interval (⇢ ✏, ⇢ + ✏) for each ⌧ < ⌧¯. Taking a perturbation expansion of ⇢(⌧,h) about ⌧ = 0
(see, for example, equation (3.6) on p. 89 of Kato (1980) which also applies in the infinitedimensional case, as made clear in Section VII.1.5 of Kato (1980)):
⇢(⌧,h)
⇢ = h(M(⌧,h)
M) ,
⇤
i + O(⌧ 2 )
= ⌧ E[m(Xt , Xt+1 )h(Xt , Xt+1 ) (Xt+1 ) ⇤ (Xt )] + O(⌧ 2 )
Z
= ⌧ m(xt , xt+1 ) (xt+1 ) ⇤ (xt )h(xt , xt+1 )dQ2 (xt , xt+1 ) + O(⌧ 2 )
(S.26)
under the normalization h , ⇤ i = 1, where the second line is by (S.24) and (S.25). Expression
(S.26) shows that the derivative of ⇢(⌧,h) at ⌧ = 0 is ˜⇢ = m(xt , xt+1 ) (xt+1 ) ⇤ (xt ).
As bounded functions are dense in H0 , we have shown that ⇢ is di↵erentiable relative to
H0 with derivative ˜⇢ . The efficient influence function for ⇢ is the projection of ˜⇢ onto H0 ,
22
namely:
˜⇢ (xt , xt+1 )
E[ ˜⇢ (Xt , Xt+1 )|Xt = xt ] =
⇢ (xt , xt+1 )
because E[ ˜⇢ (Xt , Xt+1 )|Xt = xt ] = ⇤ (xt )M (xt ) = ⇢ (xt ) ⇤ (xt ). It follows that V⇢ =
E[ ⇢ (Xt , Xt+1 )2 ] is the efficiency bound for ⇢. A similar argument shows that h0 (⇢) ⇢ is the
efficient influence function for h(⇢).
We now prove part (2). By linearity, the efficient influence function for L is:
L
where
log m
=⇢
1
⇢
log m
is the efficient influence function for E[log m(Xt , Xt+1 )]. It is well known that:
log m (x0 , x1 )
= l(x0 , x1 ) +
1 ⇣
X
E[l(Xt+1 , Xt+2 )|X1 = x1 ]
E[l(Xt , Xt+1 )|X0 = x0 ]
t=0
⌘
where l(xt , xt+1 ) = log m(xt , xt+1 ) (see, e.g., Greenwood and Wefelmeyer (1995)). It may be
verified using the telescoping property of the above sum that VL = E[ L (X0 , X1 )2 ].
Proof of Lemma B.1. Take k K from Lemma A.1 and work on the sequence of events
upon which (S.16) holds, so that ⇢ˆ, ĉ and ĉ⇤ are uniquely defined by Lemma A.3.
Normalize ĉ, ĉ⇤ , ck , and c⇤k so that kĉkG = 1, kck kG = 1, ĉ0 Gĉ⇤ = 1 and c0k Gc⇤k = 1. Let
⇤0
b
b
bG
b 1 M),
c
P = ck c⇤0
ˆ = trace(P
k G and P = ĉĉ G. We then have trace(P) = 1, trace(P) = 1, ⇢
b 1M
cP
b = ⇢ˆP
b and G 1 MP = PG 1 M = ⇢k P. Now observe that:
⇢k = trace(PG 1 M), G
⇢ˆ
bG
b 1 M)
c
⇢k = trace(P
trace(PG 1 M)
b P)G
b 1 M)
c + trace(P(G
b 1M
c
= trace((P
G 1 M)) .
By addition and subtraction of terms, we have:
b
trace((P
b 1 M)
c
P)G
b + trace(PG
b 1 M(
c P
b I))
= ⇢ˆ ⇢ˆtrace(PP)
b + trace(PG
b 1 M(
c P
b I))
= ⇢ˆtrace(P(I P))
b + ⇢k trace(P(I P))
b + trace(PG
b 1 M(
c P
b I))
= (ˆ
⇢ ⇢k )trace(P(I P))
b + trace(PG 1 M(I P))
b + trace(PG
b 1 M(
c P
b I))
= (ˆ
⇢ ⇢k )trace(P(I P))
b + trace(P(G
b 1M
c G 1 M)(P
b I))
= (ˆ
⇢ ⇢k )trace(P(I P))
(S.27)
23
where:
|trace(P(I
b = |c⇤0 G(ck
P))|
k
b k )| kc⇤ kG kck
Pc
k
b k kG .
Pc
(S.28)
b = P" , G
b 1M
c = T" ,
By the proof of Proposition 4.2 of Gobet et al. (2004) (setting P
G 1 M = T and from the proof of Lemma A.1 as the boundary of B(, ⇢) in their notation)
and similar arguments to the proof of Lemma A.4:
kck
Moreover,
kc⇤k kG
b k kG . k(G
b 1M
c
Pc
G 1 M)ck kG = Op (⌘n,k ) .
1
= kPck kG kPkG
2⇡i
Z
R(G 1 M, z)
dz
⇢k z
(S.29)
G
(Kato, 1980, expression (6.19), p. 178) and which is O(1) by display (S.14) and the fact that
⇢k ! ⇢. By displays (S.28) and (S.29) and the fact that ⇢ˆ ⇢k = Op (⌘n,k ) (by Lemma A.4),
we obtain:
b = Op (⌘ 2 ) .
(ˆ
⇢ ⇢k )trace(P(I P))
(S.30)
n,k
Moreover:
b 1M
c
|trace(P(G
b
G 1 M)(P
b 1 c G 1 M)(P
b I)ck |
I))| = |c⇤0
k G(G M
b 1M
c G 1 MkG kck Pc
b k kG
kc⇤k kG kG
= Op (⌘n,k,1 + ⌘n,k,2 ) ⇥ Op (⌘n,k )
(S.31)
by Lemma D.3(b) and display (S.29). It follows by (S.27), (S.30) and (S.31) that:
⇢ˆ
Finally,
b 1M
c
⇢k = trace(P(G
b 1M
c
trace(P(G
2
G 1 M)) + Op (⌘n,k,1 + ⌘n,k,2 ) ⇥ Op (⌘n,k ) + Op (⌘n,k
).
b 1 c G 1 M)ck
G 1 M)) = c⇤0
k G(G M
b o 1 c o Mo )c̃k
= c̃⇤0
k ((G ) M
co G
b o Mo )c̃k + Op (⌘n,k,1 ⇥ (⌘n,k,1 + ⌘n,k,2 ))
= c̃⇤0 (M
k
by Lemma D.3(a) and the fact that kc̃⇤k k = kc⇤k kG = O(1). The result follows by noting that
co
c̃⇤0
k (M
b o Mo )c̃k = c⇤0 (M
c
G
k
b k
⇢k G)c
and that ⌘n,k is of at least as small order as ⌘n,k,1 and ⌘n,k,2 (cf. Lemma D.3(a)).
24
References
Akian, M., S. Gaubert, and R. Nussbaum (2016). Uniqueness of the fixed point of nonexpansive
semidi↵erentiable maps. Transactions of the American Mathematical Society 368 (2), 1271–1320.
Bickel, P. J. and J. Kwon (2001). Inference for semiparametric models: Some questions and an
answer (with discussion). Statistica Sinica 11, 863–960.
Billingsley, P. (1961). The lindeberg-lévy theorem for martingales. Proceedings of the American
Mathematical Society 12 (1), 788–792.
Chatelin, F. (1983). Spectral Approximation of Linear Operators. Academic Press, New York.
Chen, X. and T. M. Christensen (2015). Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. Journal of Econometrics 188 (2), 447–465.
Gobet, E., M. Ho↵mann, and M. Reiß (2004). Nonparametric estimation of scalar di↵usions based
on low frequency data. Annals of Statistics 32, 2223–2253.
Greenwood, P. E. and W. Wefelmeyer (1995). Efficiency of empirical estimators for markov chains.
Annals of Statistics 32 (1), 132–143.
Hansen, B. E. (2015). A unified asymptotic distribution theory for parametric and non-parametric
least squares. Working paper, University of Wisconsin.
Kato, T. (1980). Perturbation Theory for Linear Operators. Springer-Verlag, Berlin.
Krasnosel’skii, M. A., G. M. Vainikko, P. P. Zabreiko, Ya. B. Rutitskii, and V. Ya. Stetsenko (1972).
Approximate Solution of Operator Equations. Wolters-Noordho↵, Groningen.
Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. Journal
of Econometrics 79 (1), 147–168.
Newey, W. K. and D. McFadden (1994). Chapter 36 large sample estimation and hypothesis testing.
Volume 4 of Handbook of Econometrics, pp. 2111 – 2245. Elsevier.
Schaefer, H. H. (1974). Banach Lattices and Positive Operators. Springer-Verlag, Berlin.
Schumaker, L. L. (2007). Spline Functions: Basic Theory. Cambridge University Press, Cambridge.
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press.
25
© Copyright 2026 Paperzz