B.1 Proof of Lemma A.1

E¢ cient Derivative Pricing by the Extended Method of Moments,
Supplementary material: APPENDIX B
In this Appendix we provide the proofs of theoretical results and technical Lemmas that have
been omitted in the paper. We …rst give in Section B.1 the proof of Lemma A.1. Then, in Section
B.2 we give a detailed proof of consistency of the XMM estimator (see Appendix A.1.3 in the paper).
In Sections B.3-B.6 we prove Lemma A.2, Corollary 6, Lemma A.3 and Corollary 8, respectively. In
Section B.7 we discuss regularity conditions for XMM estimation when the DGP is the stochastic
volatility model of Section 3.2 of the paper. In Section B.8 we derive the risk-neutral distribution
in the stochastic volatility model. In Section B.9 we prove Lemma A.4. Finally, in Section B.10 we
provide the Fourier transform methods used for option pricing and cross-sectional calibration in the
stochastic volatility model. We use the following notation. We denote by K1 and K2 the dimensions
0
0
0
0
0
of functions g1 and g2 , respectively. Further, g~2 denotes function g~2 = (g2 ; 1) = g2 ; a ; 1 .
B.1 Proof of Lemma A.1
B.1.1 Conditions for weak convergence of the kernel empirical process
The process
T(
);
2
; can be written as:
0
p
b [g1 ( )] E [g1 ( )]
T E
q
@
(
)
=
T
b [g ( )jx0 ] E [g ( )jx0 ]
T hdT E
2
2
0
where g2 denotes function g2 = g2 ; a
let us de…ne (see Assumption A.12):
0
0
1
A;
2
;
. Let us rewrite the second component of
(B.1)
T(
): For
2
,
' ( ) := ' (x0 ; ) = E [g2 ( )jx0 ] f (x0 );
and the corresponding kernel estimator:
'
b( )=
We have:
q
b [g ( )jx0 ]
T hdT E
2
T
1 X
g (yt ; )K
T hdT t=1 2
E [g2 ( )jx0 ]
q
=
T hdT
=
This can be rewritten as:
q
b [g2 ( )jx0 ]
T hdT E
=
1
f (x0 )
q
T hdT (b
'( )
xt
'
b( )
b
f (x0 )
x0
:
hT
'( )
f (x0 )
!
q
1
T hdT (b
' ( ) ' ( ))
f (x0 )
q
1
b [g ( )jx0 ]
T hdT E
2
f (x0 )
q
T hdT fb(x0 )
E [g2 ( )jx0 ]
1
fb(x0 ) f (x0 )
f (x0 )
q
E [g2 ( )jx0 ]
T hdT fb(x0 ) f (x0 ) :
f (x0 )
E [g2 ( )jx0 ]
' ( ))
E [g2 ( )jx0 ]
f (x0 )
1
fb(x0 )
f (x0 )
f (x0 ) :
1+
(B.2)
Under Assumptions A.5-A.9, we have [see Bosq (1998), Theorem 2.3]:
fb(x0 )
From (B.1)-(B.3) we get:
T(
where process
T
) = H0 ( )
f (x0 ) = op (1):
T
( ) (1 + op (1)) ;
( ) is de…ned by:
0 p
T
matrix H0 ( ) is given by:
H0 ( ) =
b [g1 ( )] E [g1 ( )]
T E
q
T hdT (b
' ( ) ' ( ))
q
T hdT fb(x0 ) f (x0 )
B
B
( )=B
@
IdK1
0
(B.3)
0
1
f (x0 ) IdK2 +L
1
f (x0 ) E
2
1
C
C
C;
A
0
[g2 ( )jx0 ]
;
(B.4)
2
;
(B.5)
;
2
;
K1 := dim(g1 ), K2 := dim(g2 ) and the op (1) term is uniform in 2 . The following Lemma
shows that process T ( ) is asymptotically equivalent to a zero-mean empirical process plus a bias
function.
Lemma B.1: Under Assumptions A.4, A.6, A.8, A.9 and A.12:
0 q
1 q
m
lim T hd+2m
T hdT (E [b
' ( )] ' ( ))
T
'(x0 ; )
@ q
A=
h
i
wm
m
f (x0 )
d
m!
b
T hT E f (x0 )
f (x0 )
+o(1);
uniformly in
2
(B.6)
Proof: From a standard bias expansion and Assumption A.8, we have
Z
X x0
1
E '(X; )K
E [b
' ( )] ' ( ) =
' ( ) = [' (x0 + hT u; )
hT
hdT
Z
X
hm
T
r ' (x0 + hT u
~; ) u K(u)du;
=
m!
'(x0 ; )] K(u)du
:j j=m
where u
~ is an intermediary point (depending on u). Since r ' is uniformly
continuous on X
(AsR
sumption A.12), and isR compact (Assumption A.4), we have that r ' (x0 + hT u
~; ) u K(u)du
uniformly in 2 , for any 2 Nd with j j = m. A similar
converges to r ' (x0 ; h) u K(u)du,
i
argument applies for E fb(x0 )
f (x0 ). Since K is a product kernel of order m (Assumption A.8),
the conclusion follows.
From (B.4)-(B.6) we deduce:
T(
uniformly in
2
) = [H0 ( )
T
( ) + b( ) + o(1)] (1 + op (1)) ;
, where the empirical process T ( ) is de…ned as:
0 p
1
b [g1 ( )] E [g1 ( )]
T E
C
B q
C
B
T hdT (b
' ( ) E [b
' ( )]) C ;
T ( )=B
@ q
h
i A
T hdT fb(x0 ) E fb(x0 )
2
2
2
;
:
(B.7)
:
Lemma A.1 follows if the empirical process
T
T
( ) converges weakly:
( ) =)
( ),
2
;
(B.8)
where
and:
( ) is a Gaussian process on with covariance operator:
0
S0 ( ; )
0
0
0
B
2
2
0
w f (x0 ) E g2 ( )g2 ( ) jx0
w f (x0 ) E (g2 ( )jx0 )
B
0( ; )=@
0
0
w2 f (x0 ) E g2 ( ) jx0
w2 f (x0 )
1
X
S0 ( ; ) =
Cov [g1 (Xt ; Yt ; ) ; g1 (Xt
k ; Yt k ;
1
C
C;
A
; 2
;
)] :
k= 1
To prove the weak convergence (B.8) of empirical process
T(
)=T
1=2
T
X
(vt;T ( )
T
( ), let us note that:
E [vt;T ( )]) ;
2
t=1
;
where
0
vt;T ( ) =
g1 (Xt ; Yt ; ) ; hT
0
d=2
0
0
ge2 (Yt ; ) K
0
Xt x0
hT
;
and ge2 denotes function ge2 = g2 ; 1 . From Theorem 10.2 of Pollard (1990), the weak convergence
of T ( ) to Gaussian process ( ) over
Rp compact is implied by the conditions i) and ii) of
Proposition B.2 below.
Proposition B.2: The following conditions are satis…ed:
0
0
i) Under Assumptions A.1, A.5-A.15, for any 1 ; :::; n 2 Rp , n 2 N, the vector
T ( 1 ) ; :::; T ( n )
is asymptotically normally distributed with mean zero, and asymptotic variance-covariance matrix
such that:
AsCov ( T ( i ); T ( j )) = 0 ( i ; j ) , i; j = 1; :::; n:
ii) Under Assumptions A.4, A.5, A.8-A.9 and A.16-A.18, the empirical process
cally equicontinuous, that is, 8"; > 0 9 > 0 :
"
#
lim sup P
T !1
where d(:; :) is a metric on
sup
; 2 :d( ; )<
k
T(
)
T(
)k >
T(
) is stochasti-
";
; and P denotes the outer probability.
These conditions imply the weak convergence of empirical process T (and, thus, the weak
convergence of T ). Conditions i) and ii) of the previous proposition are veri…ed below in Section
B.1.2 and B.1.3, respectively.
B.1.2 Finite dimensional convergence
To prove condition i) of Proposition B.2, we use Cramer-Wold device, and follow an approach similar
to Bosq (1998), Theorem 2.3, 3.4, and Tenreiro (1995), Theorem 1.3.10. Let
3
=
0
1 ; :::;
0
n
0
2
0
Rn(K1 +K2 +L+1) , and de…ne the zero-mean triangular array:
Zt;T =
n
X
1
p
T
i=1
0
i
(vt;T ( i )
E [vt;T ( i )]) , t
T; T
1:
Then, we can write:
0
T ( 1 ) ; :::;
T(
0
n)
=
T
X
Zt;T :
t=1
Let m = mT and q = qT be sequences of integer numbers such that:
mT = O (T a ) , qT = O(T b );
0 < b < a < 1,
and let us de…ne k = kT = bT =(mT + qT )c: In particular kT = O T 1 a . Let us divide the sample
into 2k+1 blocks, whose length is equal to m for blocks 1; 3; :::; 2k 1, equal to q for blocks 2; 4; :::; 2k;
and equal to T k(m + q) for the last block. More speci…cally, de…ne:
0
Y1;T
= Z1;T + ::: + Zm;T ,
Y2;T
= Zm+q+1;T + ::: + Z2m+q;T ,
Y2;T = Z2m+q+1;T + ::: + Z2m+2q;T ;
::::
0
= Z(k 1)(m+q)+1;T + ::: + Zkm+(k 1)q;T , Yk;T = Zkm+(k 1)q+1;T + ::: + Zkm+kq;T :
Yk;T
Y1;T = Zm+1;T + ::: + Zm+q;T ;
0
Thus, we can write:
0
T ( 1)
; :::;
0
T ( n)
=
k
X
Yl;T +
k
X
0
Yl;T + YT00 ;
(B.9)
l=1
l=1
where YT00 = Zk(m+q)+1;T + ::: + ZT;T . Then, we will prove that the …rst term in the decomposition
is asymptotically normal, and that the last two terms are negligible.
a) The …rst term is asymptotically normal
Lemma B.3: Under Assumptions A.1, A.5-A.9, A.11-A.14, there exist i.i.d. random variables
Pk
Pk
d
Yl;T ; l = 1; :::; k; such that Yl;T = Yl;T , l = 1; :::; k; and
l=1 Yl;T
l=1 Yl;T = op (1):
h
i1=2
2
Proof: Let cT := E (Y1;T )
and 0 <
T
< cT . From Bradley’s Lemma [e.g. Bosq (1998),
d
Lemma 1.2], there exist i.i.d. random variables Yl;T ; l = 1; :::; k; such that Yl;T = Yl;T , l = 1; :::; k;
and:
2=5
4=5
P Yl;T Yl;T > T
11 (cT = T )
(qT ) ; l = 1; :::; k;
where
(:) are the mixing coe¢ cients of process
0
0
0
Xt ; Yt
: It will be proved below (see Lemma
1=2
B.4) that cT = O (m=T )
. Let " > 0 be given and let
have:
2=5
1=5
P Yl;T Yl;T > "=kT = O kT (m=T )
1=2
T
:= "=kT = o (m=T )
4=5
(qT )
;
. Thus, we
l = 1; :::; k:
We deduce:
P
k
X
l=1
Yl;T
k
X
l=1
!
Yl;T > "
P
k
X
Yl;T
Yl;T > "
l=1
7=5
1=5
= O kT (m=T )
4
!
4=5
(qT )
k
X
l=1
:
P
Yl;T
Yl;T > "=kT
7=5
1=5
4=5
Since (:) has geometric decay by Assumption A.5, O kT (m=T )
is concluded.
(qT )
= o(1): The proof
Thus, we have:
k
X
Yl;T =
l=1
The asymptotic normality of
Pk
l=1
k
X
Yl;T + op (1).
(B.10)
l=1
Yl;T :
k
X
Yl;T
l=1
d
! N 0;
2
,
where 2 > 0 is given below, is proved by using Liapunov CLT [Billingsley (1965)]. For this purpose
we show that:
k
k
h
i
h
i
X
X
2
3
E Yl;T
! 2;
E Yl;T
! 0:
l=1
l=1
These two conditions are veri…ed below in Lemma B.4 and Lemma B.5, respectively.
Lemma B.4: Under Assumptions A.1, A.5-A.9 and A.11-A.14, we have:
k
X
l=1
where
ij
=
E
h
= ( ij ) is the matrix with blocks:
P1
k= 1 Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt
2
Yl;T
i
Proof: We have:
l=1
E
h
Yl;T
2
i
;
k ; Yt k ; j ))
0
k
X
0
!
h
0
w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ;
#
n X
m
X
1 0
p i vt;T ( i )
= kV Y1;T = kV
T
i=1 t=1
#
"
n
m
m
km X 0
1 X
1 X
=
Cov p
vt;T ( i ) ; p
vt;T ( j )
T i;j=1 i
m t=1
m t=1
"
Since km=T ! 1; it is su¢ cient to prove that:
"
#
m
m
1 X
1 X
vt;T ( i ) ; p
vt;T ( j ) !
T;ij := Cov p
m t=1
m t=1
Let us write:
T;ij
=
11
T;ij
21
T;ij
5
12
T;ij
22
T;ij
;
ij ,
j:
i; j = 1; :::; n:
0
j)
jXt = x0
i
!
:
where:
m
11
T;ij
= Cov
m
12
T;ij
210
T;ij
=
= Cov
m
22
T;ij
11
T;ij
11
T;ij
;
m
1 X d=2
p
ge2 (Yt ; i ) K
h
m t=1 T
= Cov
j)
1 X d=2
1 X
p
g1 (Xt ; Yt ; i ) ; p
h
ge2 (Yt ;
m t=1
m t=1 T
j) K
!
Xt x0
hT
m
1 X d=2
ge2 (Yt ;
;p
h
m t=1 T
Xt x0
hT
and derive the limit of each term for T ! 1.
i) For
!
m
1 X
1 X
p
g1 (Xt ; Yt ; i ) ; p
g1 (Xt ; Yt ;
m t=1
m t=1
j) K
;
Xt x0
hT
!
;
we have:
= Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt ; Yt ;
j )) +
m
X1
jlj
m
1
l:jlj=1
Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ;
j )) :
From Assumption A.11 and Cauchy-Schwarz inequality, we get:
r
E kg1 (Xt ; Yt ; i )k
< 1;
for r > 2. Then, by Davidov inequality [Bosq (1998), Corollary 1.1] and Assumption A.5, we get:
kCov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ;
for some 0 <
l
=O
r 1=r
E kg1 (Xt ; Yt ; i )k
E kg1 (Xt ; Yt ;
r 1=r
j )k
;
< 1. Thus, the cross-autocovariances are summable, and:
lim
T !1
= Cov hT
+
m
X1
d=2
1
+
=
m
X1
jlj
m
l:jlj=1
1
X
Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ;
j )) :
l= 1
We have:
ge2 (Yt ; i ) K
l:jlj=1
0T;ij
11
T;ij
22
T;ij .
ii) Let us now consider
22
T;ij
j ))k
Xt x0
hT
Cov hT
1
jlj
m
d=2
; hT
d=2
ge2 (Yt ; i ) K
ge2 (Yt ;
j) K
Xt x0
hT
Xt x0
hT
; hT
d=2
ge2 (Yt l ;
j) K
Xt
l
x0
hT
lT;ij :
(B.11)
Leth us …rst consider the covariance
term 0T;ij . The functions E [e
g2 (Yt ; j ) jXt = :] f (:) and
i
0
E ge2 (Yt ; i ) ge2 (Yt ; j ) jXt = : f (:) are Lebesgue-integrable. Indeed, by applying twice CauchySchwarz inequality, we get:
Z
h
i
0
E g2 (Yt ; ) g2 (Yt ; ) jXt = x f (x)dx
Z
h
i1=2 h
i1=2
2
2
E kg2 (Yt ; )k jXt = x
E kg2 (Yt ; )k jXt = x
f (x)dx
Z
h
i
2
E kg2 (Yt ; )k jXt = x f (x)dx
1=2
Z
h
i1=2 h
i1=2
2
2
= E kg2 (Yt ; )k
E kg2 (Yt ; )k
< 1;
6
h
i
2
E kg2 (Yt ; )k jXt = x f (x)dx
1=2
by Assumption A.11, and similarly:
Z
kE [g2 (Yt ; ) jXt = x]k f (x)dx
h
i1=2
2
E kg2 (Yt ; )k
< 1:
h
i
0
Since the functions E [e
g2 (Yt ; j ) jXt = :] f (:) and E ge2 (Yt ; i ) ge2 (Yt ; j ) jXt = : f (:) are Lebesgueintegrable and continuous at x = x0 (Assumption A.12-A.13), we can apply Bochner’s Lemma [e.g.
Bosq, Lecoutre (1987), p.61] to deduce that:
d
"
E hT d ge2 (Yt ;
hT E ge2 (Yt ; i ) ge2 (Yt ;
j) K
0
j)
K
Xt x0
hT
Xt x0
hT
= E [e
g2 (Yt ;
2
#
Then:
= x0 ] f (x0 ) + o(1);
h
= w2 E ge2 (Yt ; i ) ge2 (Yt ;
"
= hT d E ge2 (Yt ; i ) ge2 (Yt ;
0T;ij
j ) jXt
0
j)
Xt x0
hT
K
h
= w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ;
0
j)
2
#
0
j)
i
jXt = x0 f (x0 ) + o(1):
+ O hdT
i
jXt = x0 + o(1):
Pm 1
Let us now consider the term l:jlj=1 1 jlj
lT;ij in equation (B.11). By repeating the argument
m
used by Bosq (1998), proof of Theorem 2.3, in the case of density estimator, it is possible to prove
that Assumptions A.5-A.9, A.11 and A.14 imply (see Section B.1.4 for the detailed derivation):
m
X1
1
l:jlj=1
jlj
m
lT;ij
= o (1) .
Finally, we conclude:
lim
T !1
22
T;ij
h
= w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ;
0
j)
i
jXt = x0 :
22
iii) Finally, let us consider 12
T;ij . By a similar argument as for T;ij ; the cross-terms are negligible.
Thus, using Assumptions A.1, A.8, A.9, A.13 and Bochner’s Lemma, we get:
12
T;ij
Xt x0
+ o (1)
hT
0
Xt x0
d=2
= hT E g1 (Xt ; Yt ; i ) ge2 (Yt ; j ) K
+ o(1)
hT
h
i
0
d=2
= hT w2 f (x0 )E g1 (Xt ; Yt ; i ) ge2 (Yt ; j ) jXt = x0 + o(1)
= Cov g1 (Xt ; Yt ; i ) ; hT
d=2
ge2 (Yt ;
j) K
= o(1):
The proof is concluded.
Lemma B.5: Under the assumptions of Lemma B.4 and Assumption A.15, the Liapunov condition
holds:
k
h
i
X
3
E Yl;T
! 0:
l=1
7
Proof: By Cauchy-Schwarz we have:
k
X
E
l=1
Since kE
h
Y1;T
2
i
h
Yl;T
3
i
= kE
h
3
Y1;T
h
Since:
kE
Y1;T
4
i
1=4
we have to show:
h
Y1;T
4
i
1=2
4
i
kE
h
Y1;T
2
i
1=2
:
= o(1):
2
m
1 X
(vtT ( i )
k i k @kE 4 p
T t=1
i=1
4
m
1 X
4
p
kE
(vtT ( i )
T t=1
m
1 X
kE 4 p
(vtT ( i )
T t=1
Y1;T
0
n
X
2
We have:
2
kE
= O(1) by Lemma B.4, it is su¢ cient to prove:
kE
h
i
4
E [vtT ( i )])
3
5 =
E [vtT ( i )])
4
E [vtT ( i )])
311=4
5A
3
5 = o(1); 8i = 1; :::; n:
2
m
k 4 X
VtT;i
E
T2
t=1
4
3
5
;
(B.12)
2
!4 3
m
k 4 X
kVtT;i k 5 ;
E
T2
t=1
(B.13)
where VtT;i := vtT ( i ) E [vtT ( i )]. Moreover,
2
!4 3
m
i
X
X h
4
E4
kVtT;i k 5 =
E kVtT;i k
t
t=1
+
X
t1 6=t2
+
h
i
2
E kVt1 T;i k kVt2 T;i k (kVt1 T;i k + kVt2 T;i k)
X
t1 6=t2 6=t3
+
X
h
i
2
E kVt1 T;i k kVt2 T;i k kVt3 T;i k
t1 6=t2 6=t3 6=t4
E [kVt1 T;i k kVt2 T;i k kVt3 T;i k kVt4 T;i k] ;
(B.14)
where summations are over 1; :::; m. Let us now derive the orders of the di¤erent terms. Since:
kVtT;i k
kvtT ( i )k
E [kvtT ( i )k]
kvtT ( i )k + E [kvtT ( i )k] ;
kg1 (Xt ; Yt ; i )k + hT
d=2
ke
g2 (Yt ; i )k K
Xt x0
hT
;
= O(1);
and by Assumption A.11, the leading terms are either of order O(1) or the terms involving the
d=2
highest power of hT
ke
g2 (Yt ; i )k jK (Xt x0 =hT )j.
8
i) We have:
h
4
E kVtT;i k
i
= O hT
2d
"
4
Xt x0
hT
4
E ke
g2 (Yt ; i )k K
4
3
g2 (Yt ; i )k K
= O hT 2d kKk1 E ke
#!
Xt x0
hT
:
From Assumption A.15:
4
hT d E ke
g2 (Yt ; i )k K
Xt x0
hT
Thus, we get:
=
Z
h
i
4
E ke
g2 (Yt ; i )k jXt = x0 + hT u f (x0 +hT u) jK(u)j du = O(1):
h
i
4
E kVtT;i k = O hT d :
ii) We have:
(B.15)
h
i
3
2
3
E kVt1 T;i k kVt2 T;i k = O hT 2d kKk1 E ke
g2 (Yt1 ; i )k ke
g2 (Yt2 ; i )k K
Xt1 x0
hT
K
Xt2 x0
hT
:
From Assumption A.15:
Xt2 x0
Xt1 x0
3
K
hT 2d E ke
g2 (Yt1 ; i )k ke
g2 (Yt2 ; i )k K
hT
hT
Z
h
i
3
=
E ke
g2 (Yt1 ; i )k ke
g2 (Yt2 ; i )k jXt1 = x0 + hT u; Xt2 = x0 + hT v ft1 ;t2 (x0 + hT u; x0 + hT v) jK(u)K(v)j dudv
= O(1):
Thus, we get:
iii) Similarly, we have:
h
i
3
E kVt1 T;i k kVt2 T;i k = O (1) :
h
i
2
2
E kVt1 T;i k kVt2 T;i k
h
i
2
E kVt1 T;i k kVt2 T;i k kVt3 T;i k
E [kVt1 T;i k kVt2 T;i k kVt3 T;i k kVt4 T;i k]
Therefore, from (B.13)-(B.17), we get:
2
m
1 X
4
p
(vtT ( i ) E [vtT ( i )])
kE
T t=1
= O(1);
(B.16)
t1 6= t2 ;
= O (1) ;
t1 6= t2 6= t3 ,
= O (1) ;
t1 6= t2 6= t3 6= t4 .
4
3
5 = O
= O
(B.17)
k
mhT d + m4
T2
m3
T
;
since km=T ! 1, m=T ! 0, T hdT ! 1. (B.12) follows if we choose m such that m ! 1 and
m3 =T ! 0:
From (B.10), Lemma B.4 and Lemma B.5 we conclude that:
k
X
l=1
Yl;T
d
! N 0;
9
0
:
b) The last two terms of the decomposition (B.9) are negligible
Lemma B.6: Under the assumptions of Lemma B.4,
k
X
0
YT00 = op (1):
Yl;T = op (1);
l=1
Proof: The proof is similar to the proof of Theorem 1.3.10 in Tenreiro (1995), p. 14.
i) We have:
3
2
l(m+q)
l(m+q)
n
n
X
X
X
X
0
0
1
5
4
p
(v
(
)
E
[v
(
)])
Yl;T =
Zt;T =
t;T
i
t;T
i
i
T t=lm+(l 1)q+1
i=1
i=1
t=lm+(l 1)q+1
0
i UlT;i :
Thus:
2
E4
k
X
l=1
2
!2 31=2
n
X
=E4
Yl;T 5
0
0
i
i=1
l=1
Therefore, it is su¢ cient to prove:
2
We have:
2
E4
and:
h
0
k
X
2
UlT;i
l=1
kE UlT;i UlT;i
i
3
E4
k
X
k
X
!!2 31=2
5
UlT;i
2
3
5 = o(1);
UlT;i
l=1
n
X
i=1
k ik E 4
k
X
2
UlT;i
l=1
31=2
5
8i:
h 0
i kX1
5 = kE UlT;i UlT;i +
(k
:
(B.18)
0
jsj)E UlT;i Ul
jsj=1
"
kq
Tr V
= kT r [V (UlT;i )] =
T
2
!#
q
1 X
vt;T ( i )
p
q t=1
From the proof of Lemma B.4, e T;ii = O(1). Since kq=T = o(1); we get:
h 0
i
kE UlT;i UlT;i = o(1):
sT;i
;
kq
T r e T;ii :
T
Moreover,
k
X1
jsj=1
(k
0
jsj)E UlT;i Ul
sT;i
k
k
X1
0
E UlT;i Ul
s;T;i
jsj=1
1
0
2kq X
E vt;T ( i ) vt
T s=1
s;T
1
2kq X
kCov (vt;T ( i ) ; vt
T s=1
10
( i)
s;T
0
E (vt;T ( i )) E (vt
( i ))k :
s;T
( i ))
Using
P1 the same argument as in the proof of Lemma B.4 (see also Section B.1.4), we can show that
s=1 kCov (vt;T ( i ) ; vt s;T ( i ))k < 1: Thus,
k
X1
0
(k
jsj)E UlT;i Ul
jsj=1
= o(1):
sT;i
Then (B.18) follows.
ii) We have:
T
X
00
YT =
Zt;T =
n
X
i=1
t=k(m+q)+1
and:
E
2
4 1
i p
T
0
00
T
X
(vt;T ( i )
t=k(m+q)+1
i=1
Therefore, it is su¢ cient to prove:
h
i
2
E kUT;i k = o(1);
We have:
h
0
i
E UT;i UT;i =
and the proof is concluded.
T
n
X
E [vt;T ( i )])5
0
i UT;i ;
i=1
h
i1=2
2
k i k E kUT;i k
:
n
X
2 1=2
YT
3
2
k(m + q) 4
V p
T
T
8i:
T
1
k(m + q)
k(m+q)
X
3
vt;T ( i )5 ! 0;
t=1
B.1.3 Stochastic equicontinuity
Let us now prove the stochastic equicontinuity of empirical process T ( ) [condition ii) in Proposition
B.2] along the lines of Theorem 1 in Andrews (1991). Let us introduce the matrix-valued triangular
array:
!
Zt
0
Wt;T =
; t T , T 1;
d=2
0 hT K XthTx0 IdK2 +L+1
where Zt denotes the instrument. We can write:
vt;T ( ) = Wt;T
(Yt ; ) ;
2
where
0
0
(y; ) = g (y; ) ; ge2 (y; )
;
0
;
2
:
2
Let
j : j 2 N be the basis of L (FY ) introduced in Assumption A.16. Without loss of generality,
we can set 1 (y) = 1. Thus, from Assumption A.16, there exist sequences cj ( ) : j 2 N , 2 ,
of vector coe¢ cients such that:
(y; ) =
1
X
cj ( )
j=1
11
j
(y) ;
y 2 Y;
for any
2
, where
lim sup
J!1 2
T
( )
T
( )
=
j=1
=
1
X
T
1 X
p
Wt;T
T t=1
T
T
X
1=2
where Xj;tT := Wt;T
k
T
j
(Yt )
E Wt;T
( )k
T
Xj;tT
!
< 1:
E Wt;T
cj ( )
j
!
(Yt )
cj ( )
cj ( )
cj ( ) ;
(Yt ) ; and:
1
X
2
( )
j
(Yt )
j
t=1
j=1
2
cj ( )
j
j=J
Thus, we have:
1
X
1
X
1
1=2
T
j
T
X
Xj;tT
t=1
j=1
de…ned by:
0
1
X
d( ; ) = @
cj ( )
2 1
X
1
cj ( )
j
j=1
cj ( )
2
:
(B.19)
Let d (:; :) denote the metric on
cj ( )
2
j=1
For any ; > 0; we have:
"
lim sup P
T !1
1
2
1 @
2
"
lim sup E
0
T !1
sup
; 2 : d( ; )
using (B.19). Since:
2
T
X
E 4 T 1=2
Xj;tT
t=1
sup
; 2 : d( ; )
2
3
k
sup
; 2 : d( ; )
1
X
1
j=1
j
T
( )
k
T
T
0 2
5 = T r @E 4 T
h
cj ( )
1=2
A
T
( )k
2
1
0
+
; 2
:
#
1
X
jE
T !1 j=1
Xj;tT
t=1
= T r E Xj;tT Xj;tT
2
A lim sup
T
X
i
,
#
( )k >
( )
cj ( )
11=2
!
T
X1
jkj=1
T
1=2
T
X
t=1
1
jkj
T
2
4 T
1=2
T
X
2
Xj;tT
t=1
3
5;
(B.20)
!0 31
Xj;tT 5A
h
0
T r E Xj;tT Xj;t
k;T
i
(B.21)
we can study the asymptotic behaviour of the di¤erent terms in the decomposition.
Lemma B.7: Under Assumptions A.5, A.8-A.9 and A.17-A.18,
!
h
i
V Zt j (Yt )
0 i
0
h
E Xj;tT Xj;tT
=
+ uj;T ;
2
0
w2 E j (Yt ) jXt = x0 f (x0 )IdK2 +L+1
h
i
0
Cov Zt j (Yt ) ; Zt k j (Yt k ) 0
E Xj;tT Xj;t k;T
=
+ uj;kT ;
k 6= 0;
0
0
12
;
h
i
0
0
2
where V Zt j (Yt ) := E Zt Zt j (Yt )
E Zt j (Yt ) E Zt j (Yt ) , Cov Zt
h
i
0
0
E Zt Zt k j (Yt ) j (Yt k )
E Zt j (Yt ) E Zt k j (Yt k ) ; and:
sup kuj;T k = o (1) ,
sup
j
Proof: i) We have:
0 h
0
h
i
E Zt Zt
0
@
E Xj;tT Xj;tT =
2
j
(Yt )
i
j
E Zt
T
X
k=1
(Yt ) ; Zt
k
j
(Yt
k)
j
(Yt )
0
hT d V
0
h
0
j
(Yt ) K
Xt x0
hT
Let us consider the lower right block. The term:
Z
Xt x0
d
hT E j (Yt ) K
= E j (Yt ) jXt = x0 + hT u f (x0 + hT u) K(u)du;
hT
i
IdK2 +L+1
is bounded uniformly in j 2 N from Assumption A.17 and the Cauchy-Schwartz inequality. Moreover,
from standard bias expansion and Assumption A.17:
# Z
"
2
Xt x0
2
2
d
= 'j (x0 + hT u)K (u) du = w2 'j (x0 ) + O sup D2 'j 1 h2T :
hT E j (Yt ) K
hT
j
Thus:
hT d V
j
uniformly in j 2 N.
ii) We have:
22
kT;j
= w2 E
h
0
E Xj;tT Xj;t
where:
11
kT;j
Xt x0
hT
(Yt ) K
h
0
= E Zt Zt
j
k
= hT d Cov
j
(Yt )
j
kT
(Yt
i
i
k)
j
i
2
(Yt ) jXt = x0 f (x0 ) + o(1);
11
kT;j
=
0
;
j
j
(Yt
;
22
kT;j
0
E Zt
Xt x0
hT
(Yt ) K
h
(Yt ) E Zt
k) K
k
Xt
j
(Yt
x0
k
hT
k)
0
;
IdK2 +L+1 :
Let us consider 22
kT;j . We can use the same arguments as in the proof of Lemma B.4 to get bounds
uniform in j 2 N from Assumptions A.5, A.8, A.9 and A.18. Thus,
T
X1
hT d Cov
j
Xt x0
hT
(Yt ) K
k=1
;
j
(Yt
Xt
k) K
x0
k
hT
= o(1);
uniformly in j 2 N. The proof is concluded.
From Davidov inequality, we have:
Cov Zt
j
(Yt ) ; Zt
k
j
(Yt
k)
const
13
:=
kuj;kT k = o (1) :
(Yt ) E Zt
j
j
k
E
Zt
j
(Yt )
r 2=r
;
(B.22)
1
A:
uniformly in j 2 N; for some 0 < < 1 and r > 2 as in Assumption A.16. Thus, from Lemma B.7
and equations (B.21), (B.22) we get:
3
2
2
T
1
X
X
4 T 1=2
lim sup
Xj;tT 5
jE
T !1 j=1
1
X
j
j=1
t=1
n
T r V Zt
j
(Yt )
for some constants C1 ; C2 < 1. Now:
h
0
T r V Zt j (Yt ) = T r E Zt Zt
Since:
h
0
T r E Zt Zt
and:
T r E Zt
j
(Yt ) E Zt
j
j
(Yt )
lim sup
T !1 j=1
2
4 T
jE
1=2
T
X
2E
t=1
2
Xj;tT
3
5
(Yt )
(Yt )
(Yt )
Thus, we get:
1
X
2
j
2
j
we have:
T r V Zt
Zt
+ C1 E
h
0
i
(Yt )
j
i
=E
C3
(Yt )
j
1
X
j=1
+ C2 E
T r E Zt
h
= E Zt
Zt
r 2=r
2
Zt
j
i
n
j E
j
(Yt )
2E
Zt
j
2
i
E
Zt
2
j
(Yt ) jXt = x0
(Yt ) E Zt
2
(Yt )
h
j
j (Yt )
j
h
Zt
j
(Yt )
r 2=r
(Yt )
r 2=r
; 2 : d( ; )
+E
2
i
;
:
h
2
j (Yt ) jXt = x0
j=1
j
cj ( )
The conclusion follows from:
sup
:
io
= C4 < 1,
; 2 : d( ; )
lim
0
;
;
for some constants C3 ; C4 < 1 from Assumption A.16. We deduce from (B.20):
0
"
#
1
X
1
1 @
lim sup P
sup
k T( )
C4 2
sup
T ( )k >
T !1
(Yt )
io
!0 ; 2 : d( ; )
1
X
1
j=1
j
cj ( )
cj ( )
2
cj ( )
2
1
A:
= 0;
which is proved by Andrews (1991), Equation (2.6).
B.1.4 Bounds on covariance terms
Pm 1
In this section we derive a bound for the covariance term l:jlj=1 1 jlj
lT;ij in equation (B.11).
m
This is done by deriving two bounds for the covariance terms. i) For this purpose, let us de…ne
functions:
= E [e
g2 (Yt ; i ) jXt = x] f (x);
h
0
e2 (Yt ; i ) ge2 (Yt l ; j ) jXt = x; Xt
l;ij (x; ) = E g
i
(x)
14
l
=
i
ft;t l (x; ) .
We can write:
Xt x0
Xt l x0
Cov ge2 (Yt ; i ) K
; ge2 (Yt l ; j ) K
hT
hT
Z Z
x x0
x0
=
K
dxd
l;ij (x; ) K
hT
hT
Z
Z
0
x x0
x0
(x)
K
dx
d
i
j( ) K
hT
hT
Z Z
= h2d
l;ij (x0 + hT u; x0 + hT v) K (u) K (v) dudv
T
Z
Z
0
(x
+
h
u)
K
(u)
du
0
T
i
j (x0 + hT v) K (v) dv :
From Assumptions A.6-A.7 and A.14, and by the Cauchy-Schwartz inequality, function
are bounded uniformly in l 2 N. We get:
k
lT;ij k
sup
l;ij 1
l
2
+ k i k1
kKkL1 hdT
j 1
C1 hdT .
i
and
l;ij
(B.23)
ii) From the strong mixing property (Assumption A.5) and Davidov inequality:
Cov ge2 (Yt ; i ) K
l
const
for some 0 <
E
"
E
"
Xt x0
hT
ge2 (Yt ; i ) K
; ge2 (Yt l ;
#
r 1=r
Xt x0
hT
Xt
j) K
E
"
x0
l
hT
ge2 (Yt l ;
Xt
j) K
l
x0
hT
#
r 1=r
;
< 1 and r > 2. Moreover we have:
#
r
Xt x0
r
r
r
ge2 (Yt ; i ) K
kKk1 E ke
g2 (Yt ; i )k =: kKk1 ci < 1;
hT
from Assumptions A.8 and A.11. We deduce:
k
C2 l hT d ;
lT;ij k
(B.24)
for some constant C2 < 1 (that depends on i and j):
d=2
Let us now de…ne LT = bhT c ! 1. From (B.23) and (B.24) we have:
m
X1
1
l:jlj=1
jlj
m
2
lT;ij
m
X1
l=1
2
k
LT
X
lT;ij k
C1 hdT
+
l=1
=
2 C1 LT hdT +
1
X
Pm
1
l:jlj=1
1
jlj
m
lT;ij
= o(1).
15
C2 hT
d
l=LT +1
C2
1
const 1=LT + L2T
We deduce
l
hT d
LT +1
LT +1
! 0:
!
B.2 Proof of consistency
In this Section we prove that P
P
h
b
T
h
"
0
b
T
i
i
" ! 0, as T ! 1; for any " > 0. We have:
0
P
P
"
inf
"
2
B:k
2
B:k
QT ( )
0
k
"
0
k
"
inf
QT ( )
QT
b
T
#
#
QT ( 0 ) :
(B.25)
Let us derive the orders of the RHS term and the LHS term inside the probability. Write the criterion
as:
0
QT ( ) = [ T ( ) + mT ( )] [ T ( ) + mT ( )] ;
2
B:
(B.26)
Then, since
T ( 0)
= Op (1) from Lemma A.1, and mT ( 0 ) = 0, we get:
QT ( 0 ) = Op (1):
Let us now derive the order of
(B.27)
inf
QT ( ) : From Lemma A.1 and the Continuous
k "
Mapping Theorem [CMT, Billingsley (1968)], we have:
B:k
2
sup
T(
0
0
)
2
sup
2
0
mT ( )
T(
)
T(
)
= Op (1);
p
= Op ( T ):
B
From (B.26) it follows that:
0
QT ( ) = mT ( )
uniformly in
get:
2
0
mT ( )
T
T hdT
B. Now, let
p
mT ( ) + Op ( T );
> 0 be the smallest eigenvalue of
(Assumption A.20). We
mT ( )
2
2
kE [g1 (Yt ; Xt ; )]k + hdT kE [g2 (Yt ; ) jXt = x0 ]k +hdT kE [a (Yt ; ) jXt = x0 ]
2
2
kE [g1 (Yt ; Xt ; )]k + kE [g2 (Yt ; ) jXt = x0 ]k + kE [a (Yt ; ) jXt = x0 ]
k
2
k
2
;
for T large, and any
2
B. From continuity of moment functions (Assumption A.19), compactness of
B (Assumption A.4) and global identi…cation (Assumption A.2), we have:
2
inf
B:k
0
0
k
mT ( )
"
mT ( )
CT hdT ;
for a constant C = C" > 0. From bandwidth Assumption A.9, we have
get:
1
CT hdT ;
inf
QT ( )
2
2
B:k
0k "
p
T = o T hdT . Thus, we
(B.28)
with probability approching 1. Since T hdT ! 1 from Assumption A.9, and by using (B.25), (B.27)
and (B.28), the conclusion follows.
16
B.3 Proof of Lemma A.2
We use the following Lemma.
Lemma B.8: Under Assumptions A.1-A.20 and A.24:
b
T
q
= Op 1= T hdT .
0
Proof: We follow the approach in the proof of Lemma A1 in Stock, Wright (2000). Since bT is the
minimizer of QT we have:
QT bT
QT ( 0 ) =
that is,
h
b
T( T)
0
where d1;T =
b
0
T( T)
b
0
T( T)
T ( 0)
T ( 0 ).
where:
=
and
b
T( T)
mT (bT )
b
+ d1;T
d3;T = d1;T = =
mT (bT )
T ( 0)
0;
0;
;
b
T( T)
2d2;T mT (bT ) + d3;T
Inequality (B.29) implies:
0
T ( 0)
2
mT (bT )
T( T)
2
i
+ mT (bT )
By using:
0
mT (bT ) mT (bT )
mT (bT )
b
b
T( T)
mT (bT ) + 2mT (bT )
0
mT (bT )
T( T)
h
0
mT (bT )
we deduce:
d2;T =
i0
+ mT (bT )
h
0
b
d2;T + d22;T
d3;T
0;
b
T( T)
T( T)
1=2
;
(B.29)
i
0
T ( 0)
T ( 0)
= :
:
Let us now derive the order of the RHS. From Lemma A.1 and CMT we have:
d2;T
sup k
T(
2
jd3;T j
2 sup
)k = = Op (1) ;
0
T(
)
T(
) = = Op (1) :
2
We get mT (bT ) = Op (1) : De…ne:
0
0
G ( ) = E [g1 (Xt ; Yt ; )] ; E [g2 (Yt ; )jx0 ] ; E [a (Yt ; )
for
2
2
B: Since kmT ( )k
2
T hdT kG ( )k ;
By the mean-value theorem we can write 1 :
@G e
b
0 ( T)
T
@
2
;
q
B, we deduce: G(bT ) = Op 1= T hdT
q
= Op 1= T hdT
0
0
jx0 ]
;
1 More precisely, the mean-value theorem is applied separately for any component of function G, and the intermediary point ~T can di¤er across components.
17
:
where eT is between bT and 0 . Since bT converges to
0
@G=@ ( ) is continuous by Assumption A.24, we have:
0
by consistency (Section B.2), and
@G e
@G
p
!
0 ( T)
0 ( 0 ),
@
@
0
where @G=@ ( 0 ) has full rank, by the local identi…cation condition in Assumption A.3. The
conclusion follows.
Let us now prove Lemma A.2. From Lemma B.8, it is enough to show that plimT !1
q
d
J0 , for any T such that T
0 = Op 1= T hT . We have:
@b
gT
0
@
T
0
B
B
RT = B
@
Thus, we have to show:
b
E
h
@g1
0
h@
d=2 b @g2
hT E @ 0
h
d=2 b @a
hT E
0
@
i)
ii)
T
T
T
b @g10
E
@
@g
2
b
E
0
@
iii) hT
d=2
i
R1;Z
i
jx0 R1;Z
i
jx0 R1;Z
p
b @g10
E
@
d=2
b
E
h
b
E
@g2
0
h@
@a
b
E @ 0
h
@g1
0
@
T
T
T
i
R2;Z
i
jx0 R2;Z
i
jx0 R2;Z
0
0
IdL
T
RT =
1
C
C
C:
A
@g1
;
0 ( 0)
@
@g2
p
!E
;
0 ( 0 ) jx0
@
!E
T
T
hT
@b
gT
0
@
jx0
p
T
R2;Z ! 0 :
Let us now prove these results.
i) From Assumptions
A.4,
(1989),
Corollary
h
i p A.5,hA.21 and
i A.22, the ULLN [see Potscher, Prucha
h
i
@g1
@g1
@g1
b
1] implies that E @ 0 ( ) ! E @ 0 ( ) uniformly in 2 . Moreover, E @ 0 ( ) is continuous
w.r.t. by Assumption A.24. Then i) follows.
ii) Let g2;i denote the i-th component of function g2 , for i = 1;
; K2 + L. We have:
"
#
@g2;i
@ 2 g2;i
@g2;i
b
b
b
( T )jx0 = E
( 0 )jx0 + E
(•T )jx0 ( T
E
0 );
@
@
@ @ 0
@g2;i
p
b
and 0 . Under Assumptions A.5-A.9 and A.23 one can show that E
( 0 )jx0 !
@
"
#
@g2;i
@ 2 g2;i
b
E
( 0 )jx0 and E
(•T )jx0 = Op (1). Then ii) follows.
@
@ @ 0
iii) Let g1;i denote the i-th component of function g1 ; i = 1; :::; K1 . We have:
q
2
0
1 p b @g1;i
1
d=2 b @g1;i
d
b @ g1;i0 •T R2;Z ;
hT E
R2;Z = q
TE
T
h
E
T
T
0
0
0 ( 0 ) R2;Z + p
T
@
@
@ @
T hdT
T hd
where •T is between
T
T
(B.30)
where •T is between
the …rst one:
T
p
and
0:
Let us derive the orders of the two terms in the RHS of (B.30). For
T
@g1;i
1 X @g1;i
b
p
TE
0 ( 0 ) R2;Z =
0 (Yt ; Xt ;
@
T t=1 @
18
0 ) R2;Z ;
h
i
0
where E @g1;i =@ (Yt ; Xt ; 0 ) R2;Z = 0. From Assumptions A.5 and A.22, the CLT for mixing
processes [e.g. Herrndorf (1984), Corollary 1] implies:
p
b @g1;i
TE
0 ( 0 ) R2;Z = Op (1):
@
Let us now consider the second term in (B.30). From Assumptions A.4, A.5, A.22 and A.24, the
ULLN implies:
2
b @ g1;i0 •T
= Op (1) :
E
@ @
Thus, from (B.30) we get:
d=2 b @g1;i
hT E
@
T
R2;Z
1
1
+ p d A = op (1) ,
= Op @ q
d
T hT
T hT
0
1
from bandwidth condition in Assumption A.9. The proof is concluded.
B.4 Proof of Corollary 6
If the bandwidth is such that c = lim T h2m+d
= 0, from (A.6) the optimal weighting matrix
T
0
for given instrument is
= V0 1 . The proof that Z = E @g
@ (Y ; 0 ) jX W (X) is still an optimal instrument is similar to the proof of Proposition 3, replacing M (Z; c; a) with V (Z; a) =
0
1
1
w2
0
e, which is the asymptotic variance of ^ T . Thus, the bias-free kernel nonfX (x0 ) e J0;Z 0 J0;Z
parametric e¢ ciency bound is B (a; x0 ) =
block inversion formula.
0
w2
fX (x0 ) e
J0
0
1
0
1
J0
e. Corollary 6 follows from the
B.5 Proof of Lemma A.3
B.5.1 Asymptotic expansion of the concentrated objective function
p
Since the conditional moment restrictions are satis…ed asymptotically, we have bT ! 0, when
T ! 1. Therefore, we can consider the second-order asymptotic expansion of function LcT ( ; ) in
a neighbourhood of = 0 , = 0. Let us …rst derive the expansion w.r.t. . We have:
b exp
log E
0
g2 ( )jx0
b g2 ( )g2 ( )0 jx0
b (g2 ( )jx0 ) + 1 0 E
E
2
0
0
1
b (g2 ( )jx0 ) +
E
Vb (g2 ( )jx0 ) :
2
' log 1 +
'
0
Therefore, we can asymptotically concentrate w.r.t.
'
Vb (g2 ( )jx0 )
:
1
b (g2 ( )jx0 ) ;
E
(B.31)
and the asymptotic expansion of the concentrated objective function becomes:
LcT ( ) '
T
0
1Xb
E (g( )jxt ) Vb (g( )jxt )
T t=1
1
b (g( )jxt )+ 1 hd E
b (g2 ( )jx0 ) Vb (g2 ( )jx0 )
E
2 T
0
19
1
b (g2 ( )jx0 ) :
E
Criterion LcT ( ) multiplied by T is asymptotically equivalent to the criterion of the kernel moment
estimator (see De…nition 4) with optimal instrument and weighting matrix.
Let us now consider the expansion around = 0 . We have:
b (g( )jxt ) ' E
b (g( 0 )jxt ) + E
E
@g
(
0 ( 0 ) j xt
@
Vb (g( )jxt ) ' V (g( 0 ) j xt ) ;
0) ;
and similarly for the expectations of function g2 . Thus, we get:
LcT (
T
1X b
E (gjxt ) + E
T t=1
) '
1
+ hdT
2
b (g2 jx0 ) + E
E
V (g j xt )
b (gjxt ) + E
E
1
0
0)
V (g2 j x0 )
@g
(
0 j xt
@
0)
@g
R1
0 j xt
@
=E
@g
(
0 j xt
@
1
1;0
0)
@g2
(
0 j x0
@
b (g2 jx0 ) + E
E
1
0:
B.5.2 Asymptotic expansion of bT
E
0)
@g2
(
0 j x0
@
where functions g; g2 are evaluated at
We have:
0
@g
(
0 j xt
@
0)
;
:
We get:
LcT (
) '
T
1X b
E (gjxt ) + E
T t=1
@g
R1
0 jxt
@
0
1
V (gjxt )
1;0
b (g2 jx0 ) + E @g20 jx0 R1
E
1
@
1
b (g2 jx0 ) + E @g20 jx0 R1
E
V (g2 jx0 )
@
1
+ hdT
2
1;0
1
1
@g
R1
0 jxt
@
b (gjxt ) + E
E
@g2
+E
R2 2
0 jx0
@
@g2
R2
0 jx0
1;0 + E
@
0
2;0
2
2;0
:
The asymptotic expansion of b1;T is obtained from the maximization of the …rst term in LcT ( );
since the contribution of the second term is asymptotically negligible. We get:
!
"
# 1
0
T
p
@g
@g
1X 0
1
R E
jxt V (gjxt ) E
T b1;T
'
R1
0 jxt
1;0
T t=1 1
@
@
!
Z
0
T
@g
1 X 0
1
p
R1 E
jxt V (gjxt )
g(y; 0 )fb(yjxt )dy:
@
T t=1
Thus:
p
T b1;T
1;0
'
0
"
R1 E E
0
@g
jxt
@
V (gjxt )
1
E
@g
0 jxt
@
#
R1
!
1
!
0
@g
1
jx V (gjx) g(y; 0 )fb(y; x)dxdy
@
"
!
# ! 1
0
0
@g
@g
1
R1 E E
jxt V (gjxt ) E
R1
0 jxt
@
@
!
0
T
1 X 0
@g
1
p
R1 E
jxt V (gjxt ) g(yt ; 0 ):
@
T t=1
p Z Z 0
T
R1 E
'
!
20
1
1;0
The bias term induced by the kernel estimator is asymptotically negligible since T hd+2m
= o(1).
T
The asymptotic expansion of b2;T can be deduced from the maximization of the second component
of LcT ( ). Estimator b2;T converges at a nonparametric rate, and terms involving b1;T
1;0 can
be neglected. We get:
!
# 1
"
0
q
0
@g2
@g2
1
d
T hT b2;T
jx0 V (g2 jx0 ) E
R2
'
R2 E
0 jx0
2;0
@
@
!
Z
0
q
0
@g2
1
T hdT g2 (y; 0 )fb(yjx0 )dy:
jx0 V (g2 jx0 )
R2 E
@
Then, point i) of Lemma A.3 is proved.
B.5.3 Asymptotic expansion of bT
We have from (B.31):
Moreover,
bT '
Vb g2 (bT )jx0
b g2 (bT )jx0
E
Z
1
b g2 (bT )jx0 '
E
V (g2 ( 0 )jx0 )
1
b g2 (bT )jx0 :
E
@g2
bT
0
0 jx0
@
Z
@g2
'
g2 (y; 0 )fb(yjx0 )dy + E
R2 b2;T
0 jx0
2;0
@
(since the contribution of b1;T
1;0 is asymptotically negligible)
Z
' (Id M ) g2 (y; 0 )fb(yjx0 )dy;
'
g2 (y;
b
0 )f (yjx0 )dy
+E
where M is the matrix in (A.19). Then:
bT '
1
V (g2 ( 0 )jx0 )
(Id
M)
Z
g2 (y;
b
0 )f (yjx0 )dy;
and point ii) of Lemma A.3 is proved.
B.6 Proof of Corollary 8
From Appendix A.1.4, equation (A.5), the asymptotic distribution of the optimal kernel moment
estimator of 0 is such that:
p
T (^1;T
where
0
1;0 ) ;
q
T hdT (^2;T
0
2;0 ) ;
q
0
T hdT ^ T
0
=
(J00 J0 )
1
J00 g^T ( 0 ) + op (1);
(B.32)
= V0 1 , matrix V0 is given in (A.2), and:
0
1
E @g
R1
0
0
@
B
B
2
J0 = B
0
E @g
R2
0 jx0
@
@
@a
0
E @ 0 jx0 R2
21
0
0
IdL
1
C
C
C =:
A
E
@g1
0
@
0
R1
0
J0
!
:
0
0)
@g(Y ;
@
For Z = E
jX V (g(Y ;
E
@g1
0
@
2
R10 E
1
0 )jX)
, we have:
@g 0
jX V (gjX)
@
=E E
1
@g
jX
@ 0
E
= V (g1 ) :
Thus:
(J00 J0 )
1
=4
J00
We get:
(J00 J0 )
1
h
@a
0
@
ij
0 ,
0
11
0
1
0
0
0
21
0
21
0
G
0
@g 0
@ jX
J0
0 0
2)
= ( 1;
+A
=
E
12
0
^ [a
E
0 jx0 ]
1
R1
R10
0
0
22
0
1
0
. Solving for
1
22
0
1
21
0
2
i
@g
@ 0 jX
E
1
0
= G
0
0
1
J0
J0 0
1
0
3
5:
1
R1
^ [g2 jx0 ]
E
^ [g2 jx0 ]
E
^ [a
E
0 jx0 ]
22
0
+A
+A
1
V (gjX)
G
A
1
1
0;11 G
G
0
1 ^
0;11 E
J0
1
1
= 0, that is
= 0;
+
2
in the second block equation, we get:
^ [g2 jx0 ]
E
G
:
1
1
1
12
22
By replacing in the …rst block equation, and using 0;11
= 11
0
0
0
1
0;21 0;11 from the formulas of the inverse of a block matrix, we get:
1
1
0
J0
RL solves J0 0
2 Rs
i; j = 1; 2, denote the blocks of
2
i
@g
@ 0 jX
3
p
^ [g1 ]
R10 T E
5:
q
1
^ [g jx0 ]
J0 0 0 1 J0
J0 0 0 1 T hdT E
2
(B.33)
1
^ [g jx0 ]. Let us denote G := E @g20 jx0 R2 ,
J0 0 0 1 E
2
@
h
R10 E E
:= J0 0
jx0 R2 . Then
V (gjX)
1
0
J00 g^T ( 0 ) = 4
G
where
E
2
Let us now compute
A := E
@g 0
@ jX
21
0
and
22
0
1
21
0
=
[g2 jx0 ] ;
and:
^ [a
E
=
2
0 jx0 ]
+ A
Thus, using (B.32), (B.33) and
p
T (^1;T
q
T hdT (^2;T
1;0 )
2;0 )
0;21
0;11
R10 E E
=
0
=
R2 E
0
R2 E
0
+
0;21
1
0;11 G
G
= V (g2 jx0 ),
1 ^
0;11 E
0
1
0;11 G
0;21
@g 0
jX V (gjX)
@
!
0
@g2
jx0
@
!
@g2
jx0
@
V (g2 jx0 )
V (g2 jx0 )
22
[g2 jx0 ]
1
1
G
0
1 ^
0;11 E
[g2 jx0 ] :
= Cov (a; g2 jx0 ), we get:
1
1
E
E
@g
jX
@ 0
1
R1
@g2
R2
0 jx0
@
!
1
q
^ [g2 jx0 ] + op (1);
T hdT E
p
^ [g1 ] + op (1);
R10 T E
and:
q
T hdT ^ T
=
q
^ [a
T hdT E
0
0 jx0 ]
1
Cov (a; g2 jx0 ) V (g2 jx0 )
q
^ [g2 jx0 ]
T hdT E
@a
@g2
1
R2 Cov (a; g2 jx0 ) V (g2 jx0 ) E
R2
0 jx0
0 jx0
@
@
! 1
!
!
0
0
0
0
@g2
@g2
@g2
1
R2 E
R2
jx0 V (g2 jx0 ) E
R2 E
jx0 V (g2 jx0 )
0 jx0
@
@
@
E
1
q
^ [g2 jx0 ] + op (1):
T hdT E
The asymptotic expansions for ^1;T , ^2;T correspond to the asymptotic expansions of the XMM
estimators ^1;T , ^2;T in Lemma A.3 (i). The conclusion follows.
B.7 Regularity conditions in the stochastic volatility model
In this Section, we discuss the technical regularity assumptions for the XMM estimator (see Appendix
A.1.1 in the paper) when the DGP P0 is compatible with the stochastic volatility model (3.6)-(3.8).
They concern the stationary distribution (Section B.7.1) and the existence of moments (Section
B.7.2).
B.7.1 Stationary distribution
Let us consider process Xt = r~t ; 2t : t 2 Z , where the dynamics of r~t = rt rf;t and 2t under
the DGP P0 are de…ned in Section 3.2. Markov process Xt is exponential a¢ ne:
h
i
i
i
h
h
0
2
2
E0 e z Xt+1 j Xt
= E0 e u~rt+1 v t+1 j Xt = E0 e ( 0 u+v) t+1 E0 e u t+1 "t+1 j 2t ; Xt j Xt
h
i
2
1 2
1 2
2
= E0 e ( 0 u+v 2 u ) t+1 j 2t = exp a0
u
b0
0u + v
0u + v
t
2
h
i
0
= exp A(z) Xt B(z) ;
0
0
where A(z) = 0; a0 0 u + v 21 u2 , B(z) = b0 0 u + v 12 u2 ; for z = (u; v) 2 C2 such that
Re 0 u + v 12 u2 > 1=c0 , and functions a0 and b0 are de…ned in Section 3.2:
i) Strict stationarity and geometric strong mixing
From Proposition 2 in Gouriéroux, Jasiak (2006), the ARG process 2t is stationary if 0
0 < 1,
2
with marginal invariant distribution such that [(1
)
=c
]
(
),
where
(
)
denotes
the
0
0
0
0
t
gamma distribution with parameter 0 . Thus, when 0 < 1; process (Xt ) admits the marginal
invariant distribution:
f (x)
=
1
r~
0
2
[(1
0 ) =c0 ]
0
( 0)
1
e
0
c0
2
2
0
1
, x = r~;
2
2R
R+ = X :
(B.34)
To prove that (Xt ) is geometrically strongly mixing, we use Proposition 4.2 in Darolles, Gouriéroux,
Jasiak (2006), and verify the condition:
lim
h!1
@A
h
(0) = 0:
@z 0
23
(B.35)
1 2
u
2
We have:
@A
(0) =
@z 0
0
0
0 0
:
0
Condition (B.35) is satis…ed if 0 < 1. Thus, with Yt = Xt+1 ; :::; Xt+h
conclude that Assumption A.5 is satis…ed if 0
0 < 1:
0
for given h 2 N; we
ii) Smoothness of the marginal distribution
The stationary distribution f in (B.34) is in C 1 (X ). Moreover, we have:
1
f (x)
C1 e
2
0
2
c0
3=2
0
for a constant C1 > 0. Thus, kf k1 < 1 if, and only if,
Lemma B.9.
;
x 2 X;
3=2. Moreover, we have the following
0
Lemma B.9: kDm f k1 < 1 if, and only if, 0 3=2 + m.
Proof: Let m 2 N. Since f 2 C m (X ), to prove kDm f k1 < 1; it is su¢ cient to show that any
partial derivative of order m of function f is bounded at the boundary of X , that is, for r~ ! 1,
2
! 1, 2 ! 0. From (B.34), let us write:
2
f (~
r;
where
= (1
0 ) =c0 ,
)=C
C=
0
2
h(~
r;
2
) e
2
0
3=2
;
r~;
2
2 X;
= [ ( 0 )] ; and:
h(~
r;
2
) :=
r~
0
2
.
The function h is such that:
@h
(~
r;
@ r~
@h
(~
r;
@ 2
2
2
)
)
=
1
;
r~
0
=
0
2
=2
p
2
2
=
1
0
1
2
2
h(~
r;
2
):
We deduce that:
i) Any partial derivative of f is a linear combination of functions of the type:
(n)
h(~
r;
2
) h(~
r;
2 k
) e
2
2 l
, n; k 2 N; l 2 R:
ii) Since function h 7 ! (n) (h)hk is bounded, for any n; k 2 N, it is su¢ cient to prove that partial
derivatives of order m are bounded for 2 ! 0. This is the case if, and only if, the smallest power l
of 2 , which occurs in partial derivatives of order m, is non-negative.
m
and is l = 0 3=2 m. We conclude that
iii) The smallest power of 2 is featured by @ m f =@ 2
m
kD f k1 < 1 if, and only if, 0 3=2 + m:
Thus, Assumption A.6 is satis…ed if
0
3=2 + m. For instance, for m = 2; we get
24
0
7=2.
B.7.2 Existence of moments
For expository purpose, let us assume that the actively traded derivatives at date t0 have times-tomaturity hj = 1 (and moneyness strikes kj ), for j = 1; :::; n; and that we are interested in estimating
the price of the derivative with time-to-maturity h = 1 and moneyness strike k. The moment
function g2 (y; ), is given by:
0
rf;t+1
g2 (yt ; ) = e
e
1
2
t+1
2
B
B
B
~t+1 B
4r
B
B
B
@ (ert+1
(ert+1
2
t
3
0
0
1
erf;t+1
ert+1
+
rt+1
(e
k1 )
..
.
B
B
B
B
B
B
B
@ ct0 (kn ; 1)
0
C
C
C
C
C
C
C
A
+
kn )
+
k)
1
1
ct0 (k1 ; 1)
..
.
1
C
C
C
C
C;
C
C
A
where yt = r~t+1 ; 2t+1 ; 2t and rt+1 = r~t+1 +rf;t+1 . The relevant variables are Yt = r~t+1 ; 2t+1 ; 2t ;
and Xt = r~t ; 2t , respectively. Note that function g2 does not depend on r~t and thus we have
dropped this variable from Yt . The following Lemma B.10 provides a condition for E0 kg2 (Yt ; 0 ) k4 < 1
(see Assumption A.11).
Lemma B.10: The function g2 (:;
0
2
=
( 1;
0
2; 3; 4)
We have:
h
E0 e 4 1
4
2
2
t+1
4
3
4
0) k
is such that E0 kg2 (Yt ;
3
2
t
4
2
t
4
~t+1
4r
>
3
~t+1
4r
i
i
1=4c0 ,
<1
2
h
E0 e
,
h
= E0 e
h
= E0 e
= e
4
1
= e
4
1
>
4
4
1
2
t+1
2
4
1
4(
2
2 + 0 4 ) t+1
4
1
4(
2+ 0 4
h
E0 e
b0 (4(
4
3
2
t
2
4
2
2
2
4
4
)) E
1 + 4c0
h
E0 e [4
3 +a0
0 ) =c0 ]
(4(
2
t
2+ 0 4
if:
1+
+
2
0 4
2
4
2
t
3
4(
E0 e
i
2
4
3
0
e [4
4
4
)
2
t+1
i
< 1.
t+1 "t+1
j
(B.36)
2
t+1 ;
2
t
2
4
2
t
t
2+ 0 4
h
1)~
rt+1
4
2
2
4
3 +a0
(4(
j
2
t
i
2+ 0 4
2
> 0:
( 0 ), we have:
2
2
4
))]
c0
1
2
2
t
3
2
t+1
)
4(
E0 e
2+ 0 4
4
if:
Moreover, since [(1
< 1 if, and only if:
1 1
+
2
0 + 4c0 3
4 4)
.
0 4 + 2 4 + (2 + 0
4c0 1 + 4c0 3
h
i
4
er , for any r; s 2 R, condition E0 kg2 (Yt ; )k < 1 is satis…ed if, and
2 R4 j
+
Proof: Since (er s)
only if:
h
2
E0 e 4 1 4 2 t+1 4
0)
4
2
t
i
=
3
+ a0 4
1
1+
c0
1
0
25
0
2
+
4
3
0 4
+ a0 4
2
2
4
2
+
> 0:
0 4
2
2
4
0
;
))]
i
i
;
We deduce that conditions (B.36) are satis…ed if, and only if:
1 + 4c0
c0
1+
4
1
1+
c0
1
0
4
+ a0 4
3
+ a0 4
2
+
+
0 4
1)
2(
2
2
4
2
0 4
> 0;
2
4
2
> 0;
0
1 + 4c0
h
3
+
2
2
+
0
0
(
(
4
1)
4
2(
2
1)
4
2
1)
4
> 0;
i
> 0:
(B.37)
2
Since 2 + 0 ( 4 1) 2 ( 4 1) = 2 + 0 4 2 24 +(4 4
2) ; and function a0 is increasing, we
0
can distinguish between two parameter regions to solve system (B.37). i) First case: 4 4
0,
0 2
that is, 4 (2 + 0 )=4. In this region system (B.37) is equivalent to:
1 + 4c0
c0
1+
4
1
3
+
2
+ a0 4
+
2
2
0 4
2
0 4
2
4
> 0;
2
4
> 0:
0
Let us introduce the new variable x = 4c0
4c0
1+
1
3
2
+
0
+
2
0 4
2
4
. Then, this system becomes:
x >
1;
x
> 0:
0 1+x
0
1
By transforming the second equation, we get:
x >
(1 + 4c0 3 ) x >
There is no solution for which 1 + 4c0
1
x<
3
1;
(1
0
+ 4c0 3 ) .
< 0. Indeed, the second equation becomes:
+ 4c0
1 + 4c0 3
3
0
=
1+
0
1 + 4c0
<
1;
3
which is incompatible with the …rst equation. Instead, for 1 + 4c0 3 > 0, the second equation
becomes:
1
0 + 4c0 3
0
x>
= 1+
;
1 + 4c0 3
1 + 4c0 3
and implies the …rst equation. To summarize, a …rst region of solutions is:
(2 +
4
0 )=4
,
1 + 4c0
3
>0 , x>
1
0 + 4c0
1 + 4c0 3
3
;
that is,
4
(2 +
ii) Second case: 4
to:
0 )=4
4
0
,
3
>
1=4c0
2 < 0, that is,
4
1 + 4c0
1+
c0
1
0
h
4
3
+ a0 4
,
< (2 +
2
2
2
+
+
0
0
(
26
>
1 1
0 + 4c0
4c0 1 + 4c0 3
3
0 4
+ 2 24 .
0 )=4.
In this region, system (B.37) is equivalent
(
1)
4
4
1)
2(
2(
2
1)
4
2
4
1)
> 0;
i
> 0:
By introducing the new variable y = 4c0 2 + 0 ( 4 1)
argument as above, we get the second region of solutions:
4
< (2 +
,
3
0 )=4
,
1 + 4c0
3
2
2(
1)
4
1
>0 , y>
, and repeating the same
0 + 4c0
1 + 4c0 3
3
;
that is,
4
< (2 +
0 )=4
>
1=4c0
,
2
1 1
0 + 4c0
4c0 1 + 4c0 3
>
3
0 4
2
4
+2
+2+
4 4.
0
From Lemma B.10, the condition E0 kg2 (Yt ; 0 ) k4 < 1 is satis…ed, whenever the risk premia
parameters 02 and 03 for stochastic volatility are above some thresholds. In particular, the lower
bound for 02 depends on 03 and 04 . Imposing the no-arbitrage restriction 04 = 0 + 1=2, the
inequality constraints become:
0
3
>
0
2
1=4c0 ,
1 1
0 + 4c0
4c0 1 + 4c0 03
>
0
3
+
0 =2
+
0)
+ 3(
+ 1=4:
These constraints are satis…ed for the parameter values used in Section 3.4 iii).
B.8 ARG risk-neutral dynamics
In this section, we derive the dynamics of the ARG stochastic volatility model under the risk-neutral
0
0 2
0 2
0
~t+1 . In Secdistribution Q de…ned by the sdf Mt;t+1 ( 0 ) = e rf;t+1 exp
4r
3 t
2 t+1
1
tion B.7.1, we derived the historical conditional moment generating function of Xt+1 = r~t+1 ; 2t+1 :
E0 exp
u~
rt+1
v
2
t+1
jxt = exp
a0
0u
1 2
u
2
+v
2
t
b0
0u
1 2
u
2
+v
Let us compute the risk-neutral conditional moment generating function of r~t+1 ;
E0Q exp
u~
rt+1
v
2
t+1
jxt
= E0 Mt;t+1 ( 0 ) exp
= e
=
0
1
0
3
0
where
0
2
u+
=
0
2
0
4
+
+ v+
2
0 =2
0
2
rf;t+1
E0 exp
a0
exp
b0
by using E0 [Mt;t+1 ( 0 )jxt ] = e
2
t
0
u~
rt+1
u+
0
0
4
u+
0
4
0
4
u+
r~t+1
+ v+
+ v+
0
2
2
t+1
0 2
4
= u
0
4
0
=
1
u+v
2
0
2 );
0
3
+v
jxt =E0 [Mt;t+1 ( 0 )jxt ]
0
2
v+
0
2
1
u+
2
1 2
u +
2
1=8, and:
=
b0 (
27
=
a0 (
0
2 ):
1 2
u +
2
0
2;
(B.38)
. We have:
2
t+1
1
u+
2
0 2
4
0 2
4
0
1
jxt
+
;
and (B.38). From equations (3.9) we have:
1
u+
2
0
1
2
t+1
v
:
0
4 0
+
0
2
( 04 )2
2
0
3
2
t
Thus, we get:
E0Q exp
u~
rt+1
2
t+1
v
jxt
=
exp
1 2
u
2
1
u+v
2
a0
2
t
b0
1
u+v
2
1 2
u
2
;
(B.39)
where:
0
2)
a0 (u) = a0 (u +
0
2)
b0 (u) = b0 (u +
a0 (
0
2)
b0 (
2)
=
0u
;
1 + c0 u
=
0
log(1 + c0 u);
with:
0
=
0
=
c0
=
0
0
=
0 2
2
1 + c0
0
2
1 + c0
+
2 =2
0
1=8
2;
0;
c0
1 + c0
0
2
=
0
2
1 + c0
c0
+ 20 =2
1=8
:
By comparing (B.38) and (B.39), we deduce that, under the risk neutral distribution, the returns
follow a stochastic volatility model with risk premium parameter 0 = 21 and ARG stochastic
volatility with parameters 0 ; 0 ; c0 :
B.9 Proof of Lemma A.4
We have to show that:
P0Q
2
t+1
+
+
2
t+h
zj
2
t+h
2
t
= s;
2
0
=
is increasing w.r.t. s; for any z:
This condition is implied by:
P0Q
2
t+1
+
+
2
t+h 1
zj
2
t+h
2
t
= s;
2
0
=
is increasing w.r.t. s; for any z:
(B.40)
Since the ARG process is time-reversible, condition (B.40) is equivalent to:
P0Q
2
t+1
+
+
2
t+h 1
zj
2
t
2
t+h
= s;
2
0
=
is increasing w.r.t. s; for any z:
To show (B.41) we use the stochastic representation of Markov process (
2
t+1
= g(
(B.41)
2
t ):
2
t ; ut+1 );
(B.42)
where the innovation ut+1 is independent of 2t . By l-fold compounding of function g w.r.t. the …rst
argument, we have 2t+l = gl ( 2t ; ut+1 ;
; ut+l ), say, and 2t+1 + + 2t+h 1 = G( 2t ; ut+1 ;
; ut+h 1 ),
2
2
where G( t ; ut+1 ;
; ut+h 1 ) = g( t ; ut+1 ) +
+ gh 1 ( 2t ; ut+1 ;
; ut+h 1 ). Condition (B.41)
becomes:
P0Q G(s; ut+1 ;
; ut+h
1)
zj
2
t+h
=
2
0
is increasing w.r.t. s; for any z:
This condition is satis…ed if function G is increasing w.r.t. the …rst argument, that is, if the function
g in the stochastic representation (B.42) is increasing w.r.t. the …rst argument. The latter condition
is equivalent to 2t+1 being stochastically increasing in 2t under Q.
Finally, let us show that 2t+1 is stochastically increasing in 2t under Q for the ARG process.
This follows from the gamma-Poisson mixture representation of the ARG process:
2
t+1 =c0
j
t+1
(
0
+
t+1 );
2
t+1 j t
P(
2
0 t =c0 );
where and P denote gamma and Poisson distributions, respectively. Then 2t+1 is stochastically
increasing in t+1 , and t+1 is stochastically increasing in 2t . The conclusion follows.
28
B.10 Calibration of the parametric stochastic volatility model
In this Section we describe the computation by Fourier transform methods of the option prices in
the parametric stochastic volatility model of Section 2.6 i) in the paper. These Fourier transform
methods are used for the cross-sectional calibration of the model parameters. The risk-neutral
distribution Q is given in equations (2.15)-(2.16). The option price is such that:
h
ct (h; k) = B(t; t + h)E Q (exp Rt;h
+
k) j
2
t
i
~ t;h
exp R
= EQ
k~
+
j
2
t
;
~ t;h = r~t+1 +
where R
+ r~t+h is the cumulated excess return of the underlying asset between t and
t + h and k~ = B(t; t + h)k is the discounted moneyness of the option. Let us introduce the variable
~ and de…ne the function:
s := log(k)
(s) = e s E Q
~ t;h
exp R
es
+
j
2
t
, s 2 R;
for a given > 0 (for expository purpose we omit the dependence of function on time-to-maturity
h and current volatility value 2t ). Following Carr and Madan (1999), the Fourier transform of is
(see below):
Z 1
(iu
1)
^ (u) =
; u 2 R,
(B.43)
e ius (s) ds = 2
2
+
u
iu (2 + 1)
1
where
For the ARG model, function
h
(z) = E Q exp
~ t;h j
zR
2
t
is given by (see below):
(z) = exp
Ah
2
t
i
:
Bh ;
(B.44)
where Ah = Ah (z) and Bh = Bh (z) are de…ned recursively by:
Ah
Bh
w = z (1 + z) =2, a0 (u) =
option price:
= a0 (w + Ah 1 ) , A1 = a0 (w) ;
= Bh 1 + b0 (w + Ah 1 ) , B1 = b0 (w) ;
0u
1+c0 u ;
b0 (u) =
ct (h; k) =
0
log(1 + c0 u). By inverse Fourier transform, we get the
e
2
s
Z
1
eius ^ (u)du;
1
where s = log(B(t; t + h)k) in the RHS. Since function (s) is real valued, we have ^ ( u) = ^ (u).
It follows
Z 1
e s
ct (h; k) =
Re
eius ^ (u)du:
(B.45)
0
To compute the integral (B.45), we introduce a …nite upper integration boundary > 0 and we
discretize the resulting integral over [0; ]. More precisely, let > 0 be such that ^ (u) is small for
u > . De…ne the grid uk = ( =N ) (k 1), for k = 1; :::; N , where N 2 N is the number of grid
points. Then we have:
Z
e s
Re
eius ^ (u)du
ct (h; k) '
0
'
e
s
Re
N
1 X i s (k
e N
N
k=1
29
1) ^
k;
where ^ k := ^ (uk ).
To summarize, the algorithm to compute ct (h; k) is as follows:
1. Compute the coe¢ cients
Ah 2t
wk
1 exp
2
^ :=
k
Bh
, k = 1; 2; :::; N;
where
Ah
Bh
wk =
1
2
2
u2k
+
= a0 (wk + Ah 1 ) , A1 = a0 (wk ) ;
= Bh 1 + b0 (wk + Ah 1 ) , B1 = b0 (wk ) ;
iuk (2 + 1) , uk = ( =N ) (k
1) :
2. Compute the inverse Fourier transform of the coe¢ cients
e
ct (h; k) =
s
Re
N
1 X i s (k
e N
N
1) ^
k:
k=1
Proof of Equation (B.43): We have
Z 1
~
^ (u) =
e (iu )s EtQ eRt;h
1
Z 1
~
= EtQ
e (iu )s eRt;h
1
"
Z ~
=
Rt;h
~ t;h
R
EtQ
e
e
(iu
)s
es
1
=
where EtQ [:] = E Q [:j
e
iu
(iu
+
u2
iu (2 + 1)
+
ds
Z
ds
~ t;h
1)R
1
2
ds
es
1
= EtQ
+
+
~ t;h
R
e
(iu
1
1
iu
h
EtQ e
1)s
1
(iu
e
i
ds
~ t;h
1)R
(iu
~ t;h
1)R
#
;
2
t ]:
Proof of Equation (B.44): Under the risk-neutral distribution Q we have r~t = 12
where "t IIN (0; 1) and 2t follows an ARG process independent of ("t ) with parameters
Thus:
h
i
z 2
(z) = EtQ exp
z
(
"
+
:::
+
"
)
t+1 t+1
t+h t+h
2 t;t+h
1
= EtQ exp
z + z 2 2t;t+h ;
2
2
t
+
t "t ,
0 ; 0 ; c0 .
where 2t;t+h := 2t+1 + ::: + 2t+h : From standard results for a¢ ne processes in discrete time [e.g.,
Darolles, Gouriéroux, Jasiak (2006)], equation (B.44) follows.
30
References
[1] Andrews, D. (1991): An Empirical Process Central Limit Theorem for Dependent Nonidentically Distributed Random Variables, Journal of Multivariate Analysis, 38, 187–203.
[2] Billingsley, P. (1965): Ergodic Theory and Information. New York: John Wiley.
[3] Billingsley, P. (1968): Convergence of Probability Measures. New York: John Wiley.
[4] Bosq, D. (1998): Nonparametric Statistics for Stochastic Processes: Estimation and Prediction.
New York, Springer.
[5] Bosq, D., and P. Lecoutre (1987): Theorie de l’estimation fonctionnelle. Paris: Economica.
[6] Carr, P., and D. Madan (1999): Option Pricing and the Fast Fourier Transform, Journal of
Computational Finance, 2, 61-73.
[7] Darolles, S., C. Gouriéroux, and J. Jasiak (2006): Structural Laplace Transform and Compound
Autoregressive Models, Journal of Time Series Analysis, 27, 477-504.
[8] Gouriéroux, C., and J. Jasiak (2006): Autoregressive Gamma Process, Journal of Forecasting,
25, 129-152.
[9] Herrndorf, N. (1984): A Functional Central Limit Theorem for Weakly Dependent Sequences
of Random Variables, Annals of Probability, 12, 141-153.
[10] Pollard, D. (1990): Empirical Processes: Theory and Applications. CBMS Conference Series in
Probability and Statistics, Vol. 2. Hayward, CA: Institute of Mathematical Statistics.
[11] Potscher, B., and I. Prucha (1989): A Uniform Law of Large Numbers for Dependent and
Heterogeneous Data Processes, Econometrica, 57, 675-683.
[12] Stock, J., and J. Wright (2000): GMM with Weak Identi…cation, Econometrica, 68, 1055-1096.
[13] Tenreiro, C. (1995): Estimation fonctionnelle: Applications aux tests d’adéquation et de
paramètre constant, Ph.D. Thesis, University of Paris VI.
31