E¢ cient Derivative Pricing by the Extended Method of Moments, Supplementary material: APPENDIX B In this Appendix we provide the proofs of theoretical results and technical Lemmas that have been omitted in the paper. We …rst give in Section B.1 the proof of Lemma A.1. Then, in Section B.2 we give a detailed proof of consistency of the XMM estimator (see Appendix A.1.3 in the paper). In Sections B.3-B.6 we prove Lemma A.2, Corollary 6, Lemma A.3 and Corollary 8, respectively. In Section B.7 we discuss regularity conditions for XMM estimation when the DGP is the stochastic volatility model of Section 3.2 of the paper. In Section B.8 we derive the risk-neutral distribution in the stochastic volatility model. In Section B.9 we prove Lemma A.4. Finally, in Section B.10 we provide the Fourier transform methods used for option pricing and cross-sectional calibration in the stochastic volatility model. We use the following notation. We denote by K1 and K2 the dimensions 0 0 0 0 0 of functions g1 and g2 , respectively. Further, g~2 denotes function g~2 = (g2 ; 1) = g2 ; a ; 1 . B.1 Proof of Lemma A.1 B.1.1 Conditions for weak convergence of the kernel empirical process The process T( ); 2 ; can be written as: 0 p b [g1 ( )] E [g1 ( )] T E q @ ( ) = T b [g ( )jx0 ] E [g ( )jx0 ] T hdT E 2 2 0 where g2 denotes function g2 = g2 ; a let us de…ne (see Assumption A.12): 0 0 1 A; 2 ; . Let us rewrite the second component of (B.1) T( ): For 2 , ' ( ) := ' (x0 ; ) = E [g2 ( )jx0 ] f (x0 ); and the corresponding kernel estimator: ' b( )= We have: q b [g ( )jx0 ] T hdT E 2 T 1 X g (yt ; )K T hdT t=1 2 E [g2 ( )jx0 ] q = T hdT = This can be rewritten as: q b [g2 ( )jx0 ] T hdT E = 1 f (x0 ) q T hdT (b '( ) xt ' b( ) b f (x0 ) x0 : hT '( ) f (x0 ) ! q 1 T hdT (b ' ( ) ' ( )) f (x0 ) q 1 b [g ( )jx0 ] T hdT E 2 f (x0 ) q T hdT fb(x0 ) E [g2 ( )jx0 ] 1 fb(x0 ) f (x0 ) f (x0 ) q E [g2 ( )jx0 ] T hdT fb(x0 ) f (x0 ) : f (x0 ) E [g2 ( )jx0 ] ' ( )) E [g2 ( )jx0 ] f (x0 ) 1 fb(x0 ) f (x0 ) f (x0 ) : 1+ (B.2) Under Assumptions A.5-A.9, we have [see Bosq (1998), Theorem 2.3]: fb(x0 ) From (B.1)-(B.3) we get: T( where process T ) = H0 ( ) f (x0 ) = op (1): T ( ) (1 + op (1)) ; ( ) is de…ned by: 0 p T matrix H0 ( ) is given by: H0 ( ) = b [g1 ( )] E [g1 ( )] T E q T hdT (b ' ( ) ' ( )) q T hdT fb(x0 ) f (x0 ) B B ( )=B @ IdK1 0 (B.3) 0 1 f (x0 ) IdK2 +L 1 f (x0 ) E 2 1 C C C; A 0 [g2 ( )jx0 ] ; (B.4) 2 ; (B.5) ; 2 ; K1 := dim(g1 ), K2 := dim(g2 ) and the op (1) term is uniform in 2 . The following Lemma shows that process T ( ) is asymptotically equivalent to a zero-mean empirical process plus a bias function. Lemma B.1: Under Assumptions A.4, A.6, A.8, A.9 and A.12: 0 q 1 q m lim T hd+2m T hdT (E [b ' ( )] ' ( )) T '(x0 ; ) @ q A= h i wm m f (x0 ) d m! b T hT E f (x0 ) f (x0 ) +o(1); uniformly in 2 (B.6) Proof: From a standard bias expansion and Assumption A.8, we have Z X x0 1 E '(X; )K E [b ' ( )] ' ( ) = ' ( ) = [' (x0 + hT u; ) hT hdT Z X hm T r ' (x0 + hT u ~; ) u K(u)du; = m! '(x0 ; )] K(u)du :j j=m where u ~ is an intermediary point (depending on u). Since r ' is uniformly continuous on X (AsR sumption A.12), and isR compact (Assumption A.4), we have that r ' (x0 + hT u ~; ) u K(u)du uniformly in 2 , for any 2 Nd with j j = m. A similar converges to r ' (x0 ; h) u K(u)du, i argument applies for E fb(x0 ) f (x0 ). Since K is a product kernel of order m (Assumption A.8), the conclusion follows. From (B.4)-(B.6) we deduce: T( uniformly in 2 ) = [H0 ( ) T ( ) + b( ) + o(1)] (1 + op (1)) ; , where the empirical process T ( ) is de…ned as: 0 p 1 b [g1 ( )] E [g1 ( )] T E C B q C B T hdT (b ' ( ) E [b ' ( )]) C ; T ( )=B @ q h i A T hdT fb(x0 ) E fb(x0 ) 2 2 2 ; : (B.7) : Lemma A.1 follows if the empirical process T T ( ) converges weakly: ( ) =) ( ), 2 ; (B.8) where and: ( ) is a Gaussian process on with covariance operator: 0 S0 ( ; ) 0 0 0 B 2 2 0 w f (x0 ) E g2 ( )g2 ( ) jx0 w f (x0 ) E (g2 ( )jx0 ) B 0( ; )=@ 0 0 w2 f (x0 ) E g2 ( ) jx0 w2 f (x0 ) 1 X S0 ( ; ) = Cov [g1 (Xt ; Yt ; ) ; g1 (Xt k ; Yt k ; 1 C C; A ; 2 ; )] : k= 1 To prove the weak convergence (B.8) of empirical process T( )=T 1=2 T X (vt;T ( ) T ( ), let us note that: E [vt;T ( )]) ; 2 t=1 ; where 0 vt;T ( ) = g1 (Xt ; Yt ; ) ; hT 0 d=2 0 0 ge2 (Yt ; ) K 0 Xt x0 hT ; and ge2 denotes function ge2 = g2 ; 1 . From Theorem 10.2 of Pollard (1990), the weak convergence of T ( ) to Gaussian process ( ) over Rp compact is implied by the conditions i) and ii) of Proposition B.2 below. Proposition B.2: The following conditions are satis…ed: 0 0 i) Under Assumptions A.1, A.5-A.15, for any 1 ; :::; n 2 Rp , n 2 N, the vector T ( 1 ) ; :::; T ( n ) is asymptotically normally distributed with mean zero, and asymptotic variance-covariance matrix such that: AsCov ( T ( i ); T ( j )) = 0 ( i ; j ) , i; j = 1; :::; n: ii) Under Assumptions A.4, A.5, A.8-A.9 and A.16-A.18, the empirical process cally equicontinuous, that is, 8"; > 0 9 > 0 : " # lim sup P T !1 where d(:; :) is a metric on sup ; 2 :d( ; )< k T( ) T( )k > T( ) is stochasti- "; ; and P denotes the outer probability. These conditions imply the weak convergence of empirical process T (and, thus, the weak convergence of T ). Conditions i) and ii) of the previous proposition are veri…ed below in Section B.1.2 and B.1.3, respectively. B.1.2 Finite dimensional convergence To prove condition i) of Proposition B.2, we use Cramer-Wold device, and follow an approach similar to Bosq (1998), Theorem 2.3, 3.4, and Tenreiro (1995), Theorem 1.3.10. Let 3 = 0 1 ; :::; 0 n 0 2 0 Rn(K1 +K2 +L+1) , and de…ne the zero-mean triangular array: Zt;T = n X 1 p T i=1 0 i (vt;T ( i ) E [vt;T ( i )]) , t T; T 1: Then, we can write: 0 T ( 1 ) ; :::; T( 0 n) = T X Zt;T : t=1 Let m = mT and q = qT be sequences of integer numbers such that: mT = O (T a ) , qT = O(T b ); 0 < b < a < 1, and let us de…ne k = kT = bT =(mT + qT )c: In particular kT = O T 1 a . Let us divide the sample into 2k+1 blocks, whose length is equal to m for blocks 1; 3; :::; 2k 1, equal to q for blocks 2; 4; :::; 2k; and equal to T k(m + q) for the last block. More speci…cally, de…ne: 0 Y1;T = Z1;T + ::: + Zm;T , Y2;T = Zm+q+1;T + ::: + Z2m+q;T , Y2;T = Z2m+q+1;T + ::: + Z2m+2q;T ; :::: 0 = Z(k 1)(m+q)+1;T + ::: + Zkm+(k 1)q;T , Yk;T = Zkm+(k 1)q+1;T + ::: + Zkm+kq;T : Yk;T Y1;T = Zm+1;T + ::: + Zm+q;T ; 0 Thus, we can write: 0 T ( 1) ; :::; 0 T ( n) = k X Yl;T + k X 0 Yl;T + YT00 ; (B.9) l=1 l=1 where YT00 = Zk(m+q)+1;T + ::: + ZT;T . Then, we will prove that the …rst term in the decomposition is asymptotically normal, and that the last two terms are negligible. a) The …rst term is asymptotically normal Lemma B.3: Under Assumptions A.1, A.5-A.9, A.11-A.14, there exist i.i.d. random variables Pk Pk d Yl;T ; l = 1; :::; k; such that Yl;T = Yl;T , l = 1; :::; k; and l=1 Yl;T l=1 Yl;T = op (1): h i1=2 2 Proof: Let cT := E (Y1;T ) and 0 < T < cT . From Bradley’s Lemma [e.g. Bosq (1998), d Lemma 1.2], there exist i.i.d. random variables Yl;T ; l = 1; :::; k; such that Yl;T = Yl;T , l = 1; :::; k; and: 2=5 4=5 P Yl;T Yl;T > T 11 (cT = T ) (qT ) ; l = 1; :::; k; where (:) are the mixing coe¢ cients of process 0 0 0 Xt ; Yt : It will be proved below (see Lemma 1=2 B.4) that cT = O (m=T ) . Let " > 0 be given and let have: 2=5 1=5 P Yl;T Yl;T > "=kT = O kT (m=T ) 1=2 T := "=kT = o (m=T ) 4=5 (qT ) ; . Thus, we l = 1; :::; k: We deduce: P k X l=1 Yl;T k X l=1 ! Yl;T > " P k X Yl;T Yl;T > " l=1 7=5 1=5 = O kT (m=T ) 4 ! 4=5 (qT ) k X l=1 : P Yl;T Yl;T > "=kT 7=5 1=5 4=5 Since (:) has geometric decay by Assumption A.5, O kT (m=T ) is concluded. (qT ) = o(1): The proof Thus, we have: k X Yl;T = l=1 The asymptotic normality of Pk l=1 k X Yl;T + op (1). (B.10) l=1 Yl;T : k X Yl;T l=1 d ! N 0; 2 , where 2 > 0 is given below, is proved by using Liapunov CLT [Billingsley (1965)]. For this purpose we show that: k k h i h i X X 2 3 E Yl;T ! 2; E Yl;T ! 0: l=1 l=1 These two conditions are veri…ed below in Lemma B.4 and Lemma B.5, respectively. Lemma B.4: Under Assumptions A.1, A.5-A.9 and A.11-A.14, we have: k X l=1 where ij = E h = ( ij ) is the matrix with blocks: P1 k= 1 Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt 2 Yl;T i Proof: We have: l=1 E h Yl;T 2 i ; k ; Yt k ; j )) 0 k X 0 ! h 0 w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ; # n X m X 1 0 p i vt;T ( i ) = kV Y1;T = kV T i=1 t=1 # " n m m km X 0 1 X 1 X = Cov p vt;T ( i ) ; p vt;T ( j ) T i;j=1 i m t=1 m t=1 " Since km=T ! 1; it is su¢ cient to prove that: " # m m 1 X 1 X vt;T ( i ) ; p vt;T ( j ) ! T;ij := Cov p m t=1 m t=1 Let us write: T;ij = 11 T;ij 21 T;ij 5 12 T;ij 22 T;ij ; ij , j: i; j = 1; :::; n: 0 j) jXt = x0 i ! : where: m 11 T;ij = Cov m 12 T;ij 210 T;ij = = Cov m 22 T;ij 11 T;ij 11 T;ij ; m 1 X d=2 p ge2 (Yt ; i ) K h m t=1 T = Cov j) 1 X d=2 1 X p g1 (Xt ; Yt ; i ) ; p h ge2 (Yt ; m t=1 m t=1 T j) K ! Xt x0 hT m 1 X d=2 ge2 (Yt ; ;p h m t=1 T Xt x0 hT and derive the limit of each term for T ! 1. i) For ! m 1 X 1 X p g1 (Xt ; Yt ; i ) ; p g1 (Xt ; Yt ; m t=1 m t=1 j) K ; Xt x0 hT ! ; we have: = Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt ; Yt ; j )) + m X1 jlj m 1 l:jlj=1 Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ; j )) : From Assumption A.11 and Cauchy-Schwarz inequality, we get: r E kg1 (Xt ; Yt ; i )k < 1; for r > 2. Then, by Davidov inequality [Bosq (1998), Corollary 1.1] and Assumption A.5, we get: kCov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ; for some 0 < l =O r 1=r E kg1 (Xt ; Yt ; i )k E kg1 (Xt ; Yt ; r 1=r j )k ; < 1. Thus, the cross-autocovariances are summable, and: lim T !1 = Cov hT + m X1 d=2 1 + = m X1 jlj m l:jlj=1 1 X Cov (g1 (Xt ; Yt ; i ) ; g1 (Xt l ; Yt l ; j )) : l= 1 We have: ge2 (Yt ; i ) K l:jlj=1 0T;ij 11 T;ij 22 T;ij . ii) Let us now consider 22 T;ij j ))k Xt x0 hT Cov hT 1 jlj m d=2 ; hT d=2 ge2 (Yt ; i ) K ge2 (Yt ; j) K Xt x0 hT Xt x0 hT ; hT d=2 ge2 (Yt l ; j) K Xt l x0 hT lT;ij : (B.11) Leth us …rst consider the covariance term 0T;ij . The functions E [e g2 (Yt ; j ) jXt = :] f (:) and i 0 E ge2 (Yt ; i ) ge2 (Yt ; j ) jXt = : f (:) are Lebesgue-integrable. Indeed, by applying twice CauchySchwarz inequality, we get: Z h i 0 E g2 (Yt ; ) g2 (Yt ; ) jXt = x f (x)dx Z h i1=2 h i1=2 2 2 E kg2 (Yt ; )k jXt = x E kg2 (Yt ; )k jXt = x f (x)dx Z h i 2 E kg2 (Yt ; )k jXt = x f (x)dx 1=2 Z h i1=2 h i1=2 2 2 = E kg2 (Yt ; )k E kg2 (Yt ; )k < 1; 6 h i 2 E kg2 (Yt ; )k jXt = x f (x)dx 1=2 by Assumption A.11, and similarly: Z kE [g2 (Yt ; ) jXt = x]k f (x)dx h i1=2 2 E kg2 (Yt ; )k < 1: h i 0 Since the functions E [e g2 (Yt ; j ) jXt = :] f (:) and E ge2 (Yt ; i ) ge2 (Yt ; j ) jXt = : f (:) are Lebesgueintegrable and continuous at x = x0 (Assumption A.12-A.13), we can apply Bochner’s Lemma [e.g. Bosq, Lecoutre (1987), p.61] to deduce that: d " E hT d ge2 (Yt ; hT E ge2 (Yt ; i ) ge2 (Yt ; j) K 0 j) K Xt x0 hT Xt x0 hT = E [e g2 (Yt ; 2 # Then: = x0 ] f (x0 ) + o(1); h = w2 E ge2 (Yt ; i ) ge2 (Yt ; " = hT d E ge2 (Yt ; i ) ge2 (Yt ; 0T;ij j ) jXt 0 j) Xt x0 hT K h = w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ; 0 j) 2 # 0 j) i jXt = x0 f (x0 ) + o(1): + O hdT i jXt = x0 + o(1): Pm 1 Let us now consider the term l:jlj=1 1 jlj lT;ij in equation (B.11). By repeating the argument m used by Bosq (1998), proof of Theorem 2.3, in the case of density estimator, it is possible to prove that Assumptions A.5-A.9, A.11 and A.14 imply (see Section B.1.4 for the detailed derivation): m X1 1 l:jlj=1 jlj m lT;ij = o (1) . Finally, we conclude: lim T !1 22 T;ij h = w2 f (x0 ) E ge2 (Yt ; i ) ge2 (Yt ; 0 j) i jXt = x0 : 22 iii) Finally, let us consider 12 T;ij . By a similar argument as for T;ij ; the cross-terms are negligible. Thus, using Assumptions A.1, A.8, A.9, A.13 and Bochner’s Lemma, we get: 12 T;ij Xt x0 + o (1) hT 0 Xt x0 d=2 = hT E g1 (Xt ; Yt ; i ) ge2 (Yt ; j ) K + o(1) hT h i 0 d=2 = hT w2 f (x0 )E g1 (Xt ; Yt ; i ) ge2 (Yt ; j ) jXt = x0 + o(1) = Cov g1 (Xt ; Yt ; i ) ; hT d=2 ge2 (Yt ; j) K = o(1): The proof is concluded. Lemma B.5: Under the assumptions of Lemma B.4 and Assumption A.15, the Liapunov condition holds: k h i X 3 E Yl;T ! 0: l=1 7 Proof: By Cauchy-Schwarz we have: k X E l=1 Since kE h Y1;T 2 i h Yl;T 3 i = kE h 3 Y1;T h Since: kE Y1;T 4 i 1=4 we have to show: h Y1;T 4 i 1=2 4 i kE h Y1;T 2 i 1=2 : = o(1): 2 m 1 X (vtT ( i ) k i k @kE 4 p T t=1 i=1 4 m 1 X 4 p kE (vtT ( i ) T t=1 m 1 X kE 4 p (vtT ( i ) T t=1 Y1;T 0 n X 2 We have: 2 kE = O(1) by Lemma B.4, it is su¢ cient to prove: kE h i 4 E [vtT ( i )]) 3 5 = E [vtT ( i )]) 4 E [vtT ( i )]) 311=4 5A 3 5 = o(1); 8i = 1; :::; n: 2 m k 4 X VtT;i E T2 t=1 4 3 5 ; (B.12) 2 !4 3 m k 4 X kVtT;i k 5 ; E T2 t=1 (B.13) where VtT;i := vtT ( i ) E [vtT ( i )]. Moreover, 2 !4 3 m i X X h 4 E4 kVtT;i k 5 = E kVtT;i k t t=1 + X t1 6=t2 + h i 2 E kVt1 T;i k kVt2 T;i k (kVt1 T;i k + kVt2 T;i k) X t1 6=t2 6=t3 + X h i 2 E kVt1 T;i k kVt2 T;i k kVt3 T;i k t1 6=t2 6=t3 6=t4 E [kVt1 T;i k kVt2 T;i k kVt3 T;i k kVt4 T;i k] ; (B.14) where summations are over 1; :::; m. Let us now derive the orders of the di¤erent terms. Since: kVtT;i k kvtT ( i )k E [kvtT ( i )k] kvtT ( i )k + E [kvtT ( i )k] ; kg1 (Xt ; Yt ; i )k + hT d=2 ke g2 (Yt ; i )k K Xt x0 hT ; = O(1); and by Assumption A.11, the leading terms are either of order O(1) or the terms involving the d=2 highest power of hT ke g2 (Yt ; i )k jK (Xt x0 =hT )j. 8 i) We have: h 4 E kVtT;i k i = O hT 2d " 4 Xt x0 hT 4 E ke g2 (Yt ; i )k K 4 3 g2 (Yt ; i )k K = O hT 2d kKk1 E ke #! Xt x0 hT : From Assumption A.15: 4 hT d E ke g2 (Yt ; i )k K Xt x0 hT Thus, we get: = Z h i 4 E ke g2 (Yt ; i )k jXt = x0 + hT u f (x0 +hT u) jK(u)j du = O(1): h i 4 E kVtT;i k = O hT d : ii) We have: (B.15) h i 3 2 3 E kVt1 T;i k kVt2 T;i k = O hT 2d kKk1 E ke g2 (Yt1 ; i )k ke g2 (Yt2 ; i )k K Xt1 x0 hT K Xt2 x0 hT : From Assumption A.15: Xt2 x0 Xt1 x0 3 K hT 2d E ke g2 (Yt1 ; i )k ke g2 (Yt2 ; i )k K hT hT Z h i 3 = E ke g2 (Yt1 ; i )k ke g2 (Yt2 ; i )k jXt1 = x0 + hT u; Xt2 = x0 + hT v ft1 ;t2 (x0 + hT u; x0 + hT v) jK(u)K(v)j dudv = O(1): Thus, we get: iii) Similarly, we have: h i 3 E kVt1 T;i k kVt2 T;i k = O (1) : h i 2 2 E kVt1 T;i k kVt2 T;i k h i 2 E kVt1 T;i k kVt2 T;i k kVt3 T;i k E [kVt1 T;i k kVt2 T;i k kVt3 T;i k kVt4 T;i k] Therefore, from (B.13)-(B.17), we get: 2 m 1 X 4 p (vtT ( i ) E [vtT ( i )]) kE T t=1 = O(1); (B.16) t1 6= t2 ; = O (1) ; t1 6= t2 6= t3 , = O (1) ; t1 6= t2 6= t3 6= t4 . 4 3 5 = O = O (B.17) k mhT d + m4 T2 m3 T ; since km=T ! 1, m=T ! 0, T hdT ! 1. (B.12) follows if we choose m such that m ! 1 and m3 =T ! 0: From (B.10), Lemma B.4 and Lemma B.5 we conclude that: k X l=1 Yl;T d ! N 0; 9 0 : b) The last two terms of the decomposition (B.9) are negligible Lemma B.6: Under the assumptions of Lemma B.4, k X 0 YT00 = op (1): Yl;T = op (1); l=1 Proof: The proof is similar to the proof of Theorem 1.3.10 in Tenreiro (1995), p. 14. i) We have: 3 2 l(m+q) l(m+q) n n X X X X 0 0 1 5 4 p (v ( ) E [v ( )]) Yl;T = Zt;T = t;T i t;T i i T t=lm+(l 1)q+1 i=1 i=1 t=lm+(l 1)q+1 0 i UlT;i : Thus: 2 E4 k X l=1 2 !2 31=2 n X =E4 Yl;T 5 0 0 i i=1 l=1 Therefore, it is su¢ cient to prove: 2 We have: 2 E4 and: h 0 k X 2 UlT;i l=1 kE UlT;i UlT;i i 3 E4 k X k X !!2 31=2 5 UlT;i 2 3 5 = o(1); UlT;i l=1 n X i=1 k ik E 4 k X 2 UlT;i l=1 31=2 5 8i: h 0 i kX1 5 = kE UlT;i UlT;i + (k : (B.18) 0 jsj)E UlT;i Ul jsj=1 " kq Tr V = kT r [V (UlT;i )] = T 2 !# q 1 X vt;T ( i ) p q t=1 From the proof of Lemma B.4, e T;ii = O(1). Since kq=T = o(1); we get: h 0 i kE UlT;i UlT;i = o(1): sT;i ; kq T r e T;ii : T Moreover, k X1 jsj=1 (k 0 jsj)E UlT;i Ul sT;i k k X1 0 E UlT;i Ul s;T;i jsj=1 1 0 2kq X E vt;T ( i ) vt T s=1 s;T 1 2kq X kCov (vt;T ( i ) ; vt T s=1 10 ( i) s;T 0 E (vt;T ( i )) E (vt ( i ))k : s;T ( i )) Using P1 the same argument as in the proof of Lemma B.4 (see also Section B.1.4), we can show that s=1 kCov (vt;T ( i ) ; vt s;T ( i ))k < 1: Thus, k X1 0 (k jsj)E UlT;i Ul jsj=1 = o(1): sT;i Then (B.18) follows. ii) We have: T X 00 YT = Zt;T = n X i=1 t=k(m+q)+1 and: E 2 4 1 i p T 0 00 T X (vt;T ( i ) t=k(m+q)+1 i=1 Therefore, it is su¢ cient to prove: h i 2 E kUT;i k = o(1); We have: h 0 i E UT;i UT;i = and the proof is concluded. T n X E [vt;T ( i )])5 0 i UT;i ; i=1 h i1=2 2 k i k E kUT;i k : n X 2 1=2 YT 3 2 k(m + q) 4 V p T T 8i: T 1 k(m + q) k(m+q) X 3 vt;T ( i )5 ! 0; t=1 B.1.3 Stochastic equicontinuity Let us now prove the stochastic equicontinuity of empirical process T ( ) [condition ii) in Proposition B.2] along the lines of Theorem 1 in Andrews (1991). Let us introduce the matrix-valued triangular array: ! Zt 0 Wt;T = ; t T , T 1; d=2 0 hT K XthTx0 IdK2 +L+1 where Zt denotes the instrument. We can write: vt;T ( ) = Wt;T (Yt ; ) ; 2 where 0 0 (y; ) = g (y; ) ; ge2 (y; ) ; 0 ; 2 : 2 Let j : j 2 N be the basis of L (FY ) introduced in Assumption A.16. Without loss of generality, we can set 1 (y) = 1. Thus, from Assumption A.16, there exist sequences cj ( ) : j 2 N , 2 , of vector coe¢ cients such that: (y; ) = 1 X cj ( ) j=1 11 j (y) ; y 2 Y; for any 2 , where lim sup J!1 2 T ( ) T ( ) = j=1 = 1 X T 1 X p Wt;T T t=1 T T X 1=2 where Xj;tT := Wt;T k T j (Yt ) E Wt;T ( )k T Xj;tT ! < 1: E Wt;T cj ( ) j ! (Yt ) cj ( ) cj ( ) cj ( ) ; (Yt ) ; and: 1 X 2 ( ) j (Yt ) j t=1 j=1 2 cj ( ) j j=J Thus, we have: 1 X 1 X 1 1=2 T j T X Xj;tT t=1 j=1 de…ned by: 0 1 X d( ; ) = @ cj ( ) 2 1 X 1 cj ( ) j j=1 cj ( ) 2 : (B.19) Let d (:; :) denote the metric on cj ( ) 2 j=1 For any ; > 0; we have: " lim sup P T !1 1 2 1 @ 2 " lim sup E 0 T !1 sup ; 2 : d( ; ) using (B.19). Since: 2 T X E 4 T 1=2 Xj;tT t=1 sup ; 2 : d( ; ) 2 3 k sup ; 2 : d( ; ) 1 X 1 j=1 j T ( ) k T T 0 2 5 = T r @E 4 T h cj ( ) 1=2 A T ( )k 2 1 0 + ; 2 : # 1 X jE T !1 j=1 Xj;tT t=1 = T r E Xj;tT Xj;tT 2 A lim sup T X i , # ( )k > ( ) cj ( ) 11=2 ! T X1 jkj=1 T 1=2 T X t=1 1 jkj T 2 4 T 1=2 T X 2 Xj;tT t=1 3 5; (B.20) !0 31 Xj;tT 5A h 0 T r E Xj;tT Xj;t k;T i (B.21) we can study the asymptotic behaviour of the di¤erent terms in the decomposition. Lemma B.7: Under Assumptions A.5, A.8-A.9 and A.17-A.18, ! h i V Zt j (Yt ) 0 i 0 h E Xj;tT Xj;tT = + uj;T ; 2 0 w2 E j (Yt ) jXt = x0 f (x0 )IdK2 +L+1 h i 0 Cov Zt j (Yt ) ; Zt k j (Yt k ) 0 E Xj;tT Xj;t k;T = + uj;kT ; k 6= 0; 0 0 12 ; h i 0 0 2 where V Zt j (Yt ) := E Zt Zt j (Yt ) E Zt j (Yt ) E Zt j (Yt ) , Cov Zt h i 0 0 E Zt Zt k j (Yt ) j (Yt k ) E Zt j (Yt ) E Zt k j (Yt k ) ; and: sup kuj;T k = o (1) , sup j Proof: i) We have: 0 h 0 h i E Zt Zt 0 @ E Xj;tT Xj;tT = 2 j (Yt ) i j E Zt T X k=1 (Yt ) ; Zt k j (Yt k) j (Yt ) 0 hT d V 0 h 0 j (Yt ) K Xt x0 hT Let us consider the lower right block. The term: Z Xt x0 d hT E j (Yt ) K = E j (Yt ) jXt = x0 + hT u f (x0 + hT u) K(u)du; hT i IdK2 +L+1 is bounded uniformly in j 2 N from Assumption A.17 and the Cauchy-Schwartz inequality. Moreover, from standard bias expansion and Assumption A.17: # Z " 2 Xt x0 2 2 d = 'j (x0 + hT u)K (u) du = w2 'j (x0 ) + O sup D2 'j 1 h2T : hT E j (Yt ) K hT j Thus: hT d V j uniformly in j 2 N. ii) We have: 22 kT;j = w2 E h 0 E Xj;tT Xj;t where: 11 kT;j Xt x0 hT (Yt ) K h 0 = E Zt Zt j k = hT d Cov j (Yt ) j kT (Yt i i k) j i 2 (Yt ) jXt = x0 f (x0 ) + o(1); 11 kT;j = 0 ; j j (Yt ; 22 kT;j 0 E Zt Xt x0 hT (Yt ) K h (Yt ) E Zt k) K k Xt j (Yt x0 k hT k) 0 ; IdK2 +L+1 : Let us consider 22 kT;j . We can use the same arguments as in the proof of Lemma B.4 to get bounds uniform in j 2 N from Assumptions A.5, A.8, A.9 and A.18. Thus, T X1 hT d Cov j Xt x0 hT (Yt ) K k=1 ; j (Yt Xt k) K x0 k hT = o(1); uniformly in j 2 N. The proof is concluded. From Davidov inequality, we have: Cov Zt j (Yt ) ; Zt k j (Yt k) const 13 := kuj;kT k = o (1) : (Yt ) E Zt j j k E Zt j (Yt ) r 2=r ; (B.22) 1 A: uniformly in j 2 N; for some 0 < < 1 and r > 2 as in Assumption A.16. Thus, from Lemma B.7 and equations (B.21), (B.22) we get: 3 2 2 T 1 X X 4 T 1=2 lim sup Xj;tT 5 jE T !1 j=1 1 X j j=1 t=1 n T r V Zt j (Yt ) for some constants C1 ; C2 < 1. Now: h 0 T r V Zt j (Yt ) = T r E Zt Zt Since: h 0 T r E Zt Zt and: T r E Zt j (Yt ) E Zt j j (Yt ) lim sup T !1 j=1 2 4 T jE 1=2 T X 2E t=1 2 Xj;tT 3 5 (Yt ) (Yt ) (Yt ) Thus, we get: 1 X 2 j 2 j we have: T r V Zt Zt + C1 E h 0 i (Yt ) j i =E C3 (Yt ) j 1 X j=1 + C2 E T r E Zt h = E Zt Zt r 2=r 2 Zt j i n j E j (Yt ) 2E Zt j 2 i E Zt 2 j (Yt ) jXt = x0 (Yt ) E Zt 2 (Yt ) h j j (Yt ) j h Zt j (Yt ) r 2=r (Yt ) r 2=r ; 2 : d( ; ) +E 2 i ; : h 2 j (Yt ) jXt = x0 j=1 j cj ( ) The conclusion follows from: sup : io = C4 < 1, ; 2 : d( ; ) lim 0 ; ; for some constants C3 ; C4 < 1 from Assumption A.16. We deduce from (B.20): 0 " # 1 X 1 1 @ lim sup P sup k T( ) C4 2 sup T ( )k > T !1 (Yt ) io !0 ; 2 : d( ; ) 1 X 1 j=1 j cj ( ) cj ( ) 2 cj ( ) 2 1 A: = 0; which is proved by Andrews (1991), Equation (2.6). B.1.4 Bounds on covariance terms Pm 1 In this section we derive a bound for the covariance term l:jlj=1 1 jlj lT;ij in equation (B.11). m This is done by deriving two bounds for the covariance terms. i) For this purpose, let us de…ne functions: = E [e g2 (Yt ; i ) jXt = x] f (x); h 0 e2 (Yt ; i ) ge2 (Yt l ; j ) jXt = x; Xt l;ij (x; ) = E g i (x) 14 l = i ft;t l (x; ) . We can write: Xt x0 Xt l x0 Cov ge2 (Yt ; i ) K ; ge2 (Yt l ; j ) K hT hT Z Z x x0 x0 = K dxd l;ij (x; ) K hT hT Z Z 0 x x0 x0 (x) K dx d i j( ) K hT hT Z Z = h2d l;ij (x0 + hT u; x0 + hT v) K (u) K (v) dudv T Z Z 0 (x + h u) K (u) du 0 T i j (x0 + hT v) K (v) dv : From Assumptions A.6-A.7 and A.14, and by the Cauchy-Schwartz inequality, function are bounded uniformly in l 2 N. We get: k lT;ij k sup l;ij 1 l 2 + k i k1 kKkL1 hdT j 1 C1 hdT . i and l;ij (B.23) ii) From the strong mixing property (Assumption A.5) and Davidov inequality: Cov ge2 (Yt ; i ) K l const for some 0 < E " E " Xt x0 hT ge2 (Yt ; i ) K ; ge2 (Yt l ; # r 1=r Xt x0 hT Xt j) K E " x0 l hT ge2 (Yt l ; Xt j) K l x0 hT # r 1=r ; < 1 and r > 2. Moreover we have: # r Xt x0 r r r ge2 (Yt ; i ) K kKk1 E ke g2 (Yt ; i )k =: kKk1 ci < 1; hT from Assumptions A.8 and A.11. We deduce: k C2 l hT d ; lT;ij k (B.24) for some constant C2 < 1 (that depends on i and j): d=2 Let us now de…ne LT = bhT c ! 1. From (B.23) and (B.24) we have: m X1 1 l:jlj=1 jlj m 2 lT;ij m X1 l=1 2 k LT X lT;ij k C1 hdT + l=1 = 2 C1 LT hdT + 1 X Pm 1 l:jlj=1 1 jlj m lT;ij = o(1). 15 C2 hT d l=LT +1 C2 1 const 1=LT + L2T We deduce l hT d LT +1 LT +1 ! 0: ! B.2 Proof of consistency In this Section we prove that P P h b T h " 0 b T i i " ! 0, as T ! 1; for any " > 0. We have: 0 P P " inf " 2 B:k 2 B:k QT ( ) 0 k " 0 k " inf QT ( ) QT b T # # QT ( 0 ) : (B.25) Let us derive the orders of the RHS term and the LHS term inside the probability. Write the criterion as: 0 QT ( ) = [ T ( ) + mT ( )] [ T ( ) + mT ( )] ; 2 B: (B.26) Then, since T ( 0) = Op (1) from Lemma A.1, and mT ( 0 ) = 0, we get: QT ( 0 ) = Op (1): Let us now derive the order of (B.27) inf QT ( ) : From Lemma A.1 and the Continuous k " Mapping Theorem [CMT, Billingsley (1968)], we have: B:k 2 sup T( 0 0 ) 2 sup 2 0 mT ( ) T( ) T( ) = Op (1); p = Op ( T ): B From (B.26) it follows that: 0 QT ( ) = mT ( ) uniformly in get: 2 0 mT ( ) T T hdT B. Now, let p mT ( ) + Op ( T ); > 0 be the smallest eigenvalue of (Assumption A.20). We mT ( ) 2 2 kE [g1 (Yt ; Xt ; )]k + hdT kE [g2 (Yt ; ) jXt = x0 ]k +hdT kE [a (Yt ; ) jXt = x0 ] 2 2 kE [g1 (Yt ; Xt ; )]k + kE [g2 (Yt ; ) jXt = x0 ]k + kE [a (Yt ; ) jXt = x0 ] k 2 k 2 ; for T large, and any 2 B. From continuity of moment functions (Assumption A.19), compactness of B (Assumption A.4) and global identi…cation (Assumption A.2), we have: 2 inf B:k 0 0 k mT ( ) " mT ( ) CT hdT ; for a constant C = C" > 0. From bandwidth Assumption A.9, we have get: 1 CT hdT ; inf QT ( ) 2 2 B:k 0k " p T = o T hdT . Thus, we (B.28) with probability approching 1. Since T hdT ! 1 from Assumption A.9, and by using (B.25), (B.27) and (B.28), the conclusion follows. 16 B.3 Proof of Lemma A.2 We use the following Lemma. Lemma B.8: Under Assumptions A.1-A.20 and A.24: b T q = Op 1= T hdT . 0 Proof: We follow the approach in the proof of Lemma A1 in Stock, Wright (2000). Since bT is the minimizer of QT we have: QT bT QT ( 0 ) = that is, h b T( T) 0 where d1;T = b 0 T( T) b 0 T( T) T ( 0) T ( 0 ). where: = and b T( T) mT (bT ) b + d1;T d3;T = d1;T = = mT (bT ) T ( 0) 0; 0; ; b T( T) 2d2;T mT (bT ) + d3;T Inequality (B.29) implies: 0 T ( 0) 2 mT (bT ) T( T) 2 i + mT (bT ) By using: 0 mT (bT ) mT (bT ) mT (bT ) b b T( T) mT (bT ) + 2mT (bT ) 0 mT (bT ) T( T) h 0 mT (bT ) we deduce: d2;T = i0 + mT (bT ) h 0 b d2;T + d22;T d3;T 0; b T( T) T( T) 1=2 ; (B.29) i 0 T ( 0) T ( 0) = : : Let us now derive the order of the RHS. From Lemma A.1 and CMT we have: d2;T sup k T( 2 jd3;T j 2 sup )k = = Op (1) ; 0 T( ) T( ) = = Op (1) : 2 We get mT (bT ) = Op (1) : De…ne: 0 0 G ( ) = E [g1 (Xt ; Yt ; )] ; E [g2 (Yt ; )jx0 ] ; E [a (Yt ; ) for 2 2 B: Since kmT ( )k 2 T hdT kG ( )k ; By the mean-value theorem we can write 1 : @G e b 0 ( T) T @ 2 ; q B, we deduce: G(bT ) = Op 1= T hdT q = Op 1= T hdT 0 0 jx0 ] ; 1 More precisely, the mean-value theorem is applied separately for any component of function G, and the intermediary point ~T can di¤er across components. 17 : where eT is between bT and 0 . Since bT converges to 0 @G=@ ( ) is continuous by Assumption A.24, we have: 0 by consistency (Section B.2), and @G e @G p ! 0 ( T) 0 ( 0 ), @ @ 0 where @G=@ ( 0 ) has full rank, by the local identi…cation condition in Assumption A.3. The conclusion follows. Let us now prove Lemma A.2. From Lemma B.8, it is enough to show that plimT !1 q d J0 , for any T such that T 0 = Op 1= T hT . We have: @b gT 0 @ T 0 B B RT = B @ Thus, we have to show: b E h @g1 0 h@ d=2 b @g2 hT E @ 0 h d=2 b @a hT E 0 @ i) ii) T T T b @g10 E @ @g 2 b E 0 @ iii) hT d=2 i R1;Z i jx0 R1;Z i jx0 R1;Z p b @g10 E @ d=2 b E h b E @g2 0 h@ @a b E @ 0 h @g1 0 @ T T T i R2;Z i jx0 R2;Z i jx0 R2;Z 0 0 IdL T RT = 1 C C C: A @g1 ; 0 ( 0) @ @g2 p !E ; 0 ( 0 ) jx0 @ !E T T hT @b gT 0 @ jx0 p T R2;Z ! 0 : Let us now prove these results. i) From Assumptions A.4, (1989), Corollary h i p A.5,hA.21 and i A.22, the ULLN [see Potscher, Prucha h i @g1 @g1 @g1 b 1] implies that E @ 0 ( ) ! E @ 0 ( ) uniformly in 2 . Moreover, E @ 0 ( ) is continuous w.r.t. by Assumption A.24. Then i) follows. ii) Let g2;i denote the i-th component of function g2 , for i = 1; ; K2 + L. We have: " # @g2;i @ 2 g2;i @g2;i b b b ( T )jx0 = E ( 0 )jx0 + E (•T )jx0 ( T E 0 ); @ @ @ @ 0 @g2;i p b and 0 . Under Assumptions A.5-A.9 and A.23 one can show that E ( 0 )jx0 ! @ " # @g2;i @ 2 g2;i b E ( 0 )jx0 and E (•T )jx0 = Op (1). Then ii) follows. @ @ @ 0 iii) Let g1;i denote the i-th component of function g1 ; i = 1; :::; K1 . We have: q 2 0 1 p b @g1;i 1 d=2 b @g1;i d b @ g1;i0 •T R2;Z ; hT E R2;Z = q TE T h E T T 0 0 0 ( 0 ) R2;Z + p T @ @ @ @ T hdT T hd where •T is between T T (B.30) where •T is between the …rst one: T p and 0: Let us derive the orders of the two terms in the RHS of (B.30). For T @g1;i 1 X @g1;i b p TE 0 ( 0 ) R2;Z = 0 (Yt ; Xt ; @ T t=1 @ 18 0 ) R2;Z ; h i 0 where E @g1;i =@ (Yt ; Xt ; 0 ) R2;Z = 0. From Assumptions A.5 and A.22, the CLT for mixing processes [e.g. Herrndorf (1984), Corollary 1] implies: p b @g1;i TE 0 ( 0 ) R2;Z = Op (1): @ Let us now consider the second term in (B.30). From Assumptions A.4, A.5, A.22 and A.24, the ULLN implies: 2 b @ g1;i0 •T = Op (1) : E @ @ Thus, from (B.30) we get: d=2 b @g1;i hT E @ T R2;Z 1 1 + p d A = op (1) , = Op @ q d T hT T hT 0 1 from bandwidth condition in Assumption A.9. The proof is concluded. B.4 Proof of Corollary 6 If the bandwidth is such that c = lim T h2m+d = 0, from (A.6) the optimal weighting matrix T 0 for given instrument is = V0 1 . The proof that Z = E @g @ (Y ; 0 ) jX W (X) is still an optimal instrument is similar to the proof of Proposition 3, replacing M (Z; c; a) with V (Z; a) = 0 1 1 w2 0 e, which is the asymptotic variance of ^ T . Thus, the bias-free kernel nonfX (x0 ) e J0;Z 0 J0;Z parametric e¢ ciency bound is B (a; x0 ) = block inversion formula. 0 w2 fX (x0 ) e J0 0 1 0 1 J0 e. Corollary 6 follows from the B.5 Proof of Lemma A.3 B.5.1 Asymptotic expansion of the concentrated objective function p Since the conditional moment restrictions are satis…ed asymptotically, we have bT ! 0, when T ! 1. Therefore, we can consider the second-order asymptotic expansion of function LcT ( ; ) in a neighbourhood of = 0 , = 0. Let us …rst derive the expansion w.r.t. . We have: b exp log E 0 g2 ( )jx0 b g2 ( )g2 ( )0 jx0 b (g2 ( )jx0 ) + 1 0 E E 2 0 0 1 b (g2 ( )jx0 ) + E Vb (g2 ( )jx0 ) : 2 ' log 1 + ' 0 Therefore, we can asymptotically concentrate w.r.t. ' Vb (g2 ( )jx0 ) : 1 b (g2 ( )jx0 ) ; E (B.31) and the asymptotic expansion of the concentrated objective function becomes: LcT ( ) ' T 0 1Xb E (g( )jxt ) Vb (g( )jxt ) T t=1 1 b (g( )jxt )+ 1 hd E b (g2 ( )jx0 ) Vb (g2 ( )jx0 ) E 2 T 0 19 1 b (g2 ( )jx0 ) : E Criterion LcT ( ) multiplied by T is asymptotically equivalent to the criterion of the kernel moment estimator (see De…nition 4) with optimal instrument and weighting matrix. Let us now consider the expansion around = 0 . We have: b (g( )jxt ) ' E b (g( 0 )jxt ) + E E @g ( 0 ( 0 ) j xt @ Vb (g( )jxt ) ' V (g( 0 ) j xt ) ; 0) ; and similarly for the expectations of function g2 . Thus, we get: LcT ( T 1X b E (gjxt ) + E T t=1 ) ' 1 + hdT 2 b (g2 jx0 ) + E E V (g j xt ) b (gjxt ) + E E 1 0 0) V (g2 j x0 ) @g ( 0 j xt @ 0) @g R1 0 j xt @ =E @g ( 0 j xt @ 1 1;0 0) @g2 ( 0 j x0 @ b (g2 jx0 ) + E E 1 0: B.5.2 Asymptotic expansion of bT E 0) @g2 ( 0 j x0 @ where functions g; g2 are evaluated at We have: 0 @g ( 0 j xt @ 0) ; : We get: LcT ( ) ' T 1X b E (gjxt ) + E T t=1 @g R1 0 jxt @ 0 1 V (gjxt ) 1;0 b (g2 jx0 ) + E @g20 jx0 R1 E 1 @ 1 b (g2 jx0 ) + E @g20 jx0 R1 E V (g2 jx0 ) @ 1 + hdT 2 1;0 1 1 @g R1 0 jxt @ b (gjxt ) + E E @g2 +E R2 2 0 jx0 @ @g2 R2 0 jx0 1;0 + E @ 0 2;0 2 2;0 : The asymptotic expansion of b1;T is obtained from the maximization of the …rst term in LcT ( ); since the contribution of the second term is asymptotically negligible. We get: ! " # 1 0 T p @g @g 1X 0 1 R E jxt V (gjxt ) E T b1;T ' R1 0 jxt 1;0 T t=1 1 @ @ ! Z 0 T @g 1 X 0 1 p R1 E jxt V (gjxt ) g(y; 0 )fb(yjxt )dy: @ T t=1 Thus: p T b1;T 1;0 ' 0 " R1 E E 0 @g jxt @ V (gjxt ) 1 E @g 0 jxt @ # R1 ! 1 ! 0 @g 1 jx V (gjx) g(y; 0 )fb(y; x)dxdy @ " ! # ! 1 0 0 @g @g 1 R1 E E jxt V (gjxt ) E R1 0 jxt @ @ ! 0 T 1 X 0 @g 1 p R1 E jxt V (gjxt ) g(yt ; 0 ): @ T t=1 p Z Z 0 T R1 E ' ! 20 1 1;0 The bias term induced by the kernel estimator is asymptotically negligible since T hd+2m = o(1). T The asymptotic expansion of b2;T can be deduced from the maximization of the second component of LcT ( ). Estimator b2;T converges at a nonparametric rate, and terms involving b1;T 1;0 can be neglected. We get: ! # 1 " 0 q 0 @g2 @g2 1 d T hT b2;T jx0 V (g2 jx0 ) E R2 ' R2 E 0 jx0 2;0 @ @ ! Z 0 q 0 @g2 1 T hdT g2 (y; 0 )fb(yjx0 )dy: jx0 V (g2 jx0 ) R2 E @ Then, point i) of Lemma A.3 is proved. B.5.3 Asymptotic expansion of bT We have from (B.31): Moreover, bT ' Vb g2 (bT )jx0 b g2 (bT )jx0 E Z 1 b g2 (bT )jx0 ' E V (g2 ( 0 )jx0 ) 1 b g2 (bT )jx0 : E @g2 bT 0 0 jx0 @ Z @g2 ' g2 (y; 0 )fb(yjx0 )dy + E R2 b2;T 0 jx0 2;0 @ (since the contribution of b1;T 1;0 is asymptotically negligible) Z ' (Id M ) g2 (y; 0 )fb(yjx0 )dy; ' g2 (y; b 0 )f (yjx0 )dy +E where M is the matrix in (A.19). Then: bT ' 1 V (g2 ( 0 )jx0 ) (Id M) Z g2 (y; b 0 )f (yjx0 )dy; and point ii) of Lemma A.3 is proved. B.6 Proof of Corollary 8 From Appendix A.1.4, equation (A.5), the asymptotic distribution of the optimal kernel moment estimator of 0 is such that: p T (^1;T where 0 1;0 ) ; q T hdT (^2;T 0 2;0 ) ; q 0 T hdT ^ T 0 = (J00 J0 ) 1 J00 g^T ( 0 ) + op (1); (B.32) = V0 1 , matrix V0 is given in (A.2), and: 0 1 E @g R1 0 0 @ B B 2 J0 = B 0 E @g R2 0 jx0 @ @ @a 0 E @ 0 jx0 R2 21 0 0 IdL 1 C C C =: A E @g1 0 @ 0 R1 0 J0 ! : 0 0) @g(Y ; @ For Z = E jX V (g(Y ; E @g1 0 @ 2 R10 E 1 0 )jX) , we have: @g 0 jX V (gjX) @ =E E 1 @g jX @ 0 E = V (g1 ) : Thus: (J00 J0 ) 1 =4 J00 We get: (J00 J0 ) 1 h @a 0 @ ij 0 , 0 11 0 1 0 0 0 21 0 21 0 G 0 @g 0 @ jX J0 0 0 2) = ( 1; +A = E 12 0 ^ [a E 0 jx0 ] 1 R1 R10 0 0 22 0 1 0 . Solving for 1 22 0 1 21 0 2 i @g @ 0 jX E 1 0 = G 0 0 1 J0 J0 0 1 0 3 5: 1 R1 ^ [g2 jx0 ] E ^ [g2 jx0 ] E ^ [a E 0 jx0 ] 22 0 +A +A 1 V (gjX) G A 1 1 0;11 G G 0 1 ^ 0;11 E J0 1 1 = 0, that is = 0; + 2 in the second block equation, we get: ^ [g2 jx0 ] E G : 1 1 1 12 22 By replacing in the …rst block equation, and using 0;11 = 11 0 0 0 1 0;21 0;11 from the formulas of the inverse of a block matrix, we get: 1 1 0 J0 RL solves J0 0 2 Rs i; j = 1; 2, denote the blocks of 2 i @g @ 0 jX 3 p ^ [g1 ] R10 T E 5: q 1 ^ [g jx0 ] J0 0 0 1 J0 J0 0 0 1 T hdT E 2 (B.33) 1 ^ [g jx0 ]. Let us denote G := E @g20 jx0 R2 , J0 0 0 1 E 2 @ h R10 E E := J0 0 jx0 R2 . Then V (gjX) 1 0 J00 g^T ( 0 ) = 4 G where E 2 Let us now compute A := E @g 0 @ jX 21 0 and 22 0 1 21 0 = [g2 jx0 ] ; and: ^ [a E = 2 0 jx0 ] + A Thus, using (B.32), (B.33) and p T (^1;T q T hdT (^2;T 1;0 ) 2;0 ) 0;21 0;11 R10 E E = 0 = R2 E 0 R2 E 0 + 0;21 1 0;11 G G = V (g2 jx0 ), 1 ^ 0;11 E 0 1 0;11 G 0;21 @g 0 jX V (gjX) @ ! 0 @g2 jx0 @ ! @g2 jx0 @ V (g2 jx0 ) V (g2 jx0 ) 22 [g2 jx0 ] 1 1 G 0 1 ^ 0;11 E [g2 jx0 ] : = Cov (a; g2 jx0 ), we get: 1 1 E E @g jX @ 0 1 R1 @g2 R2 0 jx0 @ ! 1 q ^ [g2 jx0 ] + op (1); T hdT E p ^ [g1 ] + op (1); R10 T E and: q T hdT ^ T = q ^ [a T hdT E 0 0 jx0 ] 1 Cov (a; g2 jx0 ) V (g2 jx0 ) q ^ [g2 jx0 ] T hdT E @a @g2 1 R2 Cov (a; g2 jx0 ) V (g2 jx0 ) E R2 0 jx0 0 jx0 @ @ ! 1 ! ! 0 0 0 0 @g2 @g2 @g2 1 R2 E R2 jx0 V (g2 jx0 ) E R2 E jx0 V (g2 jx0 ) 0 jx0 @ @ @ E 1 q ^ [g2 jx0 ] + op (1): T hdT E The asymptotic expansions for ^1;T , ^2;T correspond to the asymptotic expansions of the XMM estimators ^1;T , ^2;T in Lemma A.3 (i). The conclusion follows. B.7 Regularity conditions in the stochastic volatility model In this Section, we discuss the technical regularity assumptions for the XMM estimator (see Appendix A.1.1 in the paper) when the DGP P0 is compatible with the stochastic volatility model (3.6)-(3.8). They concern the stationary distribution (Section B.7.1) and the existence of moments (Section B.7.2). B.7.1 Stationary distribution Let us consider process Xt = r~t ; 2t : t 2 Z , where the dynamics of r~t = rt rf;t and 2t under the DGP P0 are de…ned in Section 3.2. Markov process Xt is exponential a¢ ne: h i i i h h 0 2 2 E0 e z Xt+1 j Xt = E0 e u~rt+1 v t+1 j Xt = E0 e ( 0 u+v) t+1 E0 e u t+1 "t+1 j 2t ; Xt j Xt h i 2 1 2 1 2 2 = E0 e ( 0 u+v 2 u ) t+1 j 2t = exp a0 u b0 0u + v 0u + v t 2 h i 0 = exp A(z) Xt B(z) ; 0 0 where A(z) = 0; a0 0 u + v 21 u2 , B(z) = b0 0 u + v 12 u2 ; for z = (u; v) 2 C2 such that Re 0 u + v 12 u2 > 1=c0 , and functions a0 and b0 are de…ned in Section 3.2: i) Strict stationarity and geometric strong mixing From Proposition 2 in Gouriéroux, Jasiak (2006), the ARG process 2t is stationary if 0 0 < 1, 2 with marginal invariant distribution such that [(1 ) =c ] ( ), where ( ) denotes the 0 0 0 0 t gamma distribution with parameter 0 . Thus, when 0 < 1; process (Xt ) admits the marginal invariant distribution: f (x) = 1 r~ 0 2 [(1 0 ) =c0 ] 0 ( 0) 1 e 0 c0 2 2 0 1 , x = r~; 2 2R R+ = X : (B.34) To prove that (Xt ) is geometrically strongly mixing, we use Proposition 4.2 in Darolles, Gouriéroux, Jasiak (2006), and verify the condition: lim h!1 @A h (0) = 0: @z 0 23 (B.35) 1 2 u 2 We have: @A (0) = @z 0 0 0 0 0 : 0 Condition (B.35) is satis…ed if 0 < 1. Thus, with Yt = Xt+1 ; :::; Xt+h conclude that Assumption A.5 is satis…ed if 0 0 < 1: 0 for given h 2 N; we ii) Smoothness of the marginal distribution The stationary distribution f in (B.34) is in C 1 (X ). Moreover, we have: 1 f (x) C1 e 2 0 2 c0 3=2 0 for a constant C1 > 0. Thus, kf k1 < 1 if, and only if, Lemma B.9. ; x 2 X; 3=2. Moreover, we have the following 0 Lemma B.9: kDm f k1 < 1 if, and only if, 0 3=2 + m. Proof: Let m 2 N. Since f 2 C m (X ), to prove kDm f k1 < 1; it is su¢ cient to show that any partial derivative of order m of function f is bounded at the boundary of X , that is, for r~ ! 1, 2 ! 1, 2 ! 0. From (B.34), let us write: 2 f (~ r; where = (1 0 ) =c0 , )=C C= 0 2 h(~ r; 2 ) e 2 0 3=2 ; r~; 2 2 X; = [ ( 0 )] ; and: h(~ r; 2 ) := r~ 0 2 . The function h is such that: @h (~ r; @ r~ @h (~ r; @ 2 2 2 ) ) = 1 ; r~ 0 = 0 2 =2 p 2 2 = 1 0 1 2 2 h(~ r; 2 ): We deduce that: i) Any partial derivative of f is a linear combination of functions of the type: (n) h(~ r; 2 ) h(~ r; 2 k ) e 2 2 l , n; k 2 N; l 2 R: ii) Since function h 7 ! (n) (h)hk is bounded, for any n; k 2 N, it is su¢ cient to prove that partial derivatives of order m are bounded for 2 ! 0. This is the case if, and only if, the smallest power l of 2 , which occurs in partial derivatives of order m, is non-negative. m and is l = 0 3=2 m. We conclude that iii) The smallest power of 2 is featured by @ m f =@ 2 m kD f k1 < 1 if, and only if, 0 3=2 + m: Thus, Assumption A.6 is satis…ed if 0 3=2 + m. For instance, for m = 2; we get 24 0 7=2. B.7.2 Existence of moments For expository purpose, let us assume that the actively traded derivatives at date t0 have times-tomaturity hj = 1 (and moneyness strikes kj ), for j = 1; :::; n; and that we are interested in estimating the price of the derivative with time-to-maturity h = 1 and moneyness strike k. The moment function g2 (y; ), is given by: 0 rf;t+1 g2 (yt ; ) = e e 1 2 t+1 2 B B B ~t+1 B 4r B B B @ (ert+1 (ert+1 2 t 3 0 0 1 erf;t+1 ert+1 + rt+1 (e k1 ) .. . B B B B B B B @ ct0 (kn ; 1) 0 C C C C C C C A + kn ) + k) 1 1 ct0 (k1 ; 1) .. . 1 C C C C C; C C A where yt = r~t+1 ; 2t+1 ; 2t and rt+1 = r~t+1 +rf;t+1 . The relevant variables are Yt = r~t+1 ; 2t+1 ; 2t ; and Xt = r~t ; 2t , respectively. Note that function g2 does not depend on r~t and thus we have dropped this variable from Yt . The following Lemma B.10 provides a condition for E0 kg2 (Yt ; 0 ) k4 < 1 (see Assumption A.11). Lemma B.10: The function g2 (:; 0 2 = ( 1; 0 2; 3; 4) We have: h E0 e 4 1 4 2 2 t+1 4 3 4 0) k is such that E0 kg2 (Yt ; 3 2 t 4 2 t 4 ~t+1 4r > 3 ~t+1 4r i i 1=4c0 , <1 2 h E0 e , h = E0 e h = E0 e = e 4 1 = e 4 1 > 4 4 1 2 t+1 2 4 1 4( 2 2 + 0 4 ) t+1 4 1 4( 2+ 0 4 h E0 e b0 (4( 4 3 2 t 2 4 2 2 2 4 4 )) E 1 + 4c0 h E0 e [4 3 +a0 0 ) =c0 ] (4( 2 t 2+ 0 4 if: 1+ + 2 0 4 2 4 2 t 3 4( E0 e i 2 4 3 0 e [4 4 4 ) 2 t+1 i < 1. t+1 "t+1 j (B.36) 2 t+1 ; 2 t 2 4 2 t t 2+ 0 4 h 1)~ rt+1 4 2 2 4 3 +a0 (4( j 2 t i 2+ 0 4 2 > 0: ( 0 ), we have: 2 2 4 ))] c0 1 2 2 t 3 2 t+1 ) 4( E0 e 2+ 0 4 4 if: Moreover, since [(1 < 1 if, and only if: 1 1 + 2 0 + 4c0 3 4 4) . 0 4 + 2 4 + (2 + 0 4c0 1 + 4c0 3 h i 4 er , for any r; s 2 R, condition E0 kg2 (Yt ; )k < 1 is satis…ed if, and 2 R4 j + Proof: Since (er s) only if: h 2 E0 e 4 1 4 2 t+1 4 0) 4 2 t i = 3 + a0 4 1 1+ c0 1 0 25 0 2 + 4 3 0 4 + a0 4 2 2 4 2 + > 0: 0 4 2 2 4 0 ; ))] i i ; We deduce that conditions (B.36) are satis…ed if, and only if: 1 + 4c0 c0 1+ 4 1 1+ c0 1 0 4 + a0 4 3 + a0 4 2 + + 0 4 1) 2( 2 2 4 2 0 4 > 0; 2 4 2 > 0; 0 1 + 4c0 h 3 + 2 2 + 0 0 ( ( 4 1) 4 2( 2 1) 4 2 1) 4 > 0; i > 0: (B.37) 2 Since 2 + 0 ( 4 1) 2 ( 4 1) = 2 + 0 4 2 24 +(4 4 2) ; and function a0 is increasing, we 0 can distinguish between two parameter regions to solve system (B.37). i) First case: 4 4 0, 0 2 that is, 4 (2 + 0 )=4. In this region system (B.37) is equivalent to: 1 + 4c0 c0 1+ 4 1 3 + 2 + a0 4 + 2 2 0 4 2 0 4 2 4 > 0; 2 4 > 0: 0 Let us introduce the new variable x = 4c0 4c0 1+ 1 3 2 + 0 + 2 0 4 2 4 . Then, this system becomes: x > 1; x > 0: 0 1+x 0 1 By transforming the second equation, we get: x > (1 + 4c0 3 ) x > There is no solution for which 1 + 4c0 1 x< 3 1; (1 0 + 4c0 3 ) . < 0. Indeed, the second equation becomes: + 4c0 1 + 4c0 3 3 0 = 1+ 0 1 + 4c0 < 1; 3 which is incompatible with the …rst equation. Instead, for 1 + 4c0 3 > 0, the second equation becomes: 1 0 + 4c0 3 0 x> = 1+ ; 1 + 4c0 3 1 + 4c0 3 and implies the …rst equation. To summarize, a …rst region of solutions is: (2 + 4 0 )=4 , 1 + 4c0 3 >0 , x> 1 0 + 4c0 1 + 4c0 3 3 ; that is, 4 (2 + ii) Second case: 4 to: 0 )=4 4 0 , 3 > 1=4c0 2 < 0, that is, 4 1 + 4c0 1+ c0 1 0 h 4 3 + a0 4 , < (2 + 2 2 2 + + 0 0 ( 26 > 1 1 0 + 4c0 4c0 1 + 4c0 3 3 0 4 + 2 24 . 0 )=4. In this region, system (B.37) is equivalent ( 1) 4 4 1) 2( 2( 2 1) 4 2 4 1) > 0; i > 0: By introducing the new variable y = 4c0 2 + 0 ( 4 1) argument as above, we get the second region of solutions: 4 < (2 + , 3 0 )=4 , 1 + 4c0 3 2 2( 1) 4 1 >0 , y> , and repeating the same 0 + 4c0 1 + 4c0 3 3 ; that is, 4 < (2 + 0 )=4 > 1=4c0 , 2 1 1 0 + 4c0 4c0 1 + 4c0 3 > 3 0 4 2 4 +2 +2+ 4 4. 0 From Lemma B.10, the condition E0 kg2 (Yt ; 0 ) k4 < 1 is satis…ed, whenever the risk premia parameters 02 and 03 for stochastic volatility are above some thresholds. In particular, the lower bound for 02 depends on 03 and 04 . Imposing the no-arbitrage restriction 04 = 0 + 1=2, the inequality constraints become: 0 3 > 0 2 1=4c0 , 1 1 0 + 4c0 4c0 1 + 4c0 03 > 0 3 + 0 =2 + 0) + 3( + 1=4: These constraints are satis…ed for the parameter values used in Section 3.4 iii). B.8 ARG risk-neutral dynamics In this section, we derive the dynamics of the ARG stochastic volatility model under the risk-neutral 0 0 2 0 2 0 ~t+1 . In Secdistribution Q de…ned by the sdf Mt;t+1 ( 0 ) = e rf;t+1 exp 4r 3 t 2 t+1 1 tion B.7.1, we derived the historical conditional moment generating function of Xt+1 = r~t+1 ; 2t+1 : E0 exp u~ rt+1 v 2 t+1 jxt = exp a0 0u 1 2 u 2 +v 2 t b0 0u 1 2 u 2 +v Let us compute the risk-neutral conditional moment generating function of r~t+1 ; E0Q exp u~ rt+1 v 2 t+1 jxt = E0 Mt;t+1 ( 0 ) exp = e = 0 1 0 3 0 where 0 2 u+ = 0 2 0 4 + + v+ 2 0 =2 0 2 rf;t+1 E0 exp a0 exp b0 by using E0 [Mt;t+1 ( 0 )jxt ] = e 2 t 0 u~ rt+1 u+ 0 0 4 u+ 0 4 0 4 u+ r~t+1 + v+ + v+ 0 2 2 t+1 0 2 4 = u 0 4 0 = 1 u+v 2 0 2 ); 0 3 +v jxt =E0 [Mt;t+1 ( 0 )jxt ] 0 2 v+ 0 2 1 u+ 2 1 2 u + 2 1=8, and: = b0 ( 27 = a0 ( 0 2 ): 1 2 u + 2 0 2; (B.38) . We have: 2 t+1 1 u+ 2 0 2 4 0 2 4 0 1 jxt + ; and (B.38). From equations (3.9) we have: 1 u+ 2 0 1 2 t+1 v : 0 4 0 + 0 2 ( 04 )2 2 0 3 2 t Thus, we get: E0Q exp u~ rt+1 2 t+1 v jxt = exp 1 2 u 2 1 u+v 2 a0 2 t b0 1 u+v 2 1 2 u 2 ; (B.39) where: 0 2) a0 (u) = a0 (u + 0 2) b0 (u) = b0 (u + a0 ( 0 2) b0 ( 2) = 0u ; 1 + c0 u = 0 log(1 + c0 u); with: 0 = 0 = c0 = 0 0 = 0 2 2 1 + c0 0 2 1 + c0 + 2 =2 0 1=8 2; 0; c0 1 + c0 0 2 = 0 2 1 + c0 c0 + 20 =2 1=8 : By comparing (B.38) and (B.39), we deduce that, under the risk neutral distribution, the returns follow a stochastic volatility model with risk premium parameter 0 = 21 and ARG stochastic volatility with parameters 0 ; 0 ; c0 : B.9 Proof of Lemma A.4 We have to show that: P0Q 2 t+1 + + 2 t+h zj 2 t+h 2 t = s; 2 0 = is increasing w.r.t. s; for any z: This condition is implied by: P0Q 2 t+1 + + 2 t+h 1 zj 2 t+h 2 t = s; 2 0 = is increasing w.r.t. s; for any z: (B.40) Since the ARG process is time-reversible, condition (B.40) is equivalent to: P0Q 2 t+1 + + 2 t+h 1 zj 2 t 2 t+h = s; 2 0 = is increasing w.r.t. s; for any z: To show (B.41) we use the stochastic representation of Markov process ( 2 t+1 = g( (B.41) 2 t ): 2 t ; ut+1 ); (B.42) where the innovation ut+1 is independent of 2t . By l-fold compounding of function g w.r.t. the …rst argument, we have 2t+l = gl ( 2t ; ut+1 ; ; ut+l ), say, and 2t+1 + + 2t+h 1 = G( 2t ; ut+1 ; ; ut+h 1 ), 2 2 where G( t ; ut+1 ; ; ut+h 1 ) = g( t ; ut+1 ) + + gh 1 ( 2t ; ut+1 ; ; ut+h 1 ). Condition (B.41) becomes: P0Q G(s; ut+1 ; ; ut+h 1) zj 2 t+h = 2 0 is increasing w.r.t. s; for any z: This condition is satis…ed if function G is increasing w.r.t. the …rst argument, that is, if the function g in the stochastic representation (B.42) is increasing w.r.t. the …rst argument. The latter condition is equivalent to 2t+1 being stochastically increasing in 2t under Q. Finally, let us show that 2t+1 is stochastically increasing in 2t under Q for the ARG process. This follows from the gamma-Poisson mixture representation of the ARG process: 2 t+1 =c0 j t+1 ( 0 + t+1 ); 2 t+1 j t P( 2 0 t =c0 ); where and P denote gamma and Poisson distributions, respectively. Then 2t+1 is stochastically increasing in t+1 , and t+1 is stochastically increasing in 2t . The conclusion follows. 28 B.10 Calibration of the parametric stochastic volatility model In this Section we describe the computation by Fourier transform methods of the option prices in the parametric stochastic volatility model of Section 2.6 i) in the paper. These Fourier transform methods are used for the cross-sectional calibration of the model parameters. The risk-neutral distribution Q is given in equations (2.15)-(2.16). The option price is such that: h ct (h; k) = B(t; t + h)E Q (exp Rt;h + k) j 2 t i ~ t;h exp R = EQ k~ + j 2 t ; ~ t;h = r~t+1 + where R + r~t+h is the cumulated excess return of the underlying asset between t and t + h and k~ = B(t; t + h)k is the discounted moneyness of the option. Let us introduce the variable ~ and de…ne the function: s := log(k) (s) = e s E Q ~ t;h exp R es + j 2 t , s 2 R; for a given > 0 (for expository purpose we omit the dependence of function on time-to-maturity h and current volatility value 2t ). Following Carr and Madan (1999), the Fourier transform of is (see below): Z 1 (iu 1) ^ (u) = ; u 2 R, (B.43) e ius (s) ds = 2 2 + u iu (2 + 1) 1 where For the ARG model, function h (z) = E Q exp ~ t;h j zR 2 t is given by (see below): (z) = exp Ah 2 t i : Bh ; (B.44) where Ah = Ah (z) and Bh = Bh (z) are de…ned recursively by: Ah Bh w = z (1 + z) =2, a0 (u) = option price: = a0 (w + Ah 1 ) , A1 = a0 (w) ; = Bh 1 + b0 (w + Ah 1 ) , B1 = b0 (w) ; 0u 1+c0 u ; b0 (u) = ct (h; k) = 0 log(1 + c0 u). By inverse Fourier transform, we get the e 2 s Z 1 eius ^ (u)du; 1 where s = log(B(t; t + h)k) in the RHS. Since function (s) is real valued, we have ^ ( u) = ^ (u). It follows Z 1 e s ct (h; k) = Re eius ^ (u)du: (B.45) 0 To compute the integral (B.45), we introduce a …nite upper integration boundary > 0 and we discretize the resulting integral over [0; ]. More precisely, let > 0 be such that ^ (u) is small for u > . De…ne the grid uk = ( =N ) (k 1), for k = 1; :::; N , where N 2 N is the number of grid points. Then we have: Z e s Re eius ^ (u)du ct (h; k) ' 0 ' e s Re N 1 X i s (k e N N k=1 29 1) ^ k; where ^ k := ^ (uk ). To summarize, the algorithm to compute ct (h; k) is as follows: 1. Compute the coe¢ cients Ah 2t wk 1 exp 2 ^ := k Bh , k = 1; 2; :::; N; where Ah Bh wk = 1 2 2 u2k + = a0 (wk + Ah 1 ) , A1 = a0 (wk ) ; = Bh 1 + b0 (wk + Ah 1 ) , B1 = b0 (wk ) ; iuk (2 + 1) , uk = ( =N ) (k 1) : 2. Compute the inverse Fourier transform of the coe¢ cients e ct (h; k) = s Re N 1 X i s (k e N N 1) ^ k: k=1 Proof of Equation (B.43): We have Z 1 ~ ^ (u) = e (iu )s EtQ eRt;h 1 Z 1 ~ = EtQ e (iu )s eRt;h 1 " Z ~ = Rt;h ~ t;h R EtQ e e (iu )s es 1 = where EtQ [:] = E Q [:j e iu (iu + u2 iu (2 + 1) + ds Z ds ~ t;h 1)R 1 2 ds es 1 = EtQ + + ~ t;h R e (iu 1 1 iu h EtQ e 1)s 1 (iu e i ds ~ t;h 1)R (iu ~ t;h 1)R # ; 2 t ]: Proof of Equation (B.44): Under the risk-neutral distribution Q we have r~t = 12 where "t IIN (0; 1) and 2t follows an ARG process independent of ("t ) with parameters Thus: h i z 2 (z) = EtQ exp z ( " + ::: + " ) t+1 t+1 t+h t+h 2 t;t+h 1 = EtQ exp z + z 2 2t;t+h ; 2 2 t + t "t , 0 ; 0 ; c0 . where 2t;t+h := 2t+1 + ::: + 2t+h : From standard results for a¢ ne processes in discrete time [e.g., Darolles, Gouriéroux, Jasiak (2006)], equation (B.44) follows. 30 References [1] Andrews, D. (1991): An Empirical Process Central Limit Theorem for Dependent Nonidentically Distributed Random Variables, Journal of Multivariate Analysis, 38, 187–203. [2] Billingsley, P. (1965): Ergodic Theory and Information. New York: John Wiley. [3] Billingsley, P. (1968): Convergence of Probability Measures. New York: John Wiley. [4] Bosq, D. (1998): Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. New York, Springer. [5] Bosq, D., and P. Lecoutre (1987): Theorie de l’estimation fonctionnelle. Paris: Economica. [6] Carr, P., and D. Madan (1999): Option Pricing and the Fast Fourier Transform, Journal of Computational Finance, 2, 61-73. [7] Darolles, S., C. Gouriéroux, and J. Jasiak (2006): Structural Laplace Transform and Compound Autoregressive Models, Journal of Time Series Analysis, 27, 477-504. [8] Gouriéroux, C., and J. Jasiak (2006): Autoregressive Gamma Process, Journal of Forecasting, 25, 129-152. [9] Herrndorf, N. (1984): A Functional Central Limit Theorem for Weakly Dependent Sequences of Random Variables, Annals of Probability, 12, 141-153. [10] Pollard, D. (1990): Empirical Processes: Theory and Applications. CBMS Conference Series in Probability and Statistics, Vol. 2. Hayward, CA: Institute of Mathematical Statistics. [11] Potscher, B., and I. Prucha (1989): A Uniform Law of Large Numbers for Dependent and Heterogeneous Data Processes, Econometrica, 57, 675-683. [12] Stock, J., and J. Wright (2000): GMM with Weak Identi…cation, Econometrica, 68, 1055-1096. [13] Tenreiro, C. (1995): Estimation fonctionnelle: Applications aux tests d’adéquation et de paramètre constant, Ph.D. Thesis, University of Paris VI. 31
© Copyright 2024 Paperzz