LSE Research Online Article (refereed) Woocheol Kim and Oliver B. Linton The live method for generalized additive volatility models Originally published in Econometric theory, 20 (6). pp. 1094-1139 © 2004 Cambridge University Press. You may cite this version as: Kim, Woocheol & Linton, Oliver B. (2004). The live method for generalized additive volatility models [online]. London: LSE Research Online. Available at: http://eprints.lse.ac.uk/archive/00000321 Available online: July 2005 LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website. http://eprints.lse.ac.uk Contact LSE Research Online at: [email protected] Econometric Theory, 20, 2004, 1094–1139+ Printed in the United States of America+ DOI: 10+10170S026646660420603X THE LIVE METHOD FOR GENERALIZED ADDITIVE VOLATILITY MODELS WO O C H E O L KI M Korea Institute of Public Finance and Humboldt University of Berlin OL I V E R LI N T O N The London School of Economics We investigate a new separable nonparametric model for time series, which includes many autoregressive conditional heteroskedastic ~ARCH! models and autoregressive ~AR! models already discussed in the literature+ We also propose a new estimation procedure called LIVE, or local instrumental variable estimation, that is based on a localization of the classical instrumental variable method+ Our method has considerable computational advantages over the competing marginal integration or projection method+ We also consider a more efficient two-step likelihoodbased procedure and show that this yields both asymptotic and finite-sample performance gains+ 1. INTRODUCTION Volatility models are of considerable interest in empirical finance+ There are many types of parametric volatility models, following the seminal work of Engle ~1982!+ These models are typically nonlinear, which poses difficulties both in computation and in deriving useful tools for statistical inference+ Parametric models are prone to misspecification, especially when there is no theoretical reason to prefer one specification over another+ Nonparametric models can provide greater flexibility+ However, the greater generality of these models comes at a cost—including a large number of lags requires estimation of a highdimensional smooth, which is known to behave very badly ~Silverman, 1986!+ The “curse of dimensionality” puts severe limits on the dynamic flexibility of This paper is based on Chapter 2 of the first author’s Ph+D+ dissertation from Yale University+ We thank Wolfgang Härdle, Joel Horowitz, Peter Phillips, and Dag Tjøstheim for helpful discussions+ We are also grateful to Donald Andrews and two anonymous referees for valuable comments+ The second author thanks the National Science Foundation and the ESRC for financial support+ Address correspondence to Woocheol Kim, Korea Institute of Public Finance, 79-6 Garak-Dong, Songpa-Gu, Seoul, Republic of Korea, 138-774; e-mail: wkim@kipf+re+kr+ Oliver Linton, Department of Economics, The London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom; e-mail: lintono@lse+ac+uk+ 1094 © 2004 Cambridge University Press 0266-4666004 $12+00 LIVE METHOD FOR VOLATILITY 1095 nonparametric models+ Separable models offer an intermediate position between the complete generality of nonparametric models and the restrictiveness of parametric models+ These models have been investigated in cross-sectional settings and also in time series settings+ In this paper, we investigate a generalized additive nonlinear autoregressive conditional heteroskedastic model ~GANARCH!: yt 5 m~ yt21 , yt22 , + + + , yt2d ! 1 u t , u t 5 v 102 ~ yt21 , yt22 , + + + , yt2d !«t , S S d m~ yt21 , yt22 , + + + , yt2d ! 5 Fm cm 1 v~ yt21 , yt22 , + + + , yt2d ! 5 Fv cv 1 D D ( m a ~ yt2a ! a51 d ( va ~ yt2a ! a51 , , (1.1) (1.2) (1.3) where m a ~{! and va ~{! are smooth but unknown functions and Fm ~{! and Fv ~{! are known monotone transformations ~whose inverses are Gm ~{! and Gv ~{!, respectively!+1 The error process, $«t %, is assumed to be a martingale difference with unit scale, that is, E~«t 6Ft21 ! 5 0 and E~«t2 6Ft21 ! 5 1, where Ft is the t s-algebra of events generated by $ yk % k52` + Under some weak assumptions, the time series of nonlinear autoregressive models can be shown to be stationary and strongly mixing with mixing coefficients decaying exponentially fast+ Auestadt and Tjøstheim ~1990! use a-mixing or geometric ergodicity to identify their nonlinear time series model+ Similar results are obtained for the additive nonlinear autoregressive conditional heteroskedastic ~ARCH! process by Masry and Tjøstheim ~1997!; see also Cai and Masry ~2000! and Carrasco and Chen ~2002!+ We follow the same argument as Masry and Tjøstheim ~1997! and will assume all the necessary conditions for stationarity and mixing property of the process $ yt % nt51 in ~1+1!+ The standard identification for the components of the mean and variance is made by E @m a ~ yt2a !# 5 0 and E @va ~ yt2a !# 5 0 (1.4) for all a 5 1, + + + , d+ The notable aspect of the model is additivity via known links for conditional mean and volatility functions+ As will be shown later, ~1+1!– ~1+3! include a wide variety of time series models in the literature+ See Horowitz ~2001! for a discussion of generalized additive models in a cross-section context+ In a much simpler univariate setup, Robinson ~1983!, Auestadt and Tjøstheim ~1990!, and Härdle and Vieu ~1992! study the kernel estimation of the conditional mean function m~{! in ~1+1!+ The so-called CHARN ~conditionally heteroskedastic autoregressive nonlinear! model is the same as ~1+1! except that m~{! and v~{! are univariate functions of yt21 + Masry and Tjøstheim ~1995! and Härdle and Tsybakov ~1997! apply the Nadaraya–Watson and local linear 1096 WOOCHEOL KIM AND OLIVER LINTON smoothing methods, respectively, to jointly estimate v~{! together with m~{!+ Alternatively, Fan and Yao ~1996! and Ziegelmann ~2002! propose local linear least square estimation for the volatility function, with the extension given by Avramidis ~2002! based on local linear maximum likelihood estimation+ Also, in a nonlinear vector autoregressive ~VAR! context, Härdle, Tsybakov, and Yang ~1998! deal with the estimation of conditional mean in a multilagged extension similar to ~1+1!+ Unfortunately, however, introducing more lags in nonparametric time series models has unpleasant consequences, more so than in the parametric approach+ As is well known, smoothing methods in high dimensions suffer from a slower convergence rate—the “curse of dimensionality+” Under twice differentiability of m~{!, the optimal rate is n220~41d ! , which gets rapidly worse with dimension+ In high dimensions it is also difficult to describe graphically the function m+ The additive structure has been proposed as a useful way to circumvent these problems in multivariate smoothing+ By assuming the target function to be a sum of functions of covariates, say, m~ yt21 , yt22 , + + + , yt2d ! 5 cm 1 ( da51 m a ~ yt2a !, we can effectively reduce the dimensionality of a regression problem and improve the implementability of multivariate smoothing up to that of the one-dimensional case+ Stone ~1985, 1986! shows that it is possible to estimate m a ~{! and m~{! with the one-dimensional optimal rate of convergence— for example, n 205 for twice differentiable functions—regardless of d+ The estimates are easily illustrated and interpreted+ For these reasons, since the 1980s, additive models have been fundamental to nonparametric regression among both econometricians and statisticians+ Regarding the estimation method for achieving the one-dimensional optimal rate, the literature suggests two different approaches: backfitting and marginal integration+ The former, originally suggested by Breiman and Friedman ~1985!, Buja, Hastie, and Tibshirani ~1989!, and Hastie and Tibshirani ~1987, 1990!, is to execute iterative calculations of one-dimensional smoothing until some convergence criterion is satisfied+ Though appealing to our intuition, the statistical properties of backfitting algorithm were not clearly understood until the very recent works by Opsomer and Ruppert ~1997! and Mammen, Linton, and Nielsen ~1999!+ They develop specific ~linear! backfitting procedures and establish the geometric convergence of their algorithms and the pointwise asymptotic distributions under some conditions+ However, one disadvantage of these procedures is the time-consuming iterations required for implementation+ Also, the proofs for the linear case cannot be easily generalized to nonlinear cases such as generalized additive models+ A more recent approach, called marginal integration ~MI!, is theoretically more manipulable—its statistical properties are easy to derive, because it simply uses averaging of multivariate kernel estimates+ Developed independently by Newey ~1994!, Tjøstheim and Auestadt ~1994!, and Linton and Nielsen ~1995!, its simplicity inspired subsequent applications such as Linton, Wang, Chen, and Härdle ~1995! for transformation models and Linton, Nielsen, and van de Geer ~2003! for hazard models with censoring+ In the time series mod- LIVE METHOD FOR VOLATILITY 1097 els that are special cases of ~1+1! and ~1+2! with Fm being the identity, Chen and Tsay ~1993a, 1993b! and Masry and Tjøstheim ~1997! apply backfitting and MI, respectively, to estimate the conditional mean function+ Mammen et al+ ~1999! provide useful results for the same type of models by improving the previous backfitting method with some modification and successfully deriving the asymptotic properties under weak conditions+ The separability assumption is also used in volatility estimation by Yang, Härdle, and Nielsen ~1999!, where the nonlinear ARCH model is of additive mean and multiplicative volatility in the form of d yt 5 cm 1 ( a51 S m a ~ yt2a ! 1 cv ) a51 D 102 d va ~ yt2a ! «t + (1.5) To estimate ~1+5!, they rely on marginal integration with local linear fits as a pilot estimate and derive asymptotic properties+ This paper features two contributions to the additive literature+ The first concerns theoretical development of a new estimation tool called the local instrumental variable estimator for the components of additive models ~LIVE for CAM!, which was outlined for simple additive cross-sectional regression in Kim, Linton, and Hengartner ~1999!+ The novelty of the procedure lies in the simple definition of the estimator based on univariate smoothing combined with new kernel weights+ That is, adjusting kernel weights via conditional density of the covariate enables a univariate kernel smoother to estimate consistently the corresponding additive component function+ In many respects, the new estimator preserves the good properties of univariate smoothers+ The instrumental variable method is analytically tractable for asymptotic theory: it is shown to attain the optimal one-dimensional rate+ Furthermore, it is computationally more efficient than the two existing methods ~backfitting and MI! in the sense that it reduces the computations by a factor of n smoothings+ The other contribution relates to the general coverage of the model we work with+ The model in ~1+1!–~1+3! extends ARCH models to a generalized additive framework where both the mean and variance functions are additive after some known transformation ~see Hastie and Tibshirani, 1990!+ All the time series models in our previous discussion are regarded as a subclass of the data generating process for $ yt % in ~1+1!–~1+3!+ For example, setting Gm to be an identity and Gv a logarithmic function reduces our model to ~1+5!+ Similar efforts to apply transformation have been made in parametric ARCH models+ Nelson ~1991! considers a model for the log of the conditional variance—the exponential ~G!ARCH class—to embody the multiplicative effects of volatility+ It has also been argued to use the Box–Cox transformation for volatility, which is intermediate between linear and logarithm and which allows nonseparable news impact curves+ Because it is hard to tell a priori which structure of volatility is more realistic and it should be determined by real data, our generalized additive model provides useful flexible specifications for empirical work+ Addi- 1098 WOOCHEOL KIM AND OLIVER LINTON tionally, from the perspective of potential misspecification problems, the transformation used here alleviates the restriction imposed by the additivity assumption, which increases the approximating power of our model+ Note that when the lagged variables in ~1+1!–~1+3! are replaced by different covariates and the observations are independent and identically distributed ~i+i+d+!, the model becomes the cross-sectional additive model studied by Linton and Härdle ~1996!+ Finally, we also consider more efficient estimation along the lines of Linton ~1996, 2000!+ The rest of the paper is organized as follows+ Section 2 describes the main estimation idea in a simple setting+ In Section 3, we define the estimator for the full model+ In Section 4 we give our main results, including the asymptotic normality of our estimators+ Section 5 discusses more efficient estimation+ Section 6 reports a small Monte Carlo study+ The proofs are contained in the Appendix+ 2. NONPARAMETRIC INSTRUMENTAL VARIABLES: THE MAIN IDEA This section explains the basic idea behind the instrumental variable method and defines the estimation procedure+ For ease of exposition, this will be carried out using an example of simple additive models with i+i+d+ data+ We then extend the definition to the generalized additive ARCH case in ~1+1!–~1+3!+ Consider a bivariate additive regression model for i+i+d+ data ~ y, X 1 , X 2 !, y 5 m 1 ~X 1 ! 1 m 2 ~X 2 ! 1 «, where E~«6 X ! 5 0 with X 5 ~X 1 , X 2 ! and the components satisfy the identification conditions E @m a ~X a !# 5 0, for a 5 1,2 ~the constant term is assumed to be zero, for simplicity!+ Letting h 5 m 2 ~X 2 ! 1 «, we rewrite the model as y 5 m 1 ~X 1 ! 1 h, (2.6) which is a classical example of “omitted variable” regression+ That is, although ~2+6! appears to take the form of a univariate nonparametric regression model, smoothing y on X 1 will incur a bias due to the omitted variable h, because h contains X 2 , which in general depends on X 1 + One solution to this is suggested by the classical econometric notion of instrumental variable+ That is, we look for an instrument W such that E~W6 X 1 ! Þ 0; E~Wh6 X 1 ! 5 0 (2.7) with probability one+2 If such a random variable exists, we can write m 1~ x 1 ! 5 E~Wy6 X 1 5 x 1 ! + E~W6 X 1 5 x 1 ! (2.8) This suggests that we estimate the function m 1~{! by nonparametric smoothing of Wy on X 1 and W on X 1 + In parametric models the choice of instrument is LIVE METHOD FOR VOLATILITY 1099 usually not obvious and requires some caution+ However, our additive model has a natural class of instruments—p2 ~X 2 !0p~X ! times any measurable function of X 1 will do, where p~{!, p1~{!, and p2 ~{! are the density functions of the covariates X, X 1 , and X 2 , respectively+ It follows that E~Wy6 X 1 ! 5 E~W6 X 1 ! 5 E E W~X !m~X ! E p~X ! dX 2 p1 ~X 1 ! p~X ! dX 2 W~X ! p1 ~X 1 ! m~X !p2 ~X 2 ! dX 2 E 5 p2 ~X 2 ! dX 2 E 5 E W~X !m~X !p~X ! dX 2 E W~X !p~X ! dX 2 m~X !p2 ~X 2 ! dX 2 as required+ This formula shows what the instrumental variable estimator is estimating when m is not additive—an average of the regression function over the X 2 direction, exactly the same as the target of the marginal integration estimator+ For simplicity we will take W~X ! 5 p2 ~X 2 ! p~X ! (2.9) throughout+3 Up to now, it was implicitly assumed that the distributions of the covariates are known a priori+ In practice, this is rarely true, and we have to rely on estimates of these quantities+ Let p~{!, [ p[ 1~{!, and p[ 2 ~{! be kernel estimates of the densities p~{!, p1~{!, and p2 ~{!, respectively+ Then the feasible procedure is defined [ ! and with a replacement of the instrumental variable W by WZ 5 p[ 2 ~X 2 !0p~X taking sample averages instead of population expectations+ Section 3 provides a rigorous statistical treatment for feasible instrumental variable estimators based on local linear estimation+ See Kim et al+ ~1999! for a slightly different approach+ Next, we come to the main advantage that the local instrumental variable method has+ This is in terms of the computational cost+ The marginal integration method actually needs n 2 regression smoothings evaluated at the pairs ~ X 1i , X 2j !, for i, j 5 1, + + + , n, whereas the backfitting method requires nr operations—where r is the number of iterations to achieve convergence+ The instrumental variable procedure, in contrast, takes at most 2n operations of kernel smoothings in a preliminary step for estimating the instrumental variable and another n operations for the regressions+ Thus, it can be easily combined with the bootstrap method whose computational costs often become prohibitive in the case of marginal integration ~see Kim et al+, 1999!+ 1100 WOOCHEOL KIM AND OLIVER LINTON Finally, we show how the instrumental variable approach can be applied to generalized additive models+ Let F~{! be the inverse of a known link function G~{! and let m~X ! 5 E~ y6 X !+ The model is defined as y 5 F~m 1 ~X 1 ! 1 m 2 ~X 2 !! 1 «, (2.10) or equivalently G~m~X !! 5 m 1~X 1 ! 1 m 2 ~X 2 !+ We maintain the same identification condition, E @m a ~X a !# 5 0+ Unlike in the simple additive model, there is no direct way to relate Wy to m 1~X 1 ! here, so ~2+8! cannot be implemented+ However, under additivity m 1 ~X 1 ! 5 E @WG~m~X !!6 X 1 # E @W6 X 1 # (2.11) for the W defined in ~2+9!+ Because m~{! is unknown, we need consistent estimates of m~X ! in a preliminary step, and then the calculation in ~2+11! is feasible+ In the next section we show how these ideas are translated into estimators for the general time series setting+ 3. INSTRUMENTAL VARIABLE PROCEDURE FOR GANARCH We start with some simplifying notations that will be used repeatedly in the discussion that follows+ Let x t be the vector of d lagged variables until t 2 1, that is, x t 5 ~ yt21 , + + + , yt2d !, or concisely, x t 5 ~ yt2a , ryt2a !, where ryt2a 5 d ~ yt21 , + + + , yt2a21 , yt2a11 , + + + , yt2d !+ Defining m at ~ ryt2a ! 5 ( b51,Þa m b ~ yt2b ! d and vat ~ ryt2a ! 5 ( b51,Þa vb ~ yt2b !, we can reformulate ~1+1!–~1+3! with a focus on the ath components of the mean and variance as yt 5 m~ x t ! 1 v 102 ~ x t !«t , m~ x t ! 5 Fm ~cm 1 m a ~ yt2a ! 1 m at ~ ryt2a !!, v~ x t ! 5 Fv ~cv 1 va ~ yt2a ! 1 vat ~ ryt2a !!+ To save space we will use the following abbreviations for the functions to be estimated: Ha ~ yt2a ! [ @m a ~ yt2a !, va ~ yt2a !# Á, c [ @cm , cv # Á, Hat ~ ryt2a ! [ @m at ~ ryt2a !, vat ~ ryt2a !# Á, rt [ H~ x t ! 5 @Gm ~m~ x t !!, Gv ~v~ x t !!# Á , wa ~ ya ! 5 c 1 Ha ~ ya !+ Note that the components @m a ~{!, va ~{!# Á are identified, up to constant c, by wa ~{!, which will be our major interest in estimation+ Subsequently, we examine in some detail each relevant step for computing the feasible nonparametric instrumental variable estimator of wa ~{!+ The set of observations is given by ' Y 5 $ yt % nt51 , where n ' 5 n 1 d+ LIVE METHOD FOR VOLATILITY 1101 3.1. Step I. Preliminary Estimation of rt = H ( x t ) Because rt is unknown, we start with computing the pilot estimates of the regression surface by a local linear smoother+ Let m~ K x! be the first component of ~ a,I bD Á ! Á that solves n' min ( a, b t5d11 Kh ~ x t 2 x!$ yt 2 a 2 b Á ~ x t 2 x!% 2, (3.12) where Kh ~ x! 5 P di51 K~ x i 0h!0h d, K is a one-dimensional kernel function, and h 5 h~n! is a bandwidth sequence+ In a similar way, we get the estimate of the volatility surface, v~{!, I from ~3+12! by replacing yt with the squared residuals, K x t !! 2 + Then, transforming mK and vI by the known links will lead «I t2 5 ~ yt 2 m~ to consistent estimates of rI t , rI t 5 H~ E x t ! 5 @Gm ~ m~ K x t !!, Gv ~ v~ I x t !!# Á+ 3.2. Step II: Instrumental Variable Estimation of Additive Components This step involves the estimation of wa ~{!, which is equivalent to @m a ~{!, va ~{!# Á , up to the constant c+ Let p~{! and pat ~{! denote the density functions of the random variables ~ yt2a , ryt2a ! and ryt2a , respectively+ Define the feasible instrument as WZ t 5 p[ at ~ ryt2a ! p~ [ yt2a , ryt2a ! , [ are computed using the kernel function L~{!, for example, where p[ at ~{! and p~{! n d p~ [ x! 5 ( t51 ) i51 L g ~ x it 2 x i !0n with L g ~{! [ L~{0g!0g and g 5 g~n! is a bandwidth sequence+ The instrumental variable local linear estimates w[ a ~ ya ! are given as ~a 1 , a 2 ! Á through minimizing the localized squared errors elementwise: n' min ( a j , bj t5d11 Kh ~ yt2a 2 ya ! WZ t $ rI jt 2 a j 2 bj ~ yt2a 2 ya !% 2, (3.13) where rI jt is the jth element of rI t +4 The closed form of the solution is Á Á w[ a ~ ya ! Á 5 e1Á ~Y2 KY2 !21 Y2 KR, E (3.14) where e1 5 ~1,0! Á, Y2 5 @i,Y2 # , K 5 diag@Kh ~ yd112a 2 ya ! WZ d11 , + + + , Kh ~ yn ' 2a 2 ya ! WZ n ' # , and RE 5 ~ rI d11 , + + + , rI n ' ! Á , with i 5 ~1, + + + ,1! Á and Y2 5 ~ yd112a 2 ya , + + + , yn ' 2a 2 ya ! Á+ 1102 WOOCHEOL KIM AND OLIVER LINTON 4. MAIN RESULTS Let Fba be the s-algebra of events generated by $ yt % ba and a~k! the strong mixing coefficient of $ yt % that is defined by a~k! [ sup 0 A[F2` , B [ Fk ` 6P~A ù B! 2 P~A!P~B!6+ Throughout the paper, we make the following assumptions+ Assumption A+ A1+ $ yt % ` t51 is a stationary and strongly mixing process generated by ~1+1!– ` ~1+3!, with a mixing coefficient such that ( k50 k a $a~k!% 1220n , `, for some n . 2 and 0 , a , ~1 2 20n!+ As pointed out by Masry and Tjøstheim ~1997!, the condition on the mixing coefficient in A1 is milder than assumed on the standard mixing process where the coefficient decreases at a geometric rate, that is, a~k! 5 r2bk ~for some b . 0!+ Some technical conditions for regularity are stated here+ For simplicity, we assume that the process $ yt % ` t51 has a compact support+ A2+ The additive component functions, m a ~{! and va ~{!, for a 5 1, + + + , d, are continuous and twice differentiable on the compact support+ A3+ The link functions, Gm and Gv , have bounded continuous second-order derivatives over any compact interval+ A4+ The joint and marginal density functions, p~{!, pat ~{!, and pa ~{!, for a 5 1, + + + , d, are continuous, twice differentiable with bounded ~partial! derivatives, and bounded away from zero on the compact support+ A5+ The kernel functions, K~{! and L~{!, are a real bounded nonnegative symmetric ~around zero! function on a compact support satisfying * K~u! du 5 *L~u! du 5 1, *uK~u! du 5 *uL~u! du 5 0+ Also, assume that the kernel functions are Lipschitz-continuous, 6K~u! 2 K~v!6 # C6u 2 v6+ A6+ ~i! g r 0, ng d r `, ~log n! 2 M h0M n g d r 0+ ~ii! h r 0, ~log n! 2Y M nh 2d21 r 0+ ~iii! The bandwidth satisfies M n0h a~t~n!! r 0, where $t~n!% is a sequence of positive integers, t~n! r `, such that t~n! 5 o~M nh!+ Conditions A2–A5 are standard in kernel estimation+ The continuity assumption in A2 and A4, together with the compact support, implies that the functions are bounded+ The bandwidth conditions in A6~i! and A6~ii! are necessary for showing negligibility of the stochastic error terms arising from the preliminary estimation of m, v, and pa ~{!+ Under twice-differentiability of these functions as in A2–A4, the given side conditions are satisfied when d # 4+ Our asymptotic results that follow can be extended into a more general case of LIVE METHOD FOR VOLATILITY 1103 d . 4, although we do not prove it in the paper+ One way of extension to higher dimensions is to strengthen the differentiability conditions in A2–A4 and use higher order polynomials ~see Kim et al+, 1999!+ The additional bandwidth condition in A6~iii! is necessary to control the effects from the dependence of the mixing processes in showing the asymptotic normality of instrumental variable estimates+ The proof of consistency, however, does not require this condition+ Define D 2 f ~ x 1 , + + + , x d ! 5 ( dl51 ] 2 f ~ x l !0] 2 x and @¹Gm ~t !, ¹ Gv ~t !# 5 @dGm ~t !0dt, dGv ~t !0dt # + Let ~K * K !i ~u! 5 * K ~w!K ~w 1 u! 3 2 5 *~K * K !0 ~u!u 2 du, and 7K7 22 w i dw, a convolution of kernel functions, m K*K 2 denote *K ~u! du+ The asymptotic properties of the feasible instrumental variable estimates in ~3+14! are summarized in the following theorem, whose proof is in the Appendix+ Let k3 ~ ya , z at ! 5 E @«t3 6 x t 5 ~ ya , z at !# and k4 ~ ya , z at ! 5 E @~«t2 2 1! 2 6 x t 5 ~ ya , z at !# + A ( B denotes the matrix Hadamard product+ THEOREM 1+ Assume that conditions A1–A6 hold. Then, M nh@ w[ a ~ ya ! 2 wa ~ ya ! 2 Ba # Ba ~ ya ! 5 d & N @0, S *a ~ ya !# , where h2 2 2 m K D wa ~ ya ! 2 1 h2 2 E 2 @m K*K D 2 wa ~ ya ! 1 m 2K D 2 wat ~z at !# ( @¹Gm ~m~ ya , z at !!,¹Gv ~v~ ya , z at !!# Á pat ~z at ! dz at 1 g2 2 mK 2 S *a ~ ya ! 5 7K7 22 E EF D 2 pat ~z at ! 2 m a2t ~z at ! pa2t ~z at ! p~ ya , z at ! m at ~z at !vat ~z at ! 1 7~K * K !0 7 22 3 F F G pat ~z at ! D 2 p~ ya , z at ! Hat ~z at ! dz at , p~ ya , z at ! E m at ~z at !vat ~z at ! va2t ~z at ! pa2t ~z at ! p~ ya , z at ! ¹Gm ~m! 2 v ~¹Gm¹Gv !~k3 v 302 ! ~¹Gm¹Gv !~k3 v 302 ! ¹Gv ~v! 2 k4 v 2 G G dz at ~ ya , z at ! dz at + Remarks+ 1+ To estimate @m a ~ ya !, va ~ ya !# Á we can use the following recentered estimates: w[ a ~ ya ! 2 c,[ where c[ 5 @ c[ m , c[ v # 5 ~10n!@ (t yt ,(t «I t2 # Á and 1104 WOOCHEOL KIM AND OLIVER LINTON «I t 5 yt 2 m~ K x t !+ Because c[ 5 c 1 Op ~10M n !, the bias and variance of @ m[ a ~ ya !, v[ a ~ ya !# Á are the same as those of w[ a ~ ya !+ For y 5 ~ y1 , + + + , yd !, the estimates for the conditional mean and volatility are defined by FF F d @ m~ [ y!, v~ [ y!# [ Fm 2~d 2 1! c[ m 1 Fv 2~d 2 1! c[ v 1 G GG ( w[ a1~ ya ! a51 d ( w[ a2 ~ ya ! a51 , + Let ¹F~ y! [ @¹Fm ~m~ y!!,¹Fv ~v~ y!!# Á + Then, by Theorem 1 and the delta method, their asymptotic distribution satisfies M nh@ m~ [ y! 2 m~ y! 2 bm ~ y!, v~ [ y! 2 v~ y! 2 bv ~ y!# Á d & N @0, S * ~ y!# , where @bm ~ y!, bv ~ y!# Á 5 ¹F~ y! ( ( da51 Ba ~ ya ! and S * ~ y! 5 @¹F~ y! 3 ¹F~ y! Á # ( @S *1 ~ y1 ! 1 + + + 1 S *d ~ yd !# + It is easy to see that w[ a ~ ya ! and w[ b ~ yb ! are asymptotically uncorrelated for any a and b and that the asymptotic variance of their sum is also the sum of the variances of w[ a ~ ya ! and w[ b ~ yb !+ 2+ The first term of the bias is of standard form, depending only on the second derivatives as in other local linear smoothing+ The last term reflects the biases from using estimates for density functions to construct the fea[ x t !+ When the instrument consistsible instrumental variable, p[ at ~ ryt2a !0p~ ing of known density functions, pat ~ ryt2a !0p~ x t !, is used in ~3+13!, the asymptotic properties of instrumental variable estimates are the same as those from Theorem 1 except that the new asymptotic bias now includes only the first two terms of Ba ~ ya !+ 3+ The convolution kernel ~K * K !~{! is the legacy of double smoothing in the instrumental variable estimation of “generalized” additive models because we smooth @Gm ~ mK ~{!!, Gv ~ vI ~{!!# with mK ~{! and vI ~{! given by ~multivariate! local linear fits+ When Gm ~{! is the identity, we can directly smooth y instead of Gm ~ m~ K x t !! to estimate the components of the conditional mean function+ Then, as the following theorem shows, the second term of the bias of Ba does not arise, and the convolution kernel in the variance is replaced by a usual kernel function+ Suppose that Fm ~t ! 5 Fv ~t ! 5 t in ~1+2! and ~1+3!+ In this case, the instrumental variable estimates of wa ~ ya ! can be defined in a simpler way+ For wa ~ ya ! 5 @Ma ~ ya !,Va ~ ya !# 5 @cm 1 m a ~ ya !, cv 1 va ~ ya !# , we define @ MZ a ~ ya !, VZ a ~ ya !# by the solution to the adjusted-kernel least squares in ~3+13! with the modification that the ~2 3 1! vector zI t is replaced by @ yt , «I t2 # Á , where «I t is given in step I in Section 3+1+ Theorem 2 shows the asymptotic normality of these estimates+ The proof is almost the same as that of Theorem 1 and thus is omitted+ LIVE METHOD FOR VOLATILITY 1105 THEOREM 2+ Under the same conditions as Theorem 1, (i) d M nh@ MZ a ~ ya ! 2 Ma ~ ya ! 2 bam # 2 bam ~ ya ! 5 & N @0, sam ~ ya !# , h g m 2K D 2 m a ~ ya ! 1 m 2K 2 2 3 where 2 EF E D 2 pat ~z at ! 2 sam ~ ya ! 5 7K7 22 G pat ~z at ! D 2 p~ ya , z at ! m at ~z at ! dz at , p~ ya , z at ! pa2t ~z at ! @m a2t ~z at ! 1 v~ ya , z at !# dz at p~ ya , z at ! and (ii) M nh@ VZ a ~ ya ! 2 Va ~ ya ! 2 bav # d 2 bav ~ ya ! 5 & N @0, sav ~ ya !# , h g m 2K D 2 va ~ ya ! 1 m 2K 2 2 3 where 2 EF E D 2 pat ~z at ! 2 S va ~ ya ! 5 7K7 22 G pat ~z at ! D 2 p~ ya , z at ! vat ~z at ! dz at , p~ ya , z at ! pa2t ~z at ! @va2t ~z at ! 1 k4 ~ ya , z at !v 2 ~ ya , z at !# dz at + p~ ya , z at ! Although the instrumental variable estimators achieve the one-dimensional optimal convergence rate, there is room for improvement in terms of variance+ For example, compared with the marginal integration estimators of Linton and Härdle ~1996! or Linton and Nielsen ~1995!, the asymptotic variances of the instrumental variable estimates for m 1~{! in Theorems 1 and 2 include an additional factor of m 22 ~{!+ This is because the instrumental variable approach treats h 5 m 2 ~X 2 ! 1 « in ~2+6! as if it were the error term of the regression equation for m 1~{!+ Note that the second term of the asymptotic covariance in Theorem 2 is the same as that in Yang et al+ ~1999!, where the authors only considered the case with additive mean and multiplicative volatility functions+ The issue of efficiency in estimating an additive component was first addressed by Linton ~1996! based on “oracle efficiency” bounds of infeasible estimators under the knowledge of other components+ According to this, both instrumental variable and marginal integration estimators are inefficient, but they can attain the efficiency bounds through one simple additional step, following Linton ~1996, 2000! and Kim et al+ ~1999!+ 5. MORE EFFICIENT ESTIMATION 5.1. Oracle Standard In this section we define a standard of efficiency that could be achieved in the presence of certain information, and then we show how to achieve this in prac- 1106 WOOCHEOL KIM AND OLIVER LINTON tice+ There are several routes to efficiency here, depending on the assumptions one is willing to make about «t + We shall take an approach based on likelihood, that is, we shall assume that «t is i+i+d+ with known density function f like the normal or t with given degrees of freedom+ It is easy to generalize this to the case where f contains unknown parameters, but we shall not do so here+ It is also possible to build an efficiency standard based on the moment conditions in ~1+1!–~1+3!+ We choose the likelihood approach because it leads to easy calculations and links with existing work and is the most common method for estimating parametric ARCH0GARCH models in applied work+ There are several standards that we could apply here+ First, suppose that we know ~cm , $m b ~{! : b Þ a%! and ~cv , $va ~{! : a%!; then what is the best estimator we can obtain for the function m a within the local polynomial paradigm? Similarly, suppose that we know ~cm , $m a ~{! : a%! and ~cv , $vb ~{! : b Þ a%!; then what is the best estimator we can obtain for the function va ? It turns out that this standard is very high and cannot be achieved in practice+ Instead we ask: suppose that we know ~cm , $m b ~{! : b Þ a%! and ~cv , $vb ~{! : b Þ a%!; then what is the best estimator we can obtain for the functions ~m a , va !? It turns out that this standard can be achieved in practice+ Let p denote 2log f ~{!, where f ~{! is the density function of «t + We use z t to denote ~ x t , yt !, where x t 5 ~ yt21 , + + + , yt2d ! 5 ~ yt2a , ryt2a !+ For u 5 ~ua , ub ! 5 ~a m , a v , bm , bv !, we define l t* ~u, ga ! 5 l * ~z t ; u, ga ! 5 p 1 S yt 2 Fm ~g1a ~ ryt2a ! 1 a m 1 bm ~ yt2a 2 ya !! Fv102 ~g2a ~ ryt2a ! 1 a v 1 bv ~ yt2a 2 ya !! D 1 log Fv ~g2a ~ ryt2a ! 1 a v 1 bv ~ yt2a 2 ya !!, 2 l t ~u, ga ! 5 l~z t ; u, ga ! 5 Kh ~ yt2a 2 ya !l * ~z t ; u, ga !, (5.15) where ga ~ ryt2a ! 5 ~g1a ~ ryt2a !, g2a ~ ryt2a !! 5 ~cm 1 m at ~ ryt2a !, cv 1 vat ~ ryt2a !! 5 ~cm 1 ( dbÞa m b ~ yt2b !, cv 1 ( dbÞa vb ~ yt2b !!+ With l t ~u, ga ! being the ~negative! conditional local log likelihood, the infeasible local likelihood estimator uZ 5 ~ a[ m , a[ v , bZ m , bZ v ! is defined by the minimizer of n' Qn ~u! 5 ( l t ~u, ga0 !, t5d11 0 0 ~{!, g2a ~{!! 5 ~cm0 1 m a0t ~{!, cv0 1 va0t ~{!!+ From the definition where ga0 ~{! 5 ~g1a for the score function st* ~u, ga ! 5 s * ~z t ; u, ga ! 5 st ~u, ga ! 5 s~z t ; u, ga ! 5 ]l * ~z t ; u, ga ! , ]u ]l~z t ; u, ga ! , ]u LIVE METHOD FOR VOLATILITY 1107 the first-order condition for uZ is given by ' 1 n 0 5 sS n ~ u,Z ga0 ! 5 ( st ~ u,Z ga0 !+ n t5d11 The asymptotic distribution of the local maximum likelihood estimator has been studied by Avramidis ~2002!+ For y 5 ~ y1 , + + + , yd ! 5 ~ ya , rya !, define E E Va 5 Va ~ ya ! 5 V~ y; u0 , ga0 !p~ y! d rya ; Da 5 D~ ya ! 5 D~ y; u0 , ga0 !p~ y! d rya , where V~ y; u, ga ! 5 E @s * ~z t ; u, ga !s * ~z t ; u, ga ! Á 6 x t 5 y#; D~ y; u, ga ! 5 E~¹u st* ~z t ; u, ga !6 x t 5 y!+ With a minor generalization of the results by Avramidis ~2002, Theorem 2!, we obtain the following asymptotic properties for the infeasible estimators: w[ ainf ~ ya ! 5 @ m[ ainf ~ ya !, v[ ainf ~ ya !# Á 5 @ a[ m , a[ v # Á + Let wac ~ ya ! [ @m a ~ ya !, va ~ ya !# Á , that is, wac ~ ya ! 5 wa ~ ya ! 2 c, where c 5 ~cm , cv !+ THEOREM 3+ Under Assumption C in the Appendix, it holds that M nh@ w[ ainf ~ ya ! 2 wac ~ ya ! 2 Ba # d & N @0, V *a ~ ya !# , where Ba 5 _12 h 2 m 2K @m a'' ~ ya !, va'' ~ ya !# Á and V *a ~ ya ! 5 7K7 22 Da21 Va Da21. A more specific form for the asymptotic variance can be calculated+ For example, suppose that the error density function, f ~{!, is symmetric+ Then, the asymptotic variance of the volatility function is given by v22 ~ ya ! 5 E HE FE HE J g 2 ~ y! f ~ y! dy ~¹Fv 0Fv ! 2 ~Gv ~v~ y!!!p~ y! d rya J q~ y! f ~ y! dy ~¹Fv 0Fv ! 2 ~Gv ~v~ y!!!p~ y! d rya G 2 , where g~ y! 5 f ' ~ y! f 21 ~ y!y 1 1 and q~ y! 5 @ y 2 f '' ~ y! f ~ y! 1 yf ' ~ y! f ~ y! 2 y 2 f ' ~ y! 2 # f 22 ~ y!+ When the error distribution is Gaussian, we can further simplify the asymptotic variance; that is, v11 ~ ya ! 5 FE FE v22 ~ ya ! 5 2 v 21 ~ y!¹Fm2 ~Gm ~m~ y!!!p~ y! d rya G G v 22 ~ y!¹Fv2 ~Gv ~v~ y!!!p~ y! d rya 21 ; 21 + v12 5 v21 5 0; 1108 WOOCHEOL KIM AND OLIVER LINTON In this case, one can easily find the infeasible estimator to have lower asymptotic variance than the instrumental variable estimator+ To see this, we note that ¹Gm 5 10¹Fm and 7K7 22 # 7~K * K !0 7 22 and apply the Cauchy–Schwarz inequality to get 7~K * K !0 7 22 E FE pa2t ~ rya ! p~ ya , rya ! $ 7K7 22 ¹Gm ~m! 2 v~ ya , rya ! d rya v 21 ~ ya , rya !¹Fm2 ~Gm ~m!!p~ ya , rya ! d rya G 21 + In a similar way, from k4 5 3 due to the Gaussianity assumption on «, it follows that 7~K * K !0 7 22 k4 $2 FE E pa2t ~z at ! ¹Gv ~v! 2 v 2 ~ ya , yat ! dyat p~ ya , z at ! v 22 ~ y!¹Fv2 ~Gv ~v~ y!!!p~ y! d rya G 21 + These, together with k3 5 0, imply that the second term of S *a ~ ya ! in Theorem 1 is greater than V *a ~ ya ! in the sense of positive definiteness, and hence S *a ~ ya ! $ V *a ~ ya !, because the first term of S *a ~ ya ! is a nonnegative matrix+ The infeasible estimator is more efficient than the instrumental variable estimator because the former uses more information concerning the mean-variance structure+ We finally remark that the infeasible estimator is also more efficient that the marginal integration estimator in Yang et al+ ~1999! whose asymptotic variance corresponds to the second term of S *a ~ ya !; see the discussion following Theorem 2+ 5.2. Feasible Estimation Let ~ cI m , $ mK b ~{! : b Þ a%! and ~ cI v , $ vI b ~{! : b Þ a%! be the estimators from ~3+12! and ~3+13! in Section 3, with the common bandwidth parameter h 0 chosen for the kernel function K~{!+ We define the feasible local likelihood estimator uZ * 5 ~ a[ m* , a[ v* , bZ m* , bZ v* ! as the minimizers of n' ( QE n ~u! 5 l t ~u, gJ a !, t5d11 where gJ a ~{! 5 ~ gJ 1a ~{!, gJ 2a ~{!! 5 ~ cI m 1 mK at ~{!, cI v 1 vI at ~{!! and l t ~{! is given by ~5+15!, with the additional bandwidth parameter h, possibly different from h 0 + Then, the first-order condition for uZ * is given by 1 0 5 sS n ~ uZ , gJ a ! 5 n * n' ( st ~ uZ *, gJ a !+ (5.16) t5d11 Let w[ a* ~ ya ! 5 ~ m[ a* ~ ya !, v[ a* ~ ya !! Á 5 ~ a[ m* , a[ v* ! Á + We have the following result+ LIVE METHOD FOR VOLATILITY 1109 THEOREM 4+ Under Assumptions B and C in the Appendix, it holds that M nh@ w[ a* ~ ya ! 2 w[ ainf ~ ya !# p & 0+ This result shows that the oracle efficiency bound is achieved by the twostep estimator+ 6. NUMERICAL EXAMPLES A small-scale simulation is carried out to investigate the finite-sample properties of both the instrumental variable and two-step estimators+ The design in our experiment is additive nonlinear ARCH~2!: yt 5 @0+2 1 v1 ~ yt21 ! 1 v2 ~ yt22 !#«t , v1 ~ y! 5 0+4FN ~62y6!@2 2 FN ~ y!# y 2, v2 ~ y! 5 0+4$1YM 1 1 0+1y 2 1 ln~1 1 4y 2 ! 2 1%, where FN ~{! is the ~cumulative! standard normal distribution function and «t is i+i+d+ with N~0,1!+ Figure 1 ~solid lines! depicts the shapes of the volatility functions defined by v1~{! and v2 ~{!+ Based on the preceding model, we simulate 500 samples of ARCH processes with sample size n 5 500+ For each realization of the ARCH process, we apply the instrumental variable estimation procedure in ~3+13! with rI t 5 yt2 to get preliminary estimates of v1~{! and v2 ~{!+ Those estimates then are used to compute the two-step estimates of volatility functions based on the feasible local maximum likelihood estimator in Section 5+2, under the normality assumption for the errors+ The infeasible oracle estimates are also provided for comparisons+ The Gaussian kernel is used for all the nonparametric estimates, and bandwidths are chosen according to the rule of thumb ~Härdle, 1990!, h 5 ch std~ yt !n210~41d ! , where std~ yt ! is the standard deviation of yt + We fix ch 5 1 for both the density estimates ~for computing the instruments, W ! and instrumental variable estimates in ~3+13! and ch 5 1+5 for the ~feasible and infeasible! local maximum likelihood estimator+ To evaluate the performance of the estimators, we calculate the mean squared error, together with the mean absolute deviation error, for each simulated datum; for a 5 1,2, ea, MSE 5 H ea, MAE 5 1 50 1 50 50 @va ~ yi ! 2 v[ a ~ yi !# 2 ( i51 J 102 , 50 ( 6va ~ yi ! 2 v[ a ~ yi !6, i51 where $ y1 , + + , y50 % are grid points on @21,1!+ The grid range covers about 70% of the observations on average+ Table 1 gives averages of ea, MSE ’s and ea, MAE ’s from 500 repetitions+ 1110 Figure 1. Averages of volatility estimates ~demeaned!: ~a! first lag; ~b! second lag+ 1111 Figure 2. Volatility estimates ~demeaned!+ 1112 WOOCHEOL KIM AND OLIVER LINTON Table 1. Averages MSE and MAE for three volatility estimators Oracle est+ IV est+ Two-step e1, MSE e2, MSE e1, MAE e2, MAE 0+07636 0+08017 0+08028 0+08310 0+11704 0+08524 0+06049 0+06660 0+06372 0+06816 0+09725 0+07026 Table 1 shows that the infeasible oracle estimator is the best out of the three, as would be expected+ The performance of the instrumental variable estimator seems to be reasonably good, compared to the local maximum likelihood estimators, at least in estimating the volatility function of the first lagged variable+ However, the overall accuracy of the instrumental variable estimates is improved by the two-step procedure, which behaves almost as well as the infeasible one, confirming our theoretical results in Theorem 4+ For more comparisons, Figure 1 shows the averaged estimates of volatility functions, where the averages are made, at each grid, over 500 simulations+ In Figure 2, we also illustrate the estimates for three typical ~consecutive! realizations of ARCH processes+ NOTES 1+ The extension to allow the F transformations to be of unknown functional form is considerably more complicated; see Horowitz ~2001!+ 2+ Note the contrast with the marginal integration or projection method+ In this approach one defines m 1 by some unconditional expectation m 1 ~ x 1 ! 5 E @m~ x 1 , X 2 !W~X 2 !# for some weighting function W that depends only on X 2 and that satisfies E @W~X 2 !# 5 1; E @W~X 2 !m 2 ~X 2 !# 5 0+ 3+ If instead we take W~X ! 5 p1 ~X 1 !p2 ~X 2 ! p~X ! , this satisfies E~W6 X 1 ! 5 1 and E~Wh6 X 1 ! 5 0+ However, the term p1~X 1 ! cancels out of the expression and is redundant+ 4+ For simplicity, we choose the common bandwidth parameter for the kernel function K~{! in ~3+12! and ~3+13!, which amounts to undersmoothing ~for our choice of h! for the purposes of estimating m+ Undersmoothing in the preliminary estimation of step I allows us control over the biases from estimating m and v+ In addition, the convolution kernel function in the asymptotic variance of Theorem 1 relies on the condition of the same bandwidth for K~{!+ LIVE METHOD FOR VOLATILITY 1113 REFERENCES Andrews, D+W+K+ ~1994! Empirical process methods in econometrics+ In R+F+ Engle & D+ McFadden ~eds+!, Handbook of Econometrics, vol+ IV, pp+ 2247–2294+ North-Holland+ Auestadt, B+ & D+ Tjøstheim ~1990! Identification of nonlinear time series: First order characterization and order estimation+ Biometrika 77, 669– 687+ Avramidis, P+ ~2002! Local maximum likelihood estimation of volatility function+ Manuscript, LSE+ Breiman, L+ & J+H+ Friedman ~1985! Estimating optimal transformations for multiple regression and correlation ~with discussion!+ Journal of the American Statistical Association 80, 580– 619+ Buja, A+, T+ Hastie, & R+ Tibshirani ~1989! Linear smoothers and additive models ~with discussion!+ Annals of Statistics 17, 453–555+ Cai, Z+ & E+ Masry ~2000! Nonparametric estimation of additive nonlinear ARX time series: Local linear fitting and projections+ Econometric Theory 16, 465–501+ Carrasco, M+ & X+ Chen ~2002! Mixing and moment properties of various GARCH and stochastic volatility models+ Econometric Theory 18, 17–39+ Chen, R+ ~1996! A nonparametric multi-step prediction estimator in Markovian structures+ Statistical Sinica 6, 603– 615+ Chen, R+ & R+S+ Tsay ~1993a! Nonlinear additive ARX models+ Journal of the American Statistical Association 88, 955–967+ Chen, R+ & R+S+ Tsay ~1993b! Functional-coefficient autoregressive models+ Journal of the American Statistical Association 88, 298–308+ Engle, R+F+ ~1982! Autoregressive conditional heteroscedasticity with estimates of the variance of U+K+ inflation+ Econometrica 50, 987–1008+ Fan, J+ & Q+ Yao ~1996! Efficient estimation of conditional variance functions in stochastic regression+ Biometrica 85, 645– 660+ Gozalo, P+ & O+ Linton ~2000! Local nonlinear least squares: Using parametric information in nonparametric regression+ Journal of Econometrics 99~1!, 63–106+ Hall, P+ & C+ Heyde ~1980! Martingale Limit Theory and Its Application. Academic Press+ Härdle, W+ ~1990! Applied Nonparametric Regression. Econometric Monograph Series 19+ Cambridge University Press+ Härdle, W+ & A+B+ Tsybakov ~1997! Locally polynomial estimators of the volatility function+ Journal of Econometrics 81, 223–242+ Härdle, W+, A+B+ Tsybakov, & L+ Yang ~1998! Nonparametric vector autoregression+ Journal of Statistical Planning and Inference 68~2!, 221–245+ Härdle, W+ & P+ Vieu ~1992! Kernel regression smoothing of time series+ Journal of Time Series Analysis 13, 209–232+ Hastie, T+ & R+ Tibshirani ~1990! Generalized Additive Models. Chapman and Hall+ Hastie, T+ & R+ Tibshirani ~1987! Generalized additive models: Some applications+ Journal of the American Statistical Association 82, 371–386+ Horowitz, J+ ~2001! Estimating generalized additive models+ Econometrica 69, 499–513+ Jones, M+C+, S+J+ Davies, & B+U+ Park ~1994! Versions of kernel-type regression estimators+ Journal of the American Statistical Association 89, 825–832+ Kim, W+, O+ Linton, & N+ Hengartner ~1999! A computationally efficient oracle estimator of additive nonparametric regression with bootstrap confidence intervals+ Journal of Computational and Graphical Statistics 8, 1–20+ Linton, O+B+ ~1996! Efficient estimation of additive nonparametric regression models+ Biometrika 84, 469– 474+ Linton, O+B+ ~2000! Efficient estimation of generalized additive nonparametric regression models+ Econometric Theory 16, 502–523+ Linton, O+B+ & W+ Härdle ~1996! Estimating additive regression models with known links+ Biometrika 83, 529–540+ 1114 WOOCHEOL KIM AND OLIVER LINTON Linton, O+B+ & J+ Nielsen ~1995! A kernel method of estimating structured nonparametric regression based on marginal integration+ Biometrika 82, 93–100+ Linton, O+B+, J+ Nielsen, & S+ van de Geer ~2003! Estimating multiplicative and additive hazard functions by kernel methods+ Annals of Statistics 31, 464– 492+ Linton, O+B+, N+ Wang, R+ Chen, & W+ Härdle ~1995! An analysis of transformation for additive nonparametric regression+ Journal of the American Statistical Association 92, 1512–1521+ Mammen, E+, O+B+ Linton, & J+ Nielsen ~1999! The existence and asymptotic properties of a backfitting projection algorithm under weak conditions+ Annals of Statistics 27, 1443–1490+ Masry, E+ ~1996! Multivariate local polynomial regression for time series: Uniform strong consistency and rates+ Journal of Time Series Analysis 17, 571–599+ Masry, E+ & D+ Tjøstheim ~1995! Nonparametric estimation and identification of nonlinear ARCH time series: Strong convergence and asymptotic normality+ Econometric Theory 11, 258–289+ Masry, E+ & D+ Tjøstheim ~1997! Additive nonlinear ARX time series and projection estimates+ Econometric Theory 13, 214–252+ Nelson, D+B+ ~1991! Conditional heteroskedasticity in asset returns: A new approach+ Econometrica 59, 347–370+ Newey, W+K+ ~1994! Kernel estimation of partial means+ Econometric Theory 10, 233–253+ Opsomer, J+D+ & D+ Ruppert ~1997! Fitting a bivariate additive model by local polynomial regression+ Annals of Statistics 25, 186–211+ Pollard, D+ ~1990! Empirical Processes: Theory and Applications. CBMS Conference Series in Probability and Statistics, vol+ 2+ Institute of Mathematical Statistics+ Robinson, P+M+ ~1983! Nonparametric estimation for time series models+ Journal of Time Series Analysis 4, 185–208+ Silverman, B+W+ ~1986! Density Estimation for Statistics and Data Analysis. Chapman and Hall+ Stein, E+M+ ~1970! Singular Integrals and Differentiability Properties of Functions. Princeton University Press+ Stone, C+J+ ~1985! Additive regression and other nonparametric models+ Annals of Statistics 13, 685–705+ Stone, C+J+ ~1986! The dimensionality reduction principle for generalized additive models+ Annals of Statistics 14, 592– 606+ Tjøstheim, D+ & B+ Auestad ~1994! Nonparametric identification of nonlinear time series: Projections+ Journal of the American Statistical Association 89, 1398–1409+ Volkonskii, V+ & Y+ Rozanov ~1959! Some limit theorems for random functions+ Theory of Probability and Applications 4, 178–197+ Yang, L+, W+ Härdle, & J+ Nielsen ~1999! Nonparametric autoregression with multiplicative volatility and additive mean+ Journal of Time Series Analysis 20, 579– 604+ Ziegelmann, F+ ~2002! Nonparametric estimation of volatility functions: The local exponential estimator+ Econometric Theory 18, 985–992+ APPENDIX A.1. Proofs for Section 4. The proof of Theorem 1 consists of three steps+ Without loss of generality we deal with the case a 5 1; here we will use the subscript 2, for expositional convenience, to denote the nuisance direction+ That is, p2 ~ ryk21 ! 5 p1v ~ ryk21 ! in the case of density function+ For component functions, m 2 ~ ryk21 !, v2 ~ ryk21 !, and H2 ~ ryk21 ! will be used instead of m 1v ~ ryk21 !, v1v ~ ryk21 !, and H1v ~ ryk21 !, respectively+ We start by decomposing the estimation errors, w[ 1~ y1 ! 2 w1~ y1 !, into the main stochastic term and bias+ Use X n . Yn to denote X n 5 Yn $1 1 op ~1!% in the following+ Let vec~X ! denote the vectorization of the elements of the matrix X along with columns+ LIVE METHOD FOR VOLATILITY 1115 Proof of Theorem 1. Step I. Decompositions and Approximations. Because w[ 1~ y1 ! is a column vector, the vectorization of equation ~3+14! gives Á Á w[ 1 ~ y1 ! 5 @I2 J e1Á ~Y2 KY2 !21 # ~I2 J Y2 K!vec~ R!+ E A similar form is obtained for the true function, w1~ y1 !, Á Á @I2 J e1Á ~Y2 KY2 !21 # ~I2 J Y2 K!vec~iw1Á ~ y1 ! 1 Y2¹w1Á ~ y1 !!, by the identity Á Á w1 ~ y1 ! 5 vec$e1Á ~Y2 KY2 !21 Y2 K@iw1Á ~ y1 ! 1 Y2¹w1Á ~ y1 !#%, because Á Á e1Á ~Y2 KY2 !21 Y2 Ki 5 1, Á Á e1Á ~Y2 KY2 !21 Y2 KY2 5 0+ Á KY2 Dh21 , the estimation errors are By defining Dh 5 diag~1, h! and Qn 5 Dh21 Y2 w[ 1 ~ y1 ! 2 w1 ~ y1 ! 5 @I2 J e1Á Qn21 #tn , where Á tn 5 ~I2 J Dh21 Y2 K!vec@ RE 2 iw1Á ~ y1 ! 2 Y2¹w1Á ~ y1 !# + Observing tn 5 S n' 1 ( KhWZ ~ yk21 2 y1 !@ rI k 2 w1~ y1 ! 2 ~ yk21 2 y1 !¹w1~ y1 !# J n k5d11 1, k yk21 2 y1 h D Á , where KhWZ k ~ y! 5 Kh ~ y! WZ k , it follows by adding and subtracting rk 5 w1~ yk21 ! 1 H2 ~ ryk21 ! that tn 5 n' 1 ( KhWZ ~ yk21 2 y1 !@ rI k 2 rk 1 H2 ~ ryk21 !# J n k5d11 1 k S 1, yk21 2 y1 h D Á n' 1 n S ( KhWZ k ~ yk21 2 y1 !@w1 ~ yk21 ! 2 w1 ~ y1 ! 2 ~ yk21 2 y1 !¹w1 ~ y1 !# k5d11 J 1, yk21 2 y1 h D Á + 1116 WOOCHEOL KIM AND OLIVER LINTON As a result of the boundedness condition in Assumption A2, the Taylor expansion applied to @Gm ~ m~ K x k !!, Gv ~ v~ I x k !!# at @m~ x k !, v~ x k !# yields the first term of tn as tI n [ n' 1 ( KhWZ ~ yk21 2 y1 ! n k5d11 k F S uI k J 1, yk21 2 y1 h DG Á , where uI k [ rI k1 1 rI k2 1 H2 ~ ryk21 !, K x k ! 2 m~ x k !# ,¹Gv ~v~ x k !!@ v~ I x k ! 2 v~ x k !#% Á, rI k1 [ $¹Gm ~m~ x k !!@ m~ rI k2 [ 1 $D 2 Gm ~m * ~ x k !!@ m~ K x k ! 2 m~ x k !# 2, D 2 Gv ~v * ~ x k !!@ v~ I x k ! 2 v~ x k !# 2 % Á, 2 K x k !@ v~ I x k !# and m~ x k !@v~ x k !, respectively#+ In a simand m * ~ x k !@v * ~ x k !# is between m~ ilar way, the Taylor expansion of w1~ yk21 ! at y1 gives the second term of tn as s0n 5 h2 1 2 n' ( KhWZ ~ yk21 2 y1 ! n k5d11 k S yk21 2 y1 h DF 2 S D 2 w1 ~ y1 ! J 1, yk21 2 y1 h DG Á 3 ~1 1 op ~1!!+ The term tI n continues to be simplified by some further approximations+ Define the marginal expectation of estimated density functions p[ 2 ~{! and p~{! [ as follows: p~ T yk21 , ryk22 ! [ pT 2 ~ ryk22 ! [ E E L g ~z 1 2 yk21 !L g ~z 2 2 ryk22 !p~z 1 , z 2 ! dz 1 dz 2 , L g ~z 2 2 ryk22 !p2 ~z 2 ! dz 2 + In the first approximation, we replace the estimated instrument, W, Z by the ratio of the expectations of the kernel density estimates, pT 2 ~ ryk21 !0p~ T x k ! and deal with the linear terms in the Taylor expansions+ That is, tI n is approximated with an error of op ~1YM nh! by t1n 1 t2n : t1n [ t2n [ 1 n' ( Kh ~ yk21 2 y1 ! n k5d11 1 n' ( Kh ~ yk21 2 y1 ! n k5d11 pT 2 ~ ryk21 ! p~ T xk ! pT 2 ~ ryk21 ! p~ T xk ! DG F S S DG F rI k1 J 1, yk21 2 y1 Á , h H2 ~ ryk21 ! J 1, yk21 2 y1 h Á , based on the following results: n Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~ [ x k !# @ rI k2 J ~1, ~ yk21 2 y1 !0h! Á # 5 ~i! ~10n! ( k5d11 op ~1YM nh!, n' ~ii! ~10n! ( k5d11 Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~ [ x k ! 2 pT 2 ~ ryk21 !0p~ T x k !# @H2 ~ ryk21 ! J ~1, ~ yk21 '2 y1 ! 0h! Á # 5 op ~1YM nh!, n ~iii! ~10n! ( k5d11 Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~ [ x k ! 2 pT 2 ~ ryk21 !0p~ T x k !# @ rI k1 J ~1, Á Y ~ yk21 2 y1 ! 0h! # 5 op ~1 M nh!+ ' LIVE METHOD FOR VOLATILITY 1117 To show ~i!, consider the first two elements of the term, for example, which are bounded elementwise by max 6 m~ K x k ! 2 m~ x k !6 2 k 1 1 Kh ~ yk21 2 y1 ! 2 n( k p[ 2 ~ ryk21 ! p~ [ xk ! S D 2 Gm ~m~ x k !! 1, yk21 2 y1 h D Á 5 op ~1YM nh!+ The last equality is direct from the uniform convergence theorems in Masry ~1996! that max 6 m~ K x t ! 2 m~ x t !6 5 Op ~log nYM nh d ! (A.1) t and ~10n! (k Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~ [ x k !#D 2 Gm ~m~ x k !!~1, ~ yk21 2 y1 ! 0h! Á 5 Op ~1!+ The proof for ~ii! is shown by applying Lemma A+1, which follows+ The negligibility of ~iii! follows in a similar way from ~ii!, considering ~A+1!+ Although the asymptotic properties of s0n and t2n are relatively easy to derive, additional approximation is necessary to make t1n more tractable+ Note that the estimation errors of the local linear fits, m~ K xk ! 2 m~ x k !, are decomposed into 1 n( l Kh ~ x l 2 x k ! p~ x l ! v 102 ~ x l !«l 1 the remaining bias from the approximation results for the local linear smoother in Jones, Davies, and Park ~1994!+ A similar expression holds for volatility estimates, v~ I x k ! 2 v~ x k !, with a stochastic term of ~10n! (l @Kh ~ x l 2 x k !0p~ x l !#v~ x l !~«l2 2 1!+ Define Jk, n ~ x l ! [ 1 nh d F (k K~ yk21 2 y1 0h!K~ x l 2 x k 0h! p2 ~ ryk21 ! p~ x l ! S 3 diag~¹Gm ,¹Gv ! J 1, p~ x k ! yk21 2 y1 h DG Á and let JN ~ x l ! denote the marginal expectation of Jk, n with respect to x k + Then, the stochastic term of t1n , after rearranging its the double sums, is approximated by tI 1n 5 1 nh (l JN ~ xl !@~v 102 ~ x l !«l , v~ x l !~«l2 2 1!! Á J I2 # because the approximation error from JN ~X l ! negligible, that is, 1 nh (l ~Jk, n 2 JN !@~v 102 ~Xl !«l , v~Xl !~«l2 2 1!! Á J I2 # Á 5 op ~1YM nh!, 1118 WOOCHEOL KIM AND OLIVER LINTON applying the same method as in Lemma A+1+ A straightforward calculation gives JN ~ x l ! . 1 h E K~u 1 2 y1 0h!K~u 1 2 yl21 0h! S F 3 diag~¹Gm ~u!,¹Gv ~u!! J 1, . 1 h E E 1 u 1 2 y1 K~u 1 2 y1 0h!K~u 1 2 yl21 0h! K~ ryl21 2 u 2 0h! h d21 h DG p~ x l ! S p~ x l ! du 2 du 1 p2 ~ ryl21 ! F p2 ~ ryl21 ! p~ x l ! Á 3 diag~¹Gm ~u 1 , ryl21 !,¹Gv ~u 1 , ryl21 !! J 1, . p2 ~u 2 ! u 1 2 y1 h DG Á du 1 F diag~¹Gm ~ y1 , ryl21 !,¹Gv ~ y1 , ryl21 !! S J ~K * K !0 S yl21 2 y1 h D , ~K * K !1 S yl21 2 y1 h DD G Á , where ~K * K !i S yl21 2 y1 h D E 5 S w1i K~w1 !K w1 1 yl21 2 y1 h D dw+ Observe that ~K * K !i ~~ yl21 2 y1 !0h! in JN ~Xl ! is actually a convolution kernel and behaves just like a one-dimensional kernel function of yl21 + This means that the standard method ~central limit theorem or law of least numbers! for univariate kernel estimates can be applied to show the asymptotics of 1 tI 1n 5 nh (l p2 ~ ryl21 ! p~ x l ! 5F ¹Gm ~ y1 , ryl21 !v 102 ~X l !«l ¹Gv ~ y1 , ryl21 !v~X l !~«l2 2 1! G J 3 ~K * K !0 ~K * K !1 S S yl21 2 y1 h yl21 2 y1 h D D 46 + If we define s1n as the remaining bias term of t1n , the estimation errors of w[ 1~ y1 ! 2 w1~ y1 ! consist of two stochastic terms, @I2 J e1Á Qn21 # ~ tI 1n 1 tI 2n !, and three bias terms, @I2 J e1Á Qn21 # ~s0n 1 s1n 1 s2n !, where tI 2n 5 1 n' ( Kh ~ yk21 2 y1 ! n k5d11 s2n 5 t2n 2 tI 2n + p2 ~ ryk21 ! p~X k ! F S H2 ~ ryk21 ! J 1, Yk21 2 y1 h DG Á , LIVE METHOD FOR VOLATILITY 1119 Step II. Computation of Variance and Bias. We start with showing the order of the main stochastic term, tI n* 5 tI 1n 1 tI 2n 5 1 n (k jk , where jk 5 j1k 1 j2k , j1k 5 j2k 5 p2 ~ ryk21 ! p~ yk21 , ryk21 ! p2 ~ ryk21 ! p~ yk21 , ryk21 ! HF ¹Gm ~ y1 , ryk21 !v 102 ~X k !«k ¹ Gv ~ y1 , 5F ryk21 !v~X k !~«k2 m 2 ~ ryk21 ! v2 ~ ryk21 ! G J 3 1 h K S 2 1! G J S F 1 h ~K * K !0 h yk21 2 y1 h 0 D 1 yk21 2 y1 K h h yk21 2 y1 yl21 2 y1 DS S h D 46 DGJ , , by calculating its asymptotic variance+ Dividing a normalized variance of tI n* into the sums of variances and covariances gives var~M nh tI n* ! 5 var SM ( D h Mn jk 5 k 5 h var~ jD k ! 1 ( k h n F G n2k n h cov~jk , jl ! (k var~jk ! 1 n (( kÞl h@cov~jd , jd1k !# , where the last equality comes from the stationarity assumption+ We claim that ~a! h var~jk ! r S 1~ y1 !, ~b! (k @1 2 ~k0n!#h cov~jd , jd1k ! 5 o~1!, and ~c! nh var~ tI n* ! r S 1~ y1 !, where S 1 ~ y1 ! 5 HE J 1 p22 ~z 2 ! p~ y1 , z 2 ! F E F ¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! 7~K * K !0 7 22 0 0 0 p22 ~z 2 ! p~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! ¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 ! GJ H2 ~z 2 !H2Á ~z 2 ! dz 2 J 3 7K7 22 0 0 E K 2 ~u!u 2 du 4 + G dz 2 1120 WOOCHEOL KIM AND OLIVER LINTON Á Proof of (a). Noting E~j1k ! 5 E~j2k ! 5 0431 and E~j1k j2k ! 5 0434 , Á Á h var~jk ! 5 hE~j1k j1k ! 1 hE~j2k j2k ! by the stationarity assumption+ Applying the integration with substitution of variable and Taylor expansion, the expectation term is Á hE ~j1k j1k !5 HE J F ¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 ! p22 ~z 2 ! p~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! F 7~K * K !0 7 22 0 0 0 ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! ¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 ! G dz 2 GJ and Á hE~j2k j2k !5 E J p22 ~z 2 ! F m 2 ~z 2 !v2 ~z 2 ! m 22 ~z 2 ! p~ y1 , z! m 2 ~z 2 !v2 ~z 2 ! 3 7K7 22 0 0 E K 2 ~u!u 2 du v22 ~z 2 ! 4 G dz 2 1 o~1!, where k3 ~ y1 , z 2 ! 5 E @«t3 6 x t 5 ~ y1 , z 2 !# and k4 ~ y1 , z 2 ! 5 E @~«t2 2 1! 2 6 x t 5 ~ y1 , z 2 !# + n Á Proof of (b). Because E~j1k j1jÁ !6jÞk 5 E~j1k j2j !6jÞk 5 0, cov~jd11 , jd111k ! 5 cov~j2d11 , j2d111k !+ By setting c~n!h r 0, as n r `, we separate the covariance terms into two parts: c~n! ( k51 F G 12 k n n' h cov~j2d11 , j2d111k ! 1 ( k5c~n!11 F G 12 k n h cov~j2d11 , j2d111k !+ To show the negligibility of the first part of the covariances, consider that the dominated convergence theorem used after Taylor expansion and the integration with substitution of variables gives 6cov~j2d11 , j2d111k !6 . *E H2 ~ ryd !H2Á ~ ryd1k ! p~ y1 , ryd , y1 , ryd1k ! p162 ~ y1 6 ryd !p162 ~ y1 6 ryd1k ! * d~ ryd , ryd1k ! J F G 1 0 0 0 + LIVE METHOD FOR VOLATILITY 1121 Therefore, it follows from the assumption on the boundedness condition in Assumption A2 that 6cov~j2d11 , j2d111k !6 # E6H2 ~ ryd !6E6H2Á ~ ryd1k !6 3 E p~ y1 , ryd , y1 , ryd1k ! p162 ~ y1 6 ryd !p162 ~ y1 6 ryd1k ! d~ ryd , ryd1k ! J F G 1 0 0 0 , [ A* , where A # B means a ij # bij , for all element of matrices A and B+ By the construction of c~n!, c~n! ( k51 F G 12 k n h cov~j2d11 , j2d111k ! # 2c~n!6h cov~j2d11 , j2d111k !6 # 2c~n!hA* r 0, as n r `+ Next, we turn to the negligibility of the second part of the covariances: n' ( k5c~n!11 F G 12 k n h cov~j2d11 , j2d111k !+ i Let j2k be the ith element of j2k , for i 5 1, + + + ,4+ Using Davydov’s lemma ~in Hall and Heyde, 1980, Theorem A+5!, we obtain i i 6h cov~j2d11 , j2d111k !6 5 6cov~M hj2d11 ,M hj2d111k !6 j j # 8@a~k!1220v # F G i v max E~M h6j2k 6 ! i51, + + + ,4 20v 1 v for some v . 2+ The boundedness of E~M h6j2k 6 !, for example, is evident from the direct calculation that j2k 5 1 v 6 !. E~6M hj2k p2 ~ ryk ! p~ x k ! h v02 h 5O v21 5F G E m 2 ~ ryd ! v2 ~ ryd ! J p2v ~z 2 ! v21 p ~ y1 , z 2 ! 3 1 h h v21 5O 1 h S 6m 2v ~z 2 !6 dz 2 S D S D h v02 K v0221 + S D yk21 2 y1 K h h yk21 2 y1 yk21 2 y1 1 h DS h D 46 , 1122 WOOCHEOL KIM AND OLIVER LINTON Thus, the covariance is bounded by 6h cov~j2d11 , j2d111k !6 # C F 1 h v0221 G 20v @a~k!1220v # + This implies n' ( k5c~n!11 F G 12 k n h cov~j2d11 , j2d111k ! ` #2 ( 6h cov~j2d11, j2d111k !6 # C ' k5c~n!11 ` 5 C' ( k5c~n!11 F 1 h 1220v G F 1 h 1220v G ` ( @a~k!1220v # k5c~n!11 ` @a~k!1220v # # C ' ( k a @a~k!1220v # , k5c~n!11 if a is such that k a $ ~c~n! 1 1! a $ c~n! a 5 1 h 1220v , for example, c~n! a h 1220v 5 1, which implies c~n! r `+ If we further restrict a such that 2 0 , a , 12 , v then c~n! a h 1220v 5 1 implies c~n! a h 1220v 5 @c~n!h# 1220v c~n!2d 5 1, for d . 0+ Thus, c~n!h r 0 as required+ Therefore, n' ( k5c~n!11 F G 12 k n ` h cov~j2d11 , j2d111k ! # C ' as n goes to `+ ( k a @a~k!1220v # r 0, k5c~n!11 n The proof of ~c! is immediate from ~a! and ~b!+ Next, we consider the asymptotic bias+ Using the standard result on the kernel weighted sum of the stationary series, we first get s0n p & h2 2 @D 2 w1 ~ y1 ! J ~ m 2K ,0! Á # , LIVE METHOD FOR VOLATILITY 1123 because 1 n' DF S DG DF S DG S DF S DG S DF S DG ( KhWZ ~ yk21 2 y1 ! n k5d11 p . 5 E E E & S S yk21 2 y1 h z 1 2 y1 KhWZ ~z 1 2 y1 ! F 5 D 2 w1 ~ y1 ! J 2 z 1 2 y1 2 h E 2 h z 1 2 y1 S z 1 2 y1 h DS z 1 2 y1 p~z! dz z 1 2 y1 Á h h 1, Á h z 1 2 y1 2 Á h D 2 w1 ~ y1 ! J 1, D 2 w1 ~ y1 ! J 1, Kh ~z 1 2 y1 ! yk21 2 y1 D 2 w1 ~ y1 ! J 1, D 2 w1 ~ y1 ! J 1, h Kh ~z 1 2 y1 !p2 ~z 2 ! Kh ~z 1 2 y1 ! 2 z 1 2 y1 h dz Á dz 1 D G Á dz 1 5 @D 2 w1 ~ y1 ! J ~ m 2K ,0! Á # + For the asymptotic bias of s1n , we again use the approximation results in Jones et al+ ~1994!+ Then, the first component of s1n , for example, is 1 Kh ~ yk21 2 y1 ! n( k 3 H 1 1 2 n( l p2 ~ ryk21 ! p~ x k ! Kh ~ x l 2 x k ! p~ x l ! ¹Gm ~m~ x k !! d ( ~ yl2a 2 yk2a ! 2 a51 ] 2 m~ x k ! 2 ]yk2a J and converges to h2 2 E 2 p2 ~z 2 !¹Gm ~m~ y1 , z 2 !!@m K*K D 2 m 1 ~ y1 ! 1 m 2K D 2 m 2 ~z 2 !# dz 2 , based on the argument for the convolution kernel given previously+ A convolution of symmetric kernels is symmetric, so that *~K * K !0 ~u!udu 5 0 and *~K * K !1~u!u 2 du 5 **wK~w!K~w 1 u!u 2 dwdu 5 0+ This implies that sI 1n p & h2 2 E p2 ~z 2 !$@¹Gm ~m~ y1 , z 2 !!,¹Gv ~v~ y1 , z 2 !!# Á 2 ( @m K*K D 2 w1 ~ y1 ! 1 m 2K D 2 w2 ~z 2 !#% J ~1,0! Á dz 2 + To calculate s2n , we use the Taylor series expansion of pT 2 ~ ryk21 !0p~X T k !: F pT 2 ~ ryk21 ! 2 T k! p2 ~ ryk21 ! p~X F p~X k ! 5 pT 2 ~ ryk21 ! 2 5 pT 2 ~ ryk21 ! p~X k ! 2 G 1 p~X T k! T k! p2 ~ ryk21 ! p~X p~X k ! T k! p2 ~ ryk21 ! p~X p 2 ~X k ! G 1 p~X k ! 1 op ~1!+ F 3 12 p~X T k ! 2 p~X k ! p 2 ~X k ! 1{{{ G 1124 WOOCHEOL KIM AND OLIVER LINTON Thus, s2n 5 n' 1 ( Kh ~ yk21 2 y1 ! n k5d11 F & E . E 5 E Kh ~z 1 2 y1 ! E 1 pT 2 ~z 2 ! p~z! pT 2 ~z 2 ! p~z! g2 2 2 FE FE p2 ~ ryk21 ! p~X k ! G Á p2 ~z 2 ! 2 H2 ~z 2 ! J 1, p~z! p2 ~z 2 ! p~z! T 2 p ~z! p2 ~z 2 ! 2 p~z! p 2 ~z! z 1 2 y1 2 H2 ~z 2 ! J 1, p2 ~z 2 ! 2 p~ y1 , z 2 ! Á p~z! dz h z 1 2 y1 Á h z 1 2 y1 h p~z! dz Á p~z! dz p2 ~z 2 ! p~z! T p 2 ~z! Á p~z! dz h D 2 p2 ~z 2 !H2 ~z 2 ! dz 2 J ~ m 2K ,0! Á g2 z 1 2 y1 H2 ~z 2 ! J 1, 2 p2 ~z 2 !p~z! 3 H2 ~z 2 ! J 1, . 2 h p~z! T Kh ~z 1 2 y1 ! F p~X T k! yk21 2 y1 pT 2 ~z 2 ! Kh ~z 1 2 y1 ! Kh ~z 1 2 y1 ! pT 2 ~ ryk21 ! S DG S DG F GF S DG F GF S DG F GF F G S DG 3 H2 ~ ryk21 ! J 1, p F G G D 2 p~ y1 , z 2 !H2 ~z 2 ! dz 2 J ~ m 2K ,0! Á + Finally, for the probability limit of @I2 J e1Á Qn21 # , we note that Á KY2 Dh21 5 @ q[ ni1j22 ~ y1 ; h!# ~i, j !51,2 Qn 5 Dh21 Y2 n with q[ ni 5 ~10n! ( k5d KhWZ ~Yk21 2 y1 !~~ yk21 2 y1 !0h! i, q[ ni p & E Kh ~z 1 2 y1 ! S z 1 2 y1 h D i for i 5 0,1,2, and p2 ~z 2 ! dz 5 E K~u 1 !u 1i du 1 5 E K~u 1 !u 1i du 1 [ qi , where q0 5 1, q1 5 0, and q2 5 m 2K + E p2 ~z 2 ! dz 2 LIVE METHOD FOR VOLATILITY Thus, Qn r F GQ 0 , m 2K 1 0 21 n r ~10m 2K ! F G, and e m 2K 0 0 1 Á 21 1 Qn 1125 r e1Á+ Therefore, B1n ~ y1 ! 5 @I2 J e1Á Qn21 # ~s0n 1 s1n 1 s2n ! h2 5 2 3 m 2K D 2 w1 ~ y1 ! h2 2 E 2 @m K*K D 2 w1 ~ y1 ! 1 m 2K D 2 w2 ~z 2 !# ( @¹Gm ~m~ y1 , z 2 !!,¹Gv ~v~ y1 , z 2 !!# Á p2 ~z 2 ! dz 2 1 3 g2 2 E m 2K E Dp2 ~z 2 !H2 ~z 2 ! dz 2 2 p2 ~z 2 ! p~ y1 , z 2 ! g2 2 m 2K D 2 p~ y1 , z 2 !H2 ~z 2 ! dz 2 1 op ~h 2 ! 1 op ~g 2 !+ Step III. Asymptotic Normality of tI n*+ Applying the Cramer–Wold device, it is sufficient to show 1 Dn [ Mn (k M h jD k D & N~0, b Á S 1 b!, for all b [ R4, where jD k 5 b Á jk + We use the small block–large block argument ~see Masry and Tjøstheim, 1997!+ Partition the set $d, d 1 1, + + + , n% into 2k 1 1 subsets with large blocks of size r 5 rn and small blocks of size s 5 sn where k5 F n1 rn 1 sn G and @x# denotes the integer part of x+ Define ~ j11!~r1s!21 j~r1s!1r21 ( M h jD t , t5j~r1s! hj 5 ( vj 5 M h jD t , 0 # j # k 2 1, t5j~r1s!1r n §k 5 ( M h jD t + t5k~r1s! Then, Dn 5 1 Mn S( k21 j50 k21 hj 1 ( vj 1 §k j50 D [ 1 Mn ~Sn' 1 Sn'' 1 Sn''' !+ Because of Assumption A6, there exists a sequence a n r ` such that a n sn 5 o~M nh! and a nM n0ha~sn ! r 0, as n r `, (A.2) 1126 WOOCHEOL KIM AND OLIVER LINTON defining the large block size as rn 5 FM G nh an (A.3) + It is easy to show by ~A+2! and ~A+3! that as n r ` rn sn r 0, n rn rn r 0, r 0, M nh (A.4) and n rn a~sn ! r 0+ We first show that Sn'' and Sn''' are asymptotically negligible+ The same argument used in step II yields s21 ( var~vj ! 5 s 3 var~M h jD t ! 1 2s k51 S D 12 k s cov~M h jD d11 ,M h jD d111k ! (A.5) 5 sb Á S 1 b~1 1 o~1!!, which implies k21 nsn ( var~vj ! 5 O~ks! ; rn 1 sn ; j50 nsn rn 5 o~n!, from the condition ~A+4!+ Next, consider k21 k21 iÞj iÞj s s ( cov~vi , vj ! 5 i,( ( ( cov~M h jD N 1k ,M h jD N 1k !, i, j50, j50, k 51 k 51 i 1 1 j 2 2 where Nj 5 j~r 1 s! 1 r+ Because 6Ni 2 Nj 1 k 1 2 k 2 6 $ r, for i Þ j, the covariance term is bounded by n2r 2 n ( ( k 151 k 25k 11r 6cov~M h jD k1 ,M h jD k2 !6 n # 2n ( 6cov~M h jD d11,M h jD d111j !6 5 o~n!+ j5r11 The last equality also follows from step II+ Hence, ~10n!E $~Sn'' ! 2 % r 0, as n r `+ Repeating a similar argument for Sn''' , we get LIVE METHOD FOR VOLATILITY 1 n E $~Sn''' ! 2 % # 1 n @n 2 k~r 1 s!# var~M h jD d11 ! 12 # 1127 n 2 k~r 1 s! n2k~r1s! n j51 rn 1 sn n r 0, ( cov~M h jD d11 ,M h jD d111j ! b Á S 1 b 1 o~1! as n r `+ D k21 hj & N~0, b Á S 1 b!+ Now, it remains to show ~10M n !Sn' 5 ~10M n ! ( j50 j~r1s!1r21 j~r1s!1r21 Because hj is a function of $ jD t % t5j~r1s!11 that is Fj~r1s!112d -measurable, the Volkonskii and Rozanov’s lemma ~1959! in the appendix of Masry and Tjøstheim ~1997! implies that, with sI n 5 sn 2 d 1 1, * F SM E exp it 1 n k21 ( j50 hj DG ) E~exp~ithj !!* k21 2 # 16ka~ sI n 2 d 1 1! . j50 n rn 1 sn a~ sI n ! . n rn a~ sI n ! . o~1!, where the last two equalities follow from ~A+4!+ Thus, the summands $hj % in Sn' are asymptotically independent, because an operation similar to ~A+5! yields var~hj ! 5 rn b Á S 1 b~1 1 o~1!! and hence var SM D 1 n Sn' 5 1 n k21 ( E~hj2 ! 5 j50 k n rn n b Á S 1 b~1 1 o~1!! r b Á S * b+ Finally, because of the boundedness of density and kernel functions, the Lindeberg– Feller condition for the asymptotic normality of Sn' holds: 1 k21 n j50 ( E @hj2 I $6hj 6 . M n dM b Á S 1 b%# r 0 for every d . 0+ This completes the proof of step III+ p Á 21 I * & e Á , the Slutzky theorem implies M nh@I From e1Á Qn21 1 2 J e1 Qn # tn * * Á N~0, S 1 !, where S 1 5 @I2 J e1 #S 1 @I2 J e1 # + In sum, M nh~ w[ 1 ~ y1 ! 2 w1~ y1 ! 2 Bn ! N~0, S *1 !, with S *1 ~ y1 ! given by E p22 ~z 2 ! p~ y1 , z 2 ! 3 1 F d & & 7~K * K !0 7 22 ¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! E d p22 ~z 2 ! p~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 ! ¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 ! 7K7 22 H2 ~z 2 !H2Á ~z 2 ! dz 2 + G dz 2 n 1128 WOOCHEOL KIM AND OLIVER LINTON LEMMA A+1+ Assume the conditions in Assumptions A1 and A4–A6. For a bounded function, F~{!, it holds that n (a) r1n 5 ~M h0M n ! ( k5d Kh ~ yk21 2 y1 !~ p[ 2 ~ ryk22 ! 2 pT 2 ~ ryk22 !!F~ x k ! 5 op ~1!, n M M (b) r2n 5 ~ h0 n ! ( k5d Kh ~ yk21 2 y1 !~ p~ [ x k ! 2 p~ x k !!F~ x k ! 5 op ~1!+ Proof. The proof of ~b! is almost the same as ~a!+ Therefore we only show ~a!+ By adding and subtracting LO l 6 k ~ yl22 6 yk22 !, the conditional expectation of L g ~ ryl22 2 ryk22 ! given ryk22 in r1n , we get r1n 5 j1n 1 j2n , where n 1 j1n 5 n2 1 j2n 5 n2 n ( ( Kh ~ yk21 2 y1 !F~ xk !@L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ yl22 6 yk22 !# , k5d l5d (k (l Kh ~ yk21 2 y1 !F~ xk !@ LO l 6 k ~ yl22 6 yk22 ! 2 pT 2 ~ ryk22 !# + Rewrite j2n as 1 n2 (k ( Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !# , s,k * ~n! 1 1 n2 (k ( Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !# , s$k * ~n! where k * ~n! is increasing to infinity as n r `+ Let B 5 E $Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !#%, which exists as a result of the boundedness of F~ x k !+ Then, for a large n, the first part of j2n is asymptotically equivalent to ~10n!k * ~n!B+ The second part of j2n is bounded by sup 6 pk1s6 k ~ yk1s22 6 yk22 ! 2 p~ yk22 !6 s$k * ~n! 1 n n (k Kh ~ yk21 2 y1 !6F~ xk !6 # r k~n! Op ~1!+ Therefore, M nhj2n # Op ~~M h YM n !k * ~n!! 1 Op ~r2k ~n! M nh! 5 op ~1!, for k~n! 5 log n, for example+ It remains to show j1n 5 op ~1YM nh!+ Because E~j1n ! 5 0 from the law of iteration, we just compute * 2 E~j1n !5 1 n 4 n n n n ((( ( E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ xk ! kÞl iÞj 3 F~X l !@L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ ryk22 !# 3 @L h ~ ryj22 2 ryi22 ! 2 LO j 6 i ~ ryi22 !#%+ LIVE METHOD FOR VOLATILITY 1129 ~1! Consider the case k 5 i and l Þ j+ 1 n 4 n n n (k ( ( E $Kh2 ~ yk21 2 y!F 2 ~ x k ! lÞj 3 @ L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ ryk22 !# @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !#% 5 0, because, by the law of iteration and the definition of LO j 6 k ~ ryk22 !, E6 k, l @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !# 5 E6 k @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !# 5 E6k @L g ~ ryj22 2 ryk22 !# 2 LO j 6 k ~ ryk22 ! 5 0+ ~2! Consider the case l 5 j and k Þ i+ 1 n4 n n n ( ( (l E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ xk !F~ xi ! kÞi 3 @L g ~ ryl22 2 ryk22 ! 2 LO l 6k ~ ryk22 !# @L g ~ ryl22 2 ryi22 ! 2 LO l 6 i ~ ryi22 !#%+ We only calculate 1 n 4 n n n ( ( (l E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!L g ~ ryl22 2 ryk22 ! kÞi 3 L g ~ ryl22 2 ryi22 !F~ x k !F~ x i !% (A.6) because the rest of the triple sum consists of expectations of standard kernel estimates and is O~10n!+ Note that E6~i, k! L g ~ ryl22 2 ryk22 !L g ~ ryl22 2 ryi22 ! . ~L * L!g ~ ryk22 2 ryi22 !pl 6~k, i ! ~ ryk22 6 ryk22 , ryi22 !, where ~L * L!g ~{! 5 ~10g! *L~u!L~u 1 {0g! is a convolution kernel+ Thus, ~A+6! is 1 n n n ( ( (l E @Kh ~ yk21 2 y!Kh ~ yi21 2 y!~L * L!g ~ ryk22 2 ryi22 ! n 4 kÞi 3 F~ x k !F~ x i !pl 6~k, i ! ~ ryk22 6 ryk22 , ryi22 !# 5 O SD 1 n + ~3! Consider the case with i 5 k, j 5 m: 1 n4 n n ( ( E $Kh2 ~ yk21 2 y!F 2 ~ x k !@L g ~ yl22 2 yk22 ! 2 LO l 6 k ~ ryk22 !# 2 % kÞl 5O S D S D 1 2 n hg 5o 1 nh + 1130 WOOCHEOL KIM AND OLIVER LINTON ~4! Consider the case k Þ i, l Þ j: 1 n 4 n n n n ((( ( E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ x k !F~ x i ! kÞl iÞj 3 @ L g ~ ryl22 2 ryk22 ! 2 LO l 6k ~ ryk22 !# @L g ~ ryj22 2 ryi22 ! 2 LO j 6 i ~ ryi22 !#% 5 0, for the same reason as in ~1!+ n A.2. Proofs for Section 5. Recall that x t 5 ~ yt21 , + + + , yt2d ! 5 ~ yt2a , ryt2a ! and z t 5 ~ x t , yt !+ In a similar context, let x 5 ~ y1 , + + , yd ! 5 ~ ya , rya ! and z 5 ~ x, y0 !+ For the score function s * ~z, u, ga ! 5 s * ~z, u, ga ~ rya !!, we define its first derivative with respect to the parameter u by ¹u s * ~z, u, ga ! 5 ]s * ~z, u, ga ! ]u and use s * ~u, ga ! and ¹u s * ~u, ga ! to denote E @s * ~z t , u, ga !# and E @¹u s * ~z t , u, ga !# , respectively+ Also, the score function s * ~z, u,{! is said to be Frechet differentiable ~with respect to the sup norm 7{7` ! if there is S * ~z, u, ga ! such that for all ga with 7ga 2 ga0 7` small enough, 7s * ~z, u, ga ! 2 s * ~z, u, ga0 ! 2 S * ~z, u, ga0 ~ rya !!~ga 2 ga0 !7 # b~z!7ga 2 ga0 7 2, (A.7) for some bounded function b~{!+ The term S * ~z, u, ga0 ! is called the functional derivative of s * ~z, u, ga ! with respect to ga + In a similar way, we define ¹g S * ~z, u, ga ! to be the functional derivative of S * ~z, u, ga ! with respect to ga + Assumption B. Suppose that ~i! ¹u s * ~u0 ! is nonsingular; ~ii! S * ~z, u, ga ~ rya !! and ¹g S * ~z, u, ga ~ rya !! exist and have square integrable envelopes SN * ~{! and ¹O g S * ~{!, satisfying 7S * ~z, u, ga ~ rya !!7 # SN * ~z!, 7¹g S * ~z, u, ga ~ rya !!7 # ¹O g S * ~z!; and ~iii! both s * ~z, u, ga ! and S * ~z, u, ga ! are continuously differentiable in u, with derivatives bounded by square integrable envelopes+ Note that the first condition is related to the identification condition of component functions, whereas the second concerns Frechet differentiability ~up to the second order! of the score function and uniform boundedness of the functional derivatives+ For the main results in Section 5, we need the following conditions+ Some of the assumptions are stronger than their counterparts in Assumption A in Section 4+ Let h 0 and h denote the bandwidth parameter used for the preliminary instrumental variable and the twostep estimates, respectively, and g denote the bandwidth parameter for the kernel density+ Assumption C. ` is stationary and strongly mixing with a mixing coefficient a~k! 5 r2bk, 1+ $ yt % t51 for some b . 0, and E~«t4 x t ! , `, where «t 5 yt 2 E~ yt 6 x t !+ 2+ The joint density function, p~{!, is bounded away from zero and q-times continuously differentiable on the compact supports X 5 X a 3 X aT , with Lipschitz LIVE METHOD FOR VOLATILITY 3+ 4+ 5+ 6+ 7+ 8+ 1131 continuous remainders, that is, there exists C , ` such that for all x, x ' [ X, 6Dxm p~ x! 2 Dxm p~ x ' !6 # C7 x 2 x ' 7, for all vectors m 5 ~ m 1 , + + + , m d ! with d m i # q+ ( i51 The component functions, m a ~{! and va ~{!, for a 5 1, + + + , d, are q-times continuously differentiable on X a with Lipschitz continuous qth derivative+ The link functions, Gm and Gv , are q-times continuously differentiable over any compact interval of the real line+ The kernel functions, K~{! and L~{!, are of bounded support, symmetric about zero, satisfying * K~u! du 5 * L~u! du 5 1, and of order q, that is, * u i K~u! du 5 *u i L~u! du 5 0, for i 5 1, + + + , q 2 1+ Also, the kernel functions are q-times differentiable with Lipschitz continuous qth derivative+ The true parameters u0 5 ~m a ~ ya !, va ~ ya !, m a' ~ ya !, va' ~ ya !! lie in the interior of the compact parameter space Q+ ~i! g r 0, ng d r ` and ~ii! h 0 r 0, nh 0 r `+ q ~i! nh 02 0~log n! 2 h r ` and M nhh 0 r 0; and for some integer v . d02, q2v 2v2102 2v11 ~ii! n~h 0 h! 0log n r `; h 0 h r 0; ~iii! nh 0d1~4v11! 0log n r `; q $ 2v 1 1+ Some facts about empirical processes will be useful in the discussion that follows+ Define the L2 -Sobolev norm ~of order q! on the class of real-valued function with domain W0 : 7t7q,2, W0 5 S( E m#q W0 D 102 ~Dxm t~ x!! 2 dx , where, for x [ W0 , R k and a k-vector m 5 ~ m 1 , + + + , m k ! of nonnegative integers, k ] S i51 m i t~ x! D m t~ x! 5 ] m 1 x 1{{{] m k x k and q $ 1 is some positive integer+ Let X a be an open set in R1 with minimally smooth d boundary as defined by, for example, Stein ~1970!, and X 5 3b51 X b , with X aT 5 d d 3b51~Þa!X b + Define T1 as a class of smooth functions on X aT 5 3b51~Þa! X b whose L2 -Sobolev norm is bounded by some constant T1 5 $t : 7t7q,2, X aT # C%+ In a similar way, T2 5 $t : 7t7q,2, X # C%+ Define ~i! an empirical process, v1n ~{!, indexed by t [ T1 : v1n ~t1 ! 5 n 1 ( @ f1~ x t ; t1 ! 2 Ef1~ x t ; t1 !# , M n t51 (A.8) with pseudometric r1~{,{! on T1 : r1 ~t, t ' ! 5 FE X G ~ f1 ~w; t~ wt a !! 2 f1 ~w; t ' ~ wt a !!! 2 p~w! dw 102 , where f1~w; t! 5 h 2102 K~~wa 2 ya !0h! Ss * ~w, ga0 !t1 ~ wt a !; and ~ii! an empirical process, v2n ~{,{!, indexed by ~ ya , t2 ! [ X a 3 T2 : 1132 WOOCHEOL KIM AND OLIVER LINTON v2n ~ ya , t2 ! 5 n 1 ( @ f2 ~ x t ; ya , t2 ! 2 Ef2 ~ x t ; ya , t2 !# , M n t51 (A.9) with pseudometric r2 ~{,{! on T2 : r2 ~~ ya , t2 !,~ ya' , t2' !! 5 FE X G ~ f2 ~w; ya , t2 ! 2 f2 ~w; ya' , t2' !! 2 p~w! dw 102 , K @~wa 2 ya !0h 0 # @ paT ~ wt a !0p~w!#Gm' ~m~w!!t2 ~w!+ where f2 ~w; ya , t2 ! 5 h2102 0 We say that the processes $n1n ~{!% and $n2n ~{,{!% are stochastically equicontinuous at t10 and ~ ya0 ,t20 !, respectively ~with respect to the pseudometric r1~{,{! and r2 ~{,{!, respectively!, if ∀ «, h . 0, lim P * Tr` ∃ d . 0 s+t+ F r1 ~t, t0 !,d F r2 ~~ ya , t2 !, ~ ya0 , t20 !!,d sup G 6 n1n ~t1 ! 2 n1n ~t10 !6 . h , «, (A.10) and lim P * Tr` sup G 6n2n ~ ya , t2 ! 2 n2n ~ ya0 , t20 !6 . h , «, (A.11) respectively, where P * denotes the outer measure of the corresponding probability measure+ Let F1 be the class of functions such as f1~{! defined previously+ Note that ~A+10! follows, if Pollard’s entropy condition is satisfied by F1 with some square integrable envelope FO 1 ; see Pollard ~1990! for more details+ Because f1~w; t1 ! 5 c1~w!t1~ wt a ! is the product of smooth functions t1 from an infinite-dimensional class ~with uniformly bounded partial derivatives up to order q! and a single unbounded function c~w! 5 @h 2102 K~~wa 2 ya !0h! Ss * ~w, ga0 !# , the entropy condition is verified by Theorem 2 in Andrews ~1994! on a class of functions of type III+ Square integrability of the envelope FO 1 comes from Assumption B~ii!+ In a similar way, we can show ~A+11! by applying the “mix and match” argument of Theorem 3 in Andrews ~1994! to f2 ~w; ya , t2 ! 5 c2 ~w!h 2102 K~~wa 2 ya !0h 0 !t2 ~w!, where K~{! is Lipschitz continuous in ya , that is, a function of type II+ Proof of Theorem 4. We only give a sketch, because the whole proof is lengthy and relies on arguments similar to Andrews ~1994! or Gozalo and Linton ~2000! for the i+i+d+ case+ Expanding the first-order condition in ~5+16! and solving for ~ uZ * 2 u0 ! yields F uZ * 2 u0 5 2 1 n n' ( t5d11 G ¹u s~z t , u,N gJ a ! 21 1 n n' ( s~z t , gJ a !, t5d11 where uN is the mean value between uZ and u0 and s~z t , gJ a ! 5 s~z t , u0 , gJ a !+ By the uniform law of large numbers in Gozalo and Linton ~1995!, we have supu[Q 6 Qn ~u! 2 p E~Qn ~u!!6 & 0, which, together with ~i! uniform convergence of gJ a by Lemma A+3 and ~ii! uniform continuity of the localized likelihood function, Qn ~u, ga ! over Q 3 Ga , yields p supu[Q 6 QE n ~u! 2 E~Qn ~u!!6 & 0 and thus consistency of uZ * + Based on the ergodic theo- LIVE METHOD FOR VOLATILITY 1133 rem on the stationary time series and a similar argument to Theorem 1 in Andrews ~1994!, consistency of uZ * and uniform convergence of gJ a imply 1 n n' ( ¹u s~z t , u,N gJ a ! t5d11 p & E @¹u s~z t , u0 , ga0 !# [ Da ~ ya !+ (A.12) For the numerator, we first linearize the score function+ Under Assumption B~ii!, s * ~z, u, ga ! is Frechet differentiable and ~A+7! holds, which, because of M nh7 gJ a 2 p ga0 7 2` & 0 ~by Lemma A+3 and Assumption C+8~i!!, yields a proper linearization of the score term: n' 1 n ( 1 s~z t , gJ a ! 5 n t5d11 1 n' ( Kh ~ yt2a 2 ya0 !s * ~z t , ga0 ! t5d11 1 n n' ( Kh ~ yt2a 2 ya0 !S * ~z t , ga0 ~ ryt2a !!@ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# t5d11 1 op ~1YM nh!, where S * ~z t , ga0 ~ ryt2a !! 5 S * ~z t , u0 , ga0 ~ ryt2a !!+ Or equivalently, by letting Ss * ~ y, ga0 ~ rya !! 5 E @S * ~z t , ga0 ~ ryt2a !!x t 5 y# and u t 5 Ss * ~ x t , ga0 ~ ryt2a !! 2 E @ Ss * ~ x t , ga0 ~ ryt2a !!6 x t 5 y# , we have n' M nh n ( s~z t , gJ a ! t5d11 5 Mh Mn 1 1 n' ( Kh ~ yt2a 2 ya0 !s * ~z t , ga0 ! t5d11 n' Mh Mn ( n' Mh Mn Kh ~ yt2a 2 ya0 ! Ss * ~ x t , ga0 ~ ryt2a !!@ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# t5d11 ( Kh ~ yt2a 2 ya0 !u t @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# 1 op ~1! t5d11 [ T1n 1 T2n 1 T3n 1 op ~1!+ Note that the asymptotic expansion of the infeasible estimator is equivalent to the first term of the linearized score function premultiplied by the inverse Hessian matrix in ~A+12!+ Because of the asymptotic boundedness of ~A+12!, it suffices to show the negligibility of the second and third terms+ To calculate the asymptotic order of T2n , we make use of the preceding stochastic equicontinuity results+ For a real-valued function d~{! on X aT and T 5 $d : 7d7v,2, X aT # C%, we define an empirical process vn ~ ya , d! 5 1 Mn n' ( t5d11 @ f ~ x t ; ya , d! 2 E~ f ~ x t ; ya , d!!# , 1134 WOOCHEOL KIM AND OLIVER LINTON where f ~ x t ; ya , d! 5 K~~ yt2a 2 ya !0h!h v Ss * ~ x t , ga0 ~ ryt2a !!d~ ryt2a !, for some integer v . d02+ Let dD 5 h 2v2102 @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# + From the uniform convergence rate in Lemma A+3 and the bandwidth condition C+8~ii!, it follows that S F! 7 d7 D v,2, X aT 5 Op h 2v2102 log n ~2v11! nh 0 1 h0 q2v GD 5 op ~1!+ Because dZ is bounded uniformly over X aT , with probability approaching one, it holds that Pr~ dZ [ T ! r 1+ Also, because, for some positive constant C , `, 2 r 2 ~~ ya , d!, Z ~ ya ,0!! # Ch 2~2v11! 7 gJ a 2 ga0 7 v,2, X aT 5 op ~1!, p Z ~ ya ,0!! & 0+ Hence, following Andrews ~1994, p+ 2257!, the we have r~~ ya , d!, stochastic equicontinuity condition of vn ~ ya ,{! at d 0 5 0 implies that 6 vn ~ ya , d! Z 2 nn ~ ya , d 0 !6 5 6vn ~ ya , d!6 Z 5 op ~1!; that is, T2n is approximated ~with an op ~1! error! by * T2n 5 M nh E Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !@ gJ a ~ rya ! 2 ga0 ~ rya !# p~ x! dx+ * We proceed to show negligibility of Tn2 + From the integrability condition on * 0 S ~z, ga ~ rya !!, it follows, by change of variables and the dominated convergence theorem, that *Kh ~ ya 2 ya0 !S * ~z, ga0 ~ rya !! dF0 ~z! 5 *S * @~ y, ya0 , rya !, ga0 ~ rya !# 3 p~ y, ya0 , rya ! d~ y, rya ! , `, which, together with M n -consistency of c[ 5 ~ c[ m , c[ v ! Á , means that ~ c[ 2 c!M nh*Kh ~ ya 2 ya0 !S * ~z, ga0 ~ rya !! dF0 ~z! 5 op ~1!+ Because d gJ a ~ rya ! 2 ga0 ~ rya ! 5 ( ~ w[ b ~ yb ! 2 wb0 ~ yb !! 2 ~d 2 2!~ c[ 2 c!, b51,Þa this yields d * T2n 5 ( M nh b51,Þa E Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !~ w[ b ~ yb ! 2 wb0 ~ yb !!p~ x! d~ x! 1 op ~1!+ From Lemma A+3, w[ b ~ yb ! 2 wb ~ yb ! 5 h 0 bN b ~ yb ! 1 q 1 n (t ~K * K !h ~ yt2b 2 yb ! 0 p2 ~ ryt2b ! p~ x t ! 3 ~¹G~ x b , ryt2b ! ( jt ! 1 1 Kh ~ yt2b 2 yb ! n( t 0 p2 ~ ryt2b ! p~ x t ! ga*0 ~ ryt2b ! 1 Op ~rn2 ! 1 op ~n2102 !, where jt 5 ~«t , ~«t2 2 1!! Á, ¹G~ x t ! 5 @¹Gm ~ yb , ryt2b !v~ x t !102, ¹Gv ~ yb , ryt2b !v~ x t !# Á , q and ga*0 ~ ryt2b ! 5 ga0 ~ ryt2b ! 2 c0 + Under the condition C+8~i!, M nhh 0 5 o~1!, integra0 * bility of the bias function bN b ~ yb ! and S ~z, u0 , ga ~ rya !! imply * T2n 5 S1n 1 S2n 1 op ~1!, LIVE METHOD FOR VOLATILITY 1135 where d S1n 5 ( M nh b51,Þa E Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 ! 3 ( ~K * K !h0 ~ yt2b 2 yb ! p2 ~ ryt2b ! p~ x t ! t 1 n ~¹G~ x b , ryt2b ! ( jt !p~ x! dx, and d S2n 5 ( M nh b51,Þa E Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 ! 3 ( Kh0 ~ yt2b 2 yb ! p2 ~ ryt2b ! p~ x t ! t 1 n ga*0 ~ ryt2b !p~ x! dx+ i i Let S1n and S2n be the ith elements of S1n and S2n , respectively, with Ss *ij ~{! being the ~i, j ! element of Ss * ~{!+ By the dominated convergence theorem and the integrability condition, we have i S1n 5 Mh Mn 3 5 (t FE p2 ~ ryt2b ! p~ x t ! G d Kh ~ ya 2 ya0 ! Ss i1* ~ x, ga0 ! Mh 1 M n (t 3 FE Mh v~ x t !102 «t p2 ~ ryt2b ! p~ x t ! ( b51,Þa ~K * K !h0 ~ yb 2 yt2b !¹Gm ~ yb , ryt2b !p~ x! dx v~ x t !~«t2 2 1! d Kh ~ ya 2 ya0 ! Ss i 2* ~ x, ga0 ! M n (t p2 ~ ryt2b ! p~ x t ! G ( ~K * K !h ~ yb 2 yt2b !¹Gv ~ yb , ryt2b !p~ x! dx b51,Þa 0 @v~ x t ! 102 Ã1i1 ~ x t !«t 1 v~ x t !Ã1i 2 ~ x t !~«t2 2 1!# 1 op ~1!, where d Ã1ij ~ x t ! 5 ¹G j ~ ya0 , ryt2a ! ( b51,Þa E Ss ij* @~ ya0 , yt2b , ry~a, b! !, ga0 # p~ ya0 , yt2b , ry~a, b! ! d ry~a, b! and ¹G j ~{! 5 ¹Gm ~{!, for j 5 1; ¹Gv ~{!, for j 5 2+ Because p2 ~{!0p~{! and Ãij ~{! are bounded under the condition of compact support, applying the law of large numbers for i i+i+d+ errors jt 5 ~«t , ~«t2 2 1!! Á leads to S1n 5 op ~1! and consequently S1n 5 op ~1!+ Likewise, 1136 i S2n 5 WOOCHEOL KIM AND OLIVER LINTON Mh Mn 3 1 3 5 p2 ~ ryt2b ! (t FE p~ x t ! G d Kh ~ ya 2 ya0 ! Ss i1* ~ x, ga0 ! p2 ~ ryt2b ! Mh M n (t FE *0 g1a ~ ryt2b ! p~ x t ! ( Kh ~ yt2b 2 yb !p~ x! dx b51,Þa *0 g2a ~ ryt2b ! G d Kh ~ ya 2 ya0 ! Ss i 2* ~ x, ga0 ! p2 ~ ryt2b ! Mh M n (t p~ x t ! ( b51,Þa Kh ~ yt2b 2 yb !p~ x! dx @Ã2i1 ~ x t !m at ~ ryt2a ! 1 Ã2i1 ~ x t !vat ~ ryt2a !# 1 op ~1!, where E d Ã2ij ~ x t ! 5 ( b51,Þa Ss ij* @~ ya0 , yt2b , ry~a, b! !, ga0 # p~ ya0 , yt2b , ry~a, b! ! d ry~a, b! , i and, for the same reason as before, we get S2n 5 op ~1! and S2n 5 op ~1!, because E~m at ~ ryt2a !! 5 E~vat ~ ryt2a !! 5 0+ We finally show negligibility of the last term: T3n 5 Mh Mn n' ( Kh ~ yt2a 2 ya0 !u t @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# + t5d11 Substituting the error decomposition for gJ a ~ ryt2a ! 2 ga0 ~ ryt2a ! and interchanging the summations gives d T3n 5 Mh ( ( ( Kh ~ yt2a 2 ya0 !Kh ~ ys2b 2 yb ! b51,Þa nM n t s~Þt ! 0 3 p2 ~ rys2b ! p~ x s ! d 1 3 ga*0 ~ rys2b !u t ga*0 ~ rys2b ! Mh ( ( ( Kh ~ yt2a 2 ya0 !~K * K !h ~ ys2b 2 yb ! b51,Þa nM n t s~Þt ! 0 p2 ~ rys2b ! p~ x s ! ~¹G~ x b , rys2b ! ( u t js ! 1 op ~1!, where the op ~1! errors for the remaining bias terms hold under the assumption that M nhh02 5 o~1!+ For pni, b ~z t , z s ! 5 Kh ~ yt2at 2 ya0 !Kh0 ~ ys2b 2 yb ! p2 ~ rys2b ! p~ x s ! u t ga*0 ~ rys2b !M h0~nM n !, LIVE METHOD FOR VOLATILITY 1137 i, b i, b we can easily check that E~p1n ~z t , z s !6z t ! 5 E~p1n ~z t , z s !6z s ! 5 0, for t Þ s, implying i, b that ( (tÞs pn ~z t , z s ! is a degenerate second-order U-statistic+ The same conclusion also holds for the second term+ Hence, the two double sums are mean zero and have variance of the same order as n 2 3 $Epni, b ~z t , z s ! 2 1 Epni, b ~z t , z s !Epni, b ~z s , z t !%, n which is of order n21 h 21 + Therefore, T3n 5 op ~1!+ LEMMA A+2+ ~Masry, 1996!+ Suppose that Assumption C holds. Then, for any vector m 5 ~ m 1 , + + + , m d ! Á with 6m6 5 S j m j # v, (a) supx[X 6Dxm p~ [ x! 2 Dxm p~ x!6 5 Op ~ log n0ng ~26m61d ! ! 1 Op ~g q2m !, ~26m61d ! q2m K x! 2 Dxm m~ x!6 5 Op ~ log n0nh 0 ! 1 Op ~h 0 ! [ rn ~ m!, (b) supx[X 6Dxm m~ 2 (c) supx[X 6 m~ K x! 2 m~ x! 2 L~ E x!6 5 Op ~rn !, where, M M L~ E x! 5 1 ( n sÞt Kh0 ~ x s 2 x! p~ x s ! v 102 ~ x s !«s 1 h 0 bn ~ x!+ q LEMMA A+3+ Suppose that Assumption C holds. Then, for any vector m 5 ~ m 1 , + + , m d ! Á with 6m6 5 S j m j # v, (a) supx a[X a 6D m w[ a ~ ya ! 2 D m wa ~ ya !6 5 Op ~ log n0nh ~26m611! ! 1 Op ~h q2m ! 1 Op ~rn2 ~ m!!, (b) supx a[X a 6 w[ a ~ ya ! 2 wa ~ ya ! 2 LZ w ~ ya !6 5 Op ~rn2 ! 1 op ~n2102 !, M where LZ w ~ ya ! 5 1 n 1 ( ~K * K !h ~ yt2a 2 ya ! t 1 Kh ~ yt2a 2 ya ! n( t F paT ~ ryt2a ! Gm' ~m~ ya , ryt2a !!v~ x t !102 p~ x t ! p2 ~ ryt2a ! p~ x t ! Gv' ~v~ ya , ryt2a !v~ x t !! GF G «t , «t2 21 @m at ~ ryt2a !, vat ~ ryt2a !# Á 1 h q bN a ~ ya !+ Proof. We first show ~b!+ For notational simplicity, the bandwidth parameter h ~only in this proof ! abbreviates h 0 + From the decomposition results for the instrumental variable estimates, w[ a ~ ya ! 2 wa ~ ya ! 5 @I2 J e1Á Qn21 #tn , where Qn 5 @ q[ ni1j22 ~ ya !# ~i, j !51,2 , with q[ ni 5 ~10n! (t5d Kh ~ yt2a 2 ya !@ p[ aT ~ ryt2a !0 p~ [ x t !# @~ yt2a 2 ya !0h# i, for i 5 0,1,2, and tn 5 ~10n! (t Kh ~ yt2a 2 ya ! @ p[ aT ~ ryt2a !0 p~ [ x t !# @ rI t 2 wa ~ ya ! 2 ~ yt2a 2 ya !¹wa ~ ya !# J ~1, ~ yt2a 2 ya !0h! Á + By the Cauchy– Schwarz inequality and Lemma A+2 applied with Taylor expansion, it holds that n 1138 WOOCHEOL KIM AND OLIVER LINTON 1 n ( Kh ~ yt2a 2 ya ! n t5d sup x a[X a # sup x[X * S p[ aT ~ rya ! p~ [ x! 2 F p[ aT ~ ryt2a ! p~ [ xt ! paT ~ rya ! p~ x! * 2 sup x a[X a paT ~ ryt2a ! p~ x t ! GS yt2a 2 ya h ( Kh ~ yt2a 2 ya ! * n t5d 1 n D i yt2a 2 ya h * i D 5 Op sup 6 p~ [ x! 2 p~ x!6 [ Op ~rn !, x[X where the boundedness condition of C+2 is used for the last line+ Hence, the standard argument of Masry ~1996! implies that supx a[X a 6 q[ ni 2 qi 6 5 op ~1!, where qi 5 *K~u 1 !u 1i du 1 + From q0 5 1, q1 5 0, and q2 5 m 2K , we get the following uniform conp vergence result for the denominator term; that is, e1Á Qn21 & e1Á , uniformly in ya [ X a + For the numerator, we show the uniform convergence rate of the first element of tn because the other terms can be treated in the same way+ Let tn1 denote the first element of tn , that is, tn1 5 1 n (t Kh ~ yt2a 2 ya ! p[ aT ~ ryt2a ! p~ [ xt ! @Gm ~ m~ K x t !! 2 Ma ~ ya ! 2 ~ yt2a 2 ya !m a' ~ ya !# , or alternatively, tn1 5 1 n [ (t Kh ~ yt2a 2 ya !r~ xt ; g!, where r~ x t ; g! 5 g2 ~ ryt2a ! g3 ~ x t ! @Gm ~g1 ~ x t !! 2 Ma ~ ya ! 2 ~ yt2a 2 ya !m a' ~ ya !# , g~ x t ! 5 @ g1 ~ x t !, g2 ~ ryt2a !, g3 ~ x t !# 5 @m~ x t !, paT ~ ryt2a !, p~ x t !# , g[ 5 g~ [ x t ! 5 @ m~ K x t !, p[ aT ~ ryt2a !, p~ [ x t !# + Because paT ~{!0p~{! is bounded away from zero and Gm has a bounded second-order derivative, the functional r~ x t ; g! is Frechet differentiable in g, with respect to the sup norm 7{7` , with the ~bounded! functional derivative R~ x t ; g! 5 @]r~ x t ; g!0]g#fg5g~ x t ! + This implies that for all g with 7g 2 g 0 7` small enough, there exists some bounded function b~{! such that 7r~ x t ; g! 2 r~ x t ; g 0 ! 2 R~ x t ; g 0 !~g 2 g 0 !7` # b~ x t !7g 2 g 0 7 2` + By Lemma A+2, 7g[ 2 g 0 7 2` 5 Op ~rn2 !, and consequently, we can properly linearize tn1 as tn1 5 1 n 1 (t Kh ~ yt2a 2 ya !r~ xt ; g 0 ! 1 n (t Kh ~ yt2a 2 ya !R~ xt ; g 0 !~ g[ 2 g 0 ! 1 Op ~rn2 !, LIVE METHOD FOR VOLATILITY 1139 where the Op ~rn2 ! error term is uniformly in x a + After plugging Gm ~m~ x t !! 5 cm 1 S 1#b#d m b ~ yt2b ! into r~ x t ; g 0 !, a straightforward calculation shows that tn1 5 1 n 1 1 (t Kh ~ yt2a 2 ya !§t @1 1 Op ~rn !# 1 n (t Kh ~ yt2a 2 ya ! hq paT ~ ryt2a ! p~ x t ! Gm' ~m~ x t !!@ m~ K x t ! 2 m~ x t !# m q ~k!b1a ~ ya ! 1 op ~h q !, q! (A.13) where §t 5 @ p2 ~ ryt2a !0p~ x t !#MaT ~ ryt2a ! and MaT ~ ryt2a ! 5 S 1#b#d, ~Þa! m b ~ ryt2a !+ Note that as a result of the identification condition E @§t 6 yt2a # 5 0 and consequently the first term is a standard stochastic term appearing in kernel estimates+ For a further asymptotic expansion of the second term of tn1 , we use the stochastic equicontinuity argument to the empirical process $vn ~{,{!%, indexed by ~ ya , d! [ X a 3 T, with T 5 $d : 7d7v,2, X a # C%, such that vn ~ ya , d! 5 n' 1 Mn ( @ f ~ x t ; ya , d! 2 E~ f ~ x t ; ya , d!!# , t5d11 where f ~ x t ; ya , d! 5 K@~ yt2a 2 ya !0h# h v @ paT ~ ryt2a !0p~ x t !#Gm' ~m~ x t !!d~ yt2a !, for some positive integer v . d02+ Let dD 5 h 2v2102 @ m~ K x t ! 2 m~ x t !# + From the uniform convergence rate in Lemma A+2 and the bandwidth condition in C+8~iii!, it follows that 7 d7 D v,2, X 5 Op ~h 2v2102 @M log n0nh ~2v1d ! 1 h q2v # ! 5 op ~1!, leading to ~i! Pr~ dZ [ T ! r 1 and ~ii! p r~~ ya , d!, Z ~ ya , d 0 !! & 0, where d 0 5 0+ These conditions and stochastic equicontinuity 0 of vn ~{,{! at ~ ya , d ! yield supya[Xa 6vn ~ ya , d! Z 2 nn ~ ya , d 0 !6 5 supx a[Xa 6vn ~ ya , d!6 Z 5 op ~1!+ Thus, the second term of tn1 is approximated with an op ~10M n ! error ~uniform in ya ! by E Kh ~ yt2a 2 ya ! paT ~ ryt2a ! p~ x t ! Gm' ~m~ x t !!@ m~ K x t ! 2 m~ x t !# p~ x t ! dx t , K x t ! 2 m~ x t !, is given by which, by substituting L~ E x t ! for m~ 1 ~K * K !h n( s 1 hq q! S ys2a 2 ya h m q ~k!b2a ~ ya !, D paT ~ rys2a !Gm' ~m~ ya , rys2a !! v 102 ~ x s !«s p~ x s ! (A.14) where ~K * K !~{! is actually a convolution kernel as defined before+ Hence, by letting bN a ~ ya ! summarize two bias terms appearing in ~A+13! and ~A+14!, Lemma A+3~b! is shown+ The uniform convergence results in part ~a! then follow by the standard arguments of Masry ~1996!, because two stochastic terms in the asymptotic expansion of w[ a ~ ya ! 2 wa ~ ya ! consist only of univariate kernels+ n
© Copyright 2026 Paperzz