• ASYMPTOTIC NORMALITY OF CONDITIONAL INTEGRALS OF DIFFUSION PROCESSES MONTSERRAT FUENTES NCSU MIMEO SERIES #2520 NOVEMBER 3, 1999 ASYMPTOTIC NORMALITY OF CONDITIONAL INTEGRALS OF DIFFUSION PROCESSES MONTSERRAT FUENTES I North Carolina State University Consider predicting the integral of a diffusion process Z in a bounded interval A, based on the observations Z(tI n ), ... , Z(t nn ), where tIn, ... ,tnn is a dense triangular array of points (the step of discretization tends to zero as n increases) in the bounded interval. The best linear predictor is generally not asymptotically optimal. Instead, we predict fA Z(t)dt using the conditional expectation of the integral of the diffusion process, the optimal predictor in terms of minimizing the mean squared error, given the observed values of the process. We obtain that, conditioning on the observed values, the order of convergence in probability to zero of the mean squared prediction error is Op(n- 2 ). We prove that the standardized conditional prediction error is approximately Gaussian with mean zero and unit variance, even though the underlying diffusion is generally non-Gaussian. Because the optimal predictor is hard to calculate exactly for most diffusions, we present an easily computed approximation that is asymptotically optimal. This approximation is a function of the diffusion coefficient. I M. Fuentes is an assistant professor at the Statistics Department, North Carolina State University, NC 27695. Email address: [email protected]. AMS 1991 subject classifications: Primary 62M20j secondary 62M40, 41A25. Key words and phrases. Diffusion process, fixed-domain asymptotics, infill asymptotics, numerical integration. 1. Introduction. Consider predicting fA Z(t)dt for a diffusion process Z based on observations Z(n) = (Z(to n ), ... , Z(t nn )), where A is a bounded interval [a, b] and a = tOn < ... < t nn = b, n = 1,2, .... We assume the step of discretization tends to zero as n -+ 00: limn --+ oo max2~i~n (tin- ti-l,n) = O. Let Zn(t) = E[Z(t)IZ(n)] and C~(tl,t2) = E[Z(tl)Z(t2)IZ(n)] - Zn(tl)Zn(t2) be the conditional mean and covariance functions, respectively, for Z. For Z Gaussian, Zn(t) is a linear function of Z(n) and c~ (tl, t2) is nonrandom. However, for diffusions, Zn(t) is generally not linear in Z(n) and c~(tl, t2) is random. Numerical integration over deterministic functions is studied, for example, by Davis and Rabinowitz (1984), Ghizzetti and Ossccini (1970) and Chakravarti (1970) by taking a linear function of Z(n)' Numerical integration over random functions taking a linear function of Z(n) is done by Marnevskaya and Jarovich (1984), Stein (1993 and 1995), Pitt, Robeva and Wang (1995), and Ritter (1995). Stein (1987) presents a non-linear approximation to the integral of a transformation of a Brownian motion process, this is a special case of the central idea in this paper. Considering the close relationship between predicting integrals and estimating regression coefficients from stochastic processes described by Sacks and Ylvisaker (1966), the work by Sacks and Ylvisaker (1966, 1968, 1970), Eubank, Smith and Smith (1981) and Wahba (1971, 1974) on designs for estimating regression coefficients are also relevant for the prediction problem. Matheron (1985) has developed a technique for approximating the unconditional distribution of fA Z(t)dt when the process is non-Gaussian, but we are interested in making inferences about fA Z(t)dt given Z(n), so the conditional distributions are relevant. The conditional expectation of the integral of the diffusion process, is the optimal predictor (in the sense of minimizing the mean squared error) of fA Z(t)dt given Z(n)' Since this predictor is hard to calculate exactly for most diffusions, we present an approximation 1 that still yields better results than the best linear predictor (BLP) and is asymptotically optimal as n ~ 00. The primary purpose of this paper is to show that, under certain conditions, the standardized conditional prediction error [var {fA Z(t)dt - fA Zn(t)dt}] 1/2 is approximately N(O, 1), even if the underlying process is non-Gaussian. We also obtain an easily computed approximation for the general nonlinear predictor, fA Zn(t)dt, that is a function of the diffusion coefficient. First, in Section 2, we show that for large n, the conditional distribution of the prediction error is approximately Gaussian. Next, in Section 3, we show that the BLP is generally not asymptotically optimal and give an easily computed asymptotically optimal predictor. Then, in Section 4, we give an approximation for the conditional standard error of the prediction. Finally, in Section 5, we show a simulation study to assess the accuracy of this asymptotic result in finite samples, and in Section 6 we present some final remarks. 2. Asymptotic Normality of the Prediction Error. We predict the integral of a diffusion over the interval A = [0,1]. It is straightforward calculation to generalize the results in this paper to any other bounded interval in R Thus, for a homogeneous diffusion process Z on [0,1]' consider the prediction of 1 1 Z(t)dt based on observing Z(t) at 0= tOn < tIn'·· < t nn = 1. Specifically, let Z(n) = (Z{tOn), ... , Z(tnn )) and predict the integral by the optimal predictor 2 We will often write ti to denote tin for i = 0,'" ,n to simplify the notation. In this section we will show that the standardized error in predicting the integral of a diffusion process with the optimal predictor (in the mean squared error sense), given the observed values of the process, is asymptotically N(O, 1). We will make the following assumptions about the diffusion process ZO throughout this paper, (A.l) E{Z(t + s) - Z(t)IZ(t) = x} = sJ.t(x) + o(s) (A.2) E{(Z(t + s) - Z(t»2IZ(t) (A.3) E{IZ(t + s) - Z(t)1 3 IZ(t) = x} = s1+ 1/ 2k(x) = x} = su 2(x) + o(s) + 0(s1+1/2) where the remainder terms in (A.l)-(A.3) are uniform in x. Note that conditions A.I-A.3 require the existence of finite conditional moments of orders 1, 2 and 3. In most practical examples of diffusions, the indicated moments exist. It would be preferable to have assumption (A.3) in terms of the diffusion parameters, J.t(x) and u 2(x). The diffusion parameter J.t(x) is frequently called the drift coefficient, and u 2 (x) the diffusion coefficient. The following assumption gives explicit hypothesis on the diffusion parameters, and implies (A.3) : (AA) The diffusion parameters, J.t and u 2 are bounded, (see Karatzas and Shreve, p. 367), Le. J.t(x) + u2(x) ~ Pi °~ t < 00, and the diffusion coefficient, u 2 , is a strictly positive and uniformly bounded function that has a continuous and uniformly bounded derivative. We prove now that (AA) implies (A.3). For a diffusion process B(t) with constant diffusion coefficient u 2 , we have: (2.1) We apply the transformation function 9 to the diffusion process Z, where g(x) = I~ u(y)-ldy. Then 9 (Z(t» = B(t) has constant diffusion coefficient and satisfies (2.1). By (A.4) we can obtain 3 the following Taylor expansion for Z(t + s) = g-1 (B(t + s)) centered at t, where g-1 is the inverse function of 9: [Z(t + s) - Z(t)] = {g-1 (B(t))}' [B(t + s) - B(t)] + {g-1 where E E (B(t + E))}" [B(t + s) - B(t + E)]2 (D,s). Because B(t) satisfies (2.1), and the first and second derivatives of g-1 are uniformly bounded (by the definition of 9 and (A.4)), {g(z)-I}' = u(z) 2 {g(z)-I}" = u(z)u(z)' = {u ;z)}', and it follows that Z(t) satisfies (A.3). We will sometimes require the following set of conditions for the dense triangular array of points in the bounded interval: F-l(~),ton = ° (B.1) tin = and t nn = 1 (B.2) F is a continuous strictly monotone cdf on [0,1] with derivative (B.3) f f is a continuous function. A sequence that satisfies B.1 - B.3 is called a regular sequence by Sacks and Ylvisaker (1966). We define 'lri(X, t) to be the conditional density that from the state value x at time t the sample path of ZO satisfies Z(tHd = Zi+l at time tHl, and make the following assumption: (C.1) 'lri(X, t) has two continuous derivatives with respect to x and it is a differentiable function with respect to t. It would be preferable to have the previous assumption, (C.1), in terms of the diffusion coefficients. The following assumption, (C.2), has explicit hypotheses on p. and u and implies (C.1) : (C.2) u(x) and p.(x) are bounded and uniformly Holder-continuous functions (see Karatzas and Shreve, 1991, p. 368). 4 In the following discussion 0 and 0 have the usual interpretation in terms of order of mag- nitude statements, while op and Op mean that 0 and 0 hold respectively with a probability that can be chosen arbitrarily close to one. We now give the following definitions that will be needed in the proof of the main theorem in this section: ( i) Equivalent measures. Let with values Z(t) Z be a diffusion process of time parameter t, where t = Z(w, t), w E 0, on a probability space (0, a-algebra W is generated by Z(t) = Z(w, t) on ET = [0,1] W, P). We assume that the ° as the parameter t runs through the set T. Let PI be another measure on the a-algebra W. It is said to be absolutely continuous with respect to P if PI (A) = 0 whenever P(A) = 0 for A E W. Measures PI and P are said to be equivalent if they are mutually absolutely continuous. ( ii ) 'I'ransition density. Let p(t, x, y) denote the transition density of Z at time t. That is, p(t, x, y)dy = p{y < Z(t) ~ Y + dylZ(O) = x}. Then p(t, x, Zi+l) is the solution of the following backward equation, (2.2) where the Radon-Nikodym derivatives are supplied by the Girsanov formula (see Karatzas and Shreve, 1991). ( iii ) Conditioned diffusion process. Let Zt be a process on [ti, ti+l] such that, (2.3) where £(X) denotes the distribution law of a random variable X. The conditioned diffusion process is itself a diffusion with a time-varying drift j.£*(x, t) and same diffusion coefficient, a 2 (x), as the original process. The transition probability density for the conditioned process 5 is given by *( ) _ p(t - s,x,y)p(ti+l - t,y,Zi+t} P s, t, x, Y - ( P ti+l - S, X, Zi+l ). ( iv ) Prediction Error. We define PEn, the error in predicting J; Z(t)dt given Z(n) with the optimal predictor; that is, Furthermore, SPEn is the standardized prediction error given Z(n), We will prove in Theorem 2.1 that SPEn is asymptotically N(O, 1). We present the following lemma that will be used in the proof of Theorem 2.1. We assume that the diffusion process Z satisfies assumptions (A.1- A.3), (B1- B.3) and (C.1). The transition density of zt for ti pi(t, Xj s, y)dy = p(y < Z(s) (2.4) = for t ~ s ~ y <t ~ ti+1I is given by + dyIZ(ti) = Zi, Z(t) = x, Z(ti+l) = Zi+ 1 ) p(s - t, x, y)1ri(y, s)dy 1ri(X, t) < s. The drift coefficient for Zt, for ti < t < ti+ 11 is oo /-Li(x, t) = lim -hI h..j.O r+ (y J-oo x)pi(t, Xj t + h, y)dy. LEMMA 2.1 The conditioned diffusion process zt has a non-homogeneous infinitesimal mean: (2.5). 6 The diffusion coefficient of the process Z; is (}"2, the diffusion coefficient of the process Z. Proof of Lemma 2.1: Lemma 2.1 is a known result (see Karlin and Taylor, 1981, p. 267-268). The proof is very straightforward once we have the following Taylor expansion for 7ri(X, t): (2.6) 7ri(Y, t + h) = 7ri(X, t) + (y - a7r" x) a: (x, t) +h a7r" at' (x, t) + o(y - x) + o(h). The Main Theorem THEOREM 2.1 Consider predicting the integral of a diffusion process Z over [0,1], based on the observations 1 ~ Z(n) = (Z(tl n ), . .. ,Z(tnn )), by Jo Zn(t)dt. Suppose (i) the parameters of the diffusion process Z satisfy relations (A.1 - A.3) (ii) conditions (B.1- B.3) are satisfied by the sequence of points in [0,1] (iii) the conditional probabilities 7ri satisfy condition (C.1). Then conditional on Z(n) SPEn ~ N(O, 1) (2.7) with probability 1. Proof of Theorem 2.1: Given Z(n), PEn is a sum of independent, mean zero random variables, so this is a triangular array situation. We study the conditional cumulants of orders 1 to 3 of the prediction error 7 because, applying Lyapounov's condition (Billingsley, 1995) for 0 = 1, we can prove the weak convergence of the distributions. Suppose Yin"", Ynn are independent random variables, such Therefore the problem is reduced to studying just the cumulants of ft~~l Zi(t)dt given Zi at time ti, where the process ZiO is defined in (2.3). The infinitesimal mean of Zi satisfies (2.9) Recall that we obtained an expression for J.ti in Lemma 2.1. In Lemma 2.1 we also showed that the variance of Zi is (2.10) In addition to the infinitesimal relations (2.8) and (2.9), the following higher-order infinitesimal moment relation is satisfied when (A.3) holds: (2.11) All this means that (2.12) n-l = ~)ti+l - ti)3{u 2 (Zi)j12} + o(n- 2 ). i=Q We get the first equality by applying (2.8), and the last expression by integrating the infinitesimal variance of Zi(t) when t belongs to [ti, ti+1], where by assumptions (A.1) - (A.3), the remainder term in (2.12) is uniform in Z(n)' Therefore, the order of the conditional standard error 8 of the prediction error is O(n- 1 ) as n --+ 00. It follows from Eq. (2.12) and conditions (B.1- B.3) that n[var{PEnIZ(n)}] 1/2 converges in probability to the random variable L as n t 00, = fo1 a2(Z(t))w(t)dt where w(t) = 112f(t)-3. Using the same argument as the one leading to Eq. (2.12) together with (2.11), we get that Let s~ (Z(n») be the conditional variance of Sn (Z(n») = (Xln+' .+Xnn)-E(X 1n +·· ,+XnnIZ(n») given Z(n)' By Eqs. (2.11) and (2.12), we get Sn z(n) Bn z(n) which is Lyapounov's condition for 8 = 1. Since Lyapounov's condition holds, L ---+ N(O, 1). If we write the result in terms of the diffusion process Z, then relation (2.7) holds and the conditional prediction error is asymptotically normal. In the following corollary whose proof is obvious, we reformulate the assumptions of Theorem 2.1 in terms of the diffusion parameters, J.t(x) and a 2 (x). COROLLARY 2.1. Theorem 2.1 still holds if (A.S) is replaced by (A.4) in (i) and (C.l) is replace by (C.2) in (iii). 9 3. Approximating the Optimal Predictor. The conditional expectation of the integral of the diffusion process is the optimal predictor of Jo1 Z(t)dt given Zen). But because this predictor is hard to calculate exactly for most diffusions, we present the following approximation to it that yields an asymptotically optimal predictor: 1 n (3.1) In(Z(n») = 2::(tin - ti-1,n)Zi n + 2n 2::(tin - i=l 2 ti_1,n){a (Zi)}'. i=l Here, we write Zi to denote Z(tin) to simplify the notation, and {a 2 (Z)}' to denote the derivative of a 2 with respect to Z. Approximation (3.1) is a linear function of Z(n) , ~~=1 (tin - ti-1,n)Zi, plus a nonlinear term in Zen) that is a function of the derivative of the diffusion coefficient, and could be thought of as the adjustment for the conditional bias of the BLP. We define PEn' the error in predicting J; Z(t)dt with In(Z(n»), In this section we will show that the error PEn - PEn, in approximating Jo1 Zn(t)dt with In(Z(n»), is negligible compared to the conditional standard deviation of the prediction error PEn. Then, by Theorem 2.1, the prediction error PEn is asymptotically normal. THEOREM 3.1 Under assumptions (A.t)-(A.3), (B.t)-(B.3), and (C.t), conditional on Zen) with probability t. We now present a proposition that will be used in the proof of Theorem 3.1. 10 PROPOSITION 3.1 Under conditions (A.l-A.3), (B.l-B.3) and (C.l), Proof of proposition 3.1: We now prove that the remainder term when we approximate with In(Z(n» is of order L:~==-OI{ F- 1 (~) _F- 1 (*)} 2+6 for some c5 > O. By conditions (B.1- B.3), we get that Thus, the error in the approximation, PEn - PEn, is negligible compared to the conditional standard deviation of PEn' which we will prove in proposition 4.1 is Op (n- 1 ) • By the Markov property of the diffusion process Z, Letting Z = (zo,···, zn) be the observed values of ZO at times to,···, tn, by straightforward calculation 1 (3.2) 1 o E(Z(t)IZ(n) = z)dt = n-l1tHl L E(Z(t) i=O t, n-l +L Zi(ti+1 - ti). i=O 11 Z(ti)IZ(ti) = Zi, Z(ti+d = Zi+1)dt In (2.4) we showed that J.Li, the transition density for the conditioned process, was a function of 1ri (that satisfies (C. 1)) and p, the transition density of Z. So, we need an explicit expression for p. We transform the coordinates applying the transformation function g, where g(x) = I:a u(y)-ldy. Then g(Z) is a process with constant diffusion coefficient. Now, if P is the probability measure of the diffusion process Z, let P be a probability measure such that P and P are equivalent, and under P, Z has drift coefficient (3.3) It is routine to verify that the process B = g(Z) is a Brownian motion under P. Therefore, by a change of variables we get an equation equivalent to (2.2), 8p(t, b, bi+l) 1 8 2 p(t, b, bi+1) 8t - 2 8b 2 (3.4) The solution is a Gaussian density. By a change of variables again, we can express the solution in terms of the process Z (under P) and obtain and Assumption (C.1) postulates sufficient regularity for 1ri(X, t) to permit the use of the following Taylor expansion: (3.5) where f E (x, y). 12 We define Ri the residual term when we approximate ft: H1 Z(t)dt with the conditionally optimal predictor restricted to the interval [ti+l, til E [0,1]. By the definition of p*(t, Xj 5, y) in (2.4) and the Taylor expansion (3.5), we obtain By a change of variables again, we can get an explicit expression for the second derivative of 1ri(X, t) with respect of of x, using the same argument as in (3.4). By assumption A.3, Ri is of order Op (n-(2+O)) , where 8 is the same as in (A.3). Thus, we obtain Then we get that (under P) 1 1 n-l Zn(t)dt = L o n-l Zi(ti+l - ti) + L(ti+l - i=l -I: al(Zi~a(Zi) (3.6) ti)(Zi+l - Zi) i=l (ti+l - ti)2 + Op (n-(l+6)) i=l Thus, under the probability measure P, n-l n-l n(1+o/2) Z(n) (t)dt - ~ Zi(ti+l - ti) ( (3.7) _ + ~(ti+l - ti)(Zi+l - Zi) ~ 07' (Z';o7(Z,) (ti+l _ 1;)2) ~ n(l+'!2) (PEn - PEn) p. O. It is straightforward consequence of the equivalence of P and P that (3.7) holds under P as well. In other words, n(l+o/2) (PEn - PEn) ~ O. 13 Thus, we approximate the optimal predictor of the integral Jol Zn(t)dt with In(Z(n)) and the error in this approximation, PEn - PEn is negligible compared to the standard deviation of PEn- Proof of Theorem 3.1: We proved in Theorem 2.1 that the standardized prediction error SPEn is, conditional on Z(n) , asymptotically N(O, 1). By Propositions 3.1 and 4.1, PEn - PEn is negligible compared to the conditional standard deviation of PEn, so Theorem 3.1 follows. A symmetric approximation for the optimal predictor. We present a symmetric approximation for the predictor that is invariant if the order of time for the diffusion on the interval [0,1] is reversed. We get this approximation as the average of two approximations for the optimal predictor. The first approximation is obtained by assuming that we observe the diffusion process Z on [0,1] starting at time 0, that is, to = ° second approximation, we assume the observations start at time 1, that is, to and t n = 1. For the = 1, and t~ = 0. obtain n-I n-I In (Z(n)) = L Zi(ti+1 - ti-I) i=2 (3.8) 1 + 2 [ZI(t2 - tt) + L(ti+1 - ti-I)(Zi+1 - Zi-I) i=2 + Zn(tn - tn-I)] 1 + 2 [(t2 - tl)(Z2 - ZI) + (t n - tn-I)(Zn - Zn-I)] _ u'(ZI)U(ZI) (t2 _ tI)2 _ U'(Zn)U(Zn) (tn - t _I)2. n 4 4 Thus, 14 We 4. The Variance of the Prediction Error. In the following proposition we prove that the order of convergence to zero of the conditional variance of the prediction error is Op(n- 2 ). Then the order of convergence to zero of the error in approximating J; Zn(t)dt by In(Z(n)) is faster than the order of the conditional standard deviation of PEn. PROPOSITION 4.1 As n ~ 00, (4.1) Proof of proposition ·401: By the Markov property of the diffusion process and using the same argument as in (3.2), the variance of the prediction error, when conditions (A.1- A.3), (B.1 - B.3) and (C.1) are satisfied, can be written as (4.2) n-l = Z)ti+l - ti)3{u 2 (Z(ti))j12} + op(n- 2 ). i=O Where the process zt is defined in (2.3). Then the order of the standard error of the prediction error is Op(n- 1 ). 15 A symmetric approximation for the variance of the prediction error. Eq. (4.2) gives us an approximation for the conditional variance which can also be expressed in a symmetric manner (4.3) 5. Simulation Results with Known Diffusion Parameters. To assess the accuracy of the proposed approximations to the integral of a diffusion process Z and to the variance of the prediction error in finite samples, we performed several simulation experiments, assuming that the diffusion coefficient was a known function. We simulated n values of a diffusion process Z at times to, . .. ,tn-l where ti - n~l' for i = 0, 1, ... , n - 1, n = 100 and Z is a transformation of a Brownian motion B, Z t _ exp{B(t)} ( ) - [1 + exp{B(t)}] (5.1) This diffusion process is a uniformly bounded process and, hence, trivially satisfies the assumptions (AA) and (C.2). The parameters of the diffusion Z, as defined in (5.1), can be written as a function of the Brownian motion process. The infinitesimal variance of Z is (T2(Z(t)) = exp{2B(t)} . [1 + exp{B(t)}]4 Neither In(Z(n»), nor the approximation for the standard error, in the prediction error depend on the drift coefficient of Z, which can be an unknown function. So, for the asymptotics, we do not 16 need an expression for the infinitesimal mean of Z. Clearly, In(Z(n») is a function of the infinitesimal variance of Z and also of the derivative of that parameter, given by 2 [1 - exp{B(t)}] exp{B(t)} [1 + exp{B(t)}j3 We conducted the following simulation experiment to assess the asymptotic normality of the conditional prediction error: ( i ) We simulated n values of a Brownian motion and applied the transformation (5.1) to get n observations of the diffusion process Z. We show in Figure 5.1 Zen), the n simulated values of Z, for a single realization of Z. Using Eq. (3.8) and (2.12) we get the value of the predictor and the conditional variance of the prediction error, because the diffusion coefficient is known. ( ii ) In order to simulate the distribution of the prediction error, we need the value of the predictor obtained in (i), the conditional variance of the prediction error obtained also in (i), 1 1 and fo Z(t)dt, which is unknown. Then we have to simulate fo Z(t)dt conditional on Zen)' In order to obtain values of f~ Z(t)dt, we simulated m values of Z(t) at m equally spaced locations in time in every interval [ti, ti+1], conditioning on the observed values Zen)' We conducted several simulation experiments with different values for m, and m = 10 seemed to be sufficiently large in the sense that increasing m has only a negligible effect on the results. Figure 5.2 shows a single realization of the IOn simulated values of Z conditional on the n observed values of Zen)' We considered the mean of the IOn simulated values of Z conditioning on Zen) to be the unknown quantity that we want to predict, f~ Z(t)dt. 1 ( iii) We repeated step (ii) k times to obtain k simulated values for fo Z(t)dt. We also obtained k simulated values of the prediction error PEn and the variance of PEn, always conditioning on the n observed values Zen)' 17 ( iv ) We plotted the histogram of the k simulated values of the prediction error Figure 5.3 (a)). Here k PEn (see = 100. Then we repeated steps (i)-(iv) 100 times to study the conditionally asymptotic normality, but conditioning now on other simulated values of Zen). 18 + LO <0 c:i - + + + ++ +/+ + en Q) o c:i <0 + - + ::::J a; + + + + + + + -q. + + ++ + + + > -f:t.,+ C o LO LO - + + - + + + + + + + ++ + + ++ + + o LO c:i + + c:i + + + + "(j) ::::J == is + ++ + + + + + + + I I I I I I 0.0 0.2 0.4 0.6 0.8 1.0 Time Figure 5.1. Simulation of Z(n), the observed values ofthe diffusion process Z, with n = 100. 19 I/) co - o o CO o - t S N I/) I/) o - »s '0 o .. 0 0 ~oo It'b~ o 0 o I/) o o 0 8 000 o - : o :t; I/) o..q-~-r-------.--------.-----...,------.,...-----rI I I I I I 0.0 0.2 0.4 0.6 0.8 1.0 Time 1 Figure 5.2. Simulation of J0 Z(t)dt conditional on Zn. The vertical axis shows 1000 simulated values of Z conditional on the n=100 observed values Z(n)' The average of the 1000 simulated values approximates J~ Z(t)dt. The symbol 20 + is used for the values of Z(n)' Figure 5.3 (a) is a histogram of the distribution of the prediction error using the proposed approximation for the optimal predictor, In(Z(n)), for the particular Z(n) = Z(n) showed in Figure 5.1. Because the histogram is centered at zero and has a normal shape, this provides some evidence that the proposed predictor is conditionally unbiased and asymptotically normal. In this simulation study n = 100, m = 10 and k = 100. In the normality plot, Figure 5.4, we can see evidence of this asymptotic normality. Figure 5.3 (b) shows the distribution of the prediction error when the predictor is the classic linear combination of the observed values. The vertical line in the graph is located at the mean value of the sampling distribution. The histogram is not centered at zero, which reflects the fact that this naive predictor, a linear combination of the observed values, has nontrivial conditional bias in this case (the value of the conditional bias is .00087), which is approximately 2~ Jo1 a(Zt)a'(Zt)dt. Now we compare the mse of the linear and nonlinear predictor, for the particular Z(n) = z(n) pictured in Figure 5.1, to show that the nonlinear predictor is better. The mse of the nonlinear predictor is 4.13 x 10-7 and the mse of the linear predictor is 8.01 x 10-7, almost twice as large. In order to show that this result is not just due to a favorable value of the mse for more simulations conditioning on other values of nonlinear predictor from the simulation conditional on the Z(n)' Z(n) Z(n) , Table 5.1 compares We compared the mse of the shown in Figure 5.1 with the one obtained using the asymptotic approximation presented in Theorem 3.1, and the asymptotic mse is 3.97 X 10-7, very close to the mse of the nonlinear predictor and roughly half the mse of the linear predictor. 21 Prediction Error for Nonlinear Predictor LO ..- ~ c:: Q) :::l 0 ..- C" ~ U. LO o -0.0015 -0.0010 -0.0005 0.0 0.0005 0.0010 - 0.0015 Figure a Prediction Error for BLP 0 C\I LO ..- >0 c:: Q) :::l 0" 0 ..- ~ U. LO o -0.0005 0.0 0.0005 0.0010 0.0015 - 0.0020 Figure b Figure 5.3. Distribution of the approximated prediction error, u 2 known. (a) Distribution of 1 the conditional prediction error when predicting Z{t)dt given Z(n) = z(n) with the optimal predictor. The vertical axis represents the frequencies out of 100 simulations. (b) Distribution of the prediction error given Z(n) = z(n) using the BLP. The conditional bias is .00087. Jo 22 •••• n • • _ . __ . ••••••••••• . __ .• . . __ • ••••••••••••••••• : : ...e ... w c o ~ ~ o mmmmmmm1mmm 0""0 ~ 'EI'd ""0 C I'd en .,... I • •• -2 • o -1 1 2 Quantiles of Standard Normal Figure 5.4. Normality plot for the approximated prediction error, for the standardized conditional prediction error when we predict with the approximately optimal predictor, In(Z(n»)' Io 23 known. Normality plot Z(t)dt given Zen) = Zn (72 1 We also obtained naive 95% prediction intervals. For the nonlinear predictor in the simulation shown in Figure 5.3, conditioning on the particular value of Z(n) = Z(n) pictured in Figure 5.1, the 95% prediction interval, using the approximation for the conditional variance (4.3) presented in Section 4, is (0.572687, 0.575567). We calculated the 95% prediction interval for the linear predictor acting as if we have a Brownian motion, and estimated 0: 2 = (5.2) (72 with n L (Z(ti) - Z(ti_l))2 . i=l This estimator is uniformly minimum variance unbiased for (72 under the assumption that Z is a Brownian motion. Thus, we could use n -20: 2 to build the prediction interval. For this simulation, using the value of Z(n) = Z(n) plotted in Figure 5.1, the value of n- 20: 2 is 4.52 x 10- 7 . The prediction interval for the linear predictor using n- 2 0: 2 is (0.573507, 0.576137). Thus, the interval for the linear predictor is approximately the interval for the nonlinear predictor shifted to the right by around .0006, or by approximately the conditional bias for the linear predictor. We repeatedly simulated Jo1 Z(t)dt, conditioning on the same Z(n) as in Figure 5.1, to see how well the obtained prediction intervals work. For this simulation we used the technique presented in part (ii) of this section. In 450 out of 500 simulations (90% of the time), the prediction interval for the nonlinear predictor contains Jo1 Z(t)dtj whereas, in 390 out 500 simulations (78% of the time), the prediction interval for the linear predictor contains the simulated Jo1 Z(t)dt. The median value of Jo1 Z(t)dt is 0.574027, approximately the mid point of the interval for the nonlinear predictor. All these results suggest that the prediction interval, obtained when we use the nonlinear predictor and the approximate conditional standard prediction error, has somewhat better coverage than the one obtained for the linear predictor (see Figure 5.5). 24 Conditional Distribution o C\I ~ ...... nonlinear predictor -- linear predictor o lX) o C\I ; : I o I I I I I 0.572 0.573 0.574 0.575 0.576 Integral of the Diffusion Figure 5.5. Prediction intervals for I~ Z(t)dt. I~ Z(t)dt given a fixed Z(n) = Simulated distribution of the integral Z(n). The solid vertical line is located at the sample mean of the simulated values of I~ Z(t)dt. The marks on the horizontal axis are the prediction intervals for the linear and nonlinear predictors. The vertical axis shows the frequencies out of 500 simulations. 25 Because the results obtained could be just due to a favorable realization of Z(n), we ran many other simulations. We present the mse of the linear predictor obtained by using two different approaches (using the conditional variance for the linear predictor and using n- 2 o: 2 ) and the mse of the nonlinear predictor using (4.3), for 10 simulated Z(n), with n = 500, assuming the diffusion coefficient is a known function. Conditional mse for the linear and nonlinear predictors: Linear Nonlinear 1.18 1.04 1.09 0.31 1.86 0.61 2.07 0.52 1.70 1.09 2.23 1.42 2.02 0.04 1.99 0.61 1.98 0.29 1.92 0.79 Table 5.1 Conditional mse for the linear and nonlinear predictors multiplied by 108 . 26 mse for the best linear predictor using the conditional variance using n- 2 o: 2 : Conditional mse mse using n- 2 o: 2 1.18 1.14 2.09 2.10 1.86 2.02 2.07 2.14 1.70 1.64 2.23 2.26 2.02 1.97 1.99 2.02 1.98 1.98 1.92 2.03 Table 5.2 mse values multiplied by 108 for the linear predictor. Table 5.1 shows in the first column the mse for the linear predictor using the conditional variance for the linear predictor. The second column is the mse for the nonlinear predictor using the approximation to the conditional variance presented in Section 4 (4.3), assuming (72 is known. The first column have mse values that are larger than the mse values for the nonlinear predictor. Table 5.2 shows in the first column the mse, conditioning on the same values for Z(n) as in Table 5.1, for the linear predictor using the conditional variance for the linear predictor, and the second column shows the mse for the linear predictor acting as if it is a Brownian motion and using n- 2 o: 2 • The two columns have similar mse values, which indicates that using n- 2 o: 2 to compute the mse for the linear predictor is an appropriate approach. 27 6. Final Remarks. In this paper we have presented the asymptotic normality of the prediction error for predicting the integral of a diffusion process with the integral of the conditional expectation of the process, the optimal predictor. We have also obtained a simple asymptotically optimal approximation for the predictor and a simple asymptotically valid approximation for the variance of the prediction error, but in both cases we assumed that the diffusion coefficient was a known function. 1 We could generalize Theorem 2.1 to other additive functionals, for instance fo g(Zt)dt, because, by the Ito transformation formula we easily obtain the diffusion parameters of yt = g(Zt) as a function of Zt through g. If 9 is smooth enough so that assumptions (AA) and (C.2) are satisfied by the new diffusion parameters, then Theorem 2.1 holds for the diffusion process g(Zt), and we 1 obtain the asymptotic normality of fo {g(Zt) - E [g(Zt)lg (Z(n))]} dt. Acknowledgments. The author is grateful to Michael L. Stein for his many valuable comments, insights and guidance in obtaining the results in this paper. REFERENCES BILLINGSLEY, P. (1995). Probability and Measure, third edition. Wiley, New York. CHAKRAVARTI, P. C. (1970). Integmls and Sums: Some New Formulae for Their Numerical Evaluation. Oxford University Press, New York. DAVIS, P. J., and RABINOWITH, P. (1984). Numerical Integmtion, second edition. Academic Press, Orlando. EUBANK, R. L., SMITH, P. L., and SMITH, P. W. (1981). Uniqueness and eventual uniqueness of optimal designs in some time series models. Annals of Statistics, 9, 486-493. FUENTES, M. (1999). Asymptotic Normality of Conditional Integrals of Diffusion Processes. Tecnical Report Statistics Department North Carolina State University. • GHIZZETTI, A., and OSSCCINI, A. (1970). Quadmture Formulae. Academic Press, New York. 28 • KARATZAS, I., and SHREVE S. E. (1991). Brownian Motion and Stochastic Calculus, second edition. Springer-Verlag, New York. MARNEVSKAYA, L. A. and JAROVICH, L. A. (1984). Laplace method for Riemann integrals of random fields. Vesci Akademii Nauk Byelorussia SSR, Seryja Fizika-Matematycnyh Nauk. Leninskii prospekt, 68, Minsk, USSR, 4, 9-12. MATHERON, G. (1985). Change of support for diffusion-type random functions. Journal of the International Association for Mathematical Geology, 17, 137-165. PITT, L. D., ROBEVA, R., and WANG, D. Y. (1995). An error analysis for the numerical calculation of certain random integrals: Part 1. Annals of Applied Probability, 5,171-197. RITTER, K. (1995). Average Case Analysis of Numerical Problems. University of Erlangen. SACKS, J. and YLVISAKER, D. (1966). Designs for regression problems with correlated error. Annals of Mathematical Statistics, 37, 66-89. SACKS, J. and YLVISAKER, D. (1968). Designs for regression problems with correlated error: many parameters. Annals of Mathematical Statistics, 39, 46-69. SACKS, J. and YLVISAKER, D. (1970). Designs for regression problems with correlated error III. Annals of Mathematical Statistics, 41, 2057-2074. STEIN, M. L. (1987). Gaussian approximations to conditional distributions for multi-Gaussian processes. Mathematical Geology, 19, 387-405. STEIN, M. L. (1993). Asymptotic properties of centered systematic sampling for predicting integrals of spatial processes. Annals of Applied Probability, 3, 874-880. STEIN, M. L. (1995). Predicting integrals of stochastic processes. Annals of Applied Probabil- ity, 5, 158-170. WAHBA, G. (1971). On the regression problem of Sacks and Ylvisaker. Annals of Mathematical • Statistics, 42, 1035-1053. 29 " WARBA, G. (1974). Regression design for some equivalence classes of kernels. Annals of Statis- tics, 2, 925-934. • 30
© Copyright 2026 Paperzz