Discretely Observed Diffusions: Approximation of the Continuous

# Board of the Foundation of the Scandinavian Journal of Statistics 2001. Published by Blackwell Publishers Ltd, 108 Cowley Road,
Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 28: 113±121, 2001
Discretely Observed Diffusions:
Approximation of the Continuous-time Score
Function
HELLE SéRENSEN
University of Copenhagen
ABSTRACT. We discuss parameter estimation for discretely observed, ergodic diffusion
processes where the diffusion coef®cient does not depend on the parameter. We propose
using an approximation of the continuous-time score function as an estimating function. The
estimating function can be expressed in simple terms through the drift and the diffusion
coef®cient and is thus easy to calculate. Simulation studies show that the method performs
well.
Key words: continuous-time score function, diffusion process, discrete observations, estimating function
1. Introduction
This paper is about parameter estimation for discretely observed diffusion models with
known diffusion function. The idea is to use an approximation of the continuous-time score
function as estimating function.
This idea is very much in the spirit of the early work by Le Breton (1976) and Florens-Zmirou
(1989). They both studied the usual Riemann±Itoà discretization of the continuous-time loglikelihood function, and Florens-Zmirou (1989) showed that the corresponding estimator is
inconsistent when the length of the time interval between observations is constant.
More recently, various methods providing consistent estimators have been developed, e.g.
methods based on approximations of the true, discrete-time likelihood function (Pedersen, 1995;
AõÈt-Sahalia, 1998); methods based on auxiliary models (Gallant & Tauchen, 1996; Gourieroux
et al., 1993); and methods based on estimating functions (Bibby & Sùrensen, 1995; Hansen &
Scheinkman, 1995; Kessler, 2000; Jacobsen, 2001).
The estimating function discussed in this paper is of the simple, explicit type discussed by
Pn
Hansen & Scheinkman (1995) and Kessler (2000), that is, on the form iˆ1
Aè h(X t iÿ1 , è)
where Aè is the diffusion generator. Hansen & Scheinkman (1995) focus on identi®ability and
asymptotic behaviour of the estimating function whereas Kessler (2000) focuses on asymptotic
behaviour and ef®ciency of the estimator.
The main contribution of this paper is to recognize that, with a special choice of h, the
estimating function can be interpreted as an approximation to the continuous-time score
function. The approximating estimating function is unbiased, it is invariant to data transformations, it provides consistent and asymptotically normal estimators, and it can be explicitly
expressed in terms of the drift and diffusion coef®cient. The estimating function is alsoÐat least
in some casesÐavailable for multi-dimensional processes.
The main objection against the method is the need for a completely known diffusion function.
In case of a parameter dependent diffusion function the suggested estimating function is still
unbiased and can thus in principle be used, but there is no longer justi®cation for using it since
the continuous-time likelihood function does not exist.
We present the model and the basic assumptions in section 2 and derive the estimating
function in section 3. Asymptotic results are given in section 4. Examples and simulation studies
114
H. Sùrensen
Scand J Statist 28
are presented in section 5. Sections 2±5 discuss one-dimensional diffusion processes exclusively; we study the multi-dimensional case in section 6.
2. Model and notation
In this section we present the diffusion model, state the assumptions and introduce some
notation.
We consider a one-dimensional, time-homogeneous stochastic differential equation
dX t ˆ b(X t , è) dt ‡ ó (X t ) dW t ,
X 0 ˆ x0
(1)
p
where è is an unknown p-dimensional parameter from the parameter space È R and W
is a one-dimensional Brownian motion. The functions b: R 3 È ! R and ó : R !]0, 1[
are known, and the derivatives @ó =@x and @ 2 b=@è j @x are assumed to exist for all
j ˆ 1, . . ., p. Note that ó does not depend on è.
We assume that, for any è, (1) has a unique strong solution X, and that the range of X does
not depend on è. Assume furthermore that there exists a unique invariant distribution
ìè ˆ ì(x, è) dx such that a solution to (1) with X 0 ìè (instead of X 0 ˆ x0 ) is strictly
stationary. Suf®cient conditions for these assumptions to hold can be found in Karatzas &
Shreve (1991).
The invariant density is given by
ì(x, è) ˆ (C(è)s(x, è)ó 2 (x))ÿ1
(2)
where C(è) is a„ normalizing constant and s(:, è) is the density of the scale measure, i.e.
log s(x, è) ˆ ÿ2 x b( y, è)=ó 2 ( y) dy.
For all è 2 È, the distribution of X is denoted Pè if X 0 ˆ x0 (as in (1)) and Pèì if X 0 ìè .
Under Pèì, all X t ìè . Eèì is the expectation w.r.t. Pèì .
The objective is to estimate è from observations of X at discrete time-points t1 , , tn .
De®ne t0 ˆ 0 and Ä i ˆ ti ÿ t iÿ1 and let è0 be the true parameter.
Finally, we need some matrix notation: vectors in R p are considered as p 3 1 matrices, and
T
A is the transpose of A. For a function f ˆ ( f 1 , . . ., f q )T : R 3 È ! R q we let f 9(x, è) be the
q 3 1 matrix of partial derivatives w.r.t. x and f_(x, è) ˆ Dè f (x, è) be the q 3 p matrix of
partial derivatives with respect to è, i.e. f_jk (x, è) ˆ (@=@è k ) f j (x, è).
3. The estimating function
In this section we derive a simple, unbiased estimating function as an approximation to the
continuous-time score function.
First a comment on the model: it is important that ó does not depend on è. Otherwise the
distributions of (X s )0<s< t corresponding to two different parameter values are typically singular
~
for
„ all t > 0. If Y is the solution to dYt ˆ b(Yt , è) dt ‡ ó (Yt , è) dW t , then the process
ó ( y, è) dy) t>0 is the solution to (1) with ó 1, but this is of no help for estimation
( Y t 1=~
purposes since the transformation depends on the (unknown) parameter.
When ó is completely known as we have assumed, it follows from Lipster & Shiryayev
(1977) that the likelihood function for a continuous observation (X s )0<s< t exists and that the
corresponding score process S c is given by
…t _
…t
_ s , è)
b(X s , è) b(X
b(X s , è)
S ct (è) ˆ
ÿ
dX
ds:
s
2
2
ó (X s )
0 ó (X s )
0
Using (1) we ®nd that
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
Approximation of the score function
Scand J Statist 28
dS ct (è) ˆ
_ t , è)
b(X
dW t :
ó (X t )
115
(3)
ì
This shows that S c (è) is a local martingale and a genuine martingale if Eè
ó (X s ))2 ds , 1 for all t > 0 and all j ˆ 1, . . ., p, i.e. if
!2
!2 …
_
b_ j (x, è)
ì b j (X 0 , è)
Eè
ˆ
ì(x, è)dx , 1
ó (X 0 )
ó (x)
„t
_
0 ( b j (X s ,
è)=
(4)
for all j ˆ 1, . . ., p. In particular, Eèì S ct ˆ 0 for all t > 0 if (4) holds.
If X was observed continuously on the interval [0, tn ] we would estimate è by solving the
equation S ctn (è) ˆ 0. For discrete observations at time-points t1 , . . ., tn , the idea is to use an
approximation to S ctn as estimating function.
The most obvious approximation is obtained by simply replacing the integrals in (3) with the
corresponding Riemann and Itoà sums,
Rn (è) ˆ
n
_ t , è)
X
, è)
b(X t iÿ1 , è) b(X
iÿ1
ÿ
X
)
ÿ
Äi
(X
:
t
t
i
iÿ1
2
2
ó (X t iÿ1 )
ó (X t iÿ1 )
iˆ1
n _
X
b(X t
iˆ1
iÿ1
(5)
Note that this would be the score function if the conditional distributions of the increments
X t i ÿ X t iÿ1 , given the past, were Gaussian with expectation Ä i b(X t iÿ1 , è) and variance
ì
Ä i ó 2 (X t iÿ1 ). However, usually Eè Rn (è) 6ˆ 0, and Rn provides inconsistent estimators unless
sup iˆ1,:::, n Ä i ! 0 (Florens-Zmirou, 1989).
We now propose an unbiased approximation of S ctn . Let Aè denote the differential operator
associated with the in®nitesimal generator for X, that is
Aè f (x, è) ˆ b(x, è) f 9(x, è) ‡ 12 ó 2 (x) f 0(x, è)
for functions f : R 3 È ! R p that are twice continuously differentiable w.r.t. x.
Recall that ì is the invariant density and assume that the derivatives
h ˆ Dè log ì: R 3 È ! R p
w.r.t. the coordinates of è exist and are twice continuously differentiable w.r.t. x, such that
Aè h is well-de®ned. The connection between h and S c is given in the following
proposition.
Proposition 1
With respect to Pè and Pèì it holds for all t > 0 that
…t
2S ct (è) ˆ h (X t , è) ÿ h (X 0 , è) ÿ Aè h (X s , è)ds:
(6)
0
Proof. We show that
dh (X t , è) ˆ Aè h (X t , è)dt ‡ 2dS ct (è):
Then (6) follows immediately since
h ˆ Dè log ì in terms of b and ó;
S 0c
(7)
ˆ 0. Using (2) we easily ®nd the ®rst derivative of
h 9(x, è) ˆ Dx Dè log ì(x, è) ˆ ÿDè Dx log s(x, è) ˆ 2
Now simply apply ItoÃ's formula on h .
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
_
b(x,
è)
:
ó 2 (x)
(8)
116
H. Sùrensen
Scand J Statist 28
The proposition suggests that we use
Fn (è) ˆ
n
1X
Ä i Aè h (X t iÿ1 , è)
2 iˆ1
as an approximation to ÿS ctn (since the term h (X t n , è) ÿ h (X 0 , è) is negligible when n
is large) and hence solve the equation Fn (è) ˆ 0 in order to ®nd an estimator for è.
The r.h.s. of (6), with an arbitrary function h 2 C 2 (I) substituted for h, is a martingale if
ì
Eè (h9ó )2 , 1. Hence,
Eèì Aè h(X 0 , è) ˆ 0
(9)
ì
Eè Fn (è)
if furthermore h and Aè h are in L1 (ìè ). In particular Fn is unbiased, i.e.
ˆ 0, if
(4) holds and if h and Aè h are in L1 (ìè ).
The moment condition (9) was used by Hansen & Scheinkman (1995) to construct general
method of moments estimators (their condition C1) and by Kessler (2000) and Jacobsen (2001)
to construct unbiased estimating functions. Kessler particularly suggests choosing polynomials
h of low degreeÐregardless of the model. Instead, we suggest the model-dependent choice
h ˆ h . Intuitively, this should be good for small Ä i s since Fn ÿS ctn . Indeed, for Ä i Ä, Fn
is small Ä-optimal in the sense of Jacobsen (2001).
It should be clear, though, that moment conditions like (9) cannot achieve asymptotically
ef®cient estimators for a given Ä . 0 since each term in the discrete-time score function
involves pairs of observations. Note that if the observations were independent and identically
Pn ìè -distributed, then the score function would equal iˆ1
h (X t i , è), which is thus optimal for
Ä ! 1. Kessler (2000) discussed this estimating function.
When Ä i Ä, Fn is a simple estimating function, i.e. a function of the form
Pn
ì
iˆ1 f (X t iÿ1 , è) where Eè f (X 0 , è) ˆ 0 (Kessler, 2000). In general, the Ä i s can be interpreted
as weights compensating for the dependence between observations that are close in time: an
observation is given much weight if it is far in time from the previous one, and little weight if it
is close in time to the previous one.
Note that (9) holds so that Fn is unbiased even if ó depends on è. However, the interpretation
of Fn as an approximation of minus the continuous-time score function is of course no longer
valid, and the method will be non-optimal even for small Ä i s (Jacobsen, 2001).
A nice property of Fn is that it is invariant to transformations of data; the estimator does not
change if we observe ö(X t1 ), . . ., ö(X t n ) instead of X t1 , . . ., X t n . This is not the case for the
polynomial martingale estimating functions discussed by Bibby & Sùrensen (1995).
To prove the invariance, we need some further notation: for a diffusion process Y satisfying a
stochastic differential equation similar to (1), we write ì Y and A Y for the corresponding
invariant density and the differential operator, and de®ne hY ˆ Dè log ì Y .
Proposition 2
Let ö: I ! J R be a bijection from C 2 (I) with inverse öÿ1 , and let Y ˆ ö(X ). Then
A Y hY ( y, è) ˆ A X hX (öÿ1 ( y), è):
Proof. By ItoÃ's formula, Y is the solution to
dYt ˆ bY (Yt , è)dt ‡ ó Y (Yt ) dW t
where, with obvious notation,
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
(10)
Approximation of the score function
Scand J Statist 28
117
bY ( y, è) ˆ bX (öÿ1 ( y), è)ö9(öÿ1 ( y)) ‡ 12(ó 2X ö0)(öÿ1 ( y)),
ó Y ( y) ˆ (ó X ö9)(öÿ1 ( y)):
One can now either check directly from (11) that (10) holds or argue as follows. The density
for the invariant distribution of Y ˆ ö(X ) is given by ì Y ( y, è) ˆ ì X (öÿ1 ( y), è)j(öÿ1 )9( y)j and
thus
h ( y, è) ˆ D log ì ( y, è) ˆ D log ì (öÿ1 ( y), è) ˆ h (öÿ1 ( y), è):
Y
è
Y
è
X
X
Finally, A Y ( f öÿ1 )( y) ˆ A X f (öÿ1 ( y)) for all f 2 C 2 (I), which concludes the proof.
In the following we write f for (Aè h )=2 ˆ (Aè Dè log ì)=2. It is important to note that
Pn
Ä i f (X t iÿ1 , :)Ðin terms of b
we have an explicit expression for f Ðand thus for Fn ˆ iˆ1
and ó, even if we have no explicit expression for the normalizing constant C(è): from (8) we get
_ 9
b b_ 1 _
bó
f ˆ Aè h =2 ˆ
:
(11)
‡ b9 ÿ
ó2 2
ó
4. Asymptotic properties
In this section we state the asymptotic results for Fn. We consider equidistant observations,
ti ˆ iÄ where Ä does not depend on n, and let n ! 1.
Under suitable regularity conditions, a solution è^n to Fn (è) ˆ 0 exists with a Pè0 -probability
tending to 1, and è^n is a consistent, asymptotically normal estimator for è. The asymptotic
distribution of è^n is given by
p ~
D
n(è n ÿ è0 ) ! N (0, A(è0 )ÿ1 V (è0 )(A(è0 )ÿ1 )T )
w.r.t. Pè0 as n ! 1, where A(è0 ) ˆ Eèì0 f_ (X 0 , è0 ) and
V (è0 ) ˆ Eèì0 f (X 0 , è0 ) f (X 0 , è0 )T ‡ 2
1
X
kˆ1
Eèì0 f (X 0 , è0 ) f (X kÄ , è0 )T :
Conditions that ensure convergence of the above sum are given by Kessler (2000). If (9)
holds for each @ hj =@è k , then
!T
!
_ 0 , è0 )
_ 0 , è0 )
b(X
b(X
ì
A(è0 ) ˆ 2Eè0
,
ó (X 0 )
ó (X 0 )
and A(è0 ) is symmetric and positive semide®nite. It must be positive de®nite. We will not
go through the additional regularity conditions here but refer to Kessler (2000) and
particularly to Sùrensen (2000).
5. Examples
As already mentioned, Fn can always be expressed explicitly in terms of b and ó. When b is
linear w.r.t. the parameter we even get explicit estimators. Assume that b(x, è) ˆ b0 (x)
Pp
bj (x)è j for known functions b0 , b1 , . . ., bp : R ! R such that the assumptions of
‡ jˆ1
sections 2 and 3 hold. From (11) we easily deduce that the kth coordinate of f is given by
f k (x, è) ˆ
p
X
bk (x)bj (x)
jˆ1
ó 2 (x)
èj ‡
b0 (x)bk (x) 1
bk (x)ó 9(x)
‡ b9k (x) ÿ
:
ó 2 (x)
2
ó (x)
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
118
H. Sùrensen
Scand J Statist 28
It follows that Fn is linear in è, and it is easy to show that the estimating equation has a
unique, explicit solution if and only if b1 , . . ., bp are linearly independent.
The Ornstein±Uhlenbeck model and the Cox±Ingersoll±Ross model are special cases of this
setup. Several authors have studied inference for these models, see Bibby & Sùrensen (1995),
Gourieroux et al. (1993), Kessler (2000), Jacobsen (2000), Overbeck & RydeÂn (1997), and
Pedersen (1995), for example. From now on, we consider equidistant observations, Ä i Ä.
Example 1. The Ornstein±Uhlenbeck process. Let X be the solution to
dX t ˆ èX t dt ‡ dW t ,
X 0 ˆ x0 ,
Pn
where è , 0. The estimator is given by è^n ˆ ÿn=(2 iˆ1
X 2(iÿ1)Ä ). Since h is an eigenfunction for Aè, the simple estimating functions corresponding to f ˆ h and f ˆ f are
proportional (and hence provide the same estimator). Kessler (2000) showed that, for all Ä,
è^n has the least asymptotic variance among estimators obtained from simple estimating
equations. This also follows from results in Jacobsen (2000).
Example 2. The Cox±Ingersoll±Ross process. Consider the solution X to
p
dX t ˆ (á ‡ âX t ) dt ‡ X t dW t , X 0 ˆ x0
where â , 0 and á > 1=2. The estimating function Fn is given by
Pn
1=X (iÿ1)Ä ‡ nâ
(á ÿ 12) iˆ1
Pn
Fn (á, â) ˆ
:
â iˆ1 X (iÿ1)Ä ‡ ná
To see how the estimator performs we have compared it to three other estimators in a
simulation study. We have simulated 500 processes on the interval [0, 500] by the Euler
scheme with time-step 1/1000. The number of observations is n ˆ 500 and Ä ˆ 1. For each
simulation we have calculated four estimators: those obtained from Fn , Rn given by (5),
Pn h (X t iÿ1 , :), and the martingale estimating function suggested by Bibby &
H n ˆ iˆ1
Sùrensen (1995).
The estimating function H n is given by
Pn
iˆ1 log X (iÿ1)Ä ÿ nØ(2á) ‡ n log(ÿ2â)
P
H n (è) ˆ 2
n
iˆ1 X (iÿ1)Ä ‡ ná=â
where Ø is the Digamma function, Ø ˆ @ log Ã=@x. Note that the second coordinates of
Fn and H n are equivalent and that H n (è) ˆ 0 cannot be solved explicitly.
The empirical means and standard errors of the four estimators are listed in Table 1. The true
parameter values are á0 ˆ 10 and â0 ˆ ÿ1.
The estimator from Rn is biased (as we knew). The estimating functions Fn and H n seem to
be almost equally good and they are both better than the martingale estimating function.
Finally, we consider an example where the parameter of interest enters as an exponent in the
drift function.
Example 3. A generalized Cox±Ingersoll±Ross model. Let X be the solution to
p
dX t ˆ (á ‡ âX èt )dt ‡ X t dW t
where á > 12 and â , 0 are known, and è . 0 is the unknown parameter. Note that X is a
Cox±Ingersoll±Ross process if è ˆ 1; for è 6ˆ 1 the mean reverting force is stronger or
weaker.
From (11) it follows that f ˆ (Aè h )=2 is given by
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
Approximation of the score function
Scand J Statist 28
119
Table 1. Empirical means and standard errors for 500 realizations of various estimators for (á, â) in the
Cox±Ingersoll±Ross model. The number of observations is n ˆ 500 and Ä ˆ 1. The true value is
(á0 , â0 ) ˆ (10, ÿ1).
Estimating function
mean
Fn
Hn
Rn
Martingale
10.1271
10.1543
6.3691
10.2000
^n
á
s.e.
mean
0.7218
0.7151
0.4279
1.1900
ÿ1.0126
ÿ1.0154
ÿ0.6368
ÿ1.0200
^n
â
s.e.
0.0737
0.0729
0.0430
0.1200
è 1
1
èÿ1
è
f (x, è) ˆ âx
log x á ‡ âx ‡ ÿ
‡ âx èÿ1 :
2 2
2
The estimating equation must be solved numerically. For comparison we have also considered the simple estimating function corresponding to
Ã((2á ‡ 1)=è)
è 1=è
f~(x, è) ˆ x ÿ Eèì X 0 ˆ x ÿ
ÿ
:
(12)
Ã(2á=è)
2â
As above, we have simulated 500 processes by means of the Euler scheme; n ˆ 500 and
Ä ˆ 1. The true value of è is è0 ˆ 1:5 and á ˆ 2, ⠈ ÿ1. The means (standard errors) of the
P~
f (X (iÿ1)Ä , :).
estimators are 1.5028 (0.0508) when using Fn and 1.5001 (0.0590) when using
Both estimators are very precise. The estimator obtained from (12) is closer to the true value
but has larger standard error than the estimator obtained from Fn .
6. Multi-dimensional processes
So far, we have only studied one-dimensional diffusion processes. In this section we discuss
to what extent the ideas carry over to the multi-dimensional case.
We consider a d-dimensional stochastic differential equation
dX t ˆ b(X t , è)dt ‡ ó (X t )dW t ,
X 0 ˆ x0 :
(13)
The parameter è is still p-dimensional, è 2 È R p , but X and W are now d-dimensional.
The functions b: R d 3 È ! R d and ó : R d ! R d3d are known, ó (x) is regular for all
x 2 R d , and x0 2 R d .
Let b_ be the d 3 p matrix of derivatives; b_ ij ˆ @bi =@è j , and let Di g ˆ @ g=@xi and
D2ij g ˆ @ g2 =@xi @xj for functions g: R d 3 È ! R with g(:, è) in C 2 (R d ). With this notation,
the analogue of Aè is given by
Aè g(x, è) ˆ
d
X
bi (x, è)Di g(x, è) ‡
iˆ1
d
1X
(ó (x)ó T (x)) ij D2ij g(x, è),
2 i, jˆ1
and the score process is given by
…t
…t
S ct (è) ˆ b_ T (X s , è)Ó(X s )dX s ÿ b_ T (X s , è)Ó(X s )b(X s , è)ds,
0
0
T
ÿ1
where Ó(x) ˆ (ó (x)ó (x)) , see Lipster & Shiryayev (1977). Using (13) we ®nd
dS ct (è) ˆ b_ T (X t , è)(ó ÿ1 )T (X t )dW t .
Now, similarly to (7) we look for functions h1 , . . ., hp : R d 3 È ! R such that for each k,
dhk (X t , è) ˆ Aè hk (X t , è)dt ‡ 2 dS ck,t (è). Arguing as above, this leads to the equations
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
120
H. Sùrensen
Scand J Statist 28
Di hk (x, è) ˆ 2[ b_ T (x, è)Ó(x)] ki ˆ 2
d
X
b_ rk (x, è)Ó ir (x),
i ˆ 1, . . ., d
(14)
rˆ1
and thus
Aè hk ˆ 2
d
X
b_ rk Ó ir bi ‡
i, rˆ1
d
X
@ b_ jk
jˆ1
@xj
‡
d
X
(ó ó T ) ij b_ rk
i, j,rˆ1
@Ó ir
:
@xj
(15)
The equations (14) may, however, not have solutions; differentiation w.r.t. xj yields
!
d
X
@ 2 br
@br @Ó ir
2 D ij h k ˆ 2
,
Ó ir ‡
@è k @xj
@è k @xj
rˆ1
but the r.h.s. is not necessarily symmetric w.r.t. i and j, see the example below.
If there are solutions, then (15) has expectation zero and the simple estimating
function with f ˆ (Aè h1 , . . ., Aè hp )T may be used. Otherwise, the r.h.s. of (15) is typically
biased.
Example 4. Homogeneous Gaussian diffusions. Let B be a 2 3 2 matrix with eigenvalues
with strictly negative parts, and let A be an arbitrary 2 3 1 matrix. Consider the stochastic
differential equation
dX t ˆ (A ‡ BX t )dt ‡ ó dW t ,
X 0 ˆ x0
where ó . 0 is known, W is a two-dimensional Brownian motion, and x0 2 R2 .
Solutions to all the equations (14) exist if and only if B is symmetric. Let á1 , á2 , â11 , â22 , â12
P
P
P 2
denote the entries of A and B and let S1 ˆ
X 1,(iÿ1)Ä , S2 ˆ
X 2,(iÿ1)Ä , S11 ˆ
X 1,(iÿ1)Ä ,
P 2
P
X 2,(iÿ1)Ä , and S12 ˆ
X 1,(iÿ1)Ä X 2,(iÿ1)Ä ; all sums are from 1 to n. Then the estimatS22 ˆ
ing equation is given by
1
10
1 0
0
á1
0
n 0 S1
0
S2
CB á2 C B 0 C
B0 n
0
S2
S1
C
CB
C B
1 B
CB â11 C ˆ B ÿn=2 C:
B S1 0 S11 0
S
12
C
C
C
B
B
B
2
ó @
0 S2 0 S22
S12 A@ â22 A @ ÿn=2 A
0
S2 S1 S12 S12 S11 ‡ S22
â12
We have
500 processes (by exact simulation), each of a length of 500 with Ä ˆ 1
psimulated

and ó ˆ 2. The true matrices are
4
ÿ2
1
A0 ˆ
and B0 ˆ
:
1
1 ÿ3
The means and the standard errors (to the right) are
4:0349
0:2904
^
An ˆ
1:0035
0:2891
and
B^ n ˆ
ÿ2:0155
1:0078
1:0078 ÿ3:0247
0:1248
0:1177
0:1177
0:1978
The estimators are satisfactory.
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.
Scand J Statist 28
Approximation of the score function
121
Acknowledgements
I am grateful to my supervisor Martin Jacobsen for valuable discussions and suggestions,
and to Jens Lund, Martin Richter and the referees for useful comments on an earlier
version of the manuscript.
References
AõÈt-Sahalia, Y. (1998). Maximum likelihood estimation of discretely sampled diffusions: a closed-form
approach. Revised version of Working Paper 467, Graduate School of Business, University of Chicago.
Bibby, B. M. & Sùrensen, M. (1995). Martingale estimation functions for discretely observed diffusion
processes. Bernoulli 1, 17±39.
Florens-Zmirou, D. (1989). Approximate discrete-time schemes for statistics of diffusion processes. Statistics
20, 547±557.
Gallant, A. R. & Tauchen, G. (1996). Which moments to match? Econom. Theory 12, 657±681.
Gourieroux, C., Monfort, A. & Renault, E. (1993). Indirect inference. J. Appl. Econom. 8, 85±118.
Hansen, L. P. & Scheinkman, J. A. (1995). Back to the future: generating moment implications for
continuous-time Markov processes. Econometrica 63, 767±804.
Jacobsen, M. (2001). Discretely observed diffusions: classes of estimating functions and small Ä-optimality.
Scand. J. Statist. 28, 123±149.
Karatzas, I. & Shreve, S. E. (1991). Brownian motion and stochastic calculus, 2nd edn. Springer-Verlag,
New York.
Kessler, M. (2000) Simple and explicit estimating functions for a discretely observed diffusion process.
Scand. J. Statist. 27, 65±82.
Le Breton, A. (1976) On continuous and discrete sampling for parameter estimation in diffusion type
processes. Math. Program. Stud. 5, 124±144.
Liptser, R. S. & Shiryayev, A. N. (1977). Statistics of random processes, Vol. 1, Springer-Verlag, New York.
Overbeck, L. & RydeÂn, T. (1997) Estimation in the Cox±Ingersoll±Ross model. Econom. Theory 13,
430±461.
Pedersen, A. R. (1995). A new approach to maximum likelihood estimation for stochastic differential
equations based on discrete observations. Scand. J. Statist. 22, 55±71.
Sùrensen, M. (2000). On asymptotics of estimating functions. Brazilian Journal of Probability, in press.
Received July 1998, in ®nal form February 2000
Helle Sùrensen, Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen é, Denmark.
# Board of the Foundation of the Scandinavian Journal of Statistics 2001.