The role of initial values in conditional sum-of

QED
Queen’s Economics Department Working Paper No. 1300
The role of initial values in conditional sum-of-squares
estimation of nonstationary fractional time series models
SÃÿren Johansen
University of Copenhagen and CREATES
Morten ßrregaard Nielsen
Queen’s University and CREATES
Department of Economics
Queen’s University
94 University Avenue
Kingston, Ontario, Canada
K7L 3N6
11-2012
The role of initial values in conditional sum-of-squares estimation
of nonstationary fractional time series models
Søren Johanseny
University of Copenhagen
and CREATES
Morten Ørregaard Nielsenz
Queen’s University
and CREATES
February 25, 2015
Abstract
In this paper we analyze the in‡uence of observed and unobserved initial values on
the bias of the conditional maximum likelihood or conditional sum-of-squares (CSS, or
least squares) estimator of the fractional parameter, d, in a nonstationary fractional
time series model. The CSS estimator is popular in empirical work due, at least in part,
to its simplicity and its feasibility, even in very complicated nonstationary models.
We consider a process, Xt , for which data exist from some point in time, which we
call N0 + 1, but we only start observing it at a later time, t = 1. The parameter
(d; ; 2 ) is estimated by CSS based on the model d0 (Xt ) = "t ; t = N +1; : : : ; N +T ,
conditional on X1 ; : : : ; XN . We derive an expression for the second-order bias of d^ as a
function of the initial values, Xt ; t = N0 + 1; : : : ; N , and we investigate the e¤ect on
the bias of setting aside the …rst N observations as initial values. We compare d^ with
an estimator, d^c , derived similarly but by choosing = C. We …nd, both theoretically
and using a data set on voting behavior, that in many cases, the estimation of the
parameter picks up the e¤ect of the initial values even for the choice N = 0.
If N0 = 0, we show that the second-order bias can be completely eliminated by a
simple bias correction. If, on the other hand, N0 > 0, it can only be partly eliminated
because the second-order bias term due to the initial values can only be diminished by
increasing N .
Keywords: Asymptotic expansion, bias, conditional inference, fractional integration,
initial values, likelihood inference.
JEL Classi…cation: C22.
We are grateful to three anonymous referees, the co-editor, and seminar participants at various universities and conferences for comments and to James Davidson for sharing the opinion poll data set with
us. We would like to thank the Canada Research Chairs program, the Social Sciences and Humanities Research Council of Canada (SSHRC), and the Center for Research in Econometric Analysis of Time Series
(CREATES - DNRF78, funded by the Danish National Research Foundation) for research support.
y
Email: [email protected]
z
Corresponding author. Email: [email protected]
1
Initial values in CSS estimation of fractional models
1
2
Introduction
One of the most commonly applied inference methods in nonstationary autoregressive (AR)
models, and indeed in all time series analysis, is based on the conditional sum-of-squares
(CSS, or least squares) estimator, which is obtained by minimizing the sum of squared residuals. The estimator is derived from the Gaussian likelihood conditional on initial values and
is often denoted the conditional maximum likelihood estimator. For example, in the AR(k)
model we set aside k observations as initial values, and conditioning on these implies that
Gaussian maximum likelihood estimation is equivalent to CSS estimation. This methodology
was applied in classical work on ARIMA models by, e.g., Box and Jenkins (1970), and was
introduced for fractional time series models by Li and McLeod (1986) and Robinson (1994),
in the latter case for hypothesis testing purposes. The CSS estimator has been widely applied in the literature, also for fractional time series models. In these models, the initial
values have typically been assumed to be zero, and as remarked by Hualde and Robinson
(2011, p. 3154) a more appropriate name for the estimator may thus be the truncated sumof-squares estimator. Despite the widespread use of the CSS estimator in empirical work,
very little is known about its properties related to the initial values and speci…cally related
to the assumption of zero initial values.
Recently, inference conditional on (non-zero) initial values has been advocated in theoretical work for univariate nonstationary fractional time series models by Johansen and
Nielsen (2010) and for multivariate models by Johansen and Nielsen (2012a)— henceforth
JN (2010, 2012a)— and Tschernig, Weber, and Weigand (2013). In empirical work, these
methods have recently been applied by, for example, Carlini, Manzoni, and Mosconi (2010)
and Bollerslev, Osterrieder, Sizova, and Tauchen (2013) to high-frequency stock market data,
Hualde and Robinson (2011) to aggregate income and consumption data, Osterrieder and
Schotman (2011) to real estate data, and Rossi and Santucci de Magistris (2013) to futures
prices.
In this paper we assume the process Xt exists for t
N0 +1, and we derive the properties
of the process from the model given by the truncated fractional …lter d0N0 (Xt
0 ) = "t with
"t i:i:d:(0; 2 ), for some d0 > 1=2. However, we only observe Xt for t = 1; : : : ; T0 = N + T;
and so we estimate (d; ; 2 ) from the conditional Gaussian likelihood for XN +1 ; : : : ; XN +T
^ Our …rst result is to prove consistency
given X1 ; : : : ; XN , which de…nes the CSS estimator d.
and asymptotic normality of the estimator of d. This is of interest in its own right, not only
because of the usual issue of non-uniform convergence of the objective function, but also
because the estimator of is in fact not consistent when d0 > 1=2. We then proceed to
derive an analytical expression for the asymptotic second-order bias of d^ via a higher-order
stochastic expansion of the estimator. We apply this to investigate the magnitude of the
in‡uence of observed and unobserved initial values, and to discuss the e¤ect on the bias of
setting aside a number of observations as initial values, i.e., of splitting a given sample of
size T0 = N + T into N initial values and T observations for estimation. We compare d^ with
an estimator, d^c , derived from centering the data at C by restricting = C. We …nd, both
theoretically and using a data set on voting behavior as illustration, that in many cases, the
parameter picks up the e¤ect of the initial values even for the choice N = 0.
Finally, in a number of relevant cases, we show that the second-order bias can be eliminated, either partially or completely, by a bias correction. In the most general case, however,
Initial values in CSS estimation of fractional models
3
it can only be partly eliminated, and in particular the second-order bias term due to the
initial values can only be diminished by increasing the number of initial values, N .
In the stationary case, 0 < d < 1=2, there is a literature on Edgeworth expansions of
the distribution of the (unconditional) Gaussian maximum likelihood estimator based on
the joint density of the data, (X1 ; : : : ; XT ) in the model (1). In particular, Lieberman and
Phillips (2004) …nd expressions for the second-order term, from which we can derive the
main term of the bias in that case. We have not found any results on the nonstationary case,
d > 1=2, for the estimator based on conditioning on initial values.
The remainder of the paper is organized as follows. In the next section we present the
fractional models and in Section 3 our main results. In Section 4 we give an application of
our theoretical results to a data set of Gallup opinion polls. Section 5 concludes. Proofs of
our main results and some mathematical details are given in the appendices.
2
The fractional models and their interpretations
A simple model for fractional data is
d
(Xt
) = "t ; " t
i:i:d:(0;
2
);
t = 1; : : : ; T;
(1)
where d
0, 2 R, and 2 > 0. The fractional …lter d X
Pt 1is de…nednin terms of the
u
fractional coe¢ cients n (u) from an expansion of (1 z) = n=0 n (u)z , i.e.,
n (u) =
u(u + 1) : : : (u + n
n!
1)
=
(u + n)
(u) (n + 1)
nu 1
as n ! 1;
(u)
(2)
where (u) denotes the Gamma function and “ ” denotes that the ratio of the left- and
right-hand sides converges to one. More results are collected
Appendix A.
Pin
1
2
For a given value of d such that
P1 0 < d < 1=2, we have n=0 n (d) < 1. In this case,
d
process with a …nite
the in…nite sum Xt =
"t = n=0 n (d)"t n exists as a stationary
P
(
d) = 0.
variance, and gives a solution to equation (1) because d = 1
n=0 n
When d > 1=2, the solution to (1) is nonstationary. In that case, we discuss below two
interpretations of equation (1) as a statistical model. First as an unconditional (joint) model
of the stationary process X1 ; : : : ; XT when 1=2 < d < 3=2, and then as a conditional
model for the nonstationary process XN +1 ; : : : ; XN +T given initial values when d > 1=2. In
the latter case we call Xt an initial value if t N and denote the initial values Xn ; n N;
and we assume, see Section 2.2, that the variables we are measuring started at some point
N0 + 1 in the past, and we truncate the fractional …lter accordingly.
2.1
The unconditional fractional model and its estimation
One approach to the estimation of d from model (1) with nonstationary data is the di¤erenceand-add-back approach based on Gaussian estimation for stationary processes. If we have the
a priori information that 1=2 < d < 3=2, say, then we could transform the data X0 ; : : : ; XT
to XT = ( X1 ; : : : ; XT )0 and note that (1) can be written
d 1
(Xt
) = "t ;
so that Xt is stationary and fractional of order 1=2 < d 1 < 1=2. Note that
= 0; so
the parameter does not enter. To calculate the unconditional Gaussian likelihood function,
Initial values in CSS estimation of fractional models
4
we then need to calculate the T T variance matrix = (d; 2 ) = V ar( XT ), its inverse,
1
, and its determinant, det . This gives the Gaussian likelihood function,
1
log det
2
1
X0T
2
1
XT :
(3)
A general optimization algorithm can then be applied to …nd the maximum likelihood estimator, d^stat , if
can be calculated. This is possible by the algorithm in Sowell (1992).
^
The estimator dstat is not a CSS estimator, which is the class of estimators we study in this
paper, but it was applied by Byers, Davidson, and Peel (1997) and Dolado, Gonzalo, and
Mayoral (2002) in the analysis of the voting data, and by Davidson and Hashimzade (2009)
to the Nile data.
The estimator d^stat was analyzed by Phillips and Lieberman (2004) for true value d0 <
1=2. They derived an asymptotic expansion of the distribution function of T 1=2 (d^stat d0 );
from which a second-order bias correction of the estimator can be derived, see Section 3.2.
In more complicated models than (1), the calculation of
may be computationally
di¢ cult. This is certainly the case in, say, the fractionally cointegrated vector autoregressive
model of JN (2012a). However, even in much simpler models such as the usual autoregressive
model, a conditional approach has been advocated for its computational simplicity, e.g.,
Box and Jenkins (1970), because conditional maximum likelihood estimation simpli…es the
calculation of estimators by reducing the numerical problem to least squares. For this reason,
the conditional estimator has been very widely applied to many models, including (1). For
a discussion and comparison of the numerical complexity of Gaussian maximum likelihood
as in (3) and the CSS estimator, see e.g. Doornik and Ooms (2003).
2.2
The observations and initial values
It is di¢ cult to imagine a situation where fXs gT 1 is available, so that (1) could be applied.
In general, we assume data could potentially be available from some (typically unknown)
time in theP
past, N0 + 1, say. We therefore truncate the …lter at time N0 ; that is, de…ne
t+N0 1
d
n ( d)Xt n , and consider
N0 Xt =
n=0
d
N0 (Xt
) = "t ; t = 1; : : : ; T0 :
(4)
as the model for the data we actually observe, namely Xt for t = 1; : : : ; N + T = T0 . In
practice, when N0 > 0, we do not observe all the data, and so we have to decide how to split
a given sample of size T0 = N + T into (observed) initial values fXn gN
n=1 and observations
T
fXt gt=N +1 to be modeled, and then calculate the likelihood based on the truncated …lter
d
0 , as an approximation to the conditional likelihood based on (4). In the special case with
N0 = 0; the equations in (4) become
X1 =
X2 =
+ "1 ;
1 ( d)X1 +
(5)
+
1(
d) + "2 ;
etc., and can thus be interpreted as the initial mean or level of the observations. Clearly, if
is not included in the model, the …rst observation is X1 = "1 with mean zero and variance
2
. The lag length builds up as more observations become available.
As an example we take (an updated version of) the Gallup poll data from Byers et al.
(1997) to be analyzed in Section 4. The data is monthly from January 1951 to November
Initial values in CSS estimation of fractional models
5
2000 for a total of 599 observations. In this case the data is not available for all t simply
because the Labour party was founded in 1900, and the Gallup company was founded in
1935, and in fact the regular Gallup polls only started in January 1951, which is denoted
N0 + 1.
As a second example, consider the paper by Andersen, Bollerslev, Diebold, and Ebens
(2001) which analyzes log realized volatility for companies in the Dow Jones Industrial
Average from January 2, 1993, to May 28, 1998. For each of these companies there is an
earlier date, which we call N0 + 1, where the company became publicly traded and such
measurements were made for the …rst time. The data analyzed in Andersen et al. (2001) was
not from N0 +1, but only from the later date when the data became available on CD-ROM,
which was January 2, 1993, which we denote t = 1. We thus do not have observations from
N0 + 1 to 0.
We summarize this in the following display, which we think is representative for most, if
not all, data in economics:
:::;X
| {z
N0
}
Data does not exist
;
X
|
N0 +1 ; : : : ; X0
{z
}
Data exists
but is not observed
Thus, we consider estimation of
d
0 (Xt
;
X ;:::;X
| 1 {z N}
Data is observed
(initial values)
;
(6)
X
; : : : ; X T0
| N +1 {z
}
Data is observed
(estimation)
(7)
) = "t ; t = 1; : : : ; T0 ;
as an approximation to model (4). Unlike for (4), the conditional likelihood for (7) can
be calculated based on available data from 1 to T0 . For a fast algorithm to calculate the
fractional di¤erence, see Jensen and Nielsen (2014).
In summary, we use d N0 (Xt
) = "t as the model we would like to analyze. However,
because we only have data for t = 1; : : : ; T0 , we base the likelihood on the model d0 (Xt ) =
"t , for which an approximation to the conditional likelihood from (4) can be calculated with
the available data. We then try to mitigate the e¤ect of the unobserved initial values by
conditioning on X1 ; : : : ; XN .
2.3
The conditional fractional model
Let parameter subscript zero denote true values. In the conditional approach we interpret
equation (4) as a model for Xt given the past Ft 1 = (X N0 +1 ; : : : ; Xt 1 ) and therefore
solve the equation for Xt as a function of initial values, errors, and the initial level, 0 . The
solution to (1) is given in JN (2010, Lemma 1) under the assumption of bounded initial
values, and we give here the solution of (4).
Lemma 1 The solution of model (4) for XN +1 ; : : : ; XT0 ; conditional on initial values Xn ; N0 <
n N , is, for t = N + 1; : : : ; T0 ; given by
Xt =
d0
N "t
d0
N
t+N
0 1
X
n(
d0 )Xt
n
+
d0
t+N0 1 (
N
(8)
d0 + 1) 0 :
n=t N
d
We …nd the conditional mean and variance by writing model (4) as Xt
= (1
d
) + "t . Because (1
) is a function only of the past, we …nd
N0 )(Xt
N0 )(Xt
E(Xt
jFt 1 ) = (1
d
N0 )(Xt
) and V ar(Xt jFt 1 ) = V ar("t ) =
2
:
Initial values in CSS estimation of fractional models
6
As an example we get, for d = 1 and = 0, the well-known result from the autoregressive
model that E(Xt jFt 1 ) = Xt 1 and V ar(Xt jFt 1 ) = 2 . In model (4) this implies that the
prediction error decomposition given Xn ; N0 < n N , is the conditional sum of squares,
T0
X
(Xt E(Xt jFt 1 ))2
=
V
ar(X
t jFt 1 )
t=N +1
2
T0
X
(
d
))2 ;
N0 (Xt
t=N +1
which is used in the conditional Gaussian likelihood function (9) below.
2.4
Estimation of the conditional fractional model
We would like to consider the conditional (Gaussian) likelihood of fXt ; N + 1
given initial values fXn ; N0 + 1 n N g, which is given by
T
log
2
T0
X
1
2
2
2
d
(
N0 (Xt
T0 g
t
))2 :
(9)
t=N +1
If in fact we have observed all available data, such that N0 = 0 as in, e.g., the Gallup poll
data we can use (9) for N0 = 0. More commonly, however, data is not available all the way
back to inception at time N0 + 1, so we consider the situation that the series exists for
t > N0 , but we only have observations for t
1, as in the volatility data example. We
d
d
therefore replace the truncated …lter
0 and suggest using the (quasi) likelihood
N0 by
conditional on fXn ; 1 n N g,
L(d; ;
2
T
log
2
)=
1
2
2
2
T0
X
(
d
0 (Xt
))2 :
(10)
t=N +1
That is, (10) is an approximation to the conditional likelihood (9), where (10) has the
advantage that it can be calculated based on available data from t = 1 to T0 = N + T . It is
clear from (10) that we can equivalently …nd the (quasi) maximum likelihood estimators of
d and by minimizing
T0
1 X
( d (Xt
))2
(11)
L(d; ) =
2 t=N +1 0
with respect to d and .
We …nd from (46) in Lemma A.4 that
d
0 (Xt
)=
d
0 Xt
t 1
X
n(
d) =
d
0 Xt
t 1(
d + 1) =
d
0 Xt
0t (d)
;
n=0
where we have introduced
d + 1). The estimator of
PT0
d
+1 ( 0 Xt ) 0t (d)
^(d) = t=N
;
PT0
2
(d)
0t
t=N +1
0t (d)
=
t 1(
for …xed d is
P 0
2
provided Tt=N
+1 0t (d) > 0. The conditional quasi-maximum likelihood estimator of d can
then be found by minimizing the concentrated objective function
PT0
T0
d
2
1 X
1 ( t=N
+1 ( 0 Xt ) 0t (d))
2
d
( 0 Xt )
;
(12)
L (d) =
PT0
2
2 t=N +1
2
(d)
0t
t=N +1
Initial values in CSS estimation of fractional models
7
P 0
2
which has no singularities at the points where Tt=N
+1 0t (d) = 0, see Theorem 1. Thus,
the conditional quasi-maximum likelihood estimator d^ can be de…ned by
d^ = arg min L (d)
(13)
d2D
for a parameter space D to be de…ned below.
This is a type of conditional-sum-of-squares (CSS) estimator for d. The …rst term of (12)
is standard, and the second takes into account the estimation of the unknown initial level
at the inception of the series at time N0 + P
1.
0
2
For d = d0 and = 0 we …nd, provided Tt=N
+1 0t (d0 ) > 0, that
PT0
t=N +1 "t 0t (d0 )
;
^(d0 )
0 = PT0
2
(d
)
0t
0
t=N +1
P
T
0
2 1
which has mean zero and variance 02 ( t=N
that does not go to zero when
+1 0t (d0 ) )
P
T
2
0
2
d0 > 1=2 because then 0
t=N +1 0t (d0 ) is bounded in T0 , see (59) in Lemma B.1. Thus
we have that, even if d = d0 , ^(d0 ) is not consistent.
In the following we also analyze another estimator, d^c , constructed by choosing to center
the observations by a known value rather than estimating as above. The known value,
say C, used for centering, could be one of the observed initial values, e.g. the …rst one, or
an average of these, or it could be any known constant. This can be formulated as choosing
= C in the likelihood function (10) and de…ning
d^c = arg min Lc (d);
(14)
d2D
T0
1 X
(
Lc (d) =
2 t=N +1
d
0 (Xt
C))2 ;
(15)
which is also a CSS estimator. A commonly applied estimator is the one obtained by not
centering the observations, i.e. by setting C = 0. In that case, an initial non-zero level of
the process is therefore not taken into account.
The introduction of centering and of the parameter , interpreted as the initial level
of the process, thus allows analysis of the e¤ects of centering the observations in di¤erent
ways (and avoid the, possibly unrealistic, phenomenon described immediately after (5) when
^ where the initial
= 0). We analyze the conditional maximum likelihood estimator, d,
level is estimated by maximum likelihood jointly with the fractional parameter, and we also
analyze the more traditional CSS estimator, d^c , where the initial level is “estimated” using
a known value C, e.g. zero or the …rst available observation, X1 .
In practice we split a given sample of size T0 = N + T into (observed) initial values
N +T
fXn gN
n=1 and observations fXt gt=N +1 to be modeled, and then calculate the likelihood (12)
based on the truncated …lter d0 as an approximation to the model (4) starting at N0 + 1.
In order to discuss the error implied by using this choice in the likelihood function, we
derive in Theorem 2 a computable expression for the asymptotic second-order bias term in
the estimator of d via a higher-order stochastic expansion of the estimator. This bias term
depends on all observed and unobserved initial values and the parameters. In Corollary 1
and Theorems 3 and 4 we further investigate the e¤ect on the bias of setting aside the data
from t = 1 to N as initial values.
Initial values in CSS estimation of fractional models
2.5
8
A relation to the ARFIMA model
The simple model (1) is a special case of the well-known ARFIMA model,
A(L)
d
Xt = B(L)"t ; t = 1; : : : ; T;
where A(L) and B(L) depend on a parameter vector and B(z) 6= 0 for jzj
model, the conditional likelihood depends on the residuals
"t (d; ) = B(L) 1 A(L)
d
d
Xt = b( ; L)
1. For this
Xt ;
and when b( ; L) = 1 we obtain model (1) as a special case.
For the ARFIMA model the analysis would depend on the derivatives of the conditional
likelihood function, which would in turn be functions of the derivatives of the residuals.
Again, to focus on estimation of d we consider the remaining parameter
…xed at the
true value 0 . For a function f (d) we denote the derivative of f with respect to d as
@
f (d) (Euler’s notation), and the relevant derivatives are
Df (d) = @d
Dm "t (d; )jd0 ;
0
= b(
m
0 ; L)D
d
Xt jd0 = (log )m b(
0 ; L)
d0
Xt = (log )m "t :
Thus, for this more general model, the derivatives of the conditional likelihood with respect
to d, when evaluated at the true values, are identical to those of the residuals from the simpler
model (1). We can therefore apply the results from the simpler model more generally, but
only if we know the parameter 0 . If has to be estimated, the analysis becomes much
more complicated. We therefore focus our analysis on the simple model.
3
Main results
Our main results hold only for the true value d0 > 1=2, that is, for nonstationary processes,
which is therefore assumed in the remainder of the paper. However, we maintain a large
compact parameter set D for d in the statistical model, which does not assume a priori
knowledge that d0 > 1=2, see Assumption 2.
3.1
First-order asymptotic properties
The …rst-order asymptotic properties of the CSS estimators d^ and d^c derived from the likelihood functions L (d) and Lc (d) in (12) and (15), respectively, are given in the following
theorem, based on results of JNP(2012a) and Nielsen (2015). To describe the results, we use
s
Riemann’s zeta function, s = 1
j=1 j ; s > 1, and speci…cally
2
=
1
X
j=1
2
j
2
=
6
' 1:6449 and
3
=
1
X
j
j=1
3
' 1:2021:
(16)
We formulate two assumptions that will be used throughout.
Assumption 1 The errors "t are i.i.d.(0;
2
0)
with …nite fourth moment.
Assumption 2 The parameter set for (d; ; 2 ) is D R R+ , where D = [d; d]; 0 < d <
d < 1. The true value is (d0 ; 0 ; 02 ), where d0 > 1=2 is in the interior of D.
Theorem 1 Let the model for the data Xt ; t = 1; : : : ; N + T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the functions L (d) in (12) and Lc (d) in (15) have no
singularities forpd > 0, and the estimators d^ and d^c derived from L (d) and Lc (d), respectively, are both T -consistent and asymptotically distributed as N (0; 2 1 ).
Initial values in CSS estimation of fractional models
3.2
9
Higher-order expansions and asymptotic bias
To analyze the asymptotic bias of the CSS estimators for d, and in particular how initial
values in‡uence the bias, we need to examine higher-order terms in a stochastic expansion
of the estimators, see Lawley (1956). The conditional (negative pro…le log) likelihoods L (d)
and Lc (d) are given in (12) and (15). We …nd, see Lemma B.4, that the derivatives satisfy
DL (d0 ) = OP (T 1=2 ), D2 L (d0 ) = OP (T ), and D3 L (d) = OP (T ) uniformly in a neighborhood
^ = 0 around d0 gives
of d0 , and a Taylor series expansion of DL (d)
1
d0 )D2 L (d0 ) + (d^
2
^ = DL (d0 ) + (d^
0 = DL (d)
d0 )2 D3 L (d );
P
where d is an intermediate value satisfying jd
d0 j
jd^ d0 j ! 0. We then insert
1=2 ~
1~
3=2
^
~
d d0 = T
G1T + T G2T + OP (T
) and …nd G1T = T 1=2 DL (d0 )=D2 L (d0 ) and
~ 2T = 1 T (DL (d0 ))2 D3 L (d )=(D2 L (d0 ))3 , which we write as
G
2
T 1=2 (d^
1=2
DL (d0 )
1 D2 L (d )
0
T
T
d0 ) =
1
T
2
1=2
(
T
T
1=2
DL (d0 ) 2 T
)
1 D2 L (d )
T
0
1
D3 L (d )
+ OP (T
1 D2 L (d )
0
1
): (17)
Based on this expansion, we …nd another expansion T 1=2 (d^ d0 ) = G1T + T 1=2 G2T +
D
oP (T 1=2 ) with the property that (G1T ; G2T ) ! (G1 ; G2 ) and E(G1T ) = E(G1 ) = 0. Then
the zero- and …rst-order terms of the bias are zero, and the second-order asymptotic bias
term is de…ned as T 1 E(G2 ).
^ In order to formulate
We next present the main result on the asymptotic bias of d.
the results, we de…ne some coe¢ cients that depend on N; N0 ; T , and on initial values and
( 0 ; 02 ; d) (we suppress some of these dependencies for notational convenience),
0t (d)
0
X
=
t n(
d)(Xn
(18)
0 );
n= N0 +1
1t (d)
=
tX
1 N
k
=
t 1(
N
X
t n k(
d)(Xn
d + 1); and
0)
N
X
D
t n(
d)(Xn
0 );
(19)
n=1
n= N0 +1
k=1
0t (d)
1
1t (d)
=
D
t 1(
(20)
d + 1):
PT0
2
For two sequences fut ; vt g1
t=1 , we de…ne the product moment hu; viT = 0
t=N +1 ut vt ,
see e.g. Lemma B.1. The main contributions to the bias are expressed for d = d0 in terms of
h 0 ; 0 iT
h 0 ; 0 i2T
(h 0 ; 1 iT + h 1 ; 0 iT ) +
h 1;
h 0 ; 0 iT
h 0 ; 0 i2T
C
2
(C
0 )(h 0 ; 1 iT + h 1 ; 0 iT ) + (C
0) h 1;
N;T (d) = h 0 ; 1 iT
X
2
(t s) 1 t ( d + 1) s ( d + 1)=h 0 ; 0 iT :
N;T (d) = 0
N;T (d) = h 0 ;
1 iT
0 iT ;
(21)
0 iT ;
(22)
(23)
N s<t N +T 1
Note that (21)–(23) are all invariant to scale because of the normalization by 02 . Also note
that, even if h 0 ; 0 iT = 0, the ratio h 0 ; 0 iT =h 0 ; 0 iT as well as N;T (d) are well de…ned,
see Theorem 1 and Appendix C.1.
Initial values in CSS estimation of fractional models
10
Theorem 2 Let the model for the data Xt ; t = 1; : : : ; N + T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the asymptotic biases of d^ and d^c are
^ =
bias(d)
bias(d^c ) =
where limT !1 j
N;T (d0 )j
(T 2 ) 1 [3
3 2
(T 2 ) 1 [3
3 2
< 1, limT !1 j
1
+
N;T (d0 )
1
+
C
N;T (d0 )]
N;T (d0 )j
+
N;T (d0 )]
+ o(T
1
+ o(T
1
(24)
);
(25)
);
< 1, and limT !1 j
C
N;T (d0 )j
< 1.
The leading bias terms in (24) and (25) are of the same order of magnitude in T , namely
O(T 1 ). First, the …xed term, 3 3 2 2 , derives from correlations of derivatives of the likelihood
and does not depend on initial values or d0 . The second term in (24), N;T (d0 ), is a function
of initial values and d0 , and can be made smaller by including more initial values (larger N )
as shown in Corollary 1 below. The third term in (24), N;T (d0 ), only depends on (N; T; d0 ).
C
If we center the data by C, and do not correct for , we get the term N;T
(d0 ) in (25).
However, if we estimate we get N;T (d0 ) + N;T (d0 ) in (24), where N;T (d0 ) is independent
of initial values and only depends on (N; T; d0 ). The coe¢ cients 0t (d) and 1t (d) are linear
C
(d) are quadratic in initial
in the initial values, and hence the bias terms N;T (d) and N;T
values scaled by 0 .
The …xed bias term, 3 3 2 2 , is the same as the bias derived by Lieberman and Phillips
(2004) for the estimator d^stat , based on the unconditional likelihood (3) in the stationary
1=2
case, 0 < d0 < 1=2. They showed that the distribution function of 2 T 1=2 (d^stat d0 ) is
FT (x) = P (
1=2 1=2 ^
(dstat
2 T
d0 )
x) = (x) + T
1=2
3=2
3 2
(x)(2 + x2 ) + O(T
1
);
where (x) and (x) denote the standard normal distribution and density functions, respectively. Using D (x)(2 + x2 ) =
(x)x3 , we …nd that an approximation to the expectation
1=2
of 2 T 1=2 (d^stat d0 ), based on the …rst two terms, is given by
Z
3=2
3=2
1=2
T
x (x)x3 dx = T 1=2 3 3 2 ;
3 2
which shows that the second-order bias of d^stat , derived for 0 < d0 < 1=2, is the same as the
the second-order …xed bias term of d^ derived for d0 > 1=2 in Theorem 2.
The dependence of the bias in Theorem 2 on the number of observed initial values, N ,
is explored in the following corollary.
Corollary 1 Under the assumptions of Theorem 2, we obtain the following bounds for the
components of the bias terms for d^ and d^c when d > 1=2,
max(j
C
N;T (d)j; j N;T (d)j)
c(1 + N )
min(d;2d 1)+
for any 0 < < min(d; 2d
1):
(26)
The result in Corollary 1 shows how the bias term arising from not observing all initial
values decays as a function of the number of observed values set aside as initial values, N .
More generally, the results in this section shows that a partial bias correction is possible.
^ the second-order bias in
That is, by adding the terms (T 2 ) 1 3 3 2 1 and (T 2 ) 1 N;T (d),
d^ and d^c can be partly eliminated, but the bias due to (T 2 ) 1 N;T (d0 ) can only be made
smaller by increasing N .
Initial values in CSS estimation of fractional models
11
A di¤erent type of bias correction was used by Davidson and Hashimzade (2009, eqn.
(4.4)) in an analysis of the Nile data. They considered the CSS estimator when all initial
values are set to zero in the stationary case. To capture the e¤ect of the left-out initial
values, they introduce a few extra regressors that are found
components
P as the …rst principal
n
of the variance matrix of the n = 150 variables x = f 1
(
d)X
g
.
s k s=1
k=s k
3.3
Further results for special cases
C
The expressions for N;T (d), N;T
(d), and N;T (d) in (21)–(23) show that they depend on
C
(N; T; d) and, in the case of N;T (d) and N;T
(d), also on all initial values. In order to get an
impression of this dependence, we derive simple expressions for various special cases.
C
First, when d is an integer, we …nd simple results for N;T (d), N;T
(d), and N;T (d), and
hence the asymptotic bias, as follows.
C
N;T (d)
Theorem 3 Under the assumptions of Theorem 2 it holds that
following two cases:
(i) If d = k for an integer k such that 1 k N ,
(ii) If d = 1 and N 0.
In either case, the asymptotic biases of d^ and d^c are given by
^ =
bias(d)
bias(d^c ) =
(iii) If d0 = N +1 then
N;T (d0 )
(T 2 ) 1 (3
(T 2 ) 1 3
3 2
1
3 2
1
+
N;T (d0 ))
+ o(T
^ =
= 0 and bias(d)
1
+ o(T
1
=
N;T (d)
= 0 in the
);
):
(T 2 ) 1 (3
3 2
1
+
N;T (N
+1))+o(T
1
):
It follows from Theorem 3(i) that for d = 1 we need one initial value (N
1) and for
C
d = 2 we need two initial values (N 2), etc., to obtain N;T
(d) = N;T (d) = 0. Alternatively,
for d0 = 1, Theorem 3(ii) shows that there will be no contribution from initial values to the
second-order asymptotic bias even if N = 0, and Theorem 3(iii) shows that when N = 0, it
^ = (T 2 ) 1 3 3 1 +o(T 1 ). Since the bias term
also holds that 0;T (1) = 0 such that bias(d)
2
is continuous in d0 , the same is approximately true for a (small) neighborhood of d0 = 1.
Note that the results in Theorem 3 show that in the cases considered, the estimators d^
and d^c can be bias corrected to have second-order bias equal to zero.
We …nally consider the special case with N0 = 0, where all available data is observed.
We use the notation (d) = D log (d) to denote the Digamma function.
Theorem 4 If N0 = 0 and N
^ =
bias(d)
bias(d^c ) =
where
N;T (d0 )
0 then
(T 2 ) 1 [3
3 2
(T 2 ) 1 [3
3 2
is de…ned in (23) and
C
N;T (d0 )
=
N;T (d0 )
(C
C
N;T (d0 )
= 0 and the biases of d^ and d^c are given by
1
+
N;T (d0 )]
+ o(T
1
);
(27)
1
+
C
N;T (d0 )]
+ o(T
1
);
(28)
1 iT :
(29)
simpli…es to
0 )h 0 ; 1 iT
+ (C
2
0) h 0;
In particular, for N0 = N = 0 we get the analytical expressions
^ =
bias(d)
(T 2 ) 1 [3
3 2
bias(d^c ) =
(T 2 ) 1 [3
3 2
1
1
( (2d0 1)
(d0 ))] + o(T
2
2d0 2
(C
0)
( (2d0
2
d0 1
0
1
(30)
);
1)
(d0 ))] + o(T
1
): (31)
Initial values in CSS estimation of fractional models
12
It follows from Theorem 4 that if we have observed all possible data, that is N0 = 0,
then we get a bias of d^ in (27) and of d^c in (28) and (29). The bias of d^ comes from the
estimation of and the bias of d^c depends on the distance C
0.
^
With N0 = 0 as in Theorem 4, we note that the biases of d and d^c do not depend on
unobserved initial values. It follows that (27) can be used to bias correct the estimator d^
and (28) to bias correct the estimator d^c . For d^ this bias correction gives a second-order bias
of zero, but for d^c the correction is only partial due to (29).
Although the asymptotic bias of d^ is of order O(T 1 ), we note that the asymptotic
standard deviation of d^ is (T 2 ) 1=2 , see Theorem 1. That is, for testing purposes or for
calculating con…dence intervals for d0 , the relevant quantity is in fact the bias relative to the
asymptotic standard deviation, and this is of order O(T 1=2 ). To quantify the distortion of
the quantiles (critical values), we therefore focus on the magnitude of the relative bias, for
which we obtain the following corollary by tabulation.
Corollary 2 Letting T0 = N + T be …xed and tabulating the relative bias,
^ =
(T 2 )1=2 bias(d)
((T0
N ) 2)
1=2
[3
3 2
1
+
N;T0 N (d0 )];
see (27), for N = 0; : : : ; T0 2 and d0 > 1=2, the minimum value is attained for N = 0.
Thus, we achieve the smallest relative (and also absolute) bias of d^ by choosing N = 0.
4
Application to Gallup opinion poll data
As an application and illustration of the results, we consider the monthly Gallup opinion
poll data on support for the Conservative and Labour parties in the United Kingdom. They
cover the period from January 1951 to November 2000, for a total of 599 months. The two
series have been logistically transformed, so that, if Yt denotes an observation on the original
series, it is mapped into Xt = log(Yt =(100 Yt )). A shorter version of this data set was
analyzed by Byers et al. (1997) and Dolado et al. (2002), among others.
Using an aggregation argument and a model of voter behavior, Byers et al. (1997) show
that aggregate opinion poll data may be best modeled using fractional time series methods.
The basic …nding of Byers et al. (1997) and Dolado et al. (2002) is that the ARFIMA(0,d,0)
model, i.e. model (1), appears to …t both data series well and they obtain values of the
integration parameter d in the range of 0.6–0.8.
4.1
Analysis of the voting data
In light of the discussion in Section 2.2, we work throughout under the assumption that Xt
was not observed prior to January 1951 because the data series did not exist, and we truncate
the …lter correspondingly, i.e., we consider model (4). Because we observe all available data,
we estimate d (and ; ) by the estimator d^ setting N = N0 = 0 and take T = 599 following
Theorem 4 and Corollary 2.
The results are presented in Table 1. Since we have assumed that N = N0 = 0, we can
bias correct the estimator using (30) in Theorem 4, and the resulting estimate is reported in
Table 1 as d^bc . Two conclusions emerge from the table. First, the estimates of d (and ) are
quite similar for the two data series, but the estimates of are quite di¤erent. Second, the
bias corrections to the estimates are small. More generally, the estimates obtained in Table
1 are in line with those from the literature cited above.
Initial values in CSS estimation of fractional models
13
Table 1: Estimation results for Gallup opinion poll data
Conservative
Labour
^
d
0:7718
0:6940
d^b c
0:7721
0:6914
^
0:0097
0:4313
^
0:1098
0:1212
Note: The table presents parameter estimates for the Gallup opinion poll data with T = 599 and N0 = N = 0.
The subscript ‘bc’denotes the bias corrected estimator, c.f. (30). The asymptotic standard deviation of d^ is
given in Theorem 1 as (T 2 =6) 1=2 ' 0:032.
4.2
An experiment with unobserved initial values
We next use this data to conduct a small experiment with the purpose of investigating how
the choice of N in‡uences the bias of the estimators of d, if there were unobserved initial
values. For this purpose, we assume that the econometrician only observes data starting
in January 1961. That is, January 1951 through December 1960 are N0 = 120 unobserved
initial values. We then split the given sample of T0 = 479 observations into initial values
(N ) and observations used for estimation (T ), such that N + T = 479. We can now ask the
questions (i) what is the consequence in terms of bias of ignoring initial values, i.e. of setting
N = 0, and (ii) how sensitive is the bias to the choice of N for this particular data set.
To answer these questions we apply (24) and (25) from Theorem 2. We note that N;T (d)
C
(d) depend on the unobserved initial values, i.e. on Xn ; N0 < n 0, which in this
and N;T
example are the 120 observations from January 1951 to December 1960. To apply Theorem
2 we need (estimates of) d0 ; 0 ; 0 . For this purpose we use (d^bc ; ^; ^ ) from Table 1.
The results are shown in Figure 1. The top panels show the logistically transformed
opinion poll data for the Conservative (left) and Labour (right) parties. The shaded areas
mark the unobserved initial values January 1951 to December 1960. The bottom panels
show the relative bias in the estimators of d as a function of N 2 [0; 24], and the starred
3=2
straight line denotes the value of the …xed (relative) bias term, (T0 N ) 1=2 3 3 2 .
The estimators are d^ in (13) and d^c in (14) either with C chosen as the average of the T0
observations, denoted d^c in the graph, or with C = 0, denoted d^0 in the graph. That is, for
d^0 the series have not been centered, and for d^c the series have been centered by the average
of the T0 observed values. The latter two estimators are the usual CSS estimators with and
without centering of the series.
In Figure 1 we note that the relative bias of d^0 is larger for the Labour party series because
the last unobserved initial values are larger in absolute value than those of the Conservative
party series. In particular, if one does not condition on initial values and uses N = 0, the
relative bias of d^0 is 0:45 for the Labour party series and 0:05 for the Conservative party
series. It is clear from the …gure that the relative bias of d^0 for the Labour party series can
be reduced substantially and be made much closer to the …xed bias value by conditioning
on just a few initial values. The same conclusions can be drawn for d^c but reversing the
roles of the two series. The reason is that, after centering the series by the average of the
T0 observations, it is now for the Conservative party series that the last unobserved initial
values are di¤erent from zero, while those of the Labour party series are close to zero.
^ where the initial level or centering parameter, , is estimated jointly with
Finally, for d,
Initial values in CSS estimation of fractional models
14
Figure 1: Application to Gallup opinion poll data
Conservative party data
Labour party data
0.5
0.5
0.0
0.0
-0.5
-0.5
-1.0
-1.0
1960
1970
1980
Relative bias as function of N
d^
0.4
d^c
1990
2000
d^0
1960
1970
1980
Relative bias as function of N
0.2
0.2
0.0
0.0
-0.2
-0.2
0
4
8
12
16
d^
0.4
20
24
0
4
8
d^c
12
1990
2000
d^0
16
20
24
Note: The top panels show (logistically transformed) opinion poll time series and the bottom panels show
the relative bias for three estimators of d as a function of the number of chosen initial values, N , when the
…rst N0 = 120 observations have been reserved as unobserved initial values (shaded area). The estimators
are d^ in (13) and d^c in (14) either with C chosen as the average of the T0 observations, denoted d^c in
the graph, or with C = 0, denoted d^0 in the graph. The starred line denotes the …xed (relative) bias,
3=2
(T0 N ) 1=2 3 3 2
.
d, we …nd that the relative bias is increasing in N . The reason for this is that N;T (d)
dominates N;T (d), at least for this particular data series. With N = 0 the relative bias is
very small and the estimator d^ is better than the other two estimators.
5
Conclusion
In this paper we have analyzed the e¤ect of unobserved initial values on the asymptotic bias
of the CSS estimators, d^ and d^c , of the fractional parameter in a simple fractional model, for
d0 > 1=2. We assume that we have data Xt for t = 1; : : : ; T0 = N + T; and model Xt by the
truncated …lter d0N0 (Xt
0. We derive estimators from
0 ) = "t for t = 1; : : : ; T0 and N0
d
d
) = "t or 0 (Xt C) = "t by maximizing the respective conditional
the models 0 (Xt
Gaussian likelihoods of XN +1 ; : : : ; XT0 given X1 ; : : : ; XN .
^ consisting of
We give in Theorem 2 an explicit formula for the second-order bias of d,
three terms. The …rst is a constant, the second, N;T (d0 ), depends on initial values and
decreases with N , and the third, N;T (d0 ), does not depend on initial values. The …rst and
third terms can thus be used in general for a (partial) bias correction. In Theorem 4 we
Initial values in CSS estimation of fractional models
15
simplify the expressions for the case when N0 = 0, so that all data are observed. In this
^ at least to second order. We further
case we can completely bias correct the estimator d,
^
…nd that for d the smallest bias appears for the choice N = 0. This choice is used for the
analysis of the voting data in Section 4.1 where the bias correction is also illustrated.
In Section 4.2 we illustrate the general results with unobserved initial values, again using
the voting data. Here we show that, when keeping N0 = 120 observations for unobserved
initial values, the estimator d^ with N = 0 has the smallest bias. Thus, the idea of letting the
parameter capture the initial level of the process eliminates the e¤ect of the unobserved
initial values, at least in this example.
Appendix A
The fractional coe¢ cients
In this section we …rst give some results of Karamata. Because they are well known we
sometimes apply them in the remainder without special reference.
Lemma A.1 For m
N
X
0 and c < 1;
(1 + log n)m n
c(1 + log N )m N
+1
if
>
1;
(32)
(1 + log n)m n
c(1 + log N )m N
+1
if
<
1:
(33)
n=1
1
X
n=N
Proof. See Theorems 1.5.8–1.5.10 of Bingham, Goldie, and Teugels (1987).
We next present some useful results for the fractional coe¢ cients (2) and their derivatives.
P
Lemma A.2 De…ne the coe¢ cient aj = 1fj 1g jk=1 k 1 , where 1fAg denotes the indicator
function for the event A. The derivatives of j ( ) are
m
D log
j (u)
m+1
= ( 1)
j 1
X
i=0
u)!(j + u
j!
2
D j (u) = 2D j (u)(aj+u 1 a
D j (u) = ( 1)
u(
1
for u 6= 0; 1; : : : ; j + 1 and m
(i + u)m
1)!
u)
for u = 0; 1; : : : ; j + 1 and j
for u = 0; 1; : : : ; j + 1 and j
(34)
1;
(35)
2;
(36)
2:
Proof of Lemma A.2. The result (34) follows by taking derivatives in (2) for u 6=
0; 1; : : : ; j + 1. For u = i and i = 0; 1; : : : ; j 1 we …rst de…ne
P (u) = u(u + 1) : : : (u + j
noting that
j (u)
1); Pk (u) =
P (u)
P (u)
; Pkl (u) =
for k 6= l:
u+k
(u + k)(u + l)
= P (u)=j!, see (2). We then …nd
X
DP (u) =
Pk (u) and D2 P (u) =
0 k j 1
X
Pkl (u);
0 k6=l j 1
which we evaluate at u = i for i = 0; 1; : : : ; j 1. However, for such i we …nd Pk ( i) = 0
unless k = i and Pkl ( i) = 0 unless k = i or l = i. Thus,
DP (u)ju=
i
= Pi ( i) = ( i)( i + 1) : : : ( 1)
(1)(2) : : : (j
1
i) = ( 1)i i!(j
i
1)!
Initial values in CSS estimation of fractional models
16
and (35) follows because D j (u) = DP (u)=j!, see (2). Similarly (36) follows from
X
X
X
D2 P (u)ju= i =
Pki ( i) +
Pil ( i) = 2
Pki ( i)
k6=i
l6=i
k6=i
X 1
X Pi ( i)
= 2Pi ( i)
= 2Pi ( i)(aj
=2
k i
k i
k6=i
k6=i
For u = 0; 1; 2; : : :, we note that
non-zero even for such values of j where
j (u)
=
j
Y
i+u
i
i=1
Qj
1
=
N (u)
j
Y
N , then
(1 + (u
jDm j (u)j
jDm N;j (u)j
0 and j
1j (u)j
N;j (u)
(37)
= 1 for j = N .
c(1 + log j)m j u 1 ;
c(1 + log j)m j u 1 :
2j (u)j
ju 1
(1 +
(u)
(38)
(39)
=
(40)
1j (u));
! 0 as j ! 1 for any compact set K
N;j (u)
where supv2K j
N (u) N;j (u)
1 we also have the more precise evaluations
j (u) =
where supu2K j
1)=i) =
i=N +1
with N;j (u) = i=N +1 (1 + (u 1)=i) for j > N and
For m 0 and j 1 it holds that
For m
ai ):
u + 1, but Dm j (u) remains
= 0 for j
j (u) = 0.
j (u)
Lemma A.3 Let N be an integer and assume j
i 1
N!
j u 1 (1 +
(u + N )
Rnf0; 1; : : : g; and
(41)
2j (u));
! 0 as j ! 1 for any compact set K
Rnf N; (N + 1); : : : g:
Proof. To show (37), we …rst note
QN that for j = N the result is trivial. For j > N we
factor out the …rst N coe¢ cients, i=1 (i + u 1)=i = N (u). The product of the remaining
coe¢ cients is denoted N;j (u). The results (38) and (40) for j (u) can be found in JN (2012a,
Lemma A.5), and the results
Pj (39) and (41) for N;j (u) can be found in the same way from
a Taylor’s expansion of i=j0 log(1 + (u 1)=i) for j > j0 1 u:
P
Lemma A.4 Let aj = 1fj 1g jk=1 k 1 . Then,
= 1 and 1 (u) = u for any u;
D 0 (u) = 0 and Dm 1 (u) = 1fm=1g for m 1 and any u;
D j (0) = j 1 1fj 1g and D2 j (0) = 2j 1 aj 1 1fj 2g ;
(42)
(43)
(44)
0 (u)
m
k
X
n=j
jDm j (0)j
Dm
n(
cj
u) = Dm
1
(1 + log j)m 1 1fj
k(
u + 1)
Dm
1+
for m
1g
cj
j 1(
u + 1) for m
1 and any
> 0;
0 and any u;
(45)
(46)
Initial values in CSS estimation of fractional models
1
X
Dm
n(
u) =
Dm
u + 1) for m
j 1(
17
0 and u > 0;
(47)
n=j
k
X
n (u) k n (v)
=
k (u
+ v) for any u; v:
(48)
n=0
Proof of Lemma A.4. Result (42) is well known and follows trivially from (2), and (43)
follows by taking derivatives in (42). Next, (44) and (45) follow from Lemmas A.2 and A.3.
To prove (46) with m = 0 multiply the identity nu = u n 1 + nu 11 by ( 1)n to get
u 1
u 1
u
:
( 1)n 1
= ( 1)n
n 1
n
n
Summation from n = j to n = k yields a telescoping sum such that
( 1)n
k
X
u
( 1)n
n
n=j
u
= ( 1)k
1
( 1)k
k
u
j
1
1
;
1
which in terms of the coe¢ cients n ( ) gives the result. Take derivatives to …nd (46) with
m 1. From (38) of Lemma A.3, Dm k ( u + 1) c(1 + log k)m k u ! 0 as k ! 1 when
u > 0 which shows (47). Finally, (48) follows from the Chu-Vandermonde identity, see Askey
(1975, pp. 59–60).
Lemma A.5 For any ;
t 1
X
n
it holds that
1
(t
n)
1
c(1 + log t)tmax(
+
1;
+
1
1;
1)
(49)
:
n=1
For
+
< 1 and > 0 it holds that
1
X
(k + h) 1 k 1 (1 + log(k + h))n
ch
(1 + log h)n :
(50)
k=1
Proof of Lemma A.5. (49): See JN (2010, Lemma B.4).
(50): We …rst consider the summation from k = 1 to h:
h1
h
X
(k + h)
1
k
1
(1 + log(k + h))n
c(1 + log 2h)n h
k=1
n
c(1 + log h)
Z
1
1
h
X
k
( + 1)
h
k=1
1
(1 + u)
u
1
1
k
( )
h
1
du:
0
The integral is …nite for > 0 and all because 1 1 + u 2.
To evaluate the summation from k = h+1 to 1 we note that log(k+h) log(2k)
for h k: This gives the bound
1
1
X
X
(k + h) 1 k 1 (1 + log(k + h))n c
(h + k) 1 k 1 (1 + log k)n
k=h+1
c
k=h+1
1
X
k
k=h
see (33) of Lemma A.1.
+
2
(1 + log k)n
ch
+
1
c log k
(1 + log h)n ;
Initial values in CSS estimation of fractional models
Lemma A.6 For d > 1=2 and 2d
1
X
d
n=0
1
X
d
n=0
1
1
n
1
n
n
@
@u
d
u > 0 it holds that
1
d
1
n
18
u
(2d 1
(d) (d
=
u
u)
=
u)
2d
d
2d 2
( (2d
d 1
=
u=0
2
u
1
1)
;
(d)):
Proof of Lemma A.6. With the notation a(n) = a(a + 1) (a + n 1), Gauss’s Hypergeometric Theorem, see Abramowitz and Stegun (1964, p. 556, eqn. 15.1.20), shows that
1
X
a(n) b(n)
n=0
For a =
b)
for c > a + b:
b)
d + 1 + u, and c = 1, we have c
d + 1, b =
1
X
d
n=0
c(n) n!
(c) (c a
(c a) (c
=
1
d
1
n
n
a
b = 2d
u > 0 so that
1
1
X
( d + 1)(n) ( d + 1 + u)(n)
=
n!
n!
n=0
u
(1) (2d 1 u)
=
(d) (d u)
=
2d
2
d
u
1
with derivative with respect to u as given, using @ log (d + u)=@uju=0 =
Appendix B
(d).
Asymptotic analysis of the derivatives
We …rst analyze d0 (Xt C) and introduce some notation. From Lemma 1 we have an
expression for Xt ; t = 1; : : : ; N + T; and we insert that into d0 Xt and …nd, using d0 Xt =
P
t 1
N + 1 we have
n=0 n ( d)Xt n and (46), that for t
d
0 (Xt
d
N Xt
C) =
t 1
X
+
n(
d)Xt
n
d d0
"t
N
+
t 1
X
n=t N
d d0
"t
N
=
d d0
f
N
n(
+
n(
d)C
n=0
n=t N
=
t 1
X
t+N
0 1
X
d0 )Xt
t+N0 1 (
n
n=t N
d)Xt
t (d)
n(
n
t 1(
0t (d)(C
d0 + 1) 0 g
d + 1)C
(51)
0 );
where
t (d) =
tX
1 N
k (d0
d)
+
k=0
k (d0
t n
k ( d0 )Xn +
d)
t+N0 k 1 (
N
X
t n(
d)Xn
n=1
n= N0 +1
k=0
tX
1 N
N
X
d0 + 1)
0
t 1(
d + 1) 0 :
Initial values in CSS estimation of fractional models
d
0 (Xt
The derivatives of
C) with respect to d, evaluated at d = d0 , are of the form
d0
0 (Xt
Dm
19
+
C) = Smt
+
mt (d0 )
mt (d0 )(C
(52)
0 );
where
mt (d)
= ( 1)m Dm
+
and the stochastic term Smt
is de…ned, for t
Smt = ( 1)m
+
= ( 1)m
Smt
1
X
Dm
k=0
tX
1 N
d + 1)
N + 1, as
+
= Smt
+ Smt ;
k (0)"t k
Dm
t 1(
k (0)"t
k
and Smt = ( 1)m
1
X
Dm
k (0)"t k :
Dm
t n(
k=t N
k=0
The main deterministic term is
mt (d)
m+1
= ( 1)
N
X
[
n= N0 +1
tX
1 N
Dm
tX
1 N
m
D
k (0) t k n (
d)Xn
N
X
d)Xn
(53)
n=1
k=0
k (0) t+N0 k 1 (
d + 1)
0
+ Dm
t 1(
d + 1) 0 ]:
k=0
P +T
P1
2
We use the notation hu; viT = 0 2 N
0
t=N +1 ut vt !
t=N +1 ut vt = hu; vi, if the limit
exists.
We …rst give the order of magnitude of the deterministic terms and product moments
containing these.
Lemma B.1 The functions
j
j
mt (d)
satisfy
ct d ;
0t (d)j
mt (d)j
c(t
(54)
min(1;d)+
N)
for m
For d > 1=2 it follows that, for any 0 < < min(d; 2d
h
max(jh 0 ;
jh
m ; n iT
m;
1 iT j; jh 1 ;
n iT j
!h
0 iT j)
N + 1, and any
1, t
m; ni
c(1 + N )
c(1 + N )
> 0:
(55)
1),
< 1; m; n
(56)
0;
min(d;2d 1)+
; m; n
min(d;2d 1)+
:
(57)
0;
(58)
If N = 0 it holds that
h 0;
0 iT
!
2
0
2d 2
d 1
and h 0 ;
1 iT
2
!
0
2d 2
( (2d
d 1
1)
(d)) :
(59)
If Assumption 1 holds then
+
hSm
;
+
hSm
;
+
where E(hSm
;
n iT )
+
= E(hSm
;
n iT )
n iT
n iT
P
+
! hSm
;
P
+
! hSm
;
+
= E(hSm
;
n i;
m; n
0;
(60)
n i;
m; n
0;
(61)
n i)
+
= E(hSm
;
n i)
= 0.
Initial values in CSS estimation of fractional models
Proof of Lemma B.1. (54): The expression for
0
X
0t (d) =
t n(
d)Xn +
t n(
d)(Xn
is
0t (d)
t+N0 1 (
20
d + 1)
t 1(
0
d + 1)
0
n= N0 +1
0
X
=
(62)
0 );
n= N0 +1
P
see (46) of Lemma A.4. Using the bound j t n ( d)j c(t n) d 1 we …nd 0n= N0 +1 j t n ( d)j
ct d for n 0, see (38) of Lemma A.3, and the result follows.
(55): The remaining deterministic terms with m 1 are evaluated using j( 1)m+1 Dm k (0)j
ck 1+ 1fk 1g for > 0, see (45) of Lemma A.4, and we …nd, for t N + 1,
j
mt (d)j
1 tX
1 N
X
c
n= N
+c
c[
1+
k
(t
k + n)
d 1
+c
k=1
tX
1 N
k
k
1+
1+
(t
n)
d 1+
n=1
k=1
tX
1 N
N
X
(t + N0
(t
k
k
N)
d
d
1)
+ c(t
+ (t
N)
1)
d+
d+
]
k=1
c[(1 + log(t
N ))(t
N)
min(1;d)+
+ (t
N)
d+
]
c(t
N)
min(1;d)+2
;
where we have used (49) of Lemma A.5.
(56): From (55) we …nd j mt (d) nt (d)j
c(t N ) 2 min(1;d)+2 so that jh m ; m ij < 1 by
choosing 2 < 2 min(1; d) 1 = min(1;
1), which is possible for d > 1=2.
P2d
1
t min(1;d)+ (t + N 1) d . If 1=2 < d < 1 we apply
(57): Similarly we …nd h n ; m iT c t=1
P1
c(1 + N )1 2d+ , and if d 1
(50) of Lemma A.5 to obtain the result t=1 (t + N ) d t d+
d
d+2
2
we use (t + N )
(1 + N )
t
for 2 < d and …nd
1
X
(t + N ) d t
1+
(1 + N )
d+2
t=1
1
X
t
1
c(1 + N )
d+2
t=1
(58): The proofs for h 0 ; 1 iT and h 1 ; 0 iT are the same as for (57).
P
P
2
(59): For N = 0 we …nd h 0 ; 0 iT = Tn=01 d n 1 and h 0 ; 1 iT = 21 D Tt=1
the result follows from Lemma A.6.
(60): We have
1
X
+
Smt
nt (d)
:
=
t=N +T +1
=
1
X
t=N +T +1
1
X
[
nt (d)(
1)
m+1
tX
1 N
Dm
2
0t (d)
t k (0)"k
k=1
1
X
k=1 t=max(T;k)+N +1
nt (d)(
1)m+1 Dm
t k (0)]"k :
such that
(63)
Initial values in CSS estimation of fractional models
21
For some small > 0 to be chosen subsequently, we use the evaluations j nt (d)j
c(t
min(1;d)+
m
1+
min(1;d)+
min(1;d)+
N)
, jD t k (0)j c(t k)
1ft k 1g , and t
= (t k + k)
(t k) 2 k min(1;d)+3 . Then
V ar(
1
X
+
Smt
nt (d))
c
t=N +T +1
c
1
X
[
1
X
min(1;d)+
t
k=1 t=max(T;k)+1
1
X
2 min(1;d)+6
k
1
X
[
k=1
(t + N
(t
k)
1
k)
1+ 2
]
]2 :
t=max(T;k)+1
P
P
2 min(1;d)+6
For T ! 1 we have 1
k) 1 ! 0, and because 1
<1
t=max(T;k)+1 (t
k=1 k
P1
+
we get by dominated convergence that V ar( t=N +T +1 Smt nt (d)) ! 0. This shows that
P1
P
+
+
+
; ni = P
; n iT ! hSm
hSm
t=N +1 Smt nt (d).
P1 P1
+
m+1 m
(61): We use (63) and …nd 1
D t k (0)]"k ,
t=N +T +1 Smt nt (d) =
k=1 [
t=max(T;k)+1 nt (d)( 1)
and the proof is completed as for (60).
We next de…ne the (centered) product moments of the stochastic terms,
+
MmnT
2
=
0
T
1=2
N
+T
X
+ +
(Smt
Snt
+ +
E(Smt
Snt ));
(64)
t=N +1
as well as the product moments derived from the corresponding stationary processes,
2
MmnT =
0
T
1=2
N
+T
X
(Smt Snt
E(Smt Snt )):
t=N +1
The next two lemmas give their asymptotic behavior, where we note that
+ +
E(S0t
Smt ) = E(S0t Smt ) = 0 for m 1:
P1
Lemma B.2 Suppose Assumption 1 holds and let 2 =
j=1 j
P1
3
' 1:2021, see (16). Then
3 =
j=1 j
2
E(M01T
)
=
2
0
T
1
N
+T
X
2
E(S1t
)=
(65)
2
=
2
=6 ' 1:6449 and
(66)
2;
t=N +1
E(M01T M02T ) =
4
0
T
N
+T
X
1
E(S0t S1t S0s S2s ) =
2
0
s;t=N +1
E(M01T M11T ) =
4
0
T
1
N
+T
X
T
1
N
+T
X
E(S1t S2t ) =
2 3;
t=N +1
2
E(S0t S1t S1s
)=
4 3;
(67)
(68)
s;t=N +1
E(hS0+ ;
+
0 iT hS1 ;
0 iT =
=
4
0
N
+T
X
+
E(S0s
+
0s (d)S1t 0t (d))
s;t=N +1
2
0
X
N s<t N +T 1
(t
s)
1
t(
d + 1) s ( d + 1):
(69)
Initial values in CSS estimation of fractional models
22
It follows that, for N = 0 and T ! 1;
0;T (d)
E(hS0+ ; 0 iT hS1+ ;
h 0 ; 0 iT
=
0 iT )
!
( (2d
1)
(70)
(d)):
Furthermore, for T …xed and N ! 1; see also (37),
N;T (d) =
P
N s<t N +T 1 (t
P
1
s)
N;t (
N;t
N +1 t N +T
T 1
X
d + 1) N;s ( d + 1)
!
t
2
1 ( d + 1)
t=1
P1
Proof of Lemma B.2. (66): From S0t = "t , S1t =
2
E(M01T
)
=
4
0
E[T
1=2
N
+T
X
"t
t=N +1
1
X
1
2
2
k "t k ] =
0
T
1
k 1 "t k , and (65) we …nd
N
+T
X
1
1
X
X
1
2
E[
k "t k ] =
k
(67): We …nd using the expressions for S0t ; S1t ; and S2t = 2
together with (65) that
E(M01T M02T ) =
2
4
0
T
1
E[
N
+T
X
"t
t=N +1
1
X
1
k "t k ][
N
+T
X
k=1
P1
j=2
"s
1
X
1
j
(j
1
2
=
2:
k=1
aj 1 "t j , aj = 1fj
aj 1 )"s j ] =
2
0
1g
1
T
j=2
s=N +1
k=1
1)=T: (71)
(T
k=1
t=N +1
k=1
1
Pj
k=1
N
+T
X
t=N +1
and
2
0
T
1
N
+T
X
E(S1t S2t ) =
2
2
0
T
t=N +1
1
N
+T
X
E[
t=N +1
=
2T
1
N
+T
X
1
X
t=N +1 j=2
1
X
k 1 "t
k
j 1
j
X
(j
1
aj 1 )"t j ]
j=2
k=1
2
1
X
k
k=1
1
=
2
1
X
j
j=2
2
j 1
X
k
1
=
2
3
(72)
k=1
Pj 1 1
P
2
= 1
k=1 k . We thus need to show thatp 3 = 3 .
j=2 j
1 is the imaginary unit, c( ) =
Let f ( ) = log(1 ei ) = 21 c( ) + i ( ), where i =
i
log(2(1 cos( )), ( ) = arg(1 e ) = (
)=2 for 0 < < , and ( ) =
( ). The
transfer function of Smt is Dm (1 ei )d d0 jd=d0 = f ( )m , so that the cross spectral density
2 R
between Smt and Snt is f ( )m f ( )n and E(Smt Snt ) = 2 0
f ( )m f ( )n d . Because
c( ) is symmetric around zero and ( ) is anti-symmetric around zero, i.e. ( ) =
( ),
it follows that
for
3
c( )3 = (f ( ) + f (
))3 = f ( )3 + 3f ( )2 f (
) + 3f ( )f (
)2 + f (
)3 :
Noting that the transfer function of S0t = "t is f ( )0 = 1 and integrating both sides we …nd
2 Z
0
c( )3 d = E(S3t S0t ) + 3E(S2t S1t ) + 3E(S1t S2t ) + E(S0t S3t ):
2
The left-hand side is given as 12 02 3 in Lieberman and Phillips (2004, p. 478) and the
right-hand side is 12 02 3 from (65) and (72), which proves the result.
k 1,
E(S1t S2t )
Initial values in CSS estimation of fractional models
23
(68): We …nd, using the expressions for Smt and (65), that E(M01T M11T ) is
4
0
T
1
N
+T
X
2
E(S0t S1t S1s
)=
N
+T
X
1
T
s;t=N +1
s;t=N +1
t 1
X
E["t
s 1
X
(t k) 1 "k
(s j) 1 "j
j= 1
k= 1
s 1
X
(s i) 1 "i ]:
i= 1
The only contribution comes for t = j > k = i or t = i > k = j and therefore t < s. These
two contributions are equal, so we …nd, using s k = s t + t k,
2T
1
N
+T
X
N
+T
X
t 1
X
1
1
(t k) (s t) (s k)
1
= 2T
t=N +1 s=t+1 k= 1
2
[v
k[
1
2] and v = t
1
+ (u
v) ]u
2
=4
1
X
2
u
u=2
which proves (68).
+
+
(69): From S0s
= "s , S1t
=
t 1 ( d + 1) we get
+
0 iT hS1 ;
t 1
X
[(t k) 1 +(s t) 1 ](s k) 2 :
v < u] and …nd
k [1
u=2 v=1
EhS0+ ;
N
+T
X
t=N +1 s=t+1 k= 1
Next we introduce u = s
1 X
u 1
X
N
+T
X
1
0 iT =
Pt
1 N
k=1
4
0
N
+T
X
k 1 "t
+
E(S0s
k
u 1
X
v
1
=4
3
= 4 3;
v=1
Pt
1
k=N +1 (t
=
k) 1 "k , and
0t (d)
=
+
0s (d)S1t (d) 0t (d)
s;t=N +1
2
=
0
X
(t
s)
1
t 1(
d + 1)
s 1(
d + 1):
N +1 s<t N +T
(70): For N = 0 we use D
X
D
0 s<t<1
1
X
D
=
t=1
t s (u)ju=0
t s (u)ju=0 s (
= (t
s) 1 1ft
d + 1) t ( d + 1) =
s 1g
1
X
and …nd the limit
D t ( d + 1 + u)ju=0 t ( d + 1)
t=1
d
1
t
u
ju=0
d
1
t
=
2d 2
( (2d
d 1
1)
(d))
using (48) and Lemma A.6. From (59) we …nd the limit of h 0 ; 0 iT .
(71): From (37) weP
…nd the representation in (71), where we have cancelled the factor
2
2
( d+1) . Note that N +1 t N +T N;t 1 ( d+1)2
N;N ( d+1) = 1 and N;t ( d+1) =
QNt
P
d=i) ! 1 for N ! 1 and t N + 1, so that N;T (d) ! T 1 N s<t N +T 1 (t
i=N +1 (1
PT 1 1
s) 1 = i=1
i
(T 1)=T .
Lemma B.3 Suppose Assumption 1 holds. Then, for T ! 1, it holds that fMmnT g0 m;n 3
are jointly asymptotically normal with mean zero, and some variances and covariances can
be calculated from (66), (67), and (68) in Lemma B.2. It follows that the same holds for
+
fMmnT
g0 m;n 3 with the same variances and covariances.
Initial values in CSS estimation of fractional models
24
Proof of Lemma B.3. fMmnT g: We apply a result by Giraitis and Taqqu (1998) on limit
distributions of quadratic forms of linear processes. We de…ne the cross covariance function
rmn (t) = E(Sm0 Snt ) =
1)m+n
2
0(
1
X
Dm
n
k (0)D
t+k (0)
k=0
and …nd r00 (t) = 02 1ft=0g , rm0 (t) = 02 ( 1)m Dm t (0)1ft
For m; n 1 we …nd that jrmn (t)j is bounded for a small
c
1
X
(1 + log(t + k))m 1 (1 + log k)n 1 (t + k) 1 k
1
c
k=1
2
0(
1g ,
and r0n (t) =
> 0 by
1
X
(t + k)
1+
k
1+
1)n Dn t (0).
ct
1+3
;
k=1
P1
using the bound (t + k) 1+
k 2 t 1+3 . Thus t= 1 rmn (t)2 < 1, and joint asymptotic
normality of fMmnT g0 m;n 3 then follows from Theorem 5.1 of Giraitis and Taqqu (1998).
The asymptotic variances and covariances can be calculated as in (66), (67), and (68) in
Lemma B.2.
+
+
fMmnT
g: We show that E(MmnT MmnT
)2 ! 0. We …nd
MmnT
+
MmnT
2
=
0
T
N
+T
X
1=2
+
+
(Smt
Snt + Smt Snt
+ Smt Snt
+
+
E(Smt
Snt + Smt Snt
+ Smt Snt ));
t=N +1
(73)
and show that the expectation term converges to zero and that each of the stochastic terms
has a variance
PN +Ttending+to zero.
+
T 1=2 t=N
+1 E(Smt Snt + Smt Snt + Smt Snt ) ! 0: The …rst two terms are zero because
+
Smt
and Snt are independent. For the last term we …nd using (45) of Lemma A.4 that
jE(Smt Snt )j =
1
X
2
0
m
k=t N
so that
T
1=2
n
jD
k (0)D
k (0)j
c
1
X
k
2+
c(t
N)
1+
;
k=t N
N
+T
X
E(Smt Snt )
cT
1=2+
t=N +1
! 0:
(74)
PN +T
+
V ar(T 1=2 t=N
+1 Smt Snt ) ! 0: The …rst two terms of (73) are handled the same way.
+
+
We …nd because (Smt ; Sns
) is independent of (Smt ; Sns ) that
V ar(T
1=2
N
+T
X
+
Snt )
Smt
1
=T
t=N +1
N
+T
X
+
+
Sns )
E(Smt
Snt Sms
tX
1 N
= jE(
m
D
t
i (0)"i
X
i=1
sX
1 N
(t
i)
1+
(s
N
+T
X
+ +
E(Smt
Sms )E(Snt Sns ):
s;t=N +1
> 0, we …nd for jDm
t i (0)j
c(t
min(s;t) 1 N
m
D
s
j (0)"j )j =
j=1
i=1
min(s;t) 1 N
c
=T
s;t=N +1
Then replacing the log factors by a small power,
i) 1 (1 + log(t i))m c(t i) 1+ that
+ +
Sms )j
jE(Smt
1
2
0
X
i=1
i)
1+
:
jDm
m
t i (0)D
s i (0)j
Initial values in CSS estimation of fractional models
Now take s > t and evaluate (s
+ +
jE(Smt
Sms )j
i)
c(s
1+
t)
= (s
tX
1 N
1+3
E(Snt Sns ) = E(
n
D
jE(Snt Sns )j
c
N
X
t i (0)"i
i= 1
we …nd
i)
(t
1+
1
i)
(s
t)
c(s
t)
1+3
1+3
(t
2
i)
and
:
i=1
Similarly for
N
X
t+t
25
n
D
s j (0)"j )
2
0
=
j= 1
N
X
(t
1+
i)
i= 1
c(s
N
X
Dn
n
t i (0)D
s i (0)
i= 1
(s
i)
1+
=c
1
X
1+
(t + i)
1+
(s + i)
i= N
t)
1
X
1+3
(t + i)
1
1+3
c(s
t)
t)
1+3
(t
cT
1
(t
N) :
i= N
Finally, we can evaluate the variance as
1=2
V ar(T
N
+T
X
+
Smt
Snt )
t=N +1
1=2
PN +T
t=N +1
1=2
V ar(T
1
T 1
X
T
N
+T
X
N ) (s
t)
1+3
T
Xh
t
T1
t=1
! 0:
Smt Snt ) ! 0: The last term of (73) has variance
Smt Snt ) = T
1
E([
t=N +1
1
h
2+6
h=1
N
+T
X
and the …rst term is T
(s
N +1 t<s N +T
= cT
V ar(T
X
1
cT
N
+T
X
Smt Snt ]2 )
T
1
[
s;t=N +1
N
X
E(Smt Snt )]2
t=N +1
t=N +1
PN +T
1
N
+T
X
E(Smt Snt Sms Sns ) which equals
E(Dm
n
m
n
t i (0)"i D
t j (0)"j D
s k (0)"k D
s p (0)"p ):
s;t=N +1 i;j;k;p= 1
There are contributions from E("i "j "k "p ) in four cases which we treat in turn.
P +T
2
Case 1, i = j 6= k = p: This gives the expectation squared, T 1 [ N
t=N +1 E(Smt Snt )] ;
which is subtracted to form the variance.
Cases 2 and 3, i = k 6= j = p and i = p 6= j = k: These are treated the same way. We
…nd for Case 2 the contribution
N
+T
1
X
X
1
A1 cT
(1 + log(t + i))m (1 + log(s + i))m (t + i) 1 (s + i) 1
s;t=N +1 i= N
1
X
(1 + log(t + j))n (1 + log(s + j))n (s + j) 1 (t + j)
1
j= N
cT
1
N
+T
X
[
1
X
s;t=N +1 i= N
(t + i)
1+
(s + i)
1+ 2
]
cT
1
X
[
1
X
N +1 t<s N +T i= N
(t + i)
1+
(s + i)
1+ 2
]:
Initial values in CSS estimation of fractional models
1+
We evaluate (s + i)
1
X
(t + i)
1+
= (s
1
X
1+
(s + i)
1+
t + t + i)
i= N
(s
(s
1+3
t)
1+3
t)
(t + i)
2
(t + i)
1
(s
26
so that
1+3
t)
(t
N)
i= N
and hence
A1
cT
X
1
(s
t)
2+6
(t
N)
2
= cT
N +1 t<s N +T
1
T 1
X
h
T
Xh
2+6
t
2
cT
1
T1
2
! 0:
2 2
! 0:
t=1
h=1
Case 4, i = j = p = k: This gives in the same way the bound
T
1
N
+T
X
1
X
2+
(t + i)
(s + i)
2+
cT
1
1
N
+T
X
X
[
(t + i)
2
2
]
cT
1
X
i=1
i= N t=N +1
s;t=N +1 i= N
1
i
We now apply the previous Lemmas B.1, B.2, and B.3, and …nd asymptotic results for
the derivatives Dm L (d0 ).
Lemma B.4 Let the model for the data Xt ; t = 1; : : : ; N +T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the (normalized) derivatives of the concentrated likelihood
function L (d), see (12), satisfy
2
0
T
1=2
2
T
1
D L (d0 ) = B0 + T
2
T
1
D3 L (d0 ) = C0 + OP (T
0
0
DL (d0 ) = A0 + T
2
1=2
1=2
A1 + OP (T
1
);
(75)
B1 + OP (T
1
(log T ));
(76)
1=2
(77)
);
where
+
A0 = M01T
; E(A1 ) = N;T (d0 ) +
+
+
B0 = 2 ; B1 = M11T
+ M02T
;
C0 = 6 3 :
N;T (d0 );
(78)
(79)
(80)
+
Here N;T (d0 ); N;T (d0 ); and MmnT
; are given in (21), (23), and (64), respectively, and
2
=
=6
and
'
1:2021;
see
(16).
2
3
The (normalized) derivatives of Lc (d), see (15), satisfy (75)–(77) and (79)–(80), but (78)
is replaced by
+
C
; E(A1 ) = N;T
(d0 );
(81)
A0 = M01T
where
C
N;T (d0 )
is given by (22).
Proof of Lemma B.4. The concentrated sum of squared residuals is given in (12). We
note that the …rst term is OP (T ), and from Lemmas B.1 and B.2 the next is OP (1), so the
^ However, for the bias we
second term has no in‡uence on the asymptotic distribution of d.
need to analyze it further.
Initial values in CSS estimation of fractional models
27
We need an expression for the derivatives of the concentrated likelihood, i.e., Dm L (d).
Recall L(d; ) from (11) and denote derivatives with respect to d and by subscripts. Then
L (d) = L(d; (d)) and therefore
DL (d) = Ld (d; (d)) + L (d; (d)) d (d)
D2 L (d) = Ldd (d; (d)) + 2Ld (d; (d)) d (d) + L (d; (d)) d (d)2 + L (d; (d))
dd (d);
but ^ is determined from L (d; (d)) = 0, which implies Ld (d; (d))+L (d; (d)) d (d) = 0,
and hence
DL (d) = Ld (d; (d));
(82)
2
Ld (d; (d))
:
L (d; (d))
D2 L (d) = Ldd (d; (d))
(83)
We …nd the derivatives for d = d0 and suppress the dependence on d0 in the following. Thus
0t = 0t (d0 ) and 1t = 1t (d0 ), etc.
+
(75) and (78): We …nd from (52) that Dm d00 (Xt
) = Smt
+ mt
mt (
0 ), and
therefore from (82),
2
0
T
1=2
2
DL =
0
N
+T
X
1=2
T
+
(S0t
+
0t
+
0 ) 0t )(S1t
(^
+
1t
(^
0 ) 1t );
t=N +1
= (hS0+ ; 0 iT + h 0 ; 0 iT )=h 0 ;
0 ) = h 0 ; 0 iT =h 0 ; 0 iT we get
where ^
0 = ^ (d0 )
Cov(X; Y ) and E(^
E(
2
0
T
1=2
0
2
DL ) =
0
1=2
T
N
+T
X
(
0t
t=N +1
+
2
0
T
1=2
N
+T
X
t=N +1
The …rst term is T 1=2
equal to T 1=2 times
N;T ,
+
0 iT hS0 ;
1 iT
EhS0+ ;
h
h 0;
=
h 0;
h 0;
h 0;
+
Cov((S0t
0 iT
0 iT
0 iT .
Since E(XY ) = E(X)E(Y ) +
0t )( 1t
hS0+ ; 0 iT
h 0 ; 0 iT
h 0;
h 0;
0 iT
0 iT
+
0t ); (S1t
1t )
hS0+ ; 0 iT
h 0 ; 0 iT
1t )):
+
+
see (21). The second term is, using Cov(S0t
; S1t
) = 0, see (65),
EhS1+ ; 0 iT hS0+ ; 0 iT
EhS0+ ; 0 i2T h 0 ; 1 iT
+
h 0 ; 0 iT
h 0 ; 0 i2T
0 ; 0 iT
EhS1+ ; 0 iT hS0+ ; 0 iT
h 0 ; 1 iT
EhS1+ ; 0 iT hS0+ ;
1 iT
+
=
h 0 ; 0 iT
h 0 ; 0 iT
h 0 ; 0 iT
0 iT
0 iT
=
N;T ;
see (69) and (23).
(76) and (79): The …rst term of T 1 D2 L in (83) is analyzed below and is of the order of 1
and T 1=2 . In the second term of (83) we …nd L (d0 ; (d0 )) = 02 h 0 ; 0 iT = O(1) and
Ld (d0 ; (d0 )) = T
1
N
+T
X
+
+ 0t
(S0t
t=N +1
(^
0 ) 0t ) 1t +T
1
N
+T
X
t=N +1
+
0t (S1t + 1t
(^
0 ) 1t )
= OP (1);
Initial values in CSS estimation of fractional models
1
and hence T
2
0
1
T
Ld (d0 ; (d0 ))2 =L (d0 ; (d0 )) = OP (T
2
2
DL =
0
N
+T
X
1
T
+
(S1t
+
(^
1t
0 ) 1t )
1
28
) and can be ignored. Thus we get
2
t=N +1
2
+
0
N
+T
X
1
T
+
+
(S0t
(^
0t
+
0 ) 0t )(S2t
+
(^
2t
0 ) 2t )
+ OP (T
1
):
t=N +1
By Lemma B.1 it holds that h
2
0
1
T
2
2
DL =
0
+
= O(1) and hSm
;
m ; n iT
1
T
N
+T
X
+ 2
E(S1t
) +T
1=2
n iT
= OP (1) such that
+
+
(M11T
+ M02T
) + OP (T
1
)
t=N +1
=
2
+T
1=2
+
+
(M11T
+ M02T
) + OP (T
1
(log T ))
using also (66) and (74).
(77) and (80): For the third derivative it can be shown that the extra terms involving
derivatives d (d0 ) and dd (d0 ) can be ignored and we …nd
2
0
1
T
3
2
DL =
0
N
+T
X
1
3T
+
(S1t
+
+
0 ) 1t )(S2t
(^
1t
+
2t
(^
0 ) 2t )
t=N +1
2
+
0
N
+T
X
1
T
+
+
(S0t
0t
(^
+
0 ) 0t )(S3t
+
(^
3t
0 ) 3t )
+ OP (T
1
)
t=N +1
1=2
= 3T
+
M12T
+3
2
0
1
T
N
+T
X
+ +
E(S1t
S2t ) + T
1=2
+
M03T
+ OP (T
1
)=
6
3
+ OP (T
t=N +1
where the second-to-last equality uses Lemma B.1 and the last equality uses Lemmas B.2
and B.3, (67), and (74).
(81): For the function Lc (d), see (15), we …nd
2
0
1=2
T
2
DLc =
0
T
1=2
N
+T
X
+
(S0t
+
0t
(C
+
0 ) 0t )(S1t
+
1t
(C
0 ) 1t );
t=N +1
with expectation given by
2
0
T
1=2
N
+T
X
(
0t
(C
0 ) 0t )( 1t
(C
0 ) 1t )
+
2
0
T
t=N +1
=
2
0
T
1=2
N
+T
X
1=2
N
+T
X
+
+
Cov(S0t
; S1t
)
t=N +1
(
0t
(C
0 ) 0t )( 1t
(C
0 ) 1t )
=T
1=2 C
N;T (d0 ):
t=N +1
The remaining derivatives give the same results as for L . Notice that the two factors in the
sum in the score are independent so there is no term corresponding to N;T .
1=2
);
Initial values in CSS estimation of fractional models
Appendix C
C.1
29
Proofs of main results
Proof of Theorem 1
We …rst show that the likelihood functions have no singularities. When t N + 1 we can
use the decomposition t 1 ( d + 1) = N ( d + 1) N;t 1 ( d + 1), see (37). We …nd in the
second term of L (d) in (12) that the factor N ( d + 1)2 cancels and
P +T
PN +T
2
d
d
2
( t=N
[ N
+1 ( 0 Xt ) 0t (d))
t=N +1 ( 0 Xt ) N;t 1 ( d + 1)]
=
:
PN +T
PN +T
2
2
(d)
(
d
+
1)
0t
N;t
1
t=N +1
t=N +1
P +T
2
2
This is a di¤erentiable function of d because N
N;N ( d + 1) = 1,
t=N +1 N;t 1 ( d + 1)
see (37). Note, however, that ^(d) has singularities at the points d = 1; 2; : : : ; N .
^ The proof for d^c is similar, but simpler because in
We next discuss the estimator d.
that case ^(d) = C does not depend on d. The arguments generally follow those of JN
(2012a, Theorem 4) and Nielsen (2015, Theorem 1). To conserve space we only describe the
di¤erences in detail.
C.1.1
Existence and consistency of the estimator
The function L (d) in (12) is the sum of squares of
d
0 (Xt
^(d)) =
d d0
"t
N
+
t (d)
(^(d)
0 ) 0t (d);
see (51), so that we need to analyze product moments of the terms on the right-hand side,
appropriately normalized. The deterministic term t (d) was analyzed under the assumption
of bounded initial values in JN (2012a, Lemma A.8(i)) as Dit ( ) with b = d, i = k = 0, and
0 = 0 = 0, where it was shown that
sup
1=2
d d0 d d0
j t (d)j ! 0 and
sup
d d0 d d0
1=2
max jtd
d0 +1=2
1 t T
t (d)j
! 0 as t ! 1:
d d0
This shows that t (d) is uniformly smaller than N
"t (appropriately normalized on the
intervals 1=2
d d0 d d0 and d d0 d d0
1=2
), and is enough to show
that in the calculation of product moments we can
t (d), which will be done below.
Pignore
N +T
The product moment of the stochastic term, t=N +1 ( dN d0 "t )2 , is analyzed in Nielsen
(2015) under Assumption 1 of …nite fourth moment. Following that analysis, for some
0 < < 1=2 to be determined, we divide the parameter space into intervals where dN d0 "t is
nonstationary, “near critical”, or (asymptotically) stationary according to d d0
1=2
; 1=2
d d0
1=2 + , or 1=2 +
d d0 .
Clearly, d0 is contained in the interval 1=2 +
d d0 , and we show that on this
interval the contribution from the second term in the objective function
RT (d) = T
1
N
+T
X
(
d d0
"t )2
N
T
=T
N
+T
X
(
t=N +1
P +T
d
[ N
N
t=N +1
PN +T
t=N +1
t=N +1
1
1
d d0
"t )2
N
T
1 AT (d)
d0
"t
N;t 1 (1
N;t 1 (1
d)]2
d)2
2
BT (d)
;
(84)
Initial values in CSS estimation of fractional models
30
say, is negligible. It then follows that the objective function is only negligibly di¤erent
from the objective function obtained without the parameter , see e.g. Nielsen (2015), and
existence and consistency of d^ follows for the interval d d0
1=2 + .
The two intervals covering d d0
1=2 + require a more careful analysis, which is
given subsequently. Following the strategy of JN (2012a) and Nielsen (2015), we show that
for any K > 0 there exists a (…xed) > 0 such that, for these intervals,
P (inf RT (d) > K) ! 1 as T ! 1:
This implies that P (d^ 2 fd : d d0
parameter space is reduced to fd : d
has already been shown.
C.1.2
(85)
1=2 + g) ! 1 as T ! 1, so that the relevant
d0
1=2 + g on which existence and consistency
Tightness of product moments
We want to show that the remainder term, T 1 AT (d)2 =BT (d), in (84) is dominated by the
…rst term on various compact intervals. The function BT (d) is discussed below, and we want
to …nd the supremum of the suitably normalized product moment MT (d) = T + d (log T ) AT (d)
by considering it a continuous process on a compact interval K; that is, we consider it a
process in C(K), the space of continuous functions on K endowed with the uniform topology.
The usual technique is then to prove that the process MT is tight in C(K), which implies
that also supd2K jMT (d)j is tight, by the continuity of the mapping f 7! supu2K jf (u)j, that
is supd2K MT (d) = OP (1).
Tightness of MT can be proved by applying Billingsley (1968, Theorem 12.3), which
states that it is enough to verify the two conditions
EMT (d0 )2
E(MT (d1 ) MT (d2 ))2
c;
c(d1
2
d2 ) for d1 ; d2 2 K:
(86)
(87)
In one case we will also need the weak limit of the process MT , and in that case we apply
Billingsley (1968, Theorem 8.1), which states that if MT is tight then convergence of the
…nite dimensional distributions implies weak convergence. Thus, instead of working with the
processes themselves, we need only evaluate their second moments and …nite dimensional
distributions.
Speci…cally, by a Taylor series expansion of the coe¢ cients we …nd
m (d0
=
d1 ) N;t+m 1 (1
(d1 d2 )fD m (d0
d1 )
dm;t )
m (d0
N;t+m
d2 ) N;t+m 1 (1 d2 )
dm;t ) + m (d0 dm;t )D
1 (1
N;t+m 1 (1
dm;t )g
for some dm;t between d1 and d2 . It follows that if d1 and d2 are in the interval K, then also
dm;t 2 K, so that any uniform bound we …nd for EDMT (d)2 for d 2 K will also be valid for
dm;t . This shows that to prove tightness of MT (d), it is enough to verify
sup EMT (d)2
c and sup E(DMT (d))2
d2K
C.1.3
c:
(88)
d2K
Evaluation of product moments
We evaluate product moments on intervals of the form d 1=2
or d 1=2
; as well as
d d0
1=2
or d d0
1=2
. Some of these intervals may be empty, depending
on d and d, in which case the proof simpli…es easily, so we proceed assuming all intervals are
non-empty.
Initial values in CSS estimation of fractional models
P +T
The product moment BT (d) = N
d)2 : We …rst …nd that
t=N +1 N;t 1 (1
inf BT (d)
31
(89)
1
d 0
because BT (d)
d) = 1:
N;N (1
Next there are constants c1 ; c2 such that
0 < c1
T 2d 1 BT (d)
sup
c2 < 1:
d d 1=2
(90)
This follows from (41) because
T
2d 1
N
+T
X
N!
)2 T
d) = (
(1 d + N )
N;j (1
t=N +1
inf
d 1=2+
T 2d 1 BT (d)
T
BT (d)
cT
1
N
+T
X
t
( )2
T
t=N +1
The product moment AT (d) =
AT (d) =
N
+T
X
"t
1
c
2d
which again follows from (41) because (t=T )
2d 1
t
( )
T
t=N +1
Z
cT
1
(1 +
2t (d));
T
u2
1
PN +T
d d0
"t N;t 1 (1
N
t=N +1
N;t (d) =
NX
+T t
(91)
which implies that
du = c
1
N +1
N;t (d);
2d
((N + 1)=T )2
;
2
1
(t=T )2
2
((N + 1)=T )2
:
2
d): We …nd that
m (d0
d)
N;t+m 1 (1
d):
m=0
t=N +1
From (38) and (39) we …nd j
EAT (d)2 =
N
+T
X
] to (N != (1 d+N ))2 =(1 2d) which is bounded
1 2d:
which converges uniformly in d 2 [d; 1=2
between c1 and c2 because 2
1 2d
Finally,
1=2
1
2
N;t (d)j
N
+T
X
2
0
c
PN +T
t
m=0
2
N;t (d)
t=N +1
c
md0
N
+T
X
(t + m) d ; and
NX
+T t
f
t=N +1
d 1
md0
m=0
d 1
(t + m) d g2 ;
while DAT (d) contains an extra factor log(m(t + m)).
We give in Table 2 the bounds for EAT (d)2 for various intervals and normalizations.
These follow from …rst using the inequalities (t + m) d (t + m) 1=2+ when d 1=2
and T d (m + t) d ((m + t)=T ) 1=2+ when d 1=2
; and similarly for d d0 . We then
apply the result that
T
1
N
+T
X
fT
t=N +1
1
NX
+T t
(
m=0
because the left-hand side converges to
m
)
T
1=2+
R1 R1
f 0
0
v
(
u
t+m
)
T
1=2+
1=2+
(u + v)
g2 = O(1)
1=2+
dug2 dv:
Initial values in CSS estimation of fractional models
Second moment
EAT (d)2
ET 2d AT (d)2
ET 2(d d0 +1) AT (d)2
ET 4d 2d0 +2 AT (d)2
32
Table 2: Bounds for AT (d)
d d0
Upper bound on second moment
P P
1=2
( m 1=2+ (t + m) 1=2+ )2
P P
1=2
( m 1=2+ ( t+m ) 1=2+ )2
P P m 1=2+ T
1=2
( (T )
(t + m) 1=2+ )2
P P m
1=2
( ( T ) 1=2+ ( t+m
) 1=2+ )2
T
d
1=2
1=2
1=2
1=2
Order
T 1+2 +2
T 2+2
T 2+2
T3
Note: Uniform upper bounds on the normalized second moment of AT (d) for di¤erent restrictions on d and
d d0 . The bounds are also valid if we replace by
or by
.
Table 3: Bounds for CT;M (d)
d d0
Upper bound on second moment Order
PP
1=2+
1=2
+ m) 1=2+ )2
M 1+2 T 2
P(P m 1=2+ (tt+m
1=2
( m
( T ) 1=2+ )2
M 1+2 T
Second moment d
ECT;M (d)2
1=2
2d
2
ET CT;M (d)
1=2
Note: Uniform upper bounds on the normalized second moment of CT;M (d) for di¤erent restrictions on d
and d d0 .
PM 1
PN +T
d)"t n g N;t 1 (1
f
The product moment CM;T =
n (d0
n=0
t=N +M +1
we analyze another product moment, which we …nd by truncating the sum
Pt N 1
d)"t n at M = T for < 1; and de…ne
n (d0
n=0
CT;M (d) =
N
+T
X
d): Now
d d0
"t =
N
min(M 1;N +T t)
"t
N;M;t (d);
N;M;t (d)
X
=
t=N +2
m (d0
d)
N;t+m 1 (1
d): (92)
m=max(N +M +1 t;0)
The coe¢ cients are the same as for AT (d), but the sum N;M;t (d) only contains at most M
terms. We give in Table 3 the bounds for the second moment of CT;M (d), which are derived
using the same methods as for AT (d).
We now apply the above evaluations to study the objective function in the three intervals
d d0
1=2 + ; 1=2
d d0
1=2 + ; and 1=2
d d0 :
C.1.4
The stationarity interval: fd
d0
We want to show that
sup
fd d0
1=2+ g\D
jT
1=2 + g \ D
2
1 AT (d)
BT (d)
j = oP (1);
and consider two cases because of the di¤erent behavior of BT (d).
Case 1: If d
1=2
we let K = fd
1=2
; d d0
1=2 + g \ D and use
(89) to eliminate BT (d) and focus on AT (d). From Table 2 we …nd using ( ;
) that
1
2
2 2
supK E(T AT (d) ) = O(T
). For the derivative we get an extra factor log T in the
coe¢ cients and …nd supK E(T 1 (DAT (d))2 ) = O((log T )2 T 2 2 ).
It then follows from (86) and (87) that MT (d) = T 1=2+ (log T ) 1 AT (d) is tight. Because convergence in probability and tightness implies uniform convergence in probability it
follows that
sup T 1 AT (d)2 = OP (T 2 +2 (log T )2 ) = oP (1) for < :
d2K
Initial values in CSS estimation of fractional models
33
Case 2: If d 1=2
we de…ne K = fd 1=2
; d d0
1=2 + g \ D. From (90)
1
2
2d 2
2
we …nd that supK E(T AT (d) =BT (d))
c supK E(T
AT (d) ). From Table 2 we then
…nd for ( ;
) that supK E(T 2d 2 AT (d)2 ) = O(T 2 ). For the derivative we get an extra
factor log T . Thus, supK E(DT d 1=2 AT (d))2 = O((log T )2 T 2 ) and (log T ) 1 T T d 1 AT (d)
is tight, so that
2
1 AT (d)
sup jT
BT (d)
d2K
C.1.5
c sup T 2d 2 AT (d)2 = OP ((log T )2 T
j
2
) = oP (1):
d2K
The critical interval: f 1=2
d
1=2 + g \ D
d0
For this interval we show that (85) holds by setting su¢ ciently small. As in JN (2012a)
and Nielsen (2015) we apply a truncation argument. With M = T , for some > 0 to be
chosen below, let
d d0
"t
N
=
M
X1
n (d0
d)"t
n
+
n=0
tX
n 1
n (d0
d)"t
n
= w1t + w2t ; t
M + N + 1;
n=M
such that the objective function (84) is
RT (d) = T
1
N
+T
X
(
d d0
"t
N
N;t 1 (1
t=N +1
where vt = w2t
RT (d)
N;t 1 (1
T
1
AT (d) 2
)
d)
BT (d)
T
1
N
+T
X
(w1t + vt )2 ;
(93)
AT (d)
;
BT (d)
(94)
t=N +M +1
AT (d)
. We further …nd that
d) B
T (d)
N
+T
X
2
w1t
1
+ 2T
t=N +M +1
N
+T
X
w1t w2t
2T
1
CT;M (d)
t=N +M +1
where CT;M (d) is given by (92). The …rst two terms in (94) are analyzed in Nielsen (2015),
where it is shown that by setting su¢ ciently small, the …rst term can be made arbitrarily
large while the second is oP (1), uniformly on jd d0 + 1=2j
1 for some …xed 1 > . Thus
it remains to be shown that the third term of (94) is asymptotically negligible, uniformly on
the critical interval, that is,
sup
jd d0 +1=2j
1
jT
1
CT;M (d)
AT (d)
j = oP (1):
BT (d)
We consider two cases depending on d:
Case 1: Let K = f1=2
d; 1=2
d d0
1=2 + 1 g \ D. From (89) we have
1
1
BT (d)
1 and from Table 2 we …nd for ( ; 1 ) that supK EAT (d)2 = O(T 1+2 1 +2 ) and
supK E(DAT (d))2 = O((log T )2 T 1+2 1 +2 ) such that supK jAT (d)j = OP ((log T )T 1=2+ 1 + ):
From Table 3 for ( ; 1 ) we then …nd supK ECT;M (d)2 = O(M 1+2 1 T 2 ) = O(T (1+2 1 )+2 )
and also supK E(DCT;M (d))2 = O((log T )2 M 1+2 1 T 2 ) = O((log T )2 T (1+2 1 )+2 ); such that
supK jCT;M (d)j = OP ((log T )T (1=2+ 1 )+ ). This shows that
sup jT
d2K
1
CT;M (d)
AT (d)
j = OP ((log T )2 T
BT (d)
(1=2+
1)
(1=2
1
2 )
) = oP (1)
Initial values in CSS estimation of fractional models
34
for
< (1=2
2 )=(1=2 + 1 ).
1
Case 2: Let K = fd
1=2
; 1=2
d d0
1=2 + 1 g \ D. From (90)
1
we …nd supK jT 1 2d BT (d) 1 j
c, and we …nd from Table 2 that supK E(T d 1 AT (d))2 =
O(T 2 1 ) and therefore supK E(DT d 1 AT (d))2 = O((log T )2 T 2 1 ). From Table 3 we get
supK E(T d 1 CT;M (d))2 = O(M 1+2 1 T 1 ) = O(T (1+2 1 ) 1 ) and supK E(DT d 1 CT;M (d))2 =
O((log T )2 M 1+2 1 T 1 ) = O((log T )2 T (1+2 1 ) 1 ). Hence
sup jT
1
CT (d)
d2K
for
< (1=2
C.1.6
1 )=(1=2
+
AT (d)
j = OP ((log T )2 T
BT (d)
(1=2+
1)
(1=2
1)
) = oP (1)
1 ).
The nonstationarity interval: fd
d0
1=2
g\D
We give di¤erent arguments for di¤erent intervals of d, and we distinguish three cases.
Case 1: Let K = f1=2 +
d; d d0
1=2
g \ D. For this interval the main term of
RT (d) in (84) has been shown by Nielsen (2015) to satisfy (85), and it is su¢ cient to show,
with the normalization relevant to the nonstationarity interval, that
sup T 2(d
2
d0 ) AT (d)
BT (d)
d2K
(95)
= oP (1):
We use (89) to evaluate BT (d) 1 1 and …nd from Table 2 for ( ; ) that supK E(T 2(d
O(T 2 ) so that E(DT d d0 AT (d))2 = O((log T )2 T 2 ), which shows that
sup T 2(d
2
d0 ) AT (d)
BT (d)
d2K
= OP ((log T )2 T
2
d0 )
AT (d)2 ) =
) = oP (1):
Case 2: Let K = f1=2
d 1=2 + ; d d0
1=2
g \ D. Again the main term
of RT (d) in (84) has been shown by Nielsen (2015) to satisfy (85), and we therefore want to
show that
AT (d)2
supd2K T 2(d d0 ) T 2d 1 AT (d)2
sup jT 2(d d0 )
j
= OP (1);
(96)
BT (d)
inf d2K jT 2d 1 BT (d)j
d2K
but can be made arbitrarily small by choosing su¢ ciently small.
It follows from (91) that the denominator T 2d 1 BT (d) of (96) can be made arbitrarily
large by choosing su¢ ciently small, because (1 ((N + 1)=T )2 )=2 ! log(T =(N + 1))
for ! 0. We next prove that the numerator of (96) is uniformly OP (1), which proves the
result for Case 2. From Table 2 for ( ; ) we …nd supK E(T 2(d d0 ) T 2d 1 AT (d)2 ) = O(1). The
derivative of T d d0 T d N;t (d) is bounded by
N +T [T v]
T
1
X
m=N +1
(
m d0
)
T
d 1
(
[T v] + m
)
T
d
log(
m m + [T v]
(
));
T
T
R1 v
which converges to 0 ud0 d 1 (v + u) d log(u(u + v))du < 1 for d 2 K. Thus no extra
log T factor is needed in this case, and we …nd that T d d0 T d 1=2 AT (d) is tight, which proves
that supK jT d d0 T d 1=2 AT (d)j = OP (1).
Case 3: Finally, we assume d
d
1=2
and d d0
d d0
1=2
. We
1 2d
1
note that on this set the term T
BT (d) is uniformly bounded and uniformly bounded
Initial values in CSS estimation of fractional models
35
away from zero, see (90), so we factor it out of the objective function. We thus analyze the
objective function
RT (d) = T 2d
2
N
+T
X
(
d d0
"t )2 N;s 1 (1
N
d)2
(
d d0
"s ) N;s 1 (1
N
d d0
"t ) N;t 1 (1
N
d)(
d) :
t;s=N +1
The most straightforward approach would be to obtain the weak limit of T 2(d d0 +1=2) RT
from the weak convergence of T d d0 +1=2 dN d0 "t on d d0 1=2
and the uniform conver!
d
gence of T d N;[T u] 1 (1 d) ! (1 Nd+N
u
.
However,
the
former
would
require the existence
)
q
of Ej"t j for q > 1=(d d0 1=2)
1= with arbitrarily small, see JN (2012b), which
we have not assumed in Assumption 1. We therefore introduce dN d0 1 "t ; the cumulad d0
tion of N
"t ; to increase the fractional order su¢ ciently far away from the critical value
d d0 = 1=2, so the number of moments needed is q > 1=(1 + ). To this end we …rst
prove the following.
P
P
Lemma C.1 Let at ; bt ; t = 1; : : : ; T , be real numbers and At = ts=1 as ; Bt = ts=1 bs . Then
2
T (T
1)
T
X
(a2t b2s
t;s=1
at bs as bt )
(
2
T (T
1)
T 1
X
(bt AT
bt A t
bt+1 At ))2 :
t=1
Proof. We …rst …nd
T X
T
X
(a2t b2s
at b s as b t ) =
t=1 s=1
X
(a2t b2s + a2s b2t
2at bs as bt ) =
X
1 s<t T
1 s<t T
The proof is then completed by using the Cauchy-Schwarz inequality,
X
X
2
2
(
(at bs as bt ))2
(at bs
T (T 1) 1 s<t T
T (T 1) 1 s<t T
as bt )2 ;
PT
P
P
together with 1 s<t T (at bs as bt ) = Ts=11 bs (AT As )
t=2 bt At 1 .
2
2 2d
Applying Lemma C.1 to T
R (d) we …nd that for at =
T (T 1) T
d) it holds that RT (d) 2T 2(d0 d) 1 QT (d)2 where
N;t 1 (1
QT (d) = T
2d d0 1=2
T
1
NX
+T 1
(
N;t 1 (1
as b t ) 2 :
(at bs
d)(
d d0 1
"N +T 1 )
N
(
N;t 1 (1
d d0
"t
N
d)+
and bt =
N;t (1
d))(
d d0 1
"t )):
N
t=N +1
(97)
Following the arguments in JN (2012a) and Nielsen (2015), we show that QT (d) converges
weakly (in the space of continuous functions of d) to a random variable that is positive
almost surely.
Let K = fd d0 1
3=2
;d
1=2
g \ D. Assumption 1 ensures that we
have enough moments, q > max(2; 1=(1 + )), to apply the fractional functional central limit
theorem, e.g. Marinucci and Robinson (2000, Theorem 1), and …nd for each d 2 K that
Td
N;[T u] 1 (1
d)T d
d0 1=2
d d0 1
"[T u]
N
)
N!
(N + 1
d)
u d Wd0 d (u) as T ! 1 on D[0; 1];
Initial values in CSS estimation of fractional models
36
Ru
where “)”denotes weak convergence and Wd0 d (u) = ( (d0 d + 1)) 1 0 (u s)d0 d dW (s)
denotes fractional Brownian motion (of type II) and W denotes Brownian motion generated
by "t .
Because the integral is a continuous mapping of D[0; 1] to R it holds that
Z 1
N!
u d (Wd0 d (1) 2Wd0 d (u))du as T ! 1
(98)
QT (d) ) Q(d) =
(N + 1 d) 0
for any …xed d 2 K. We can establish tightness of the continuous process QT (d) by evaluating
the second moment, using the methods above. For all terms we see that it has the same
form as AT (d) except that (d d0 ; d) is replaced by (d d0 1; d) and hence the result
follows as the results for AT (d). This establishes tightness of QT (d) and hence strengthens
the convergence in (98) to weak convergence in the space of continuous functions of d on K
endowed with the uniform topology.
It thus holds that
inf RT (d)
d2K
2 inf T 2(d0
d) 1
d2K
QT (d)2 + oP (1)
where inf d2K QT (d)2 > 0 almost surely and
2T 2 inf QT (d)2 + oP (1);
d2K
> 0. It follows that, for any K > 0,
P (inf RT (d) > K) ! 1 as T ! 1;
d2K
which shows (85) and hence proves the result for Case 3.
C.1.7
Asymptotic normality of the estimator
To show asymptotic normality of d^ we apply the usual expansion of the score function,
^ = DL (d0 ) + (d^
0 = DL (d)
d0 )D2 L (d );
P
where d is an intermediate value satisfying jd
d0 j jd^ d0 j ! 0. The product moments
in D2 L (d) are shown in JN (2010, Lemma C.4) and JN (2012a, Lemma A.8(i)) to be
tight, or equicontinuous, in a neighborhood of d0 , so that we can apply JN (2010, Lemma
A.3) to conclude that D2 L (d ) = D2 L (d0 ) + oP (1), and we therefore analyze DL (d0 )
+
and D2 L (d0 ). From Lemma B.4 we …nd that 0 2 T 1=2 DL (d0 ) = M01T
+ OP (T 1=2 ) and
2
1 2
D L (d0 ) = 2 + OP (T 1=2 ) = 2 =6 + OP (T 1=2 ), and the result follows from Lemmas
0 T
B.2 and B.3.
C.2
Proof of Theorem 2
First we note that, as in the proof of Theorem 1 in Appendix C.1.7, we can apply JN (2010,
Lemma A.3) to conclude that D3 L (d ) = D3 L (d0 ) + oP (1). We thus insert the expressions
(75), (76), and (77) into the expansion (17) and …nd
T 1=2 (d^
d0 ) =
A0 + T
B0 + T
1=2
A1
1=2 B
1
1
T
2
1=2
d0 ) =
A0
B0
T
1=2
A0 + T
B0 + T
1=2
A1 2
)
1=2 B
B0
1
C0
+ oP (T
+ T 1=2 B1
z + z 2 + : : :, reduces to
which, using the expansion 1=(1 + z) = 1
T 1=2 (d^
(
(
A1
B0
A0 B1 1 A20 C0
+
) + oP (T
B02
2 B03
1=2
):
1=2
);
Initial values in CSS estimation of fractional models
+
We …nd that E(A0 ) = E(M01T
) = 0, so the bias of T (d^
E(A1 ) E(A0 B1 )
+
B0
B02
N;T (d0 ) + N;T (d0 )
= (
(
1 E(A20 )C0
) + o(1)
2 B03
+
+
E(M01T
(M11T
+
37
d0 ) is, from (78)–(80),
(99)
+
+2
M02T
)) + 3E(M01T
)
3 2
1
2
2
2
) + o(1):
From Lemma B.2,
+
+
+
+2
E(M01T
(M11T
+ M02T
)) + 3E(M01T
)
3 2
1
=
4
3
2
3
+3
3
=
3 3;
see (66)–(68), so that we get the …nal result ( N;T (d0 ) + N;T (d0 ) + 3 3 2 1 ) 2 1 + o(1).
For the estimator d^c we get the expansion (99), but use (81) instead of (80).
C.3
Proof of Corollary 1
C
We suppress the argument d and want to evaluate N;T and N;T
, see (21) and (22). From
(57) and (58) we …nd that h 0 ; 1 iT , h 0 ; 1 iT , h 1 ; 0 iT , and h 1 ; 0 iT are all bounded by
C
(1 + N ) min(d;2d 1)+ , which shows the result for N;T
. To …nd the result for N;T , it only
remains
P 0 1 to be shown that h 0 ; 0 iT =h 0 ; 0 iT is bounded. We …nd from (62) that j 0t (d)j
c N
j=0 j t+j ( d)j. We apply (37) and note that, for a given d and t > N > d, the
Q
(d + 1)=i) are all of the
coe¢ cients t+j ( d) = N ( d) N;t+j ( d) = N ( d) t+j
i=N +1 (1
same sign for j 0. If this is positive, we have, see (47),
N
0 1
X
j=0
j
t+j (
d)j
1
X
t+j (
d) =
t 1(
d + 1) > 0
j=0
because t 1
N , and a similar relation holds if the coe¢ cients are negative. Thus,
j 0t (d)j cj 0t (d)j and therefore
jh 0 ;
C.4
0 iT j =
2
0
j
N
+T
X
0t (d)
0t (d)j
c
t=N +1
2
0
N
+T
X
2
0t (d)
= ch 0 ;
t=N +1
0 iT :
Proof of Theorem 3
(i): We note that, because t N + 1, we have 0t (d) =
Similarly, because t + n N + 1 for n 0, we have
0t (d)
=
N
0 1
X
t+n (
d)(
0
X
n)
t 1(
d + 1) = 0 for d = 1; : : : ; N .
= 0 for d = 0; 1; : : : ; N + 1;
n=0
and hence h 0 ; 1 iT = h 0 ; 1 iT = h 1 ; 0 iT = h 1 ; 0 iT = 0 for d = 1; : : : ; N . This implies
C
that N;T and N;T
are zero.
(ii): Next assume d = 1. The case with N 1 is covered by part (i), so we only need to show
the result for N = 0. For N = 0 we have 0t (1) = t 1 (0) = 1(t=1) and 1t (1) = D t 1 (0) =
P0
(t 1) 1 1ft 1 1g , see (44). From (18) we …nd 0t (1) =
0) =
n= N0 +1 1(t n=1) (Xn
1(t=1) (X0
2 because otherwise the summation
0 ), whereas 1t (1) is non-zero only for t
over k in (19) is empty. Thus, 0t (1) and 0t (1) are non-zero only if t = 1, but 1t (1) and
C
2, and therefore 0;T
(1) = 0;T (1) = 0.
1t (1) are non-zero only if t
Q
(iii): From (37) it follows that N;t ( d + 1)jd=N +1 = ti=N +1 (i N 1)=i = 0 for t N + 1
and therefore (71) shows that N;T (d) = 0 for d = N + 1.
Initial values in CSS estimation of fractional models
C.5
38
Proof of Theorem 4
P 1
(27): For N0 = 0 we …nd from (18) that 0t (d0 ) = n=0
t+n ( d0 )( 0
is enough to show that N;T (d0 ) = 0, see (21).
C
(28) and (29): We also …nd for 0t (d0 ) = 0 that N;T
(d0 ) simpli…es to
C
N;T (d0 )
=
2
0
N
+T
X
( (C
t=N +1
0 ) 0t )( 1t
(C
0 ) 1t )
=
(C
X
0 )(h 0 ; 1 iT
n)
(C
= 0, and that
0 )h 0 ;
1 iT ):
(30): The result follows from (27) and (70).
(31): If further N = 0, then both summations over n in (19) are empty, and hence zero,
C
2
such that 1t (d0 ) = 0. It then follows from (28) that N;T
(d0 ) = (C
0 ) h 0 ; 1 iT ; which
can be replaced by its limit, see (59).
References
1. Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions, National Bureau of Standards, Washington D.C.
2. Andersen, T. G., Bollerslev, T., Diebold, F. X., and Ebens, H. (2001). The distribution
of daily realized stock return volatility. Journal of Financial Economics 61, pp. 43–76.
3. Askey, R. (1975). Orthogonal Polynomials and Special Functions, SIAM, Philadelphia.
4. Bingham, N. H., Goldie, C. M., and Teugels, J. L. (1987). Regular Variation, Cambridge University Press, Cambridge.
5. Bollerslev, T., Osterrieder, D., Sizova, N., and Tauchen, G. (2013) Risk and return:
long-run relationships, fractional cointegration, and return predictability. Journal of
Financial Economics 108, pp. 409–424.
6. Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis, Forecasting, and
Control, Holden-Day, San Francisco.
7. Byers, J. D., Davidson, J., and Peel, D. (1997). Modelling political popularity: an
analysis of long-range dependence in opinion poll series. Journal of the Royal Statistical
Society, Series A 160, pp. 471–490.
8. Carlini, F., Manzoni, M., and Mosconi, R. (2010). The impact of supply and demand
imbalance on stock prices: an analysis based on fractional cointegration using Borsa
Italiana ultra high frequency data. Working paper, Politecnico di Milano.
9. Davidson, J. and Hashimzade, N. (2009). Type I and type II fractional Brownian
motions: a reconsideration. Computational Statistics & Data Analysis 53, pp. 2089–
2106.
10. Dolado, J. J., Gonzalo, J., and Mayoral, L. (2002). A fractional Dickey-Fuller test for
unit roots. Econometrica 70, pp. 1963–2006.
11. Doornik, J. A. and Ooms, M. (2003). Computational aspects of maximum likelihood
estimation of autoregressive fractionally integrated moving average models. Computational Statistics & Data Analysis 42, pp. 333–348.
12. Giraitis, L. and Taqqu, M. (1998). Central limit theorems for quadratic forms with
time domain conditions. Annals of Probability 26, pp. 377–398.
13. Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Application, Academic Press, New York.
Initial values in CSS estimation of fractional models
39
14. Hualde, J. and Robinson, P. M. (2011). Gaussian pseudo-maximum likelihood estimation of fractional time series models. Annals of Statistics 39, pp. 3152–3181.
15. Jensen, A. N. and Nielsen, M. Ø. (2014). A fast fractional di¤erence algorithm. Journal
of Time Series Analysis 35, pp. 428–436.
16. Johansen, S. (2008). A representation theory for a class of vector autoregressive models
for fractional processes. Econometric Theory 24, pp. 651–676.
17. Johansen, S. and Nielsen, M. Ø. (2010). Likelihood inference for a nonstationary
fractional autoregressive model. Journal of Econometrics 158, pp. 51–66.
18. Johansen, S. and Nielsen, M. Ø. (2012a). Likelihood inference for a fractionally cointegrated vector autoregressive model. Econometrica 80, pp. 2667–2732.
19. Johansen, S. and Nielsen, M. Ø. (2012b). A necessary moment condition for the
fractional functional central limit theorem. Econometric Theory 28, pp. 671–679.
20. Lawley, D. N. (1956). A general method for approximating to the distribution of
likelihood ratio criteria. Biometrika 43, pp. 295–303.
21. Li, W. K. and McLeod, A. I. (1986). Fractional time series modelling. Biometrika 73,
pp. 217–221.
22. Lieberman, O. and Phillips, P. C. B. (2004). Expansions for the distribution of the
maximum likelihood estimator of the fractional di¤erence parameter. Econometric
Theory 20, pp. 464–484.
23. Marinucci, D. and Robinson, P. M. (2000). Weak convergence of multivariate fractional
processes. Stochastic Processes and their Applications 86, pp. 103–120.
24. Nielsen, M. Ø. (2014). Asymptotics for the conditional-sum-of-squares estimator in
multivariate fractional time series models. Journal of Time Series Analysis 36, pp.
154–188.
25. Osterrieder, D. and Schotman, P. C. (2011). Predicting returns with a co-fractional
VAR model. CREATES research paper, Aarhus University.
26. Robinson, P. M. (1994). E¢ cient tests of nonstationary hypotheses, Journal of the
American Statistical Association 89, pp. 1420–1437.
27. Rossi, E. and Santucci de Magistris, P. (2013). A no-arbitrage fractional cointegration
model for futures and spot daily ranges. Journal of Futures Markets 33, pp. 77–102.
28. Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally
integrated time series models, Journal of Econometrics 53, pp. 165–188.
29. Tschernig, R., Weber, E., and Weigand, R. (2013). Long-run identi…cation in a fractionally integrated system. Journal of Business and Economic Statistics 31, pp. 438450.