QED Queen’s Economics Department Working Paper No. 1300 The role of initial values in conditional sum-of-squares estimation of nonstationary fractional time series models SÃÿren Johansen University of Copenhagen and CREATES Morten ßrregaard Nielsen Queen’s University and CREATES Department of Economics Queen’s University 94 University Avenue Kingston, Ontario, Canada K7L 3N6 11-2012 The role of initial values in conditional sum-of-squares estimation of nonstationary fractional time series models Søren Johanseny University of Copenhagen and CREATES Morten Ørregaard Nielsenz Queen’s University and CREATES February 25, 2015 Abstract In this paper we analyze the in‡uence of observed and unobserved initial values on the bias of the conditional maximum likelihood or conditional sum-of-squares (CSS, or least squares) estimator of the fractional parameter, d, in a nonstationary fractional time series model. The CSS estimator is popular in empirical work due, at least in part, to its simplicity and its feasibility, even in very complicated nonstationary models. We consider a process, Xt , for which data exist from some point in time, which we call N0 + 1, but we only start observing it at a later time, t = 1. The parameter (d; ; 2 ) is estimated by CSS based on the model d0 (Xt ) = "t ; t = N +1; : : : ; N +T , conditional on X1 ; : : : ; XN . We derive an expression for the second-order bias of d^ as a function of the initial values, Xt ; t = N0 + 1; : : : ; N , and we investigate the e¤ect on the bias of setting aside the …rst N observations as initial values. We compare d^ with an estimator, d^c , derived similarly but by choosing = C. We …nd, both theoretically and using a data set on voting behavior, that in many cases, the estimation of the parameter picks up the e¤ect of the initial values even for the choice N = 0. If N0 = 0, we show that the second-order bias can be completely eliminated by a simple bias correction. If, on the other hand, N0 > 0, it can only be partly eliminated because the second-order bias term due to the initial values can only be diminished by increasing N . Keywords: Asymptotic expansion, bias, conditional inference, fractional integration, initial values, likelihood inference. JEL Classi…cation: C22. We are grateful to three anonymous referees, the co-editor, and seminar participants at various universities and conferences for comments and to James Davidson for sharing the opinion poll data set with us. We would like to thank the Canada Research Chairs program, the Social Sciences and Humanities Research Council of Canada (SSHRC), and the Center for Research in Econometric Analysis of Time Series (CREATES - DNRF78, funded by the Danish National Research Foundation) for research support. y Email: [email protected] z Corresponding author. Email: [email protected] 1 Initial values in CSS estimation of fractional models 1 2 Introduction One of the most commonly applied inference methods in nonstationary autoregressive (AR) models, and indeed in all time series analysis, is based on the conditional sum-of-squares (CSS, or least squares) estimator, which is obtained by minimizing the sum of squared residuals. The estimator is derived from the Gaussian likelihood conditional on initial values and is often denoted the conditional maximum likelihood estimator. For example, in the AR(k) model we set aside k observations as initial values, and conditioning on these implies that Gaussian maximum likelihood estimation is equivalent to CSS estimation. This methodology was applied in classical work on ARIMA models by, e.g., Box and Jenkins (1970), and was introduced for fractional time series models by Li and McLeod (1986) and Robinson (1994), in the latter case for hypothesis testing purposes. The CSS estimator has been widely applied in the literature, also for fractional time series models. In these models, the initial values have typically been assumed to be zero, and as remarked by Hualde and Robinson (2011, p. 3154) a more appropriate name for the estimator may thus be the truncated sumof-squares estimator. Despite the widespread use of the CSS estimator in empirical work, very little is known about its properties related to the initial values and speci…cally related to the assumption of zero initial values. Recently, inference conditional on (non-zero) initial values has been advocated in theoretical work for univariate nonstationary fractional time series models by Johansen and Nielsen (2010) and for multivariate models by Johansen and Nielsen (2012a)— henceforth JN (2010, 2012a)— and Tschernig, Weber, and Weigand (2013). In empirical work, these methods have recently been applied by, for example, Carlini, Manzoni, and Mosconi (2010) and Bollerslev, Osterrieder, Sizova, and Tauchen (2013) to high-frequency stock market data, Hualde and Robinson (2011) to aggregate income and consumption data, Osterrieder and Schotman (2011) to real estate data, and Rossi and Santucci de Magistris (2013) to futures prices. In this paper we assume the process Xt exists for t N0 +1, and we derive the properties of the process from the model given by the truncated fractional …lter d0N0 (Xt 0 ) = "t with "t i:i:d:(0; 2 ), for some d0 > 1=2. However, we only observe Xt for t = 1; : : : ; T0 = N + T; and so we estimate (d; ; 2 ) from the conditional Gaussian likelihood for XN +1 ; : : : ; XN +T ^ Our …rst result is to prove consistency given X1 ; : : : ; XN , which de…nes the CSS estimator d. and asymptotic normality of the estimator of d. This is of interest in its own right, not only because of the usual issue of non-uniform convergence of the objective function, but also because the estimator of is in fact not consistent when d0 > 1=2. We then proceed to derive an analytical expression for the asymptotic second-order bias of d^ via a higher-order stochastic expansion of the estimator. We apply this to investigate the magnitude of the in‡uence of observed and unobserved initial values, and to discuss the e¤ect on the bias of setting aside a number of observations as initial values, i.e., of splitting a given sample of size T0 = N + T into N initial values and T observations for estimation. We compare d^ with an estimator, d^c , derived from centering the data at C by restricting = C. We …nd, both theoretically and using a data set on voting behavior as illustration, that in many cases, the parameter picks up the e¤ect of the initial values even for the choice N = 0. Finally, in a number of relevant cases, we show that the second-order bias can be eliminated, either partially or completely, by a bias correction. In the most general case, however, Initial values in CSS estimation of fractional models 3 it can only be partly eliminated, and in particular the second-order bias term due to the initial values can only be diminished by increasing the number of initial values, N . In the stationary case, 0 < d < 1=2, there is a literature on Edgeworth expansions of the distribution of the (unconditional) Gaussian maximum likelihood estimator based on the joint density of the data, (X1 ; : : : ; XT ) in the model (1). In particular, Lieberman and Phillips (2004) …nd expressions for the second-order term, from which we can derive the main term of the bias in that case. We have not found any results on the nonstationary case, d > 1=2, for the estimator based on conditioning on initial values. The remainder of the paper is organized as follows. In the next section we present the fractional models and in Section 3 our main results. In Section 4 we give an application of our theoretical results to a data set of Gallup opinion polls. Section 5 concludes. Proofs of our main results and some mathematical details are given in the appendices. 2 The fractional models and their interpretations A simple model for fractional data is d (Xt ) = "t ; " t i:i:d:(0; 2 ); t = 1; : : : ; T; (1) where d 0, 2 R, and 2 > 0. The fractional …lter d X Pt 1is de…nednin terms of the u fractional coe¢ cients n (u) from an expansion of (1 z) = n=0 n (u)z , i.e., n (u) = u(u + 1) : : : (u + n n! 1) = (u + n) (u) (n + 1) nu 1 as n ! 1; (u) (2) where (u) denotes the Gamma function and “ ” denotes that the ratio of the left- and right-hand sides converges to one. More results are collected Appendix A. Pin 1 2 For a given value of d such that P1 0 < d < 1=2, we have n=0 n (d) < 1. In this case, d process with a …nite the in…nite sum Xt = "t = n=0 n (d)"t n exists as a stationary P ( d) = 0. variance, and gives a solution to equation (1) because d = 1 n=0 n When d > 1=2, the solution to (1) is nonstationary. In that case, we discuss below two interpretations of equation (1) as a statistical model. First as an unconditional (joint) model of the stationary process X1 ; : : : ; XT when 1=2 < d < 3=2, and then as a conditional model for the nonstationary process XN +1 ; : : : ; XN +T given initial values when d > 1=2. In the latter case we call Xt an initial value if t N and denote the initial values Xn ; n N; and we assume, see Section 2.2, that the variables we are measuring started at some point N0 + 1 in the past, and we truncate the fractional …lter accordingly. 2.1 The unconditional fractional model and its estimation One approach to the estimation of d from model (1) with nonstationary data is the di¤erenceand-add-back approach based on Gaussian estimation for stationary processes. If we have the a priori information that 1=2 < d < 3=2, say, then we could transform the data X0 ; : : : ; XT to XT = ( X1 ; : : : ; XT )0 and note that (1) can be written d 1 (Xt ) = "t ; so that Xt is stationary and fractional of order 1=2 < d 1 < 1=2. Note that = 0; so the parameter does not enter. To calculate the unconditional Gaussian likelihood function, Initial values in CSS estimation of fractional models 4 we then need to calculate the T T variance matrix = (d; 2 ) = V ar( XT ), its inverse, 1 , and its determinant, det . This gives the Gaussian likelihood function, 1 log det 2 1 X0T 2 1 XT : (3) A general optimization algorithm can then be applied to …nd the maximum likelihood estimator, d^stat , if can be calculated. This is possible by the algorithm in Sowell (1992). ^ The estimator dstat is not a CSS estimator, which is the class of estimators we study in this paper, but it was applied by Byers, Davidson, and Peel (1997) and Dolado, Gonzalo, and Mayoral (2002) in the analysis of the voting data, and by Davidson and Hashimzade (2009) to the Nile data. The estimator d^stat was analyzed by Phillips and Lieberman (2004) for true value d0 < 1=2. They derived an asymptotic expansion of the distribution function of T 1=2 (d^stat d0 ); from which a second-order bias correction of the estimator can be derived, see Section 3.2. In more complicated models than (1), the calculation of may be computationally di¢ cult. This is certainly the case in, say, the fractionally cointegrated vector autoregressive model of JN (2012a). However, even in much simpler models such as the usual autoregressive model, a conditional approach has been advocated for its computational simplicity, e.g., Box and Jenkins (1970), because conditional maximum likelihood estimation simpli…es the calculation of estimators by reducing the numerical problem to least squares. For this reason, the conditional estimator has been very widely applied to many models, including (1). For a discussion and comparison of the numerical complexity of Gaussian maximum likelihood as in (3) and the CSS estimator, see e.g. Doornik and Ooms (2003). 2.2 The observations and initial values It is di¢ cult to imagine a situation where fXs gT 1 is available, so that (1) could be applied. In general, we assume data could potentially be available from some (typically unknown) time in theP past, N0 + 1, say. We therefore truncate the …lter at time N0 ; that is, de…ne t+N0 1 d n ( d)Xt n , and consider N0 Xt = n=0 d N0 (Xt ) = "t ; t = 1; : : : ; T0 : (4) as the model for the data we actually observe, namely Xt for t = 1; : : : ; N + T = T0 . In practice, when N0 > 0, we do not observe all the data, and so we have to decide how to split a given sample of size T0 = N + T into (observed) initial values fXn gN n=1 and observations T fXt gt=N +1 to be modeled, and then calculate the likelihood based on the truncated …lter d 0 , as an approximation to the conditional likelihood based on (4). In the special case with N0 = 0; the equations in (4) become X1 = X2 = + "1 ; 1 ( d)X1 + (5) + 1( d) + "2 ; etc., and can thus be interpreted as the initial mean or level of the observations. Clearly, if is not included in the model, the …rst observation is X1 = "1 with mean zero and variance 2 . The lag length builds up as more observations become available. As an example we take (an updated version of) the Gallup poll data from Byers et al. (1997) to be analyzed in Section 4. The data is monthly from January 1951 to November Initial values in CSS estimation of fractional models 5 2000 for a total of 599 observations. In this case the data is not available for all t simply because the Labour party was founded in 1900, and the Gallup company was founded in 1935, and in fact the regular Gallup polls only started in January 1951, which is denoted N0 + 1. As a second example, consider the paper by Andersen, Bollerslev, Diebold, and Ebens (2001) which analyzes log realized volatility for companies in the Dow Jones Industrial Average from January 2, 1993, to May 28, 1998. For each of these companies there is an earlier date, which we call N0 + 1, where the company became publicly traded and such measurements were made for the …rst time. The data analyzed in Andersen et al. (2001) was not from N0 +1, but only from the later date when the data became available on CD-ROM, which was January 2, 1993, which we denote t = 1. We thus do not have observations from N0 + 1 to 0. We summarize this in the following display, which we think is representative for most, if not all, data in economics: :::;X | {z N0 } Data does not exist ; X | N0 +1 ; : : : ; X0 {z } Data exists but is not observed Thus, we consider estimation of d 0 (Xt ; X ;:::;X | 1 {z N} Data is observed (initial values) ; (6) X ; : : : ; X T0 | N +1 {z } Data is observed (estimation) (7) ) = "t ; t = 1; : : : ; T0 ; as an approximation to model (4). Unlike for (4), the conditional likelihood for (7) can be calculated based on available data from 1 to T0 . For a fast algorithm to calculate the fractional di¤erence, see Jensen and Nielsen (2014). In summary, we use d N0 (Xt ) = "t as the model we would like to analyze. However, because we only have data for t = 1; : : : ; T0 , we base the likelihood on the model d0 (Xt ) = "t , for which an approximation to the conditional likelihood from (4) can be calculated with the available data. We then try to mitigate the e¤ect of the unobserved initial values by conditioning on X1 ; : : : ; XN . 2.3 The conditional fractional model Let parameter subscript zero denote true values. In the conditional approach we interpret equation (4) as a model for Xt given the past Ft 1 = (X N0 +1 ; : : : ; Xt 1 ) and therefore solve the equation for Xt as a function of initial values, errors, and the initial level, 0 . The solution to (1) is given in JN (2010, Lemma 1) under the assumption of bounded initial values, and we give here the solution of (4). Lemma 1 The solution of model (4) for XN +1 ; : : : ; XT0 ; conditional on initial values Xn ; N0 < n N , is, for t = N + 1; : : : ; T0 ; given by Xt = d0 N "t d0 N t+N 0 1 X n( d0 )Xt n + d0 t+N0 1 ( N (8) d0 + 1) 0 : n=t N d We …nd the conditional mean and variance by writing model (4) as Xt = (1 d ) + "t . Because (1 ) is a function only of the past, we …nd N0 )(Xt N0 )(Xt E(Xt jFt 1 ) = (1 d N0 )(Xt ) and V ar(Xt jFt 1 ) = V ar("t ) = 2 : Initial values in CSS estimation of fractional models 6 As an example we get, for d = 1 and = 0, the well-known result from the autoregressive model that E(Xt jFt 1 ) = Xt 1 and V ar(Xt jFt 1 ) = 2 . In model (4) this implies that the prediction error decomposition given Xn ; N0 < n N , is the conditional sum of squares, T0 X (Xt E(Xt jFt 1 ))2 = V ar(X t jFt 1 ) t=N +1 2 T0 X ( d ))2 ; N0 (Xt t=N +1 which is used in the conditional Gaussian likelihood function (9) below. 2.4 Estimation of the conditional fractional model We would like to consider the conditional (Gaussian) likelihood of fXt ; N + 1 given initial values fXn ; N0 + 1 n N g, which is given by T log 2 T0 X 1 2 2 2 d ( N0 (Xt T0 g t ))2 : (9) t=N +1 If in fact we have observed all available data, such that N0 = 0 as in, e.g., the Gallup poll data we can use (9) for N0 = 0. More commonly, however, data is not available all the way back to inception at time N0 + 1, so we consider the situation that the series exists for t > N0 , but we only have observations for t 1, as in the volatility data example. We d d therefore replace the truncated …lter 0 and suggest using the (quasi) likelihood N0 by conditional on fXn ; 1 n N g, L(d; ; 2 T log 2 )= 1 2 2 2 T0 X ( d 0 (Xt ))2 : (10) t=N +1 That is, (10) is an approximation to the conditional likelihood (9), where (10) has the advantage that it can be calculated based on available data from t = 1 to T0 = N + T . It is clear from (10) that we can equivalently …nd the (quasi) maximum likelihood estimators of d and by minimizing T0 1 X ( d (Xt ))2 (11) L(d; ) = 2 t=N +1 0 with respect to d and . We …nd from (46) in Lemma A.4 that d 0 (Xt )= d 0 Xt t 1 X n( d) = d 0 Xt t 1( d + 1) = d 0 Xt 0t (d) ; n=0 where we have introduced d + 1). The estimator of PT0 d +1 ( 0 Xt ) 0t (d) ^(d) = t=N ; PT0 2 (d) 0t t=N +1 0t (d) = t 1( for …xed d is P 0 2 provided Tt=N +1 0t (d) > 0. The conditional quasi-maximum likelihood estimator of d can then be found by minimizing the concentrated objective function PT0 T0 d 2 1 X 1 ( t=N +1 ( 0 Xt ) 0t (d)) 2 d ( 0 Xt ) ; (12) L (d) = PT0 2 2 t=N +1 2 (d) 0t t=N +1 Initial values in CSS estimation of fractional models 7 P 0 2 which has no singularities at the points where Tt=N +1 0t (d) = 0, see Theorem 1. Thus, the conditional quasi-maximum likelihood estimator d^ can be de…ned by d^ = arg min L (d) (13) d2D for a parameter space D to be de…ned below. This is a type of conditional-sum-of-squares (CSS) estimator for d. The …rst term of (12) is standard, and the second takes into account the estimation of the unknown initial level at the inception of the series at time N0 + P 1. 0 2 For d = d0 and = 0 we …nd, provided Tt=N +1 0t (d0 ) > 0, that PT0 t=N +1 "t 0t (d0 ) ; ^(d0 ) 0 = PT0 2 (d ) 0t 0 t=N +1 P T 0 2 1 which has mean zero and variance 02 ( t=N that does not go to zero when +1 0t (d0 ) ) P T 2 0 2 d0 > 1=2 because then 0 t=N +1 0t (d0 ) is bounded in T0 , see (59) in Lemma B.1. Thus we have that, even if d = d0 , ^(d0 ) is not consistent. In the following we also analyze another estimator, d^c , constructed by choosing to center the observations by a known value rather than estimating as above. The known value, say C, used for centering, could be one of the observed initial values, e.g. the …rst one, or an average of these, or it could be any known constant. This can be formulated as choosing = C in the likelihood function (10) and de…ning d^c = arg min Lc (d); (14) d2D T0 1 X ( Lc (d) = 2 t=N +1 d 0 (Xt C))2 ; (15) which is also a CSS estimator. A commonly applied estimator is the one obtained by not centering the observations, i.e. by setting C = 0. In that case, an initial non-zero level of the process is therefore not taken into account. The introduction of centering and of the parameter , interpreted as the initial level of the process, thus allows analysis of the e¤ects of centering the observations in di¤erent ways (and avoid the, possibly unrealistic, phenomenon described immediately after (5) when ^ where the initial = 0). We analyze the conditional maximum likelihood estimator, d, level is estimated by maximum likelihood jointly with the fractional parameter, and we also analyze the more traditional CSS estimator, d^c , where the initial level is “estimated” using a known value C, e.g. zero or the …rst available observation, X1 . In practice we split a given sample of size T0 = N + T into (observed) initial values N +T fXn gN n=1 and observations fXt gt=N +1 to be modeled, and then calculate the likelihood (12) based on the truncated …lter d0 as an approximation to the model (4) starting at N0 + 1. In order to discuss the error implied by using this choice in the likelihood function, we derive in Theorem 2 a computable expression for the asymptotic second-order bias term in the estimator of d via a higher-order stochastic expansion of the estimator. This bias term depends on all observed and unobserved initial values and the parameters. In Corollary 1 and Theorems 3 and 4 we further investigate the e¤ect on the bias of setting aside the data from t = 1 to N as initial values. Initial values in CSS estimation of fractional models 2.5 8 A relation to the ARFIMA model The simple model (1) is a special case of the well-known ARFIMA model, A(L) d Xt = B(L)"t ; t = 1; : : : ; T; where A(L) and B(L) depend on a parameter vector and B(z) 6= 0 for jzj model, the conditional likelihood depends on the residuals "t (d; ) = B(L) 1 A(L) d d Xt = b( ; L) 1. For this Xt ; and when b( ; L) = 1 we obtain model (1) as a special case. For the ARFIMA model the analysis would depend on the derivatives of the conditional likelihood function, which would in turn be functions of the derivatives of the residuals. Again, to focus on estimation of d we consider the remaining parameter …xed at the true value 0 . For a function f (d) we denote the derivative of f with respect to d as @ f (d) (Euler’s notation), and the relevant derivatives are Df (d) = @d Dm "t (d; )jd0 ; 0 = b( m 0 ; L)D d Xt jd0 = (log )m b( 0 ; L) d0 Xt = (log )m "t : Thus, for this more general model, the derivatives of the conditional likelihood with respect to d, when evaluated at the true values, are identical to those of the residuals from the simpler model (1). We can therefore apply the results from the simpler model more generally, but only if we know the parameter 0 . If has to be estimated, the analysis becomes much more complicated. We therefore focus our analysis on the simple model. 3 Main results Our main results hold only for the true value d0 > 1=2, that is, for nonstationary processes, which is therefore assumed in the remainder of the paper. However, we maintain a large compact parameter set D for d in the statistical model, which does not assume a priori knowledge that d0 > 1=2, see Assumption 2. 3.1 First-order asymptotic properties The …rst-order asymptotic properties of the CSS estimators d^ and d^c derived from the likelihood functions L (d) and Lc (d) in (12) and (15), respectively, are given in the following theorem, based on results of JNP(2012a) and Nielsen (2015). To describe the results, we use s Riemann’s zeta function, s = 1 j=1 j ; s > 1, and speci…cally 2 = 1 X j=1 2 j 2 = 6 ' 1:6449 and 3 = 1 X j j=1 3 ' 1:2021: (16) We formulate two assumptions that will be used throughout. Assumption 1 The errors "t are i.i.d.(0; 2 0) with …nite fourth moment. Assumption 2 The parameter set for (d; ; 2 ) is D R R+ , where D = [d; d]; 0 < d < d < 1. The true value is (d0 ; 0 ; 02 ), where d0 > 1=2 is in the interior of D. Theorem 1 Let the model for the data Xt ; t = 1; : : : ; N + T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the functions L (d) in (12) and Lc (d) in (15) have no singularities forpd > 0, and the estimators d^ and d^c derived from L (d) and Lc (d), respectively, are both T -consistent and asymptotically distributed as N (0; 2 1 ). Initial values in CSS estimation of fractional models 3.2 9 Higher-order expansions and asymptotic bias To analyze the asymptotic bias of the CSS estimators for d, and in particular how initial values in‡uence the bias, we need to examine higher-order terms in a stochastic expansion of the estimators, see Lawley (1956). The conditional (negative pro…le log) likelihoods L (d) and Lc (d) are given in (12) and (15). We …nd, see Lemma B.4, that the derivatives satisfy DL (d0 ) = OP (T 1=2 ), D2 L (d0 ) = OP (T ), and D3 L (d) = OP (T ) uniformly in a neighborhood ^ = 0 around d0 gives of d0 , and a Taylor series expansion of DL (d) 1 d0 )D2 L (d0 ) + (d^ 2 ^ = DL (d0 ) + (d^ 0 = DL (d) d0 )2 D3 L (d ); P where d is an intermediate value satisfying jd d0 j jd^ d0 j ! 0. We then insert 1=2 ~ 1~ 3=2 ^ ~ d d0 = T G1T + T G2T + OP (T ) and …nd G1T = T 1=2 DL (d0 )=D2 L (d0 ) and ~ 2T = 1 T (DL (d0 ))2 D3 L (d )=(D2 L (d0 ))3 , which we write as G 2 T 1=2 (d^ 1=2 DL (d0 ) 1 D2 L (d ) 0 T T d0 ) = 1 T 2 1=2 ( T T 1=2 DL (d0 ) 2 T ) 1 D2 L (d ) T 0 1 D3 L (d ) + OP (T 1 D2 L (d ) 0 1 ): (17) Based on this expansion, we …nd another expansion T 1=2 (d^ d0 ) = G1T + T 1=2 G2T + D oP (T 1=2 ) with the property that (G1T ; G2T ) ! (G1 ; G2 ) and E(G1T ) = E(G1 ) = 0. Then the zero- and …rst-order terms of the bias are zero, and the second-order asymptotic bias term is de…ned as T 1 E(G2 ). ^ In order to formulate We next present the main result on the asymptotic bias of d. the results, we de…ne some coe¢ cients that depend on N; N0 ; T , and on initial values and ( 0 ; 02 ; d) (we suppress some of these dependencies for notational convenience), 0t (d) 0 X = t n( d)(Xn (18) 0 ); n= N0 +1 1t (d) = tX 1 N k = t 1( N X t n k( d)(Xn d + 1); and 0) N X D t n( d)(Xn 0 ); (19) n=1 n= N0 +1 k=1 0t (d) 1 1t (d) = D t 1( (20) d + 1): PT0 2 For two sequences fut ; vt g1 t=1 , we de…ne the product moment hu; viT = 0 t=N +1 ut vt , see e.g. Lemma B.1. The main contributions to the bias are expressed for d = d0 in terms of h 0 ; 0 iT h 0 ; 0 i2T (h 0 ; 1 iT + h 1 ; 0 iT ) + h 1; h 0 ; 0 iT h 0 ; 0 i2T C 2 (C 0 )(h 0 ; 1 iT + h 1 ; 0 iT ) + (C 0) h 1; N;T (d) = h 0 ; 1 iT X 2 (t s) 1 t ( d + 1) s ( d + 1)=h 0 ; 0 iT : N;T (d) = 0 N;T (d) = h 0 ; 1 iT 0 iT ; (21) 0 iT ; (22) (23) N s<t N +T 1 Note that (21)–(23) are all invariant to scale because of the normalization by 02 . Also note that, even if h 0 ; 0 iT = 0, the ratio h 0 ; 0 iT =h 0 ; 0 iT as well as N;T (d) are well de…ned, see Theorem 1 and Appendix C.1. Initial values in CSS estimation of fractional models 10 Theorem 2 Let the model for the data Xt ; t = 1; : : : ; N + T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the asymptotic biases of d^ and d^c are ^ = bias(d) bias(d^c ) = where limT !1 j N;T (d0 )j (T 2 ) 1 [3 3 2 (T 2 ) 1 [3 3 2 < 1, limT !1 j 1 + N;T (d0 ) 1 + C N;T (d0 )] N;T (d0 )j + N;T (d0 )] + o(T 1 + o(T 1 (24) ); (25) ); < 1, and limT !1 j C N;T (d0 )j < 1. The leading bias terms in (24) and (25) are of the same order of magnitude in T , namely O(T 1 ). First, the …xed term, 3 3 2 2 , derives from correlations of derivatives of the likelihood and does not depend on initial values or d0 . The second term in (24), N;T (d0 ), is a function of initial values and d0 , and can be made smaller by including more initial values (larger N ) as shown in Corollary 1 below. The third term in (24), N;T (d0 ), only depends on (N; T; d0 ). C If we center the data by C, and do not correct for , we get the term N;T (d0 ) in (25). However, if we estimate we get N;T (d0 ) + N;T (d0 ) in (24), where N;T (d0 ) is independent of initial values and only depends on (N; T; d0 ). The coe¢ cients 0t (d) and 1t (d) are linear C (d) are quadratic in initial in the initial values, and hence the bias terms N;T (d) and N;T values scaled by 0 . The …xed bias term, 3 3 2 2 , is the same as the bias derived by Lieberman and Phillips (2004) for the estimator d^stat , based on the unconditional likelihood (3) in the stationary 1=2 case, 0 < d0 < 1=2. They showed that the distribution function of 2 T 1=2 (d^stat d0 ) is FT (x) = P ( 1=2 1=2 ^ (dstat 2 T d0 ) x) = (x) + T 1=2 3=2 3 2 (x)(2 + x2 ) + O(T 1 ); where (x) and (x) denote the standard normal distribution and density functions, respectively. Using D (x)(2 + x2 ) = (x)x3 , we …nd that an approximation to the expectation 1=2 of 2 T 1=2 (d^stat d0 ), based on the …rst two terms, is given by Z 3=2 3=2 1=2 T x (x)x3 dx = T 1=2 3 3 2 ; 3 2 which shows that the second-order bias of d^stat , derived for 0 < d0 < 1=2, is the same as the the second-order …xed bias term of d^ derived for d0 > 1=2 in Theorem 2. The dependence of the bias in Theorem 2 on the number of observed initial values, N , is explored in the following corollary. Corollary 1 Under the assumptions of Theorem 2, we obtain the following bounds for the components of the bias terms for d^ and d^c when d > 1=2, max(j C N;T (d)j; j N;T (d)j) c(1 + N ) min(d;2d 1)+ for any 0 < < min(d; 2d 1): (26) The result in Corollary 1 shows how the bias term arising from not observing all initial values decays as a function of the number of observed values set aside as initial values, N . More generally, the results in this section shows that a partial bias correction is possible. ^ the second-order bias in That is, by adding the terms (T 2 ) 1 3 3 2 1 and (T 2 ) 1 N;T (d), d^ and d^c can be partly eliminated, but the bias due to (T 2 ) 1 N;T (d0 ) can only be made smaller by increasing N . Initial values in CSS estimation of fractional models 11 A di¤erent type of bias correction was used by Davidson and Hashimzade (2009, eqn. (4.4)) in an analysis of the Nile data. They considered the CSS estimator when all initial values are set to zero in the stationary case. To capture the e¤ect of the left-out initial values, they introduce a few extra regressors that are found components P as the …rst principal n of the variance matrix of the n = 150 variables x = f 1 ( d)X g . s k s=1 k=s k 3.3 Further results for special cases C The expressions for N;T (d), N;T (d), and N;T (d) in (21)–(23) show that they depend on C (N; T; d) and, in the case of N;T (d) and N;T (d), also on all initial values. In order to get an impression of this dependence, we derive simple expressions for various special cases. C First, when d is an integer, we …nd simple results for N;T (d), N;T (d), and N;T (d), and hence the asymptotic bias, as follows. C N;T (d) Theorem 3 Under the assumptions of Theorem 2 it holds that following two cases: (i) If d = k for an integer k such that 1 k N , (ii) If d = 1 and N 0. In either case, the asymptotic biases of d^ and d^c are given by ^ = bias(d) bias(d^c ) = (iii) If d0 = N +1 then N;T (d0 ) (T 2 ) 1 (3 (T 2 ) 1 3 3 2 1 3 2 1 + N;T (d0 )) + o(T ^ = = 0 and bias(d) 1 + o(T 1 = N;T (d) = 0 in the ); ): (T 2 ) 1 (3 3 2 1 + N;T (N +1))+o(T 1 ): It follows from Theorem 3(i) that for d = 1 we need one initial value (N 1) and for C d = 2 we need two initial values (N 2), etc., to obtain N;T (d) = N;T (d) = 0. Alternatively, for d0 = 1, Theorem 3(ii) shows that there will be no contribution from initial values to the second-order asymptotic bias even if N = 0, and Theorem 3(iii) shows that when N = 0, it ^ = (T 2 ) 1 3 3 1 +o(T 1 ). Since the bias term also holds that 0;T (1) = 0 such that bias(d) 2 is continuous in d0 , the same is approximately true for a (small) neighborhood of d0 = 1. Note that the results in Theorem 3 show that in the cases considered, the estimators d^ and d^c can be bias corrected to have second-order bias equal to zero. We …nally consider the special case with N0 = 0, where all available data is observed. We use the notation (d) = D log (d) to denote the Digamma function. Theorem 4 If N0 = 0 and N ^ = bias(d) bias(d^c ) = where N;T (d0 ) 0 then (T 2 ) 1 [3 3 2 (T 2 ) 1 [3 3 2 is de…ned in (23) and C N;T (d0 ) = N;T (d0 ) (C C N;T (d0 ) = 0 and the biases of d^ and d^c are given by 1 + N;T (d0 )] + o(T 1 ); (27) 1 + C N;T (d0 )] + o(T 1 ); (28) 1 iT : (29) simpli…es to 0 )h 0 ; 1 iT + (C 2 0) h 0; In particular, for N0 = N = 0 we get the analytical expressions ^ = bias(d) (T 2 ) 1 [3 3 2 bias(d^c ) = (T 2 ) 1 [3 3 2 1 1 ( (2d0 1) (d0 ))] + o(T 2 2d0 2 (C 0) ( (2d0 2 d0 1 0 1 (30) ); 1) (d0 ))] + o(T 1 ): (31) Initial values in CSS estimation of fractional models 12 It follows from Theorem 4 that if we have observed all possible data, that is N0 = 0, then we get a bias of d^ in (27) and of d^c in (28) and (29). The bias of d^ comes from the estimation of and the bias of d^c depends on the distance C 0. ^ With N0 = 0 as in Theorem 4, we note that the biases of d and d^c do not depend on unobserved initial values. It follows that (27) can be used to bias correct the estimator d^ and (28) to bias correct the estimator d^c . For d^ this bias correction gives a second-order bias of zero, but for d^c the correction is only partial due to (29). Although the asymptotic bias of d^ is of order O(T 1 ), we note that the asymptotic standard deviation of d^ is (T 2 ) 1=2 , see Theorem 1. That is, for testing purposes or for calculating con…dence intervals for d0 , the relevant quantity is in fact the bias relative to the asymptotic standard deviation, and this is of order O(T 1=2 ). To quantify the distortion of the quantiles (critical values), we therefore focus on the magnitude of the relative bias, for which we obtain the following corollary by tabulation. Corollary 2 Letting T0 = N + T be …xed and tabulating the relative bias, ^ = (T 2 )1=2 bias(d) ((T0 N ) 2) 1=2 [3 3 2 1 + N;T0 N (d0 )]; see (27), for N = 0; : : : ; T0 2 and d0 > 1=2, the minimum value is attained for N = 0. Thus, we achieve the smallest relative (and also absolute) bias of d^ by choosing N = 0. 4 Application to Gallup opinion poll data As an application and illustration of the results, we consider the monthly Gallup opinion poll data on support for the Conservative and Labour parties in the United Kingdom. They cover the period from January 1951 to November 2000, for a total of 599 months. The two series have been logistically transformed, so that, if Yt denotes an observation on the original series, it is mapped into Xt = log(Yt =(100 Yt )). A shorter version of this data set was analyzed by Byers et al. (1997) and Dolado et al. (2002), among others. Using an aggregation argument and a model of voter behavior, Byers et al. (1997) show that aggregate opinion poll data may be best modeled using fractional time series methods. The basic …nding of Byers et al. (1997) and Dolado et al. (2002) is that the ARFIMA(0,d,0) model, i.e. model (1), appears to …t both data series well and they obtain values of the integration parameter d in the range of 0.6–0.8. 4.1 Analysis of the voting data In light of the discussion in Section 2.2, we work throughout under the assumption that Xt was not observed prior to January 1951 because the data series did not exist, and we truncate the …lter correspondingly, i.e., we consider model (4). Because we observe all available data, we estimate d (and ; ) by the estimator d^ setting N = N0 = 0 and take T = 599 following Theorem 4 and Corollary 2. The results are presented in Table 1. Since we have assumed that N = N0 = 0, we can bias correct the estimator using (30) in Theorem 4, and the resulting estimate is reported in Table 1 as d^bc . Two conclusions emerge from the table. First, the estimates of d (and ) are quite similar for the two data series, but the estimates of are quite di¤erent. Second, the bias corrections to the estimates are small. More generally, the estimates obtained in Table 1 are in line with those from the literature cited above. Initial values in CSS estimation of fractional models 13 Table 1: Estimation results for Gallup opinion poll data Conservative Labour ^ d 0:7718 0:6940 d^b c 0:7721 0:6914 ^ 0:0097 0:4313 ^ 0:1098 0:1212 Note: The table presents parameter estimates for the Gallup opinion poll data with T = 599 and N0 = N = 0. The subscript ‘bc’denotes the bias corrected estimator, c.f. (30). The asymptotic standard deviation of d^ is given in Theorem 1 as (T 2 =6) 1=2 ' 0:032. 4.2 An experiment with unobserved initial values We next use this data to conduct a small experiment with the purpose of investigating how the choice of N in‡uences the bias of the estimators of d, if there were unobserved initial values. For this purpose, we assume that the econometrician only observes data starting in January 1961. That is, January 1951 through December 1960 are N0 = 120 unobserved initial values. We then split the given sample of T0 = 479 observations into initial values (N ) and observations used for estimation (T ), such that N + T = 479. We can now ask the questions (i) what is the consequence in terms of bias of ignoring initial values, i.e. of setting N = 0, and (ii) how sensitive is the bias to the choice of N for this particular data set. To answer these questions we apply (24) and (25) from Theorem 2. We note that N;T (d) C (d) depend on the unobserved initial values, i.e. on Xn ; N0 < n 0, which in this and N;T example are the 120 observations from January 1951 to December 1960. To apply Theorem 2 we need (estimates of) d0 ; 0 ; 0 . For this purpose we use (d^bc ; ^; ^ ) from Table 1. The results are shown in Figure 1. The top panels show the logistically transformed opinion poll data for the Conservative (left) and Labour (right) parties. The shaded areas mark the unobserved initial values January 1951 to December 1960. The bottom panels show the relative bias in the estimators of d as a function of N 2 [0; 24], and the starred 3=2 straight line denotes the value of the …xed (relative) bias term, (T0 N ) 1=2 3 3 2 . The estimators are d^ in (13) and d^c in (14) either with C chosen as the average of the T0 observations, denoted d^c in the graph, or with C = 0, denoted d^0 in the graph. That is, for d^0 the series have not been centered, and for d^c the series have been centered by the average of the T0 observed values. The latter two estimators are the usual CSS estimators with and without centering of the series. In Figure 1 we note that the relative bias of d^0 is larger for the Labour party series because the last unobserved initial values are larger in absolute value than those of the Conservative party series. In particular, if one does not condition on initial values and uses N = 0, the relative bias of d^0 is 0:45 for the Labour party series and 0:05 for the Conservative party series. It is clear from the …gure that the relative bias of d^0 for the Labour party series can be reduced substantially and be made much closer to the …xed bias value by conditioning on just a few initial values. The same conclusions can be drawn for d^c but reversing the roles of the two series. The reason is that, after centering the series by the average of the T0 observations, it is now for the Conservative party series that the last unobserved initial values are di¤erent from zero, while those of the Labour party series are close to zero. ^ where the initial level or centering parameter, , is estimated jointly with Finally, for d, Initial values in CSS estimation of fractional models 14 Figure 1: Application to Gallup opinion poll data Conservative party data Labour party data 0.5 0.5 0.0 0.0 -0.5 -0.5 -1.0 -1.0 1960 1970 1980 Relative bias as function of N d^ 0.4 d^c 1990 2000 d^0 1960 1970 1980 Relative bias as function of N 0.2 0.2 0.0 0.0 -0.2 -0.2 0 4 8 12 16 d^ 0.4 20 24 0 4 8 d^c 12 1990 2000 d^0 16 20 24 Note: The top panels show (logistically transformed) opinion poll time series and the bottom panels show the relative bias for three estimators of d as a function of the number of chosen initial values, N , when the …rst N0 = 120 observations have been reserved as unobserved initial values (shaded area). The estimators are d^ in (13) and d^c in (14) either with C chosen as the average of the T0 observations, denoted d^c in the graph, or with C = 0, denoted d^0 in the graph. The starred line denotes the …xed (relative) bias, 3=2 (T0 N ) 1=2 3 3 2 . d, we …nd that the relative bias is increasing in N . The reason for this is that N;T (d) dominates N;T (d), at least for this particular data series. With N = 0 the relative bias is very small and the estimator d^ is better than the other two estimators. 5 Conclusion In this paper we have analyzed the e¤ect of unobserved initial values on the asymptotic bias of the CSS estimators, d^ and d^c , of the fractional parameter in a simple fractional model, for d0 > 1=2. We assume that we have data Xt for t = 1; : : : ; T0 = N + T; and model Xt by the truncated …lter d0N0 (Xt 0. We derive estimators from 0 ) = "t for t = 1; : : : ; T0 and N0 d d ) = "t or 0 (Xt C) = "t by maximizing the respective conditional the models 0 (Xt Gaussian likelihoods of XN +1 ; : : : ; XT0 given X1 ; : : : ; XN . ^ consisting of We give in Theorem 2 an explicit formula for the second-order bias of d, three terms. The …rst is a constant, the second, N;T (d0 ), depends on initial values and decreases with N , and the third, N;T (d0 ), does not depend on initial values. The …rst and third terms can thus be used in general for a (partial) bias correction. In Theorem 4 we Initial values in CSS estimation of fractional models 15 simplify the expressions for the case when N0 = 0, so that all data are observed. In this ^ at least to second order. We further case we can completely bias correct the estimator d, ^ …nd that for d the smallest bias appears for the choice N = 0. This choice is used for the analysis of the voting data in Section 4.1 where the bias correction is also illustrated. In Section 4.2 we illustrate the general results with unobserved initial values, again using the voting data. Here we show that, when keeping N0 = 120 observations for unobserved initial values, the estimator d^ with N = 0 has the smallest bias. Thus, the idea of letting the parameter capture the initial level of the process eliminates the e¤ect of the unobserved initial values, at least in this example. Appendix A The fractional coe¢ cients In this section we …rst give some results of Karamata. Because they are well known we sometimes apply them in the remainder without special reference. Lemma A.1 For m N X 0 and c < 1; (1 + log n)m n c(1 + log N )m N +1 if > 1; (32) (1 + log n)m n c(1 + log N )m N +1 if < 1: (33) n=1 1 X n=N Proof. See Theorems 1.5.8–1.5.10 of Bingham, Goldie, and Teugels (1987). We next present some useful results for the fractional coe¢ cients (2) and their derivatives. P Lemma A.2 De…ne the coe¢ cient aj = 1fj 1g jk=1 k 1 , where 1fAg denotes the indicator function for the event A. The derivatives of j ( ) are m D log j (u) m+1 = ( 1) j 1 X i=0 u)!(j + u j! 2 D j (u) = 2D j (u)(aj+u 1 a D j (u) = ( 1) u( 1 for u 6= 0; 1; : : : ; j + 1 and m (i + u)m 1)! u) for u = 0; 1; : : : ; j + 1 and j for u = 0; 1; : : : ; j + 1 and j (34) 1; (35) 2; (36) 2: Proof of Lemma A.2. The result (34) follows by taking derivatives in (2) for u 6= 0; 1; : : : ; j + 1. For u = i and i = 0; 1; : : : ; j 1 we …rst de…ne P (u) = u(u + 1) : : : (u + j noting that j (u) 1); Pk (u) = P (u) P (u) ; Pkl (u) = for k 6= l: u+k (u + k)(u + l) = P (u)=j!, see (2). We then …nd X DP (u) = Pk (u) and D2 P (u) = 0 k j 1 X Pkl (u); 0 k6=l j 1 which we evaluate at u = i for i = 0; 1; : : : ; j 1. However, for such i we …nd Pk ( i) = 0 unless k = i and Pkl ( i) = 0 unless k = i or l = i. Thus, DP (u)ju= i = Pi ( i) = ( i)( i + 1) : : : ( 1) (1)(2) : : : (j 1 i) = ( 1)i i!(j i 1)! Initial values in CSS estimation of fractional models 16 and (35) follows because D j (u) = DP (u)=j!, see (2). Similarly (36) follows from X X X D2 P (u)ju= i = Pki ( i) + Pil ( i) = 2 Pki ( i) k6=i l6=i k6=i X 1 X Pi ( i) = 2Pi ( i) = 2Pi ( i)(aj =2 k i k i k6=i k6=i For u = 0; 1; 2; : : :, we note that non-zero even for such values of j where j (u) = j Y i+u i i=1 Qj 1 = N (u) j Y N , then (1 + (u jDm j (u)j jDm N;j (u)j 0 and j 1j (u)j N;j (u) (37) = 1 for j = N . c(1 + log j)m j u 1 ; c(1 + log j)m j u 1 : 2j (u)j ju 1 (1 + (u) (38) (39) = (40) 1j (u)); ! 0 as j ! 1 for any compact set K N;j (u) where supv2K j N (u) N;j (u) 1 we also have the more precise evaluations j (u) = where supu2K j 1)=i) = i=N +1 with N;j (u) = i=N +1 (1 + (u 1)=i) for j > N and For m 0 and j 1 it holds that For m ai ): u + 1, but Dm j (u) remains = 0 for j j (u) = 0. j (u) Lemma A.3 Let N be an integer and assume j i 1 N! j u 1 (1 + (u + N ) Rnf0; 1; : : : g; and (41) 2j (u)); ! 0 as j ! 1 for any compact set K Rnf N; (N + 1); : : : g: Proof. To show (37), we …rst note QN that for j = N the result is trivial. For j > N we factor out the …rst N coe¢ cients, i=1 (i + u 1)=i = N (u). The product of the remaining coe¢ cients is denoted N;j (u). The results (38) and (40) for j (u) can be found in JN (2012a, Lemma A.5), and the results Pj (39) and (41) for N;j (u) can be found in the same way from a Taylor’s expansion of i=j0 log(1 + (u 1)=i) for j > j0 1 u: P Lemma A.4 Let aj = 1fj 1g jk=1 k 1 . Then, = 1 and 1 (u) = u for any u; D 0 (u) = 0 and Dm 1 (u) = 1fm=1g for m 1 and any u; D j (0) = j 1 1fj 1g and D2 j (0) = 2j 1 aj 1 1fj 2g ; (42) (43) (44) 0 (u) m k X n=j jDm j (0)j Dm n( cj u) = Dm 1 (1 + log j)m 1 1fj k( u + 1) Dm 1+ for m 1g cj j 1( u + 1) for m 1 and any > 0; 0 and any u; (45) (46) Initial values in CSS estimation of fractional models 1 X Dm n( u) = Dm u + 1) for m j 1( 17 0 and u > 0; (47) n=j k X n (u) k n (v) = k (u + v) for any u; v: (48) n=0 Proof of Lemma A.4. Result (42) is well known and follows trivially from (2), and (43) follows by taking derivatives in (42). Next, (44) and (45) follow from Lemmas A.2 and A.3. To prove (46) with m = 0 multiply the identity nu = u n 1 + nu 11 by ( 1)n to get u 1 u 1 u : ( 1)n 1 = ( 1)n n 1 n n Summation from n = j to n = k yields a telescoping sum such that ( 1)n k X u ( 1)n n n=j u = ( 1)k 1 ( 1)k k u j 1 1 ; 1 which in terms of the coe¢ cients n ( ) gives the result. Take derivatives to …nd (46) with m 1. From (38) of Lemma A.3, Dm k ( u + 1) c(1 + log k)m k u ! 0 as k ! 1 when u > 0 which shows (47). Finally, (48) follows from the Chu-Vandermonde identity, see Askey (1975, pp. 59–60). Lemma A.5 For any ; t 1 X n it holds that 1 (t n) 1 c(1 + log t)tmax( + 1; + 1 1; 1) (49) : n=1 For + < 1 and > 0 it holds that 1 X (k + h) 1 k 1 (1 + log(k + h))n ch (1 + log h)n : (50) k=1 Proof of Lemma A.5. (49): See JN (2010, Lemma B.4). (50): We …rst consider the summation from k = 1 to h: h1 h X (k + h) 1 k 1 (1 + log(k + h))n c(1 + log 2h)n h k=1 n c(1 + log h) Z 1 1 h X k ( + 1) h k=1 1 (1 + u) u 1 1 k ( ) h 1 du: 0 The integral is …nite for > 0 and all because 1 1 + u 2. To evaluate the summation from k = h+1 to 1 we note that log(k+h) log(2k) for h k: This gives the bound 1 1 X X (k + h) 1 k 1 (1 + log(k + h))n c (h + k) 1 k 1 (1 + log k)n k=h+1 c k=h+1 1 X k k=h see (33) of Lemma A.1. + 2 (1 + log k)n ch + 1 c log k (1 + log h)n ; Initial values in CSS estimation of fractional models Lemma A.6 For d > 1=2 and 2d 1 X d n=0 1 X d n=0 1 1 n 1 n n @ @u d u > 0 it holds that 1 d 1 n 18 u (2d 1 (d) (d = u u) = u) 2d d 2d 2 ( (2d d 1 = u=0 2 u 1 1) ; (d)): Proof of Lemma A.6. With the notation a(n) = a(a + 1) (a + n 1), Gauss’s Hypergeometric Theorem, see Abramowitz and Stegun (1964, p. 556, eqn. 15.1.20), shows that 1 X a(n) b(n) n=0 For a = b) for c > a + b: b) d + 1 + u, and c = 1, we have c d + 1, b = 1 X d n=0 c(n) n! (c) (c a (c a) (c = 1 d 1 n n a b = 2d u > 0 so that 1 1 X ( d + 1)(n) ( d + 1 + u)(n) = n! n! n=0 u (1) (2d 1 u) = (d) (d u) = 2d 2 d u 1 with derivative with respect to u as given, using @ log (d + u)=@uju=0 = Appendix B (d). Asymptotic analysis of the derivatives We …rst analyze d0 (Xt C) and introduce some notation. From Lemma 1 we have an expression for Xt ; t = 1; : : : ; N + T; and we insert that into d0 Xt and …nd, using d0 Xt = P t 1 N + 1 we have n=0 n ( d)Xt n and (46), that for t d 0 (Xt d N Xt C) = t 1 X + n( d)Xt n d d0 "t N + t 1 X n=t N d d0 "t N = d d0 f N n( + n( d)C n=0 n=t N = t 1 X t+N 0 1 X d0 )Xt t+N0 1 ( n n=t N d)Xt t (d) n( n t 1( 0t (d)(C d0 + 1) 0 g d + 1)C (51) 0 ); where t (d) = tX 1 N k (d0 d) + k=0 k (d0 t n k ( d0 )Xn + d) t+N0 k 1 ( N X t n( d)Xn n=1 n= N0 +1 k=0 tX 1 N N X d0 + 1) 0 t 1( d + 1) 0 : Initial values in CSS estimation of fractional models d 0 (Xt The derivatives of C) with respect to d, evaluated at d = d0 , are of the form d0 0 (Xt Dm 19 + C) = Smt + mt (d0 ) mt (d0 )(C (52) 0 ); where mt (d) = ( 1)m Dm + and the stochastic term Smt is de…ned, for t Smt = ( 1)m + = ( 1)m Smt 1 X Dm k=0 tX 1 N d + 1) N + 1, as + = Smt + Smt ; k (0)"t k Dm t 1( k (0)"t k and Smt = ( 1)m 1 X Dm k (0)"t k : Dm t n( k=t N k=0 The main deterministic term is mt (d) m+1 = ( 1) N X [ n= N0 +1 tX 1 N Dm tX 1 N m D k (0) t k n ( d)Xn N X d)Xn (53) n=1 k=0 k (0) t+N0 k 1 ( d + 1) 0 + Dm t 1( d + 1) 0 ]: k=0 P +T P1 2 We use the notation hu; viT = 0 2 N 0 t=N +1 ut vt ! t=N +1 ut vt = hu; vi, if the limit exists. We …rst give the order of magnitude of the deterministic terms and product moments containing these. Lemma B.1 The functions j j mt (d) satisfy ct d ; 0t (d)j mt (d)j c(t (54) min(1;d)+ N) for m For d > 1=2 it follows that, for any 0 < < min(d; 2d h max(jh 0 ; jh m ; n iT m; 1 iT j; jh 1 ; n iT j !h 0 iT j) N + 1, and any 1, t m; ni c(1 + N ) c(1 + N ) > 0: (55) 1), < 1; m; n (56) 0; min(d;2d 1)+ ; m; n min(d;2d 1)+ : (57) 0; (58) If N = 0 it holds that h 0; 0 iT ! 2 0 2d 2 d 1 and h 0 ; 1 iT 2 ! 0 2d 2 ( (2d d 1 1) (d)) : (59) If Assumption 1 holds then + hSm ; + hSm ; + where E(hSm ; n iT ) + = E(hSm ; n iT ) n iT n iT P + ! hSm ; P + ! hSm ; + = E(hSm ; n i; m; n 0; (60) n i; m; n 0; (61) n i) + = E(hSm ; n i) = 0. Initial values in CSS estimation of fractional models Proof of Lemma B.1. (54): The expression for 0 X 0t (d) = t n( d)Xn + t n( d)(Xn is 0t (d) t+N0 1 ( 20 d + 1) t 1( 0 d + 1) 0 n= N0 +1 0 X = (62) 0 ); n= N0 +1 P see (46) of Lemma A.4. Using the bound j t n ( d)j c(t n) d 1 we …nd 0n= N0 +1 j t n ( d)j ct d for n 0, see (38) of Lemma A.3, and the result follows. (55): The remaining deterministic terms with m 1 are evaluated using j( 1)m+1 Dm k (0)j ck 1+ 1fk 1g for > 0, see (45) of Lemma A.4, and we …nd, for t N + 1, j mt (d)j 1 tX 1 N X c n= N +c c[ 1+ k (t k + n) d 1 +c k=1 tX 1 N k k 1+ 1+ (t n) d 1+ n=1 k=1 tX 1 N N X (t + N0 (t k k N) d d 1) + c(t + (t N) 1) d+ d+ ] k=1 c[(1 + log(t N ))(t N) min(1;d)+ + (t N) d+ ] c(t N) min(1;d)+2 ; where we have used (49) of Lemma A.5. (56): From (55) we …nd j mt (d) nt (d)j c(t N ) 2 min(1;d)+2 so that jh m ; m ij < 1 by choosing 2 < 2 min(1; d) 1 = min(1; 1), which is possible for d > 1=2. P2d 1 t min(1;d)+ (t + N 1) d . If 1=2 < d < 1 we apply (57): Similarly we …nd h n ; m iT c t=1 P1 c(1 + N )1 2d+ , and if d 1 (50) of Lemma A.5 to obtain the result t=1 (t + N ) d t d+ d d+2 2 we use (t + N ) (1 + N ) t for 2 < d and …nd 1 X (t + N ) d t 1+ (1 + N ) d+2 t=1 1 X t 1 c(1 + N ) d+2 t=1 (58): The proofs for h 0 ; 1 iT and h 1 ; 0 iT are the same as for (57). P P 2 (59): For N = 0 we …nd h 0 ; 0 iT = Tn=01 d n 1 and h 0 ; 1 iT = 21 D Tt=1 the result follows from Lemma A.6. (60): We have 1 X + Smt nt (d) : = t=N +T +1 = 1 X t=N +T +1 1 X [ nt (d)( 1) m+1 tX 1 N Dm 2 0t (d) t k (0)"k k=1 1 X k=1 t=max(T;k)+N +1 nt (d)( 1)m+1 Dm t k (0)]"k : such that (63) Initial values in CSS estimation of fractional models 21 For some small > 0 to be chosen subsequently, we use the evaluations j nt (d)j c(t min(1;d)+ m 1+ min(1;d)+ min(1;d)+ N) , jD t k (0)j c(t k) 1ft k 1g , and t = (t k + k) (t k) 2 k min(1;d)+3 . Then V ar( 1 X + Smt nt (d)) c t=N +T +1 c 1 X [ 1 X min(1;d)+ t k=1 t=max(T;k)+1 1 X 2 min(1;d)+6 k 1 X [ k=1 (t + N (t k) 1 k) 1+ 2 ] ]2 : t=max(T;k)+1 P P 2 min(1;d)+6 For T ! 1 we have 1 k) 1 ! 0, and because 1 <1 t=max(T;k)+1 (t k=1 k P1 + we get by dominated convergence that V ar( t=N +T +1 Smt nt (d)) ! 0. This shows that P1 P + + + ; ni = P ; n iT ! hSm hSm t=N +1 Smt nt (d). P1 P1 + m+1 m (61): We use (63) and …nd 1 D t k (0)]"k , t=N +T +1 Smt nt (d) = k=1 [ t=max(T;k)+1 nt (d)( 1) and the proof is completed as for (60). We next de…ne the (centered) product moments of the stochastic terms, + MmnT 2 = 0 T 1=2 N +T X + + (Smt Snt + + E(Smt Snt )); (64) t=N +1 as well as the product moments derived from the corresponding stationary processes, 2 MmnT = 0 T 1=2 N +T X (Smt Snt E(Smt Snt )): t=N +1 The next two lemmas give their asymptotic behavior, where we note that + + E(S0t Smt ) = E(S0t Smt ) = 0 for m 1: P1 Lemma B.2 Suppose Assumption 1 holds and let 2 = j=1 j P1 3 ' 1:2021, see (16). Then 3 = j=1 j 2 E(M01T ) = 2 0 T 1 N +T X 2 E(S1t )= (65) 2 = 2 =6 ' 1:6449 and (66) 2; t=N +1 E(M01T M02T ) = 4 0 T N +T X 1 E(S0t S1t S0s S2s ) = 2 0 s;t=N +1 E(M01T M11T ) = 4 0 T 1 N +T X T 1 N +T X E(S1t S2t ) = 2 3; t=N +1 2 E(S0t S1t S1s )= 4 3; (67) (68) s;t=N +1 E(hS0+ ; + 0 iT hS1 ; 0 iT = = 4 0 N +T X + E(S0s + 0s (d)S1t 0t (d)) s;t=N +1 2 0 X N s<t N +T 1 (t s) 1 t( d + 1) s ( d + 1): (69) Initial values in CSS estimation of fractional models 22 It follows that, for N = 0 and T ! 1; 0;T (d) E(hS0+ ; 0 iT hS1+ ; h 0 ; 0 iT = 0 iT ) ! ( (2d 1) (70) (d)): Furthermore, for T …xed and N ! 1; see also (37), N;T (d) = P N s<t N +T 1 (t P 1 s) N;t ( N;t N +1 t N +T T 1 X d + 1) N;s ( d + 1) ! t 2 1 ( d + 1) t=1 P1 Proof of Lemma B.2. (66): From S0t = "t , S1t = 2 E(M01T ) = 4 0 E[T 1=2 N +T X "t t=N +1 1 X 1 2 2 k "t k ] = 0 T 1 k 1 "t k , and (65) we …nd N +T X 1 1 X X 1 2 E[ k "t k ] = k (67): We …nd using the expressions for S0t ; S1t ; and S2t = 2 together with (65) that E(M01T M02T ) = 2 4 0 T 1 E[ N +T X "t t=N +1 1 X 1 k "t k ][ N +T X k=1 P1 j=2 "s 1 X 1 j (j 1 2 = 2: k=1 aj 1 "t j , aj = 1fj aj 1 )"s j ] = 2 0 1g 1 T j=2 s=N +1 k=1 1)=T: (71) (T k=1 t=N +1 k=1 1 Pj k=1 N +T X t=N +1 and 2 0 T 1 N +T X E(S1t S2t ) = 2 2 0 T t=N +1 1 N +T X E[ t=N +1 = 2T 1 N +T X 1 X t=N +1 j=2 1 X k 1 "t k j 1 j X (j 1 aj 1 )"t j ] j=2 k=1 2 1 X k k=1 1 = 2 1 X j j=2 2 j 1 X k 1 = 2 3 (72) k=1 Pj 1 1 P 2 = 1 k=1 k . We thus need to show thatp 3 = 3 . j=2 j 1 is the imaginary unit, c( ) = Let f ( ) = log(1 ei ) = 21 c( ) + i ( ), where i = i log(2(1 cos( )), ( ) = arg(1 e ) = ( )=2 for 0 < < , and ( ) = ( ). The transfer function of Smt is Dm (1 ei )d d0 jd=d0 = f ( )m , so that the cross spectral density 2 R between Smt and Snt is f ( )m f ( )n and E(Smt Snt ) = 2 0 f ( )m f ( )n d . Because c( ) is symmetric around zero and ( ) is anti-symmetric around zero, i.e. ( ) = ( ), it follows that for 3 c( )3 = (f ( ) + f ( ))3 = f ( )3 + 3f ( )2 f ( ) + 3f ( )f ( )2 + f ( )3 : Noting that the transfer function of S0t = "t is f ( )0 = 1 and integrating both sides we …nd 2 Z 0 c( )3 d = E(S3t S0t ) + 3E(S2t S1t ) + 3E(S1t S2t ) + E(S0t S3t ): 2 The left-hand side is given as 12 02 3 in Lieberman and Phillips (2004, p. 478) and the right-hand side is 12 02 3 from (65) and (72), which proves the result. k 1, E(S1t S2t ) Initial values in CSS estimation of fractional models 23 (68): We …nd, using the expressions for Smt and (65), that E(M01T M11T ) is 4 0 T 1 N +T X 2 E(S0t S1t S1s )= N +T X 1 T s;t=N +1 s;t=N +1 t 1 X E["t s 1 X (t k) 1 "k (s j) 1 "j j= 1 k= 1 s 1 X (s i) 1 "i ]: i= 1 The only contribution comes for t = j > k = i or t = i > k = j and therefore t < s. These two contributions are equal, so we …nd, using s k = s t + t k, 2T 1 N +T X N +T X t 1 X 1 1 (t k) (s t) (s k) 1 = 2T t=N +1 s=t+1 k= 1 2 [v k[ 1 2] and v = t 1 + (u v) ]u 2 =4 1 X 2 u u=2 which proves (68). + + (69): From S0s = "s , S1t = t 1 ( d + 1) we get + 0 iT hS1 ; t 1 X [(t k) 1 +(s t) 1 ](s k) 2 : v < u] and …nd k [1 u=2 v=1 EhS0+ ; N +T X t=N +1 s=t+1 k= 1 Next we introduce u = s 1 X u 1 X N +T X 1 0 iT = Pt 1 N k=1 4 0 N +T X k 1 "t + E(S0s k u 1 X v 1 =4 3 = 4 3; v=1 Pt 1 k=N +1 (t = k) 1 "k , and 0t (d) = + 0s (d)S1t (d) 0t (d) s;t=N +1 2 = 0 X (t s) 1 t 1( d + 1) s 1( d + 1): N +1 s<t N +T (70): For N = 0 we use D X D 0 s<t<1 1 X D = t=1 t s (u)ju=0 t s (u)ju=0 s ( = (t s) 1 1ft d + 1) t ( d + 1) = s 1g 1 X and …nd the limit D t ( d + 1 + u)ju=0 t ( d + 1) t=1 d 1 t u ju=0 d 1 t = 2d 2 ( (2d d 1 1) (d)) using (48) and Lemma A.6. From (59) we …nd the limit of h 0 ; 0 iT . (71): From (37) weP …nd the representation in (71), where we have cancelled the factor 2 2 ( d+1) . Note that N +1 t N +T N;t 1 ( d+1)2 N;N ( d+1) = 1 and N;t ( d+1) = QNt P d=i) ! 1 for N ! 1 and t N + 1, so that N;T (d) ! T 1 N s<t N +T 1 (t i=N +1 (1 PT 1 1 s) 1 = i=1 i (T 1)=T . Lemma B.3 Suppose Assumption 1 holds. Then, for T ! 1, it holds that fMmnT g0 m;n 3 are jointly asymptotically normal with mean zero, and some variances and covariances can be calculated from (66), (67), and (68) in Lemma B.2. It follows that the same holds for + fMmnT g0 m;n 3 with the same variances and covariances. Initial values in CSS estimation of fractional models 24 Proof of Lemma B.3. fMmnT g: We apply a result by Giraitis and Taqqu (1998) on limit distributions of quadratic forms of linear processes. We de…ne the cross covariance function rmn (t) = E(Sm0 Snt ) = 1)m+n 2 0( 1 X Dm n k (0)D t+k (0) k=0 and …nd r00 (t) = 02 1ft=0g , rm0 (t) = 02 ( 1)m Dm t (0)1ft For m; n 1 we …nd that jrmn (t)j is bounded for a small c 1 X (1 + log(t + k))m 1 (1 + log k)n 1 (t + k) 1 k 1 c k=1 2 0( 1g , and r0n (t) = > 0 by 1 X (t + k) 1+ k 1+ 1)n Dn t (0). ct 1+3 ; k=1 P1 using the bound (t + k) 1+ k 2 t 1+3 . Thus t= 1 rmn (t)2 < 1, and joint asymptotic normality of fMmnT g0 m;n 3 then follows from Theorem 5.1 of Giraitis and Taqqu (1998). The asymptotic variances and covariances can be calculated as in (66), (67), and (68) in Lemma B.2. + + fMmnT g: We show that E(MmnT MmnT )2 ! 0. We …nd MmnT + MmnT 2 = 0 T N +T X 1=2 + + (Smt Snt + Smt Snt + Smt Snt + + E(Smt Snt + Smt Snt + Smt Snt )); t=N +1 (73) and show that the expectation term converges to zero and that each of the stochastic terms has a variance PN +Ttending+to zero. + T 1=2 t=N +1 E(Smt Snt + Smt Snt + Smt Snt ) ! 0: The …rst two terms are zero because + Smt and Snt are independent. For the last term we …nd using (45) of Lemma A.4 that jE(Smt Snt )j = 1 X 2 0 m k=t N so that T 1=2 n jD k (0)D k (0)j c 1 X k 2+ c(t N) 1+ ; k=t N N +T X E(Smt Snt ) cT 1=2+ t=N +1 ! 0: (74) PN +T + V ar(T 1=2 t=N +1 Smt Snt ) ! 0: The …rst two terms of (73) are handled the same way. + + We …nd because (Smt ; Sns ) is independent of (Smt ; Sns ) that V ar(T 1=2 N +T X + Snt ) Smt 1 =T t=N +1 N +T X + + Sns ) E(Smt Snt Sms tX 1 N = jE( m D t i (0)"i X i=1 sX 1 N (t i) 1+ (s N +T X + + E(Smt Sms )E(Snt Sns ): s;t=N +1 > 0, we …nd for jDm t i (0)j c(t min(s;t) 1 N m D s j (0)"j )j = j=1 i=1 min(s;t) 1 N c =T s;t=N +1 Then replacing the log factors by a small power, i) 1 (1 + log(t i))m c(t i) 1+ that + + Sms )j jE(Smt 1 2 0 X i=1 i) 1+ : jDm m t i (0)D s i (0)j Initial values in CSS estimation of fractional models Now take s > t and evaluate (s + + jE(Smt Sms )j i) c(s 1+ t) = (s tX 1 N 1+3 E(Snt Sns ) = E( n D jE(Snt Sns )j c N X t i (0)"i i= 1 we …nd i) (t 1+ 1 i) (s t) c(s t) 1+3 1+3 (t 2 i) and : i=1 Similarly for N X t+t 25 n D s j (0)"j ) 2 0 = j= 1 N X (t 1+ i) i= 1 c(s N X Dn n t i (0)D s i (0) i= 1 (s i) 1+ =c 1 X 1+ (t + i) 1+ (s + i) i= N t) 1 X 1+3 (t + i) 1 1+3 c(s t) t) 1+3 (t cT 1 (t N) : i= N Finally, we can evaluate the variance as 1=2 V ar(T N +T X + Smt Snt ) t=N +1 1=2 PN +T t=N +1 1=2 V ar(T 1 T 1 X T N +T X N ) (s t) 1+3 T Xh t T1 t=1 ! 0: Smt Snt ) ! 0: The last term of (73) has variance Smt Snt ) = T 1 E([ t=N +1 1 h 2+6 h=1 N +T X and the …rst term is T (s N +1 t<s N +T = cT V ar(T X 1 cT N +T X Smt Snt ]2 ) T 1 [ s;t=N +1 N X E(Smt Snt )]2 t=N +1 t=N +1 PN +T 1 N +T X E(Smt Snt Sms Sns ) which equals E(Dm n m n t i (0)"i D t j (0)"j D s k (0)"k D s p (0)"p ): s;t=N +1 i;j;k;p= 1 There are contributions from E("i "j "k "p ) in four cases which we treat in turn. P +T 2 Case 1, i = j 6= k = p: This gives the expectation squared, T 1 [ N t=N +1 E(Smt Snt )] ; which is subtracted to form the variance. Cases 2 and 3, i = k 6= j = p and i = p 6= j = k: These are treated the same way. We …nd for Case 2 the contribution N +T 1 X X 1 A1 cT (1 + log(t + i))m (1 + log(s + i))m (t + i) 1 (s + i) 1 s;t=N +1 i= N 1 X (1 + log(t + j))n (1 + log(s + j))n (s + j) 1 (t + j) 1 j= N cT 1 N +T X [ 1 X s;t=N +1 i= N (t + i) 1+ (s + i) 1+ 2 ] cT 1 X [ 1 X N +1 t<s N +T i= N (t + i) 1+ (s + i) 1+ 2 ]: Initial values in CSS estimation of fractional models 1+ We evaluate (s + i) 1 X (t + i) 1+ = (s 1 X 1+ (s + i) 1+ t + t + i) i= N (s (s 1+3 t) 1+3 t) (t + i) 2 (t + i) 1 (s 26 so that 1+3 t) (t N) i= N and hence A1 cT X 1 (s t) 2+6 (t N) 2 = cT N +1 t<s N +T 1 T 1 X h T Xh 2+6 t 2 cT 1 T1 2 ! 0: 2 2 ! 0: t=1 h=1 Case 4, i = j = p = k: This gives in the same way the bound T 1 N +T X 1 X 2+ (t + i) (s + i) 2+ cT 1 1 N +T X X [ (t + i) 2 2 ] cT 1 X i=1 i= N t=N +1 s;t=N +1 i= N 1 i We now apply the previous Lemmas B.1, B.2, and B.3, and …nd asymptotic results for the derivatives Dm L (d0 ). Lemma B.4 Let the model for the data Xt ; t = 1; : : : ; N +T; be given by (4) and let Assumptions 1 and 2 be satis…ed. Then the (normalized) derivatives of the concentrated likelihood function L (d), see (12), satisfy 2 0 T 1=2 2 T 1 D L (d0 ) = B0 + T 2 T 1 D3 L (d0 ) = C0 + OP (T 0 0 DL (d0 ) = A0 + T 2 1=2 1=2 A1 + OP (T 1 ); (75) B1 + OP (T 1 (log T )); (76) 1=2 (77) ); where + A0 = M01T ; E(A1 ) = N;T (d0 ) + + + B0 = 2 ; B1 = M11T + M02T ; C0 = 6 3 : N;T (d0 ); (78) (79) (80) + Here N;T (d0 ); N;T (d0 ); and MmnT ; are given in (21), (23), and (64), respectively, and 2 = =6 and ' 1:2021; see (16). 2 3 The (normalized) derivatives of Lc (d), see (15), satisfy (75)–(77) and (79)–(80), but (78) is replaced by + C ; E(A1 ) = N;T (d0 ); (81) A0 = M01T where C N;T (d0 ) is given by (22). Proof of Lemma B.4. The concentrated sum of squared residuals is given in (12). We note that the …rst term is OP (T ), and from Lemmas B.1 and B.2 the next is OP (1), so the ^ However, for the bias we second term has no in‡uence on the asymptotic distribution of d. need to analyze it further. Initial values in CSS estimation of fractional models 27 We need an expression for the derivatives of the concentrated likelihood, i.e., Dm L (d). Recall L(d; ) from (11) and denote derivatives with respect to d and by subscripts. Then L (d) = L(d; (d)) and therefore DL (d) = Ld (d; (d)) + L (d; (d)) d (d) D2 L (d) = Ldd (d; (d)) + 2Ld (d; (d)) d (d) + L (d; (d)) d (d)2 + L (d; (d)) dd (d); but ^ is determined from L (d; (d)) = 0, which implies Ld (d; (d))+L (d; (d)) d (d) = 0, and hence DL (d) = Ld (d; (d)); (82) 2 Ld (d; (d)) : L (d; (d)) D2 L (d) = Ldd (d; (d)) (83) We …nd the derivatives for d = d0 and suppress the dependence on d0 in the following. Thus 0t = 0t (d0 ) and 1t = 1t (d0 ), etc. + (75) and (78): We …nd from (52) that Dm d00 (Xt ) = Smt + mt mt ( 0 ), and therefore from (82), 2 0 T 1=2 2 DL = 0 N +T X 1=2 T + (S0t + 0t + 0 ) 0t )(S1t (^ + 1t (^ 0 ) 1t ); t=N +1 = (hS0+ ; 0 iT + h 0 ; 0 iT )=h 0 ; 0 ) = h 0 ; 0 iT =h 0 ; 0 iT we get where ^ 0 = ^ (d0 ) Cov(X; Y ) and E(^ E( 2 0 T 1=2 0 2 DL ) = 0 1=2 T N +T X ( 0t t=N +1 + 2 0 T 1=2 N +T X t=N +1 The …rst term is T 1=2 equal to T 1=2 times N;T , + 0 iT hS0 ; 1 iT EhS0+ ; h h 0; = h 0; h 0; h 0; + Cov((S0t 0 iT 0 iT 0 iT . Since E(XY ) = E(X)E(Y ) + 0t )( 1t hS0+ ; 0 iT h 0 ; 0 iT h 0; h 0; 0 iT 0 iT + 0t ); (S1t 1t ) hS0+ ; 0 iT h 0 ; 0 iT 1t )): + + see (21). The second term is, using Cov(S0t ; S1t ) = 0, see (65), EhS1+ ; 0 iT hS0+ ; 0 iT EhS0+ ; 0 i2T h 0 ; 1 iT + h 0 ; 0 iT h 0 ; 0 i2T 0 ; 0 iT EhS1+ ; 0 iT hS0+ ; 0 iT h 0 ; 1 iT EhS1+ ; 0 iT hS0+ ; 1 iT + = h 0 ; 0 iT h 0 ; 0 iT h 0 ; 0 iT 0 iT 0 iT = N;T ; see (69) and (23). (76) and (79): The …rst term of T 1 D2 L in (83) is analyzed below and is of the order of 1 and T 1=2 . In the second term of (83) we …nd L (d0 ; (d0 )) = 02 h 0 ; 0 iT = O(1) and Ld (d0 ; (d0 )) = T 1 N +T X + + 0t (S0t t=N +1 (^ 0 ) 0t ) 1t +T 1 N +T X t=N +1 + 0t (S1t + 1t (^ 0 ) 1t ) = OP (1); Initial values in CSS estimation of fractional models 1 and hence T 2 0 1 T Ld (d0 ; (d0 ))2 =L (d0 ; (d0 )) = OP (T 2 2 DL = 0 N +T X 1 T + (S1t + (^ 1t 0 ) 1t ) 1 28 ) and can be ignored. Thus we get 2 t=N +1 2 + 0 N +T X 1 T + + (S0t (^ 0t + 0 ) 0t )(S2t + (^ 2t 0 ) 2t ) + OP (T 1 ): t=N +1 By Lemma B.1 it holds that h 2 0 1 T 2 2 DL = 0 + = O(1) and hSm ; m ; n iT 1 T N +T X + 2 E(S1t ) +T 1=2 n iT = OP (1) such that + + (M11T + M02T ) + OP (T 1 ) t=N +1 = 2 +T 1=2 + + (M11T + M02T ) + OP (T 1 (log T )) using also (66) and (74). (77) and (80): For the third derivative it can be shown that the extra terms involving derivatives d (d0 ) and dd (d0 ) can be ignored and we …nd 2 0 1 T 3 2 DL = 0 N +T X 1 3T + (S1t + + 0 ) 1t )(S2t (^ 1t + 2t (^ 0 ) 2t ) t=N +1 2 + 0 N +T X 1 T + + (S0t 0t (^ + 0 ) 0t )(S3t + (^ 3t 0 ) 3t ) + OP (T 1 ) t=N +1 1=2 = 3T + M12T +3 2 0 1 T N +T X + + E(S1t S2t ) + T 1=2 + M03T + OP (T 1 )= 6 3 + OP (T t=N +1 where the second-to-last equality uses Lemma B.1 and the last equality uses Lemmas B.2 and B.3, (67), and (74). (81): For the function Lc (d), see (15), we …nd 2 0 1=2 T 2 DLc = 0 T 1=2 N +T X + (S0t + 0t (C + 0 ) 0t )(S1t + 1t (C 0 ) 1t ); t=N +1 with expectation given by 2 0 T 1=2 N +T X ( 0t (C 0 ) 0t )( 1t (C 0 ) 1t ) + 2 0 T t=N +1 = 2 0 T 1=2 N +T X 1=2 N +T X + + Cov(S0t ; S1t ) t=N +1 ( 0t (C 0 ) 0t )( 1t (C 0 ) 1t ) =T 1=2 C N;T (d0 ): t=N +1 The remaining derivatives give the same results as for L . Notice that the two factors in the sum in the score are independent so there is no term corresponding to N;T . 1=2 ); Initial values in CSS estimation of fractional models Appendix C C.1 29 Proofs of main results Proof of Theorem 1 We …rst show that the likelihood functions have no singularities. When t N + 1 we can use the decomposition t 1 ( d + 1) = N ( d + 1) N;t 1 ( d + 1), see (37). We …nd in the second term of L (d) in (12) that the factor N ( d + 1)2 cancels and P +T PN +T 2 d d 2 ( t=N [ N +1 ( 0 Xt ) 0t (d)) t=N +1 ( 0 Xt ) N;t 1 ( d + 1)] = : PN +T PN +T 2 2 (d) ( d + 1) 0t N;t 1 t=N +1 t=N +1 P +T 2 2 This is a di¤erentiable function of d because N N;N ( d + 1) = 1, t=N +1 N;t 1 ( d + 1) see (37). Note, however, that ^(d) has singularities at the points d = 1; 2; : : : ; N . ^ The proof for d^c is similar, but simpler because in We next discuss the estimator d. that case ^(d) = C does not depend on d. The arguments generally follow those of JN (2012a, Theorem 4) and Nielsen (2015, Theorem 1). To conserve space we only describe the di¤erences in detail. C.1.1 Existence and consistency of the estimator The function L (d) in (12) is the sum of squares of d 0 (Xt ^(d)) = d d0 "t N + t (d) (^(d) 0 ) 0t (d); see (51), so that we need to analyze product moments of the terms on the right-hand side, appropriately normalized. The deterministic term t (d) was analyzed under the assumption of bounded initial values in JN (2012a, Lemma A.8(i)) as Dit ( ) with b = d, i = k = 0, and 0 = 0 = 0, where it was shown that sup 1=2 d d0 d d0 j t (d)j ! 0 and sup d d0 d d0 1=2 max jtd d0 +1=2 1 t T t (d)j ! 0 as t ! 1: d d0 This shows that t (d) is uniformly smaller than N "t (appropriately normalized on the intervals 1=2 d d0 d d0 and d d0 d d0 1=2 ), and is enough to show that in the calculation of product moments we can t (d), which will be done below. Pignore N +T The product moment of the stochastic term, t=N +1 ( dN d0 "t )2 , is analyzed in Nielsen (2015) under Assumption 1 of …nite fourth moment. Following that analysis, for some 0 < < 1=2 to be determined, we divide the parameter space into intervals where dN d0 "t is nonstationary, “near critical”, or (asymptotically) stationary according to d d0 1=2 ; 1=2 d d0 1=2 + , or 1=2 + d d0 . Clearly, d0 is contained in the interval 1=2 + d d0 , and we show that on this interval the contribution from the second term in the objective function RT (d) = T 1 N +T X ( d d0 "t )2 N T =T N +T X ( t=N +1 P +T d [ N N t=N +1 PN +T t=N +1 t=N +1 1 1 d d0 "t )2 N T 1 AT (d) d0 "t N;t 1 (1 N;t 1 (1 d)]2 d)2 2 BT (d) ; (84) Initial values in CSS estimation of fractional models 30 say, is negligible. It then follows that the objective function is only negligibly di¤erent from the objective function obtained without the parameter , see e.g. Nielsen (2015), and existence and consistency of d^ follows for the interval d d0 1=2 + . The two intervals covering d d0 1=2 + require a more careful analysis, which is given subsequently. Following the strategy of JN (2012a) and Nielsen (2015), we show that for any K > 0 there exists a (…xed) > 0 such that, for these intervals, P (inf RT (d) > K) ! 1 as T ! 1: This implies that P (d^ 2 fd : d d0 parameter space is reduced to fd : d has already been shown. C.1.2 (85) 1=2 + g) ! 1 as T ! 1, so that the relevant d0 1=2 + g on which existence and consistency Tightness of product moments We want to show that the remainder term, T 1 AT (d)2 =BT (d), in (84) is dominated by the …rst term on various compact intervals. The function BT (d) is discussed below, and we want to …nd the supremum of the suitably normalized product moment MT (d) = T + d (log T ) AT (d) by considering it a continuous process on a compact interval K; that is, we consider it a process in C(K), the space of continuous functions on K endowed with the uniform topology. The usual technique is then to prove that the process MT is tight in C(K), which implies that also supd2K jMT (d)j is tight, by the continuity of the mapping f 7! supu2K jf (u)j, that is supd2K MT (d) = OP (1). Tightness of MT can be proved by applying Billingsley (1968, Theorem 12.3), which states that it is enough to verify the two conditions EMT (d0 )2 E(MT (d1 ) MT (d2 ))2 c; c(d1 2 d2 ) for d1 ; d2 2 K: (86) (87) In one case we will also need the weak limit of the process MT , and in that case we apply Billingsley (1968, Theorem 8.1), which states that if MT is tight then convergence of the …nite dimensional distributions implies weak convergence. Thus, instead of working with the processes themselves, we need only evaluate their second moments and …nite dimensional distributions. Speci…cally, by a Taylor series expansion of the coe¢ cients we …nd m (d0 = d1 ) N;t+m 1 (1 (d1 d2 )fD m (d0 d1 ) dm;t ) m (d0 N;t+m d2 ) N;t+m 1 (1 d2 ) dm;t ) + m (d0 dm;t )D 1 (1 N;t+m 1 (1 dm;t )g for some dm;t between d1 and d2 . It follows that if d1 and d2 are in the interval K, then also dm;t 2 K, so that any uniform bound we …nd for EDMT (d)2 for d 2 K will also be valid for dm;t . This shows that to prove tightness of MT (d), it is enough to verify sup EMT (d)2 c and sup E(DMT (d))2 d2K C.1.3 c: (88) d2K Evaluation of product moments We evaluate product moments on intervals of the form d 1=2 or d 1=2 ; as well as d d0 1=2 or d d0 1=2 . Some of these intervals may be empty, depending on d and d, in which case the proof simpli…es easily, so we proceed assuming all intervals are non-empty. Initial values in CSS estimation of fractional models P +T The product moment BT (d) = N d)2 : We …rst …nd that t=N +1 N;t 1 (1 inf BT (d) 31 (89) 1 d 0 because BT (d) d) = 1: N;N (1 Next there are constants c1 ; c2 such that 0 < c1 T 2d 1 BT (d) sup c2 < 1: d d 1=2 (90) This follows from (41) because T 2d 1 N +T X N! )2 T d) = ( (1 d + N ) N;j (1 t=N +1 inf d 1=2+ T 2d 1 BT (d) T BT (d) cT 1 N +T X t ( )2 T t=N +1 The product moment AT (d) = AT (d) = N +T X "t 1 c 2d which again follows from (41) because (t=T ) 2d 1 t ( ) T t=N +1 Z cT 1 (1 + 2t (d)); T u2 1 PN +T d d0 "t N;t 1 (1 N t=N +1 N;t (d) = NX +T t (91) which implies that du = c 1 N +1 N;t (d); 2d ((N + 1)=T )2 ; 2 1 (t=T )2 2 ((N + 1)=T )2 : 2 d): We …nd that m (d0 d) N;t+m 1 (1 d): m=0 t=N +1 From (38) and (39) we …nd j EAT (d)2 = N +T X ] to (N != (1 d+N ))2 =(1 2d) which is bounded 1 2d: which converges uniformly in d 2 [d; 1=2 between c1 and c2 because 2 1 2d Finally, 1=2 1 2 N;t (d)j N +T X 2 0 c PN +T t m=0 2 N;t (d) t=N +1 c md0 N +T X (t + m) d ; and NX +T t f t=N +1 d 1 md0 m=0 d 1 (t + m) d g2 ; while DAT (d) contains an extra factor log(m(t + m)). We give in Table 2 the bounds for EAT (d)2 for various intervals and normalizations. These follow from …rst using the inequalities (t + m) d (t + m) 1=2+ when d 1=2 and T d (m + t) d ((m + t)=T ) 1=2+ when d 1=2 ; and similarly for d d0 . We then apply the result that T 1 N +T X fT t=N +1 1 NX +T t ( m=0 because the left-hand side converges to m ) T 1=2+ R1 R1 f 0 0 v ( u t+m ) T 1=2+ 1=2+ (u + v) g2 = O(1) 1=2+ dug2 dv: Initial values in CSS estimation of fractional models Second moment EAT (d)2 ET 2d AT (d)2 ET 2(d d0 +1) AT (d)2 ET 4d 2d0 +2 AT (d)2 32 Table 2: Bounds for AT (d) d d0 Upper bound on second moment P P 1=2 ( m 1=2+ (t + m) 1=2+ )2 P P 1=2 ( m 1=2+ ( t+m ) 1=2+ )2 P P m 1=2+ T 1=2 ( (T ) (t + m) 1=2+ )2 P P m 1=2 ( ( T ) 1=2+ ( t+m ) 1=2+ )2 T d 1=2 1=2 1=2 1=2 Order T 1+2 +2 T 2+2 T 2+2 T3 Note: Uniform upper bounds on the normalized second moment of AT (d) for di¤erent restrictions on d and d d0 . The bounds are also valid if we replace by or by . Table 3: Bounds for CT;M (d) d d0 Upper bound on second moment Order PP 1=2+ 1=2 + m) 1=2+ )2 M 1+2 T 2 P(P m 1=2+ (tt+m 1=2 ( m ( T ) 1=2+ )2 M 1+2 T Second moment d ECT;M (d)2 1=2 2d 2 ET CT;M (d) 1=2 Note: Uniform upper bounds on the normalized second moment of CT;M (d) for di¤erent restrictions on d and d d0 . PM 1 PN +T d)"t n g N;t 1 (1 f The product moment CM;T = n (d0 n=0 t=N +M +1 we analyze another product moment, which we …nd by truncating the sum Pt N 1 d)"t n at M = T for < 1; and de…ne n (d0 n=0 CT;M (d) = N +T X d): Now d d0 "t = N min(M 1;N +T t) "t N;M;t (d); N;M;t (d) X = t=N +2 m (d0 d) N;t+m 1 (1 d): (92) m=max(N +M +1 t;0) The coe¢ cients are the same as for AT (d), but the sum N;M;t (d) only contains at most M terms. We give in Table 3 the bounds for the second moment of CT;M (d), which are derived using the same methods as for AT (d). We now apply the above evaluations to study the objective function in the three intervals d d0 1=2 + ; 1=2 d d0 1=2 + ; and 1=2 d d0 : C.1.4 The stationarity interval: fd d0 We want to show that sup fd d0 1=2+ g\D jT 1=2 + g \ D 2 1 AT (d) BT (d) j = oP (1); and consider two cases because of the di¤erent behavior of BT (d). Case 1: If d 1=2 we let K = fd 1=2 ; d d0 1=2 + g \ D and use (89) to eliminate BT (d) and focus on AT (d). From Table 2 we …nd using ( ; ) that 1 2 2 2 supK E(T AT (d) ) = O(T ). For the derivative we get an extra factor log T in the coe¢ cients and …nd supK E(T 1 (DAT (d))2 ) = O((log T )2 T 2 2 ). It then follows from (86) and (87) that MT (d) = T 1=2+ (log T ) 1 AT (d) is tight. Because convergence in probability and tightness implies uniform convergence in probability it follows that sup T 1 AT (d)2 = OP (T 2 +2 (log T )2 ) = oP (1) for < : d2K Initial values in CSS estimation of fractional models 33 Case 2: If d 1=2 we de…ne K = fd 1=2 ; d d0 1=2 + g \ D. From (90) 1 2 2d 2 2 we …nd that supK E(T AT (d) =BT (d)) c supK E(T AT (d) ). From Table 2 we then …nd for ( ; ) that supK E(T 2d 2 AT (d)2 ) = O(T 2 ). For the derivative we get an extra factor log T . Thus, supK E(DT d 1=2 AT (d))2 = O((log T )2 T 2 ) and (log T ) 1 T T d 1 AT (d) is tight, so that 2 1 AT (d) sup jT BT (d) d2K C.1.5 c sup T 2d 2 AT (d)2 = OP ((log T )2 T j 2 ) = oP (1): d2K The critical interval: f 1=2 d 1=2 + g \ D d0 For this interval we show that (85) holds by setting su¢ ciently small. As in JN (2012a) and Nielsen (2015) we apply a truncation argument. With M = T , for some > 0 to be chosen below, let d d0 "t N = M X1 n (d0 d)"t n + n=0 tX n 1 n (d0 d)"t n = w1t + w2t ; t M + N + 1; n=M such that the objective function (84) is RT (d) = T 1 N +T X ( d d0 "t N N;t 1 (1 t=N +1 where vt = w2t RT (d) N;t 1 (1 T 1 AT (d) 2 ) d) BT (d) T 1 N +T X (w1t + vt )2 ; (93) AT (d) ; BT (d) (94) t=N +M +1 AT (d) . We further …nd that d) B T (d) N +T X 2 w1t 1 + 2T t=N +M +1 N +T X w1t w2t 2T 1 CT;M (d) t=N +M +1 where CT;M (d) is given by (92). The …rst two terms in (94) are analyzed in Nielsen (2015), where it is shown that by setting su¢ ciently small, the …rst term can be made arbitrarily large while the second is oP (1), uniformly on jd d0 + 1=2j 1 for some …xed 1 > . Thus it remains to be shown that the third term of (94) is asymptotically negligible, uniformly on the critical interval, that is, sup jd d0 +1=2j 1 jT 1 CT;M (d) AT (d) j = oP (1): BT (d) We consider two cases depending on d: Case 1: Let K = f1=2 d; 1=2 d d0 1=2 + 1 g \ D. From (89) we have 1 1 BT (d) 1 and from Table 2 we …nd for ( ; 1 ) that supK EAT (d)2 = O(T 1+2 1 +2 ) and supK E(DAT (d))2 = O((log T )2 T 1+2 1 +2 ) such that supK jAT (d)j = OP ((log T )T 1=2+ 1 + ): From Table 3 for ( ; 1 ) we then …nd supK ECT;M (d)2 = O(M 1+2 1 T 2 ) = O(T (1+2 1 )+2 ) and also supK E(DCT;M (d))2 = O((log T )2 M 1+2 1 T 2 ) = O((log T )2 T (1+2 1 )+2 ); such that supK jCT;M (d)j = OP ((log T )T (1=2+ 1 )+ ). This shows that sup jT d2K 1 CT;M (d) AT (d) j = OP ((log T )2 T BT (d) (1=2+ 1) (1=2 1 2 ) ) = oP (1) Initial values in CSS estimation of fractional models 34 for < (1=2 2 )=(1=2 + 1 ). 1 Case 2: Let K = fd 1=2 ; 1=2 d d0 1=2 + 1 g \ D. From (90) 1 we …nd supK jT 1 2d BT (d) 1 j c, and we …nd from Table 2 that supK E(T d 1 AT (d))2 = O(T 2 1 ) and therefore supK E(DT d 1 AT (d))2 = O((log T )2 T 2 1 ). From Table 3 we get supK E(T d 1 CT;M (d))2 = O(M 1+2 1 T 1 ) = O(T (1+2 1 ) 1 ) and supK E(DT d 1 CT;M (d))2 = O((log T )2 M 1+2 1 T 1 ) = O((log T )2 T (1+2 1 ) 1 ). Hence sup jT 1 CT (d) d2K for < (1=2 C.1.6 1 )=(1=2 + AT (d) j = OP ((log T )2 T BT (d) (1=2+ 1) (1=2 1) ) = oP (1) 1 ). The nonstationarity interval: fd d0 1=2 g\D We give di¤erent arguments for di¤erent intervals of d, and we distinguish three cases. Case 1: Let K = f1=2 + d; d d0 1=2 g \ D. For this interval the main term of RT (d) in (84) has been shown by Nielsen (2015) to satisfy (85), and it is su¢ cient to show, with the normalization relevant to the nonstationarity interval, that sup T 2(d 2 d0 ) AT (d) BT (d) d2K (95) = oP (1): We use (89) to evaluate BT (d) 1 1 and …nd from Table 2 for ( ; ) that supK E(T 2(d O(T 2 ) so that E(DT d d0 AT (d))2 = O((log T )2 T 2 ), which shows that sup T 2(d 2 d0 ) AT (d) BT (d) d2K = OP ((log T )2 T 2 d0 ) AT (d)2 ) = ) = oP (1): Case 2: Let K = f1=2 d 1=2 + ; d d0 1=2 g \ D. Again the main term of RT (d) in (84) has been shown by Nielsen (2015) to satisfy (85), and we therefore want to show that AT (d)2 supd2K T 2(d d0 ) T 2d 1 AT (d)2 sup jT 2(d d0 ) j = OP (1); (96) BT (d) inf d2K jT 2d 1 BT (d)j d2K but can be made arbitrarily small by choosing su¢ ciently small. It follows from (91) that the denominator T 2d 1 BT (d) of (96) can be made arbitrarily large by choosing su¢ ciently small, because (1 ((N + 1)=T )2 )=2 ! log(T =(N + 1)) for ! 0. We next prove that the numerator of (96) is uniformly OP (1), which proves the result for Case 2. From Table 2 for ( ; ) we …nd supK E(T 2(d d0 ) T 2d 1 AT (d)2 ) = O(1). The derivative of T d d0 T d N;t (d) is bounded by N +T [T v] T 1 X m=N +1 ( m d0 ) T d 1 ( [T v] + m ) T d log( m m + [T v] ( )); T T R1 v which converges to 0 ud0 d 1 (v + u) d log(u(u + v))du < 1 for d 2 K. Thus no extra log T factor is needed in this case, and we …nd that T d d0 T d 1=2 AT (d) is tight, which proves that supK jT d d0 T d 1=2 AT (d)j = OP (1). Case 3: Finally, we assume d d 1=2 and d d0 d d0 1=2 . We 1 2d 1 note that on this set the term T BT (d) is uniformly bounded and uniformly bounded Initial values in CSS estimation of fractional models 35 away from zero, see (90), so we factor it out of the objective function. We thus analyze the objective function RT (d) = T 2d 2 N +T X ( d d0 "t )2 N;s 1 (1 N d)2 ( d d0 "s ) N;s 1 (1 N d d0 "t ) N;t 1 (1 N d)( d) : t;s=N +1 The most straightforward approach would be to obtain the weak limit of T 2(d d0 +1=2) RT from the weak convergence of T d d0 +1=2 dN d0 "t on d d0 1=2 and the uniform conver! d gence of T d N;[T u] 1 (1 d) ! (1 Nd+N u . However, the former would require the existence ) q of Ej"t j for q > 1=(d d0 1=2) 1= with arbitrarily small, see JN (2012b), which we have not assumed in Assumption 1. We therefore introduce dN d0 1 "t ; the cumulad d0 tion of N "t ; to increase the fractional order su¢ ciently far away from the critical value d d0 = 1=2, so the number of moments needed is q > 1=(1 + ). To this end we …rst prove the following. P P Lemma C.1 Let at ; bt ; t = 1; : : : ; T , be real numbers and At = ts=1 as ; Bt = ts=1 bs . Then 2 T (T 1) T X (a2t b2s t;s=1 at bs as bt ) ( 2 T (T 1) T 1 X (bt AT bt A t bt+1 At ))2 : t=1 Proof. We …rst …nd T X T X (a2t b2s at b s as b t ) = t=1 s=1 X (a2t b2s + a2s b2t 2at bs as bt ) = X 1 s<t T 1 s<t T The proof is then completed by using the Cauchy-Schwarz inequality, X X 2 2 ( (at bs as bt ))2 (at bs T (T 1) 1 s<t T T (T 1) 1 s<t T as bt )2 ; PT P P together with 1 s<t T (at bs as bt ) = Ts=11 bs (AT As ) t=2 bt At 1 . 2 2 2d Applying Lemma C.1 to T R (d) we …nd that for at = T (T 1) T d) it holds that RT (d) 2T 2(d0 d) 1 QT (d)2 where N;t 1 (1 QT (d) = T 2d d0 1=2 T 1 NX +T 1 ( N;t 1 (1 as b t ) 2 : (at bs d)( d d0 1 "N +T 1 ) N ( N;t 1 (1 d d0 "t N d)+ and bt = N;t (1 d))( d d0 1 "t )): N t=N +1 (97) Following the arguments in JN (2012a) and Nielsen (2015), we show that QT (d) converges weakly (in the space of continuous functions of d) to a random variable that is positive almost surely. Let K = fd d0 1 3=2 ;d 1=2 g \ D. Assumption 1 ensures that we have enough moments, q > max(2; 1=(1 + )), to apply the fractional functional central limit theorem, e.g. Marinucci and Robinson (2000, Theorem 1), and …nd for each d 2 K that Td N;[T u] 1 (1 d)T d d0 1=2 d d0 1 "[T u] N ) N! (N + 1 d) u d Wd0 d (u) as T ! 1 on D[0; 1]; Initial values in CSS estimation of fractional models 36 Ru where “)”denotes weak convergence and Wd0 d (u) = ( (d0 d + 1)) 1 0 (u s)d0 d dW (s) denotes fractional Brownian motion (of type II) and W denotes Brownian motion generated by "t . Because the integral is a continuous mapping of D[0; 1] to R it holds that Z 1 N! u d (Wd0 d (1) 2Wd0 d (u))du as T ! 1 (98) QT (d) ) Q(d) = (N + 1 d) 0 for any …xed d 2 K. We can establish tightness of the continuous process QT (d) by evaluating the second moment, using the methods above. For all terms we see that it has the same form as AT (d) except that (d d0 ; d) is replaced by (d d0 1; d) and hence the result follows as the results for AT (d). This establishes tightness of QT (d) and hence strengthens the convergence in (98) to weak convergence in the space of continuous functions of d on K endowed with the uniform topology. It thus holds that inf RT (d) d2K 2 inf T 2(d0 d) 1 d2K QT (d)2 + oP (1) where inf d2K QT (d)2 > 0 almost surely and 2T 2 inf QT (d)2 + oP (1); d2K > 0. It follows that, for any K > 0, P (inf RT (d) > K) ! 1 as T ! 1; d2K which shows (85) and hence proves the result for Case 3. C.1.7 Asymptotic normality of the estimator To show asymptotic normality of d^ we apply the usual expansion of the score function, ^ = DL (d0 ) + (d^ 0 = DL (d) d0 )D2 L (d ); P where d is an intermediate value satisfying jd d0 j jd^ d0 j ! 0. The product moments in D2 L (d) are shown in JN (2010, Lemma C.4) and JN (2012a, Lemma A.8(i)) to be tight, or equicontinuous, in a neighborhood of d0 , so that we can apply JN (2010, Lemma A.3) to conclude that D2 L (d ) = D2 L (d0 ) + oP (1), and we therefore analyze DL (d0 ) + and D2 L (d0 ). From Lemma B.4 we …nd that 0 2 T 1=2 DL (d0 ) = M01T + OP (T 1=2 ) and 2 1 2 D L (d0 ) = 2 + OP (T 1=2 ) = 2 =6 + OP (T 1=2 ), and the result follows from Lemmas 0 T B.2 and B.3. C.2 Proof of Theorem 2 First we note that, as in the proof of Theorem 1 in Appendix C.1.7, we can apply JN (2010, Lemma A.3) to conclude that D3 L (d ) = D3 L (d0 ) + oP (1). We thus insert the expressions (75), (76), and (77) into the expansion (17) and …nd T 1=2 (d^ d0 ) = A0 + T B0 + T 1=2 A1 1=2 B 1 1 T 2 1=2 d0 ) = A0 B0 T 1=2 A0 + T B0 + T 1=2 A1 2 ) 1=2 B B0 1 C0 + oP (T + T 1=2 B1 z + z 2 + : : :, reduces to which, using the expansion 1=(1 + z) = 1 T 1=2 (d^ ( ( A1 B0 A0 B1 1 A20 C0 + ) + oP (T B02 2 B03 1=2 ): 1=2 ); Initial values in CSS estimation of fractional models + We …nd that E(A0 ) = E(M01T ) = 0, so the bias of T (d^ E(A1 ) E(A0 B1 ) + B0 B02 N;T (d0 ) + N;T (d0 ) = ( ( 1 E(A20 )C0 ) + o(1) 2 B03 + + E(M01T (M11T + 37 d0 ) is, from (78)–(80), (99) + +2 M02T )) + 3E(M01T ) 3 2 1 2 2 2 ) + o(1): From Lemma B.2, + + + +2 E(M01T (M11T + M02T )) + 3E(M01T ) 3 2 1 = 4 3 2 3 +3 3 = 3 3; see (66)–(68), so that we get the …nal result ( N;T (d0 ) + N;T (d0 ) + 3 3 2 1 ) 2 1 + o(1). For the estimator d^c we get the expansion (99), but use (81) instead of (80). C.3 Proof of Corollary 1 C We suppress the argument d and want to evaluate N;T and N;T , see (21) and (22). From (57) and (58) we …nd that h 0 ; 1 iT , h 0 ; 1 iT , h 1 ; 0 iT , and h 1 ; 0 iT are all bounded by C (1 + N ) min(d;2d 1)+ , which shows the result for N;T . To …nd the result for N;T , it only remains P 0 1 to be shown that h 0 ; 0 iT =h 0 ; 0 iT is bounded. We …nd from (62) that j 0t (d)j c N j=0 j t+j ( d)j. We apply (37) and note that, for a given d and t > N > d, the Q (d + 1)=i) are all of the coe¢ cients t+j ( d) = N ( d) N;t+j ( d) = N ( d) t+j i=N +1 (1 same sign for j 0. If this is positive, we have, see (47), N 0 1 X j=0 j t+j ( d)j 1 X t+j ( d) = t 1( d + 1) > 0 j=0 because t 1 N , and a similar relation holds if the coe¢ cients are negative. Thus, j 0t (d)j cj 0t (d)j and therefore jh 0 ; C.4 0 iT j = 2 0 j N +T X 0t (d) 0t (d)j c t=N +1 2 0 N +T X 2 0t (d) = ch 0 ; t=N +1 0 iT : Proof of Theorem 3 (i): We note that, because t N + 1, we have 0t (d) = Similarly, because t + n N + 1 for n 0, we have 0t (d) = N 0 1 X t+n ( d)( 0 X n) t 1( d + 1) = 0 for d = 1; : : : ; N . = 0 for d = 0; 1; : : : ; N + 1; n=0 and hence h 0 ; 1 iT = h 0 ; 1 iT = h 1 ; 0 iT = h 1 ; 0 iT = 0 for d = 1; : : : ; N . This implies C that N;T and N;T are zero. (ii): Next assume d = 1. The case with N 1 is covered by part (i), so we only need to show the result for N = 0. For N = 0 we have 0t (1) = t 1 (0) = 1(t=1) and 1t (1) = D t 1 (0) = P0 (t 1) 1 1ft 1 1g , see (44). From (18) we …nd 0t (1) = 0) = n= N0 +1 1(t n=1) (Xn 1(t=1) (X0 2 because otherwise the summation 0 ), whereas 1t (1) is non-zero only for t over k in (19) is empty. Thus, 0t (1) and 0t (1) are non-zero only if t = 1, but 1t (1) and C 2, and therefore 0;T (1) = 0;T (1) = 0. 1t (1) are non-zero only if t Q (iii): From (37) it follows that N;t ( d + 1)jd=N +1 = ti=N +1 (i N 1)=i = 0 for t N + 1 and therefore (71) shows that N;T (d) = 0 for d = N + 1. Initial values in CSS estimation of fractional models C.5 38 Proof of Theorem 4 P 1 (27): For N0 = 0 we …nd from (18) that 0t (d0 ) = n=0 t+n ( d0 )( 0 is enough to show that N;T (d0 ) = 0, see (21). C (28) and (29): We also …nd for 0t (d0 ) = 0 that N;T (d0 ) simpli…es to C N;T (d0 ) = 2 0 N +T X ( (C t=N +1 0 ) 0t )( 1t (C 0 ) 1t ) = (C X 0 )(h 0 ; 1 iT n) (C = 0, and that 0 )h 0 ; 1 iT ): (30): The result follows from (27) and (70). (31): If further N = 0, then both summations over n in (19) are empty, and hence zero, C 2 such that 1t (d0 ) = 0. It then follows from (28) that N;T (d0 ) = (C 0 ) h 0 ; 1 iT ; which can be replaced by its limit, see (59). References 1. Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions, National Bureau of Standards, Washington D.C. 2. Andersen, T. G., Bollerslev, T., Diebold, F. X., and Ebens, H. (2001). The distribution of daily realized stock return volatility. Journal of Financial Economics 61, pp. 43–76. 3. Askey, R. (1975). Orthogonal Polynomials and Special Functions, SIAM, Philadelphia. 4. Bingham, N. H., Goldie, C. M., and Teugels, J. L. (1987). Regular Variation, Cambridge University Press, Cambridge. 5. Bollerslev, T., Osterrieder, D., Sizova, N., and Tauchen, G. (2013) Risk and return: long-run relationships, fractional cointegration, and return predictability. Journal of Financial Economics 108, pp. 409–424. 6. Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis, Forecasting, and Control, Holden-Day, San Francisco. 7. Byers, J. D., Davidson, J., and Peel, D. (1997). Modelling political popularity: an analysis of long-range dependence in opinion poll series. Journal of the Royal Statistical Society, Series A 160, pp. 471–490. 8. Carlini, F., Manzoni, M., and Mosconi, R. (2010). The impact of supply and demand imbalance on stock prices: an analysis based on fractional cointegration using Borsa Italiana ultra high frequency data. Working paper, Politecnico di Milano. 9. Davidson, J. and Hashimzade, N. (2009). Type I and type II fractional Brownian motions: a reconsideration. Computational Statistics & Data Analysis 53, pp. 2089– 2106. 10. Dolado, J. J., Gonzalo, J., and Mayoral, L. (2002). A fractional Dickey-Fuller test for unit roots. Econometrica 70, pp. 1963–2006. 11. Doornik, J. A. and Ooms, M. (2003). Computational aspects of maximum likelihood estimation of autoregressive fractionally integrated moving average models. Computational Statistics & Data Analysis 42, pp. 333–348. 12. Giraitis, L. and Taqqu, M. (1998). Central limit theorems for quadratic forms with time domain conditions. Annals of Probability 26, pp. 377–398. 13. Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Application, Academic Press, New York. Initial values in CSS estimation of fractional models 39 14. Hualde, J. and Robinson, P. M. (2011). Gaussian pseudo-maximum likelihood estimation of fractional time series models. Annals of Statistics 39, pp. 3152–3181. 15. Jensen, A. N. and Nielsen, M. Ø. (2014). A fast fractional di¤erence algorithm. Journal of Time Series Analysis 35, pp. 428–436. 16. Johansen, S. (2008). A representation theory for a class of vector autoregressive models for fractional processes. Econometric Theory 24, pp. 651–676. 17. Johansen, S. and Nielsen, M. Ø. (2010). Likelihood inference for a nonstationary fractional autoregressive model. Journal of Econometrics 158, pp. 51–66. 18. Johansen, S. and Nielsen, M. Ø. (2012a). Likelihood inference for a fractionally cointegrated vector autoregressive model. Econometrica 80, pp. 2667–2732. 19. Johansen, S. and Nielsen, M. Ø. (2012b). A necessary moment condition for the fractional functional central limit theorem. Econometric Theory 28, pp. 671–679. 20. Lawley, D. N. (1956). A general method for approximating to the distribution of likelihood ratio criteria. Biometrika 43, pp. 295–303. 21. Li, W. K. and McLeod, A. I. (1986). Fractional time series modelling. Biometrika 73, pp. 217–221. 22. Lieberman, O. and Phillips, P. C. B. (2004). Expansions for the distribution of the maximum likelihood estimator of the fractional di¤erence parameter. Econometric Theory 20, pp. 464–484. 23. Marinucci, D. and Robinson, P. M. (2000). Weak convergence of multivariate fractional processes. Stochastic Processes and their Applications 86, pp. 103–120. 24. Nielsen, M. Ø. (2014). Asymptotics for the conditional-sum-of-squares estimator in multivariate fractional time series models. Journal of Time Series Analysis 36, pp. 154–188. 25. Osterrieder, D. and Schotman, P. C. (2011). Predicting returns with a co-fractional VAR model. CREATES research paper, Aarhus University. 26. Robinson, P. M. (1994). E¢ cient tests of nonstationary hypotheses, Journal of the American Statistical Association 89, pp. 1420–1437. 27. Rossi, E. and Santucci de Magistris, P. (2013). A no-arbitrage fractional cointegration model for futures and spot daily ranges. Journal of Futures Markets 33, pp. 77–102. 28. Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models, Journal of Econometrics 53, pp. 165–188. 29. Tschernig, R., Weber, E., and Weigand, R. (2013). Long-run identi…cation in a fractionally integrated system. Journal of Business and Economic Statistics 31, pp. 438450.
© Copyright 2026 Paperzz