HAC Estimation and Strong Linearity Testing in Weak ARMA Models Christian Francq Université Lille III, GREMARS E-mail: [email protected] Jean-Michel Zakoïan Université Lille III, GREMARS and CREST E-mail: [email protected] 1 Running head: HAC Estimation in Weak ARMA Models. Corresponding Author: Jean-Michel Zakoïan CREST, MK1, 3 Avenue Pierre Larousse, 92245 Malakoff Cedex France Email :[email protected] Phone number: 33(0)141176840 Fax: 33(0)141176840. 2 Abstract The paper develops a procedure for testing the null hypothesis that the errors of an ARMA model are independent and identically distributed against the alternative that they are uncorrelated but not independent. The test statistic is based on the difference between a conventional estimator of the asymptotic covariance matrix of the least-squares estimator of the ARMA coefficients and its robust HAC-type version. The asymptotic distribution of the HAC estimator is established under the null hypothesis of independence, and under a large class of alternatives. The asymptotic distribution of the proposed statistic is shown to be a standard chisquare under the null, and a noncentral chi-square under the alternatives. The choice of the HAC estimator is discussed through a local power analysis. An automatic bandwidth selection method is also considered. The finite sample properties of the test are analyzed via Monte Carlo simulation. 3 1 Introduction The standard statistical inference of time series models relies on methods for fitting ARMA models by model selection and estimation followed by model criticism through significance tests and diagnostic checks on the adequacy of the fitted model. The large sample distributions of the statistics involved in the so-called Box-Jenkins methodology have been established under the assumption of independent white noise. Departure from this assumption can severely alter the large sample distributions of standard statistics, such as the empirical autocovariances (Romano and Thombs (1996) and Berlinet and Francq (1999)), the estimators of the ARMA parameters (Francq and Zakoïan (1998)) or the portmanteau statistics (Lobato (2002), Francq, Roy and Zakoïan (2004)). Unfortunately, ARMA models endowed with an independent white noise sequence, referred to as strong ARMA models, are often found to be unrealistic in economic applications. On the other hand, weak ARMA models (i.e. in which the noise is not required to be independent nor to be a martingale difference) have found increasing interest in the recent statistical and econometric literatures. As we will see, relaxing the strong assumptions on the noise allows for a great generality. Indeed, a large variety of well-known nonlinear processes admit (weak) ARMA representations. Other examples of weak ARMA processes are obtained from usual transformations (such as aggregation) of strong ARMA processes. It is therefore crucial, at the model criticism stage, to be able to detect departure from the key assumption that the white noise disturbances are independent. Apart from the technical reasons already mentioned, one important motivation for testing this assumption is to find out whether or not the estimated model captures the whole dynamics of the series. Only optimal linear predictions can be deduced from non-strong ARMA models. Thus, when the strong ARMA assumption is 4 rejected, improvements can be expected from using nonlinear models. In this paper we propose a test for white noise independence in ARMA models. An informal presentation of the test is as follows, precise notations and assumptions being introduced in the next section. Let θ denote the vector of the ARMA coefficients, and let θ̂n denote the standard Least-Squares Estimator (LSE) of θ for √ a sample of size n. Then, under appropriate conditions, the n-difference between θ̂n and the true parameter value converges in distribution to a N (0, Σ(1) ) when the ARMA is strong. When the noise is only uncorrelated, the asymptotic distribution turns out to be of the form N (0, Σ(2) ). The aim of the paper is to test H0 : Σ(1) = Σ(2) , against H1 : Σ(1) 6= Σ(2) . The Asymptotic Covariance Matrices (ACM) Σ(1) and Σ(2) coincide in case of a strong ARMA process, but they will be different, in general, in the weak situation. It should be noted that H0 is not equivalent to the noise independence. Hypothesis H0 is the consequence of independence that matters as far as the asymptotic precision of the LS estimator is concerned. The test proposed in this paper is built on the difference between consistent estimators of the two ACM. Scaled in an appropriate way, this difference will have two different behaviors: it will converge to a non degenerate distribution if the ARMA process is strong and to infinity otherwise. The matrix Σ(2) is function of a matrix of the form limn→∞ 1 n Pn s,t=1 Cov(Vt , Vs ), which is also 2π times the spectral density matrix of the multivariate process (Vt ) evaluated at frequency zero. Many papers in the statistical and econometric literatures deal with estimating such ’long-run variance’ matrices. Examples include the estimation of the optimal weighting matrix for General Method of Moments estimators (Hansen, 1982), the estimation of the covariance matrix of the error term in unit root tests (Phillips, 1987), the estimation of the asymptotic variance 5 of sample autocovariances of nonlinear processes (Berlinet and Francq, 1999); other important econometric contributions include Newey and West (1987), Gallant and White (1988), Andrews (1991), Hansen (1992), de Jong and Davidson (2000). However, the asymptotic distribution of the ACM estimators is seldom considered in the literature dealing with HAC estimation. See Phillips, Sun and Jin (2003, Theorem 2) for the asymptotic distribution for a HAC estimator in the framework of robust regression. The main difficulty of the present paper is to derive the asymptotic distributions of the HAC estimator of Σ(2) , under both assumptions of weak and strong ARMA. Those distributions are needed to construct the asymptotic critical region of our test and to derive its local asymptotic power. The order of the paper is as follows. Section 2 introduces the notion of weak ARMA representations. Examples treated explicitly are the first component of a strong bivariate MA(1) model and a chaotic process. Section 3 presents notations and briefly overviews results concerning the asymptotic behaviour of the LSE in the weak ARMA framework. Section 4 establishes the asymptotic distribution of the above-mentioned HAC estimator. Section 5 introduces the test statistic and derives its asymptotic properties under the null of strong ARMA, and under the alternative of weak ARMA model. In Section 6, the choice of the HAC estimator is discussed through a local power analysis and an automatic bandwidth selection method. The finite sample performance of the tests is studied. Proofs and additional notations are in Section 7. Section 8 concludes. 2 Weak ARMA models It is not very surprising that number of non linear processes admit an ARMA representation. Indeed, the Wold Theorem states that any purely non deterministic, second-order stationary process (Xt ) can be represented by an infinite MA 6 representation. When the infinite MA polynomial is obtained as the ratio of two finite-order polynomials, the nonlinear process also admits a finite-order ARMA representation. This representation is weak because the noise is only the linear innovation of (Xt ) (otherwise (Xt ) would be a linear process). This distinction has important consequences in terms of prediction. The predictors obtained from a weak ARMA model are only optimal in the linear sense. They are particularly useful if the nonlinear dynamics of (Xt ) is difficult to identify, which is often the case given the variety of non linear models. It is also crucial to take into account the differences between weak and strong ARMA models in the different steps of the methodology of Box and Jenkins. When the noise is only uncorrelated, the tools provided by the standard Box-Jenkins methodology can be quite misleading. Recent papers cited in the introduction have been devoted to the correction of such tools and have attempted to develop tests and methods allowing to work with a broad class of ARMA models. It is, however, of importance to know, in practical situations, when these new methods are required and when the classical ones are reliable. Needless to say that the latter procedures are simpler to use, in particular because they are widely implemented in all standard statistical softwares. It is precisely the purpose of the present paper to develop a test of the reliability of such procedures. Examples of ARMA representations of bilinear processes, Markov switching processes, threshold processes, processes obtained by temporal aggregation of a strong ARMA, or by linear combination of independent ARMA processes can be found in Romano and Thombs (1996), Francq and Zakoïan (1998, 2000). We will give two new illustrations which have their own interest. To construct a simple example of weak ARMA model obtained by transformation of a strong ARMA 7 process, let us consider the following bivariate MA(1) model X1 t η1 t b11 b12 η1 t−1 = + X2 t η2 t b21 b22 η2 t−1 where {η1 t , η2 t }′t is a centered iid sequence with covariance matrix (ξij ). It is easy to see that the first component of the strong bivariate MA(1) model satisfies a univariate MA(1) model of the form X1 t = ǫt + θǫt−1 , where θ ∈ (−1, 1) is such that EX1 t X1 t−1 ξ11 b11 + ξ12 b12 θ = = . 1 + θ2 EX12 t ξ11 (1 + b211 ) + ξ22 b212 + 2ξ12 b11 b12 From ǫt + θǫt−1 = η1 t + b11 η1 t−1 + b12 η2 t−1 , we find ǫt = (b11 − θ)(η1 t−1 − θη1 t−2 ) + b12 (η2 t−1 − θη2 t−2 ) + Rt , where Rt is centered and independent of X1 t−1 . Therefore EX12 t−1 ǫt = (b11 − θ) (1 − θb211 )Eη13 t − θb212 Eη1 t η22 t − 2θb11 b12 Eη12 t η2 t +b12 (1 − θb211 )Eη12 t η2 t − θb212 Eη23 t − 2θb11 b12 Eη1 t η22 t . The random variable X12 t−1 belongs to σ{ǫs , s < t}. It can be seen that, in general, EX12 t−1 ǫt = E[X12 t−1 E(ǫt |ǫt−1 , . . .)] 6= 0. Thus, (ǫt ) is not a martingale difference. So the MA(1) for X1 t is only weak. Now we will give an example based on a chaotic process (see May (1976)). Let ǫt = ut + ηt , ut = 4ut−1 (1 − ut−1 ), t≥1 (1) where u0 has the arc-sinus density f (x) = π −1 {x(1 − x)}−1/2 on the interval [0, 1], (ηt )t≥1 is an iid sequence, independent from u0 , with mean -1/2 and finite variance. Since f is the invariant density of (ut ), this process is stationary. We have Eǫt = 0 and, since ut and 1 − ut have the same law, Cov(ǫt , ǫt−1 ) = Cov(ut , ut−1 ) = Cov{4ut−1 (1 − ut−1 ), ut−1 } = Cov(ut , 1 − ut−1 ) = 0. 8 The same symmetry argument shows that Cov(ǫt , ǫt−h ) = 0 for all h 6= 0. Therefore (ǫt ) is a white noise. Consequently, given {ǫu , u ≤ t}, the best linear predictor of ǫt+h is equal to zero, for any horizon h. However, in general, the best (nonlinear) predictor is quite different. For illustrative purpose, Figure 1 displays the scatter plot of the pairs (ǫt , ǫt−2 ), for t = 1, . . . , 1, 000, obtained by simulation, and the nonlinear regression obtained by tedious computation, in the case where ηt is uniformly distributed over [−0.6, −0.4]. 0.6 0.4 0.2 -0.6 -0.4 -0.2 0.2 0.4 0.6 -0.2 -0.4 -0.6 Figure 1: Scatter plot of 1,000 pairs (ǫt , ǫt−2 ), simulated from (1) with ηt uniformly distributed over [−0.6, −0.4]. The full line is the theoretical (nonlinear) regression of ǫt on ǫt−2 . This example illustrates the, possibly dramatic, differences between linear and nonlinear predictions of a given weak ARMA process. 9 3 Notations and preliminary asymptotic results In this section we introduce the main notations and recall some results established in Francq and Zakoïan (1998, 2000). Let X = (Xt )t∈Z be a real second-order stationary ARMA(p, q) process such that, for all t ∈ ZZ, Xt + p X φi Xt−i = ǫt + i=1 q X (2) ψi ǫt−i . i=1 We consider the estimation of parameter θ = (θ1 , . . . , θp+q )′ ∈ Rp+q with true value θ0 = (φ1 , . . . , φp , ψ1 , . . . , ψq )′ . Let Φθ (z) = 1 + θ1 z + · · · + θp z p and Ψθ (z) = 1 + θp+1 z + · · · + θp+q z q be the AR and MA polynomials.1 For any δ > 0, let the compact set Θδ = {θ ∈ Rp+q ; the zeros of polynomials Φθ (z) and Ψθ (z) have moduli ≥ 1+δ}. We make the following assumptions. A1. ǫ = (ǫt ) is a strictly stationary sequence of uncorrelated random variables with zero mean and variance σ 2 > 0, defined on some probability space (Ω, A, P ). A2. θ0 belongs to the interior of Θδ , and the polynomials Φθ0 (z) and Ψθ0 (z) have no zero in common. A3. φp and ψq are not both equal to zero (by convention φ0 = ψ0 = 1). For all θ ∈ Θ, let ǫt (θ) = Ψ−1 θ (B)Φθ (B)Xt , where B denotes the backshift operator. Given a realization X1 , X2 , . . . , Xn of X, the ǫt (θ) can be approximated, for 0 < t ≤ n, by et (θ) = Ψ−1 θ (B)Φθ (B)(Xt 11≤t≤n ). The random variable θ̂n is called Least Squares Estimator (LSE) if it satisfies, almost surely, n 1 X 2 Qn (θ̂n ) = min Qn (θ) where Qn (θ) = et (θ). θ∈Θδ 2n t=1 10 (3) Define the strong ARMA assumption: A4. The process X is solution of Model (2) where the random variables ǫt are independent and identically distributed (iid). In the ARMA models statistical literature, most results are obtained under A4. Less restrictive hypotheses rely on the α−mixing (strong mixing) coefficients {αǫ (h)}h≥0 for (ǫt ), or {αX (h)}h≥0 for (Xt ). Let us consider the assumptions: A5. The process X is solution of Model (2) where some ν > 0. A5′ . The process X is solution of Model (2) where some ν > 0. P∞ h=0 {αǫ (h)} P∞ h=0 {αX (h)} ν 2+ν ν 2+ν < ∞, for < ∞, for It is well known that A5 and A5′ are not equivalent. Pham (1986), Carrasco and Chen (2002) have shown that, for a wide class of processes, the mixing conditions A5 and/or A5′ are satisfied. Let ∂2 Qn (θ) a.s., n→∞ ∂θ∂θ ′ √ ∂ I(θ) = lim Var( n Qn (θ)). n→∞ ∂θ J(θ) = lim The following theorem gives already-established results concerning the asymptotic d behaviour of the LSE. The symbol ; denotes convergence in distribution as the sample size n goes to infinity. Theorem 1 Assume that A1 − A3 hold. If Eǫ2t < ∞ and A4 holds, then θ̂n is strongly consistent and √ d n(θ̂n − θ0 ) ; N (0, Σ(1) ), where Σ(1) = σ 2 J −1 (θ0 ). (4) If Eǫ4+2ν < ∞ and either A4, A5 or A5′ holds, then θ̂n is strongly consistent t and √ d n(θ̂n − θ0 ) ; N (0, Σ(2) ), where 11 Σ(2) = J −1 (θ0 )I(θ0 )J −1 (θ0 ). (5) Obviously the moment assumptions on ǫ could be replaced by the same assumption on X. The convergence under A4 is standard (see e.g. Brockwell and Davis (1991)). Francq and Zakoïan (1998) established (5) under A5′ . It can be shown that the result remains valid under A5. Note that the finiteness of Eǫ4t is required for the existence of Σ(2) .2 Remarks (a) Straightforward computations show that I(θ) = +∞ X where ∆i (θ), ∆i (θ) = Eut (θ)u′t+i (θ), ut (θ) = ǫt (θ) i=−∞ ∂ǫt (θ). ∂θ (b) Under A4 the asymptotic variances Σ(1) and Σ(2) are clearly equal. However it may also be the case that Σ(1) = Σ(2) under A5 or A5′ . More precisely we have Σ(1) = Σ(2) ⇔ I(θ0 ) = σ 2 J (θ0 ) ′ +∞ X ∂ǫt ∂ǫt ′ 2 ⇔ Eut (θ0 )ut+i (θ0 ) = σ E (θ0 ) (θ0 ) ∂θ ∂θ i=−∞ where all quantities are taken at θ0 . In particular, the asymptotic variances are ∂ǫt ′ ∂ǫt 2 t equal when the process ǫt ∂ǫ . ∂θ is a noise and ǫt is uncorrelated with ∂θ ∂θ In the case of a martingale difference the former condition holds but, in general, the latter condition does not. More precisely we have, when (ǫt ) is a martingale difference Σ (1) −Σ (2) =E ∂ǫt ǫ2t ∂ǫt ∂θ ∂θ ′ −E ǫ2t E ∂ǫt ∂θ ∂ǫt ∂θ ′ For example if the model is an AR(1) with true value θ0 = 0, then . ∂ǫt ∂θ = Xt−1 = ǫt−1 . Thus we have Σ(1) − Σ(2) = Cov(ǫ2t , ǫ2t−1 ), which is not equal to zero in presence of ARCH-type conditional heteroskedasticity. When (ǫt ) is not a martingale t difference, the sequence ǫt ∂ǫ is not uncorrelated in general and the difference ∂θ 12 between the matrices Σ(1) and Σ(2) can be substantial, as illustrated in Figure 2. Interestingly, it is also seen on this example that Σ(1) − Σ(2) is not always positive definite. Therefore, for some linear combinations of the ARMA parameters, a better asymptotic accuracy may be obtained when the noise is weak than when it is strong. The same remark was made by Romano and Thombs (1996) on another example. 1.75 1.5 1.25 1 0.75 1 0.8 0.6 p 0.4 -0.5 0 b 0.2 0.5 Figure 2: Σ(2) (1, 1)/Σ(1) (1, 1) as function of b ∈ [−0.9, 0.9] and p ∈ [0.1, 1] for the model ∀t ∈ Z, X + aX t t−1 = ǫt + bǫt−1 ǫ = η + (c − 2c∆ )η , t t t t−1 where a = −0.5 and c = 1, (ηt ) is an i.i.d. N (0, 1) sequence, (∆t ) is a stationary Markov chain, independent of (ηt ), with state space {0, 1} and transition probabilities p = P (∆t = 1|∆t−1 = 0) = P (∆t = 0|∆t−1 = 1) ∈ (0, 1). It can be shown that (ǫt ) is a white noise. 13 ∂ Consistent estimation of the matrix J = J(θ0 ) = E ∂θ ǫt (θ0 ) ∂θ∂ ′ ǫt (θ0 ) involved in (4) and (5) is straightforward, for example by taking n Jˆ = Jˆn (θ̂n ), 1X ∂ ∂ et (θ) ′ et (θ). Jˆn (θ) = n ∂θ ∂θ (6) t=1 Estimation of the matrix I = I(θ0 ) is a much more intricate problem and is the object of the next section. 4 Asymptotic distribution of the HAC estimator of the covariance matrix I The formula displayed in Remark (a) of Theorem 1 motivates the introduction of a HAC estimator of I of the general form Iˆ = Iˆn (θ̂n ) = +∞ X ˆ i (θ̂n ), ω(ibn )∆ (7) i=−∞ involving the ARMA residuals et (θ̂n ) through the functions n−i X ∂ ∂ 1 et (θ) et (θ) ′ et+i (θ)et+i (θ), n ′ ˆ ˆ ∂θ ∂θ ∆i (θ) = ∆−i (θ) = t=1 0, 0 ≤ i < n, i ≥ n. In (7), ω(·) is a kernel function belonging to the set K defined below, and bn is a size-dependent bandwidth parameter. When i is large relative to n, ∆i (θ0 ) ˆ i (θ̂n ) which is based on too few observations. is in general poorly estimated by ∆ Consistency of Iˆ therefore requires that the weights ω(ibn ) be close to one for small P ˆ i and close to zero for large i. In particular, the naive estimator ∞ i=−∞ ∆i (θ̂n ) is inconsistent.3 The set K is defined by K = {ω(·) : IR → IR | ω(0) = 1, ω(·) is bounded, even, has a compact support [−a, a] and is continuous on [−a, a]} . 14 (8) Various kernels ω(·) belonging to K are available for use in (7). Standard examples are the rectangular kernel ω(x) = 1[−1,1] (x), the Bartlett kernel ω(x) = (1 − |x|)1[−1,1] (x), the Parzen kernel ω(x) = (1 − 6x2 + 6|x|3 )1[0,1/2] (|x|) + 2(1 − |x|)3 1(1/2,1] (|x|) or the Tukey-Hanning kernel ω(x) = {1/2 + cos(πx)/2} 1[−1,1] (x). The properties of kernel functions have been extensively studied in the time series literature (see e.g. Priestley (1981)). For a given kernel ω(·) in K, let ̟2 = R 2 ω (x)dx. Our asymptotic normality result on Iˆ requires4 lim bn = 0, n→∞ lim nb4n = +∞. n→∞ (9) We denote by A ⊗ B the Kronecker product of two matrices A and B, vecA denotes the vector obtained by stacking the columns of A, and vechA denotes the vector obtained by stacking the diagonal and subdiagonal elements of A (see e.g. Harville (1997) for more details about these matrix operators). Let Dm denote + = (D ′ D )−1 D ′ . Thus the m2 × m(m + 1)/2 duplication matrix,5 and let Dm m m m + vec (A) = vech (A) when A = A′ . Dm 4.1 Strong ARMA Our first result is stated in the following theorem. Theorem 2 Let A1 − A3 hold and let (9) be satisfied. If Eǫ4+ν < ∞ and A4 holds, then Iˆ defined by (7), with ω(·) ∈ K, is consistent t and p d + +′ nbn vech Iˆ − I ; N 0, 2̟2 Dp+q (I ⊗ I)Dp+q Next, we give a corresponding result for the estimator of the difference between the ACM under the two set of assumptions, Σ(1) − Σ(2) , upon which the test of the next section will be based. 15 Let Σ̂(1) = σ̂ 2 Jˆ−1 , Σ̂(2) = Jˆ−1 IˆJˆ−1 , σ̂ 2 = Qn (θ̂n ). Theorem 3 Let the assumptions of Theorem 2 hold. Then p d + +′ nbn vech Σ̂(2) − Σ̂(1) ; N 0, 2̟2 Dp+q (Σ(1) ⊗ Σ(1) )Dp+q := N (0, Λ) The asymptotic distribution of √ nbn vech Σ̂(2) − Σ̂(1) is non degenerate. The non-singularity of Σ(1) results from Theorem 1 and the determinant of Λ is, up to a constant, equal to the determinant of Σ(1) at the power (p + q)2 + 1 (see Magnus and Neudecker, Theorem 3.14, 1988). Moreover, we have ′ ′ Λ−1 = (2σ 4 ̟2 )−1 Dp+q (J ⊗ J)Dp+q = (2̟2 )−1 Dp+q (Σ(1) ⊗ Σ(1) )−1 Dp+q . (10) Explicit expressions for the asymptotic covariance matrices appearing in Theorems 1 and 3 can be obtained for the MA(1) and AR(1) models. Corollary 1 Let Xt = ǫt + ψ1 ǫt−1 , where ǫt iid (0, σ 2 ), σ 2 > 0 and |ψ1 | < 1. Then Theorem 1 holds with Σ(1) = Σ(2) = 1 − ψ12 and p d nbn Σ̂(2) − Σ̂(1) ; N (0, 2̟2 (1 − ψ12 )2 ). The same results hold for the stationary solution of the AR(1) model Xt +φ1 Xt−1 = ǫt with φ1 = ψ1 and under the same noise assumptions. 4.2 Weak ARMA Next we state additional assumptions on the kernel at zero and on ut (θ0 ) to obtain asymptotic results for the HAC estimator when the ARMA is not strong. Following Parzen (1957), define ω(r) = lim x→0 1 − ω(x) |x|r 16 for r ∈ [0, +∞). The largest exponent r such that ω(r) exists and is finite characterizes the smoothness of ω(·) at zero.6 Let the matrix I (r) = ∞ X i=−∞ |i|r ∆i (θ0 ) for r ∈ [0, +∞). Let ut = ut (θ0 ) = ǫt ∂ǫt (θ0 ) := (ut (1), . . . , ut (p + q))′ . ∂θ Denote by κℓ1 ,...,ℓ8 (0, j1 , . . . , j7 ) the eighth order cumulant of (ut (ℓ1 ), ut+j1 (ℓ2 ), . . . , ut+j7 (ℓ8 )) (see Brillinger, 1981 p.19), where ℓ1 , . . . , ℓ8 are positive integers less than p + q and j1 , . . . , j7 are integers. We consider the following assumptions. A6. E(ǫ16 t ) < ∞ and for all ℓ1 , . . . , ℓ8 +∞ X j1 =−∞ ··· +∞ X j7 =−∞ |κℓ1 ,...,ℓ8 (0, j1 , . . . , j7 )| < ∞ A7. For some r0 ∈ (0, +∞), 0 +1 lim nb2r = γ ∈ [0, +∞), n n→∞ ω(r0 ) < ∞ and kI (r0 ) k < ∞. The following result is a consequence of Andrews (Theorem 1, 1991) (and partly of Francq and Zakoïan (1998, 2000)): Theorem 4 Let A1 − A3 and A6 − A7 hold with γ > 0. Then for ω(·) ∈ K, n o′ lim nbn E vec Iˆ − I vec Iˆ − I n→∞ n o′ n o + 2 (r0 ) (r0 ) = ω(r vecI vecI γ + 2̟2 tr Dp+q Dp+q (I ⊗ I) . 0) (11) Note that this result is not in contradiction with Theorem 2. Indeed, under A4, I (r) = 0 for r > 0, so the first term in the right-hand side of (11) vanishes. 17 As noted by Andrews (1991), it seems likely that Assumption A6 could be replaced by mixing plus moment conditions, such as Assumption A5, or A5′ , and the moment condition Eǫ16+ν < ∞. We will do so in the next theorem. t We introduce the following mixing and moment assumptions on the process u = (ut ). A5′′ . A7 holds with γ = 0, kut k8+ν < ∞ and P∞ r h=0 h ν {αu (h)} 8+ν < ∞, for some ν > 0 and some r such that r ≥ 2, r > (3ν − 8)/(8 + ν) and r ≥ r0 . The extensions of Theorems 2 and 3 to weak ARMA can be formulated as follows. Theorem 5 Let A1 − A3 and A5′′ hold. Assume there exists κ < 1/6 such that 1/κ lim inf n→∞ nbn > 0. Then the convergence in distribution of Theorem 2 holds, and that of Theorem 3 becomes p d + +′ nbn vech Σ̂(2) − Σ̂(1) − Σ(2) + Σ(1) ; N 0, 2̟2 Dp+q (Σ(2) ⊗ Σ(2) )Dp+q . To prove this theorem we searched for a CLT for a triangular array (xn,t ) (see Equation (41) below) such that xn,t is a measurable function of ut−Tn , ut−Tn +1 , . . . ,ut+Tn , for some mixing process ut , with Tn = [a/bn ] → ∞. Denote by αn (·) the mixing coefficients of (xn,t )t . It seems that the existing CLT’s (see e.g. Withers (1981)) require conditions on supn αn (·) or other conditions which are difficult to check in our framework. We therefore establish the following Lemma, which can be viewed as a direct extension of the CLT given by Herrndorf (1984) in the case of a non stationary sequence (xt ), but may have its own interest. Lemma 1 Let (xn,t )n≥1,1≤t≤n be a triangular array of centered real-valued random variables. For each n ≥ 2, let αn (h), h = 1, . . . , n − 1, be the strong mixing coefficients of xn,1 , . . . , xn,n , defined by: αn (h) = sup sup 1≤t≤n−h A∈An,t , B∈Bn,t+h 18 |P (A ∩ B) − P (A)P (B)| . where An,t = σ (xn,u : 1 ≤ u ≤ t) and Bn,t = σ (xn,u : t ≤ u ≤ n). By convention, P we set αn (h) = 1/4 for h ≤ 0 and αn (h) = 0 for h ≥ n. Put Sn = nt=1 xn,t and assume that sup sup kxn,t k2+ν ∗ < ∞ n≥1 1≤t≤n for some ν ∗ (0, ∞], lim n−1 VarSn = σ 2 > 0, (12) (13) n→∞ there exist a sequence of integers (Tn ) such that Tn = O(nκ ) for some κ ∈ [0, ν ∗ /{4(1 + ν ∗ )}), (14) and a sequence {α(h)}h≥1 such that αn (h) ≤ α(h − Tn ), for all h > Tn , ∞ X ν∗ ∗ hr α(h) 2+ν ∗ < ∞ for some r ∗ > h=1 (15) ν ∗) ν∗ 2κ(1 + . − 2κ(1 + ν ∗ ) (16) Then d n−1/2 Sn ; N (0, σ 2 ). Theorem 5 applies for kernels which are very smooth at zero. Indeed, the condi6 0 +1 = 0 and lim inf tions limn→∞ nb2r n→∞ nbn 6= 0 imply r0 > 5/2. The following n theorem shows that this smoothness condition can be weaken when moment assumptions are added. The proof is similar to that of Theorem 5 and is therefore skipped. Theorem 6 Let A1−A3 and A5′′ hold with ν = ∞. Assume there exists κ < 1/4 1/κ such that lim inf n→∞ nbn > 0. Then the convergences in distribution of Theorem 5 hold. The results of this section will now be used to derive the asymptotic level of our test statistic. 19 5 Testing adequacy of the standard asymptotic distribution The results of Section 3 show that the asymptotic variances of the LSE under strong and weak assumptions can be dramatically different. Standard statistical routines estimate the asymptotic variance corresponding to strong ARMA models and it is of importance to know if the resulting tests (or confidence intervals) are reliable. The aim of the present section is therefore to test the assumptions presented in the introduction, which we recall for convenience: H0 : Σ(1) = Σ(2) , against H1 : Σ(1) 6= Σ(2) . It should be clear that under both assumptions the ARMA model is well-specified. In particular, the case of serial correlation of (ǫt ) is not considered. A statistic derived from Theorem 3 is n o′ n o Λ̂−1 vech Σ̂(2) − Σ̂(1) Υn = nbn vech Σ̂(2) − Σ̂(1) (17) where Λ̂−1 is any consistent estimator of Λ−1 . In view of (10) we can take ′ ˆ p+q , Λ̂−1 = (2σ̂ 4 ̟2 )−1 Dp+q (Jˆ ⊗ J)D which does not require any matrix inversion. Then we have n n o′ o ˆ vec Σ̂(2) − Σ̂(1) . (Jˆ ⊗ J) Υn = nbn (2σ̂ 4 ̟2 )−1 vec Σ̂(2) − Σ̂(1) But since n o′ n o′ ˆ ˆ = σ̂ 4 (p + q), vecΣ̂(1) (Jˆ ⊗ J)vec Σ̂(1) = σ̂ 4 vecJˆ−1 vecJˆ = σ̂ 4 tr(Jˆ−1 J) n o′ n o′ ˆ ˆ vecΣ̂(1) (Jˆ ⊗ J)vec Σ̂(2) = σ̂ 2 vecJˆ−1 vecIˆ = σ̂ 2 tr(Jˆ−1 I), n o′ n o′ ˆ ˆ 2 }, vecΣ̂(2) (Jˆ ⊗ J)vec Σ̂(2) = vecJˆ−1 IˆJˆ−1 vecIˆ = tr{(Jˆ−1 I) 20 we get n o ˆ + tr{(Jˆ−1 I) ˆ 2} Υn = nbn (2σ̂ 4 ̟2 )−1 σ̂ 4 (p + q) − σ̂ 2 tr(Jˆ−1 I) and therefore, denoting by Ip+q the (p + q) × (p + q) identity matrix, we obtain nbn 1 ˆ−1 ˆ 2 J I Υn = tr I − . (18) p+q 2̟2 σ̂ 2 1 ˆ−1 ˆ J I σ̂2 Note that when the ARMA is strong, converges to the identity matrix. The next result, which is a straightforward consequence of Theorem 3 and (17), provides a critical region of asymptotic level α ∈ (0, 1). Theorem 7 Let the assumptions of Theorem 2, in particular H0 , hold. Then n o lim P Υn > χ21 (p+q)(p+q+1),1−α = α, n→∞ 2 where χ2r,α denotes the α-quantile of the χ2r distribution. The following theorem gives conditions for the consistency of our test. Theorem 8 Assume that A1 − A3 and A5 (or A5′ ) hold and that E|Xt |4+2ν < 4+10/ν ∞. Let ω(·) ∈ K, and (bn ) satisfying limn→∞ bn = 0 and limn→∞ nbn = +∞. Then, under H1 we have n o lim P Υn > χ21 (p+q)(p+q+1),1−α = 1, n→∞ 2 for any α ∈ (0, 1). The test procedure is quite simple. For a given time series it consists in : (i) fitting an ARMA(p, q) model (after an identification step of the orders p and q, which is not the subject of this paper); (ii) estimating the matrices I and J by (7) and (6); (iii) computing the statistic Υn by (18) and rejecting H0 when Υn > χ21 (p+q)(p+q+1),1−α . Choice of the bandwidth and kernel used to define the 2 estimator of matrix I will be discussed in the next section. 21 Remarks (a) To our knowledge, this is the first test designed for the purpose of distinguishing between weak and strong ARMA models. In other words, this statistic allows to test whether the error term of an ARMA model is independent or simply non correlated. (b) Our test is related to other tests recently introduced in the time series literature. Some of them are goodness-of-fit tests (which is not the case of ours, since under both H0 and H1 the ARMA model (2) is well-specified). Let us mention Hong (1996), who proposes a test, based on a kernel-based spectral density estimator, for uncorrelatedness of the residuals of AR models with exogeneous variables. Its asymptotic distribution is established under the null assumption of independent errors and his consistent for serial correlation of unknown form. In the framework of ARMA models, Francq, Roy and Zakoian (2004) propose a modification of the standard portmanteau test for serial correlation, when the errors are only assumed to be uncorrelated (not independent) under the null hypothesis. The test of the present paper can be viewed as complementary to those goodnessof-fit tests. Another approach very closely related to ours is taken by Hong (1999) who proposes a nonparametric test for serial independence, based on a generalization of the spectral density. Hong’s test has power against various types of pairwise dependencies, including cases of absence of correlation, and is aimed to detect any departure from independence. Our test has a more limited scope since it is devoted to ARMA models. Moreover it is aimed to test one consequence of independence, the one that matters for the reliability of the standard routines. Another difference between the two approaches, is that Hong’s test is aimed to detect departure from independence of the observed process. It cannot be straightforwardly applied to our framework because, even under the null hypothesis of independence, the residuals 22 are dependent. 6 Choice of the bandwidth and kernel, and finite sample performances of the test To make the test procedure fully operational, it is necessary to specify how to chose the kernel and bandwidth parameters. To this aim we will first consider two standard asymptotic local efficiency criteria, respectively derived from the Bahadur and Pitman approaches. The reader is referred to Van der Vaart (1998) for details concerning the asymptotic local efficiency of tests. 6.1 Bahadur’s approach In view of (18), under the assumptions of Theorem 8, 1 1 1 −1 2 Υn → tr I − J I p+q nbn 2̟2 σ2 (1) (19) (2) in probability as n → ∞. Let Υn and Υn be two test statistics of the form Υn , with respective kernels ω1 and ω2 . The p-value of the tests can be approximated by 1−Fχ21 (i) 2 (p+q)(p+q+1) of the χ2k (Υn ), where Fχ2 denotes the cumulative distribution function k distribution. Assume we are under the alternative. Let {n1 (n)}n and {n2 (n)}n be two sequences of integers tending to infinity such that (1) log 1 − Fχ21 (Υn1 (n) ) 2 (p+q)(p+q+1) = 1 a.s. lim n→∞ (2) log 1 − Fχ21 (Υn2 (n) ) 2 (p+q)(p+q+1) One can say that, for large n, the two tests require respectively n1 (n) and n2 (n) observations to reach the same (log) p-value. The Bahadur asymptotic relative (1) efficiency (ARE) of Υn (2) with respect to Υn 23 (1) (2) is defined by ARE(Υn , Υn ) = limn→∞ n2 (n)/n1 (n). To make the comparison meaningful we use the same bandwidth for the two tests. Assume that bn = cn−ν , c > 0, ν ∈ (0, 1). Using n o 1/(1−ν) (1) (2) log 1 − Fχ2 (x) ∼ −x/2 as x → ∞, we obtain ARE(Υn , Υn ) = ̟22 /̟12 , k R where ̟i2 = ωi2 (x)dx. Thus, one can consider that the asymptotic superiority of the first test over the second test hold if ̟12 < ̟22 . In this sense, it is easy to see that the tests based on the truncated, Bartlett, Tukey-Hanning and Parzen kernels are ranked in an increasing order of asymptotic efficiency. A similar argument shows that, when the kernel is fixed and ν varies, the Bahadur efficiency decreases when ν increases. Thus, in the Bahadur sense, there is no optimal choice of bn : the slower bn tends to zero, the asymptotically more efficient the tests are. Unfortunately, this result does not give indication on how to choose the bandwidth parameter for finite samples. If bn tends to zero too slowly and/or if ̟i2 is too small, the finite sample bias of Iˆ is likely to be important, and the rate of convergence in (19) is likely to be very slow. 6.2 Pitman’s approach Another popular approach used to compare the asymptotic local powers of tests is √ that of Pitman. Consider local alternatives of the form H1n : I = σ 2 J + ∆/ nbn , where ∆ 6= 0 is a symmetric positive definite matrix. Alternatively, one could for√ mulate these alternatives as H1n : Σ(2) = Σ(1) + J −1 ∆J −1 / nbn . Under standard assumptions, Σ̂(2) − Σ̂(1) is a regular estimator of Σ(2) − Σ(1) (see Van der Vaart (1998), Section 8.5). Therefore, in view of Theorem 5, under H1n , p d nbn vech Σ̂(2) − Σ̂(1) ; N vechJ −1 ∆J −1 , Λ . It follows that, under H1n , d Υn ; χ21 (p+q)(p+q+1) 2 n ′ o vechJ −1 ∆J −1 Λ−1 vechJ −1 ∆J −1 , 24 where χ2k (δ) denotes the noncentral χ2k distribution with noncentrality parameter δ. It follows that the Pitman asymptotic local power increases with the noncentrality parameter. In view of (10), we draw the same conclusion as for the Bahadur approach: tests with small ̟i2 are preferred. Amazingly, the asymptotic distribution of Υn does not depend on the asymptotic behaviour of bn . However the slower bn tends to zero, the faster H1n tends to H0 . It is therefore preferable to chose bn as large as possible. This was also our conclusion with the previous approach. 6.3 Automatic bandwidth estimators Since the approaches of the previous sections do not allow to chose bn in practice, we now turn to an automatic bandwidth method. Andrews (1991) obtained asymptotically optimal data-dependent automatic bandwidth parameters for HAC estimators. We will apply his results to our framework. It should be noted, however, that optimal HAC estimators do not necessarily provide asymptotically optimal test statistics. This issue is left for further investigation. Andrews showed that, under the assumption kI (r) k > 0 and other regularity assumptions, the asymptotically optimal bandwidth parameter (leading to an estimator with the best bias-variance trade-off) is given by b∗n −1 =c {α(r)n} −1/(2r+1) , α(r) = ′ (r) e 2 i i=1 ei I Pp+q ′ 2 , (e Ie ) i=1 i i Pp+q where (c, r) is equal to (1.1447, 1) for the Bartlett kernel, to (2.6614, 2) for the Parzen kernel, and to (1.7462, 2) for the Tukey-Hanning kernel, and where ei denotes the i-th vector of the canonical basis of Rp+q . Approximating, for j = 1, . . . , p + q, the dynamics of ǫt ∂ǫt (θ0 )/∂θj by a simple AR(1) model with autoregressive parameter âj and variance parameter σ̂j2 , Andrews obtained the data- 25 dependent estimate b̂∗n = c−1 {α̂(r)n}−1/(2r+1) of b∗n , by setting α̂(1) = 6.4 Pp+q i=1 4â2j σ̂j4 (1 − âj )−6 (1 + âj )−2 , Pp+q 4 −4 i=1 σ̂j (1 − âj ) Finite sample performance α̂(2) = Pp+q 2 4 −8 i=1 4âj σ̂j (1 − âj ) . Pp+q 4 −4 i=1 σ̂j (1 − âj ) To assess the finite sample performances of the tests proposed in this paper, we first simulate 1,000 replications of several strong ARMA models of size n = 200, n = 400 and n = 800. We consider tests with nominal level α = 5%. We use the estimated optimal bandwidth given by Andrews, as described previously, and 3 different kernels. The relative rejection frequencies are given in Table 1. All the empirical sizes are less than the nominal 5% level. It seems that the tests are slightly conservative, and that, in terms of control of type I error, the performance of the three kernels is very similar. Now we turn to experiments under the alternative of non independent errors. Five models were considered: i) an AR(1) with GARCH(1,1) errors; ii) the square of a GARCH(1,1); iii) an ARMA(1,1) with a Markov-Switching white noise; iv) the first component of a strong bivariate MA(1); v) an AR(1) with a chaotic noise. Precise specifications are displayed in Table 2. All these examples have been shown to provide ARMA models with non independent errors: an ARMA(1,1) for models ii) and iii), an AR(1) for models i) and v), and a MA(1) for model iv). It should be emphasized that we only need to estimate the ARMA representation, not the DGP. Andrews (1991) showed that, for HAC estimation, the Bartlett kernel is less efficient than the two other kernels. In these examples, the power of the test does not appear to be very sensitive to the kernel choice. There is no particular user-chosen kernel that is the most satisfactory for all cases. For each sample size, the best performance is obtained for the Markov-switching model. For this 26 model, the tests almost always take the right decision, at least when n ≥ 500. Slower convergences to 1 are obtained for the powers in models i), iv) and v). The very slow power convergence in model ii) can be explained as follows. The weak ARMA(1, 1) representation of ǫ2t is ǫ2t − 0.97ǫ2t−1 = 1 + ut − 0.85ut−1 for some white noise ut . It is seen that the AR and MA parts have close roots, making statistical inference difficult. As continuous functions of the ARMA estimators, the estimators of I and J inherit their poor accuracy. Another explanation, which also holds for the relatively poor performance of model i), is that, in models based on GARCH errors, the noise is a martingale difference. For this reason, departure from the strong assumption can be more difficult to detect than in cases when the noise is only uncorrelated (as in models iii)-v)). Table 1: Size (in % of relative rejection frequencies) of Υn -tests with estimated optimal bandwidth. The nominal significance level is α = 5%. The number of replications is N = 1, 000. Model Strong AR(1)1 Strong MA(1)2 Strong ARMA(1,1)3 Kernel n Bartlett Parzen Tukey-Hanning 200 2.3 1.5 2.2 500 2.0 3.8 2.3 800 2.9 2.1 2.6 200 1.3 3.1 1.3 500 1.9 2.7 2.1 800 2.2 2.5 3.0 200 1.9 3.0 3.3 500 3.2 4.8 3.8 800 3.6 4.4 4.0 1: Xt = 0.5Xt−1 + ǫt , ǫt iid Student with ν = 5 degrees of freedom 2: Xt = ǫt + 0.7ǫt−1 , ǫt iid with centered exponential density 3: Xt − 0.5Xt−1 = ǫt + 0.7ǫt−1 , ǫt iid N (0, 1) 27 Table 2: Power (in % of relative rejection frequencies) of Υn -tests with estimated optimal bandwidth. The nominal significance level is α = 5%. The number of replications is N = 1, 000. Model AR(1)-GARCH(1,1)4 Square of a GARCH(1,1)5 MS-ARMA(1,1)6 MA(1) marginal AR(1) 8 7 n Bartlett 200 14.1 500 53.5 800 74.6 200 9.4 500 23.6 800 38.8 200 75.8 500 98.4 800 99.9 200 32.6 500 70.4 800 86.0 200 22.5 500 42.1 800 53.9 Kernel Parzen Tukey-Hanning 18.9 19.7 50.0 51.0 72.2 70.4 9.4 9.7 24.9 26.8 39.0 36.1 81.1 79.8 98.4 98.3 99.9 99.9 30.5 36.2 73.1 78.8 89.4 94.1 25.8 21.0 46.2 48.9 63.3 62.2 √ 4: Xt = 0.5Xt−1 + ǫt , ǫt = ht ηt , ht = 1 + 0.12ǫ2t−1 + 0.85ht−1 , ηt iid N (0, 1) 5: Xt = ǫ2t , where ǫt is as in 4 6: Xt − 0.5Xt−1 = ǫt + 0.7ǫt−1 , ǫt = ηt + (1 − 2∆t )ηt−1 , (∆t ) is a Markov Chain with state-space {0, 1} and transition probabilities P (∆t = 1|∆t−1 = 0) = P (∆t = 0|∆t−1 = 1) = 0.01, ηt iid N (0, 1) η1 t 7: X1 t = ǫ1 t + 0.8ǫ1 t−1 − 0.9ǫ2 t−1 , where ǫ1 t = η12 t − 1, ǫ2 t = η22 t − 1, and ∼ η2 t 1 0.9 N 0, 0.9 1 8: Xt = 0.5Xt−1 + ǫt , where (ǫt ) is the noise defined by (1) with ηt ∼ N (−0.5, 0.052) 28 7 Proofs 7.1 Additional notations and scheme of proof for Theorem 2 Throughout this section, the letter K (resp. ρ) will be used to denote positive constants (resp. constants in (0, 1)) whose values are unimportant and may P vary. We will use the norm kAk = |aij | for any matrix A = (aij ). Let ut (θ) = ǫt (θ) ∂ ǫt (θ), ∂θ ut = ut (θ0 ), vt (θ) = et (θ) ∂ et (θ), ∂θ vt = vt (θ0 ). Hence, for 0 ≤ i < n, and for i ≥ 0, respectively, n−i X ′ ˆ i (θ) = ∆ ˆ ′ (θ) = 1 ∆ vt (θ)vt+i (θ), −i n t=1 ∆i (θ) = ∆′−i (θ) = E{ut (θ)u′t+i (θ)}. We set, for 0 ≤ i < n, [a/bn ] n−i n o′ 1X u u ˆ ˆ ∆i (θ) = ∆−i (θ) = ut (θ)u′t+i (θ), n t=1 Iˆnu (θ) = X ˆ u (θ), ω(ibn )∆ i i=−[a/bn ] assuming without loss of generality that a/bn ≤ n. Similarly, we can write ˆ i (θ) = ∆ ˆ v (θ) and Iˆn (θ) = Iˆv (θ). In the notation of all those quantities the ∆ i n parameter θ will be removed when equal to θ0 . It will also be convenient to modify the number of terms taken into acˆ u (θ0 ) = ∆ ˆ u . Define, for i ∈ N = {0, 1, . . . }, count in the definition of ∆ i i [a/bn ] n ∆ui = (∆u−i )′ 1X ut u′t+i , = n t=1 29 Inu = X i=−[a/bn ] ω(ibn )∆ui , and the centered matrix u X I n = Inu − ∆u0 = ω(ibn )∆ui = 0<|i|≤[a/bn ] It will be shown that √ X 0<i≤[a/bn ] ω(ibn ) {∆ui + (∆ui )′ } . n o √ √ u nbn Iˆ − I = nbn Iˆnv (θ̂n ) − I and nbn I n have the same asymptotic distribution (see Lemma 9 below). P Define the sequences (ck,ℓ )k by ∂θ∂ ℓ ǫt = ∞ k=1 ck,ℓ ǫt−k . Since a central limit theorem for r-dependent sequences will be used, it will be useful to consider the following truncated variables. For any positive integer r, let r ∂ X ǫt = ck,ℓ ǫt−k , r ∂θℓ r ut = ǫt k=1 ∂ ǫt , r ∂θ r ut = ut − r ut , (20) and for m < k u r ∆i,(m,k) = ′ u r ∆−i,(m,k) u r I n,(m,k) = k X 1 ′ = r ut r ut+i k − m + 1 t=m X (i ∈ N), ω(ibn )r ∆ui,(m,k) . 0<|i|≤[a/bn ] When m = 1 and k = n we will write r ∆ui,(m,k) = u r ∆i u u and r I n,(m,k) = r I n . √ u By a standard argument, we will obtain the limit distribution of nbn I n by taking the limit as r → ∞ of the asymptotic distribution, as n → ∞, of √ u nbn r I n . We also denote ∂ ∂ ǫt ǫt . rJ = E r ∂θ r ∂θ ′ P u Now note that r I n = nt=1 Zn,t , where Zn,t = 1 n X ω(ibn ) 0<i≤[a/bn ] 30 ′ r ut r ut+i + r ut+i r ut ′ . Process (Zn,t )t is mn -dependent, with mn = [ab−1 n ] + r. Next we split the sum Pn t=1 Zn,t into alternate blocks of length kn − mn and mn (with a remaining block of size n − pn kn + mn ): pn −2 u rI n = r Sn + X ℓ=0 r Sn Zn,(ℓ+1)kn −mn +1 + · · · + Zn,(ℓ+1)kn ! +Zn,pnkn −mn +1 + · · · + Zn,n , pX n −1 = Zn,ℓkn+1 + · · · + Zn,(ℓ+1)kn −mn (21) ℓ=0 pn −1 kn − mn X u = r I n,(ℓkn +1,(ℓ+1)kn −mn ) , n ℓ=0 where kn is an integer in (mn , n) to be specified later and pn = [n/kn ], the integer part of n/kn (assuming that n is sufficiently large, so that mn ≤ n). It will be shown that, when mn = o(kn − mn ) and pn → ∞, the asymptotic √ √ u distributions of nbn r I n and nbn r Sn are identical (see part (b) of Lemma 13). To avoid moment assumptions of excessive order, we then introduce variables that are truncated in level. For any positive constant κ and for m < k, let r ∂ κ X ǫt = ck,ℓ ǫκt−k , ∂θ r ℓ k=1 ǫκt = ǫt 1{|ǫt |≤κ} − Eǫt 1{|ǫt |≤κ} , κ κ r ut = ǫt let r ∆u,κ i,(m,k) (resp. ′ r ut r ut+i ∂ κ ǫt , r ∂θ u,κ r I n,(m,k) ) κ r ut be the matrix obtained by replacing the variables ′ by r uκt r uκt+i in r ∆ui,(m,k) (resp. r Sn κ = r ut − r uκt , u r I n,(m,k) ) and let pn −1 kn − mn X u,κ = r I n,(ℓkn +1,ℓkn +kn −mn ) . n ℓ=0 31 √ We will show that, when κn → ∞, the asymptotic distributions of nbn r Sn √ and nbn r Snκn are identical (see part (c) of Lemma 13). The Lindeberg central limit theorem will be used to show the asymptotic normality of √ nbn r Snκn . 7.2 Lemmas and proofs for Theorem 2 In all subsequent lemmas, the assumptions of Theorem 2 are supposed to be satisfied. The first four lemmas are concerned with some fourth-order properties of the process (ut ). Lemma 2 Let k ∈ N∗ = {1, 2, . . . }, t1 , t2 , . . . , tk ∈ Z = {. . . , −1, 0, 1, . . . } and i1 , i2 , . . . , ik ∈ N∗ . If the indexes t1 , t2 , . . . , tk are all distinct and the indices i1 , i2 , . . . , ik are all less than or equal to 4, then k Y ij ǫt ≤ Mk < ∞. E j j=1 Proof. Arguing by induction, it suffices to note that E|Xǫit | = E|X|E|ǫit | ≤ E|X| maxj∈{1,...,4} E|ǫt |j < ∞, when X and ǫt are independent, E|X| < ∞, i ∈ {1, . . . , 4} and Eǫ4t < ∞. 2 Lemma 3 For all i, j, h ∈ Z, the matrix Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j } is 2 2 well defined in R(p+q) × R(p+q) and is bounded in norm by a constant inde- pendent of i, j and h. Proof. For ℓ ∈ {1, . . . , p + q}, recall that ut (ℓ) = ǫt ∂θ∂ ℓ ǫt (θ0 ) is the ℓ-th P element of ut . Assumption A2 entails that ∂θ∂ ℓ ǫt (θ0 ) = k≥1 ck,ℓ ǫt−k , where 32 |ck,ℓ | < Kρk . Therefore we have, for all ℓ1 , ℓ2 , ℓ3 , ℓ4 ∈ {1, . . . , p + q}, E |u1 (ℓ1 )u1+i (ℓ2 )u1+h (ℓ3 )u1+h+j (ℓ4 )| ≤ K X ρi1 +i2 +i3 +i4 E i1 ,i2 ,i3 ,i4 ≥1 8 Y j=1 |ǫtj | (22) with t1 := 1 6= t2 := 1 − i1 , t3 := 1 + i 6= t4 := 1 + i − i2 , t5 := 1 + h 6= t6 := 1 + h − i3 and t7 := 1 + h + j 6= t8 := 1 + h + j − i4 . Since, at most four of the indices t1 , . . . , t8 are equal, Lemma 2 shows that the right-hand side of (22) is bounded. The proof follows. 2 Lemma 4 For i, h ∈ Z = {. . . , −1, 0, 1, . . . }, 0 when ih 6= 0, Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+i } = O(ρ|h|) when i = 0, σ 4 J ⊗ J + O(ρ|i| ) when h = 0. Proof. We keep the notations of the proof of Lemma 3. First note that, for ℓ1 , ℓ2 , ℓ3 , ℓ4 ∈ {1, . . . , p+q}, the (p+q)(ℓ1−1)+ℓ2 -th row and (p+q)(ℓ3 −1)+ℓ4th column element of the previous covariance matrix (resp. of J ⊗ J) is Cov {u1 (ℓ1 )u1+i (ℓ2 ), u1+h (ℓ3 )u1+h+i (ℓ4 )} (23) (resp. J(ℓ1 , ℓ3 )J(ℓ2 , ℓ4 )). The case ih 6= 0 is obvious by noting that, in (23), one of the u’s has an index that is strictly greater than the other indexes. To deal with the case i = 0, without loss of generality suppose that h > 0. The covariance in (23) is given by σ2 X i1 ,i2 ,i3 ,i4 ≥1 ci1 ,ℓ1 ci2 ,ℓ2 ci3 ,ℓ3 ci4 ,ℓ4 Cov ǫ21 ǫ1−i1 ǫ1−i2 , ǫ1+h−i3 ǫ1+h−i4 . 33 When i4 < h, Cov {ǫ21 ǫ1−i1 ǫ1−i2 , ǫ1+h−i3 ǫ1+h−i4 } = 0 because when i3 < h, ǫ21 ǫ1−i1 ǫ1−i2 and ǫ1+h−i3 ǫ1+h−i4 are independent, and when i3 ≥ h this covariance is given by Eǫ21 ǫ1−i1 ǫ1−i2 ǫ1+h−i3 Eǫ1+h−i4 = 0. Therefore |Cov {u1(ℓ1 )u1 (ℓ2 ), u1+h (ℓ3 )u1+h (ℓ4 )}| ≤ K X i4 ≥h ci4 ,ℓ4 ≤ Kρh . Now we consider the case h = 0. For i 6= 0, (23) is given by Eu1 (ℓ1 )u1 (ℓ3 )u1+i (ℓ2 )u1+i (ℓ4 ) = Eu1 (ℓ1 )u1 (ℓ3 )Eu1+i (ℓ2 )u1+i (ℓ4 ) + ∇ = σ 4 J(ℓ1 , ℓ3 )J(ℓ2 , ℓ4 ) + ∇, where ∇ = Cov {u1 (ℓ1 )u1 (ℓ3 ), u1+i (ℓ2 )u1+i (ℓ4 )} = O(ρ|i| ), as already proven. 2 Lemma 5 For |i| = 6 |j|, +∞ X h=−∞ kCov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j }k ≤ Kρ|i| ρ|j| . (24) Proof. Without loss of generality, assume that i ≥ 0 and j ≥ 0 (i 6= j). For i > 0 and j > 0, all norms in (24) vanish, except perhaps one which is the sum (over the ℓi ’s) of |Cov {u1 (ℓ1 )u1+i (ℓ2 ), u1−j+i (ℓ3 )u1+i (ℓ4 )}| X ≤ |ci1 ,ℓ1 ci2 ,ℓ2 ci3 ,ℓ3 ci4 ,ℓ4 | Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ21+i ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 i1 ,i2 ,i3 ,i4 ≥1 ≤ K X i1 ,i2 ,i3 ,i4 ≥1 ρi1 +i2 +i3 +i4 |Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 | . Let Si = {(i1 , i2 , i3 , i4 )| max(i1 , i2 , i3 , i4 ) ≥ i}. Clearly, X Si ρi1 +i2 +i3 +i4 |Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 | ≤ Kρi . 34 Now for (i1 , i2 , i3 , i4 ) not belonging to Si , the indices of the ǫ’s in the expectation can be ranked as follows: 1 − i1 < 1 < min(1 + i − i2 , 1 + i − i4 ) and 1 − j + i − i3 < 1 − j + i. (25) It is therefore clear that, at least one of the ǫ’s in the previous expectations has an index different from the others, making these expectations equal to 0. We conclude that the left-hand side of (24) is bounded by Kρi uniformly in j > 0, and, by symmetry, by Kρj uniformly in i > 0. For i = 0 and j > 0, the left-hand side of (24) reduces to a sum (over the ℓi ’s) of −j X h=−∞ ≤ K |Cov {u1 (ℓ1 )u1 (ℓ2 ), u1+h (ℓ3 )u1+h+j (ℓ4 )}| −j X X h=−∞ i1 ,i2 ,i3 ,i4 ≥1 ρi1 +i2 +i3 +i4 Eǫ1−i1 ǫ21 ǫ1−i2 ǫ1+h−i3 ǫ1+h ǫ1+h+j−i4 ǫ1+h+j . In this sum all terms vanish, except when i1 = −h or i2 = −h or i4 = j (in which case at least two indices are equal to 1+h) and when i1 = −h−j or h = −j or i2 = −h − j (in which case at least two indices are equal to 1 + h + j). Therefore, it can be seen that the left-hand side of (24) is also bounded by Kρj when i = 0. The case j = 0 is handled in the same way, by symmetry. The conclusion follows. 2 Lemma 6 ′ + + lim nbn Var {vech Inu } = 2σ 4 ̟ 2 Dp+q (J ⊗ J)Dp+q . n→∞ Proof. By stationarity of the processes (ut ), and by the elementary relations + vech(A) = Dp+q vec(A), for any p + q × p + q symmetric matrix A, and 35 vec(ui u′j ) = uj ⊗ ui , we have 5 X nbn Var {vech Inu } = X An (i, j, h) k=1 (i,j,h)∈Ik where An (i, j, h) = bn + +′ ω(ibn )ω(jbn )(n−|h|)Dp+q Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j } Dp+q n and the Ik ’s are subsets of ZZ2 × {−n + 1, . . . , n − 1} defined by I1 = {i = j, h = 0}, I3 = {|i| = 6 |j|}, I2 = {i = −j = h 6= 0}, I4 = {i = j, h 6= 0}, I5 = {i = −j, h 6= i 6= 0}. In view of Lemma 4, X An (i, j, h) 4 = σ bn n→∞ 4 +∞ n X i=−∞ (i,j,h)∈I1 2 ′ + + ω 2(ibn )Dp+q (J ⊗ J)Dp+q + O(ρ|i| ) ′ o + + −→ σ ̟ Dp+q (J ⊗ J)Dp+q . We have Cov {u1 ⊗ u1+i , u1+i ⊗ u1 } = Cov {u1 ⊗ u1+i , u1 ⊗ u1+i } Kp+q where Kp+q is the (symmetric) commutation matrix such that Kp+q vecA = vecA′ for any (p + q) × (p + q) matrix A. Thus, X An (i, j, h) n−1 o X n − |i| n 2 + +′ |i| ω (ibn )Dp+q (J ⊗ J)Kp+q Dp+q + O(ρ ) σ bn n i=−n+1 4 = (i,j,h)∈I2 n→∞ ′ + + −→ σ 4 ̟ 2 Dp+q (J ⊗ J)Kp+q Dp+q From Lemma 5, X (i,j,h)∈I3 n→∞ An (i, j, h) = O(bn ) −→ 0. 36 From Lemma 4, X An (i, j, h) = bn (i,j,h)∈I4 X 0<|h|<n n→∞ ρ|h| −→ 0, X An (i, j, h) = 0. (i,j,h)∈I5 Therefore ′ + + lim nbn Var {vech Inu } = σ 4 ̟ 2Dp+q (J ⊗ J)(I(p+q)2 + Kp+q )Dp+q . n→∞ The conclusion follows from the relation ′ ′ ′ + + + + (I(p+q)2 + Kp+q )Dp+q = 2Dp+q Dp+q Dp+q = 2Dp+q . (see Magnus and Neudecker, 1988, Theorem 3.12). 2 Lemma 7 ∂ K u ˆ E ′ (vec ∆i ) < Kρ|i| + √ . ∂θ n ˆ ui = ∆ ˆ u−i′ , we will only consider the case 0 ≤ i < n. We Proof. Because ∆ have n−i X ∂ut+i ∂ut ∂ ˆ u) = 4 (vec ∆ ⊗ ut + ut+i ⊗ ′ i ′ ′ ∂θ n t=1 ∂θ ∂θ n−i n−i 4X ∂ǫt+i ∂ǫt+i ∂ǫt 4X ∂ 2 ǫt+i ∂ǫt = ǫt ⊗ + ǫ ǫ ⊗ t t+i n t=1 ∂θ ∂θ′ ∂θ n t=1 ∂θ∂θ′ ∂θ n−i n−i ∂ǫt+i ∂ǫt ∂ǫt 4X ∂ǫt+i ∂ 2 ǫt 4X ǫt+i ⊗ + ǫ ǫ ⊗ . + t t+i n t=1 ∂θ ∂θ ∂θ′ n t=1 ∂θ ∂θ∂θ′ Considering the first sum in the right-hand side, we will prove that, for any ℓ1 , ℓ2 , ℓ3 ∈ {1, . . . , p + q} n−i 1 X ∂ǫ ∂ǫ ∂ǫ K t+i t+i t E ǫt < Kρi + √ . n ∂θℓ1 ∂θℓ2 ∂θℓ3 n t=1 37 (26) The other sums can be handled in a similar fashion. For i = 0 or i = 1, (26) holds straightforwardly because the L1 norm of the term inside the sum exists. Now, for i > 1 write ∂ǫ i−1 ∂ǫ ∂ǫt+i t+i t+i = + , ∂θℓ i−1 ∂θℓ ∂θℓ with the notation in (20). Note that the truncated derivative, i.e. the first term in the right-hand side of the previous equality, is independent of ǫt−j , j ≥ 0. We will first prove that (26) holds when ∂ǫt+i /∂θℓ1 and ∂ǫt+i /∂θℓ2 are replaced by the truncated derivatives. It will be sufficient to show that E ( n−i 1X ǫt n t=1 ∂ǫ t+i ∂θℓ1 i−1 ∂ǫ ∂ǫ t ∂θℓ2 ∂θℓ3 t+i i−1 )2 < K . n (27) By stationarity the left-hand side in (27) is bounded by +∞ 1 X E ǫ1 n h=−∞ ≤ +∞ X i−1 X ∂ǫ 1+i i−1 ∂θℓ1 × ǫ1+|h| ∞ X ∂ǫ 1+i ∂θℓ2 i−1 ∂ǫ ∂ǫ 1 ∂θℓ3 1+|h|+i i−1 ∂θℓ1 ∂ǫ 1+|h|+i i−1 ∂θℓ2 ∂ǫ 1+|h| ∂θℓ3 1 |ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ | n h=−∞ k ,k ,k ,k =1 k ,k =1 1 1 2 2 3 3 4 1 5 2 6 3 1 2 4 5 3 6 × E ǫ1 ǫ1+i−k1 ǫ1+i−k2 ǫ1−k3 ǫ1+|h| ǫ1+|h|+i−k4 ǫ1+|h|+i−k5 ǫ1+|h|−k6 . It is easily seen that in the last expectation at most four indexes can be equal which, by Lemma 2 ensures its existence. Moreover, when h 6= 0 the expectation vanishes. Therefore (27) holds. It remains to show that (26) holds when ∂ǫt+i /∂θℓ1 and/or ∂ǫt+i /∂θℓ2 are replaced by the complements of the truncated derivatives. For instance we 38 have n−i 1 X E ǫt n t=1 i−1 ∂ǫ t+i ∂ǫt+i ∂θℓ1 ∂θℓ2 ∂ǫt ≤ kǫt k4 ∂θℓ3 ∂ǫt+i ∂ǫt ∂θℓ ∂θℓ 2 3 4 4 4 i−1 ∂ǫ t+i ∂θℓ1 < K × Kρi × K × K. The proof of (26) is completed. Hence the Lemma is proved. 2 Lemma 8 p nbn E sup θ∈Θδ ˆ n→∞ u ˆ In (θ) − In (θ) −→ 0. Proof. The matrix norm being multiplicative, the supremum inside the brackets is bounded by [a/bn ] n−i X 2 X ω(ibn ) λi,t n i=0 t=1 where λi,t = sup {kut (θ) − vt (θ)kkut+i (θ)k + kut+i (θ) − vt+i (θ)kkvt (θ)k}. θ∈Θδ Note that max sup |ǫt (θ) − et (θ)| , sup X ∂ ∂ ǫt (θ) − et (θ) ≤ K ρt+m |ǫ1−m | . ∂θ θ∈Θδ ∂θ m≥1 θ∈Θδ Hence ∂ ∂ ∂ sup kut (θ) − vt (θ)k ≤ sup |ǫt (θ) − et (θ)| ǫt (θ) + |et (θ)| ǫt (θ) − et (θ) ∂θ ∂θ ∂θ θ∈Θδ θ∈Θδ X ≤ K ρt+m1 +m2 |ǫ1−m1 | |ǫt−m2 | . m1 ,m2 ≥1 It follows that, for i ≥ 0 λi,t ≤ K X m1 ,m2 ,m3 ,m4 ≥1 ρt+m1 +m2 +m3 +m4 |ǫ1−m1 | |ǫt−m2 | |ǫt+i−m3 | (|ǫt+i−m4 |+|ǫt−m4 |). 39 Therefore, in view of Eǫ4t < ∞, we get E(λi,t ) ≤ Kρt , from which we deduce r [a/bn ] ∞ X p bn X ˆ u ˆ nbn E sup In (θ) − In (θ) ≤ K ω(ibn ) ρt n i=0 θ∈Θδ t=1 K ≤ √ = o(1). 2 nbn Lemma 9 p u nbn Iˆ − I − I n = oP (1). Proof. We prove this lemma by showing that: √ u ˆ ˆ i) nbn I − In (θ̂n ) = oP (1); √ nbn Iˆnu (θ̂n ) − Iˆnu = oP (1); ii) √ u u ˆ iii) nbn In − In = oP (1); √ u iv) nbn Inu − I − I n = oP (1). Result i) is a straightforward consequence of Lemma 8. To prove ii) we proceed by applying the mean-value theorem to the (ℓ1 , ℓ2 )-th component of Iˆnu . For some θ between θ̂n and θ0 we have ∂ Iˆnu (θ̂n )(ℓ1 , ℓ2 ) − Iˆnu (ℓ1 , ℓ2 ) = (θ̂n − θ0 )′ Iˆnu (θ)(ℓ1 , ℓ2 ). ∂θ Since kθ̂n − θ0 k = OP (n−1/2 ), it is sufficient to prove that ∂ u sup Iˆn (θ)(ℓ1 , ℓ2 ) = OP (1). θ∈Θδ ∂θ (28) Straightforward algebra shows that ∂ ∂ u ˆ vec vec ∆i (θ) ∂θ′ ∂θ′ n−i n−i 1X ∂ ∂ut+i 1X ∂ut+i ∂ut = vec ⊗ ut (θ) + vec ⊗ ′ (θ) ′ ′ ′ n t=1 ∂θ ∂θ n t=1 ∂θ ∂θ n−i n−i 1 X ∂ut+i ∂ut 1X ∂ ∂ut + ⊗ vec (θ) + ut+i ⊗ vec (θ). (29) n t=1 ∂θ′ ∂θ′ n t=1 ∂θ′ ∂θ′ 40 Using the Cauchy-Schwarz inequality and the ergodic theorem, we have, almost surely, n−i 1 X ∂ ∂ut+i lim sup sup vec ⊗ ut (θ) ′ ′ n→∞ i θ∈Θ n ∂θ ∂θ δ t=1 n−i ∂ ∂ut+i 1X sup kut (θ)k ≤ lim sup sup vec (θ) θ∈Θ ′ ′ n→∞ i n ∂θ ∂θ θ∈Θ δ δ t=1 ( n )1/2 ( n )1/2 X ∂ 2 ∂u 1 1X t 2 sup vec (θ) sup kut (θ)k < ∞. ≤ lim n→∞ n t=1 θ∈Θδ ∂θ′ ∂θ′ n t=1 θ∈Θδ Treating in the same way the three other sums of (29), we deduce ∂ ∂ u ˆ < ∞ a.s. lim sup sup ′ vec vec ∆ (θ) i ′ n→∞ i θ∈Θ ∂θ ∂θ δ (30) Now a Taylor expansion gives, for any ℓ1 , ℓ2 , ℓ3 ∂ ˆu ∂ ˆu ∂ ∆i (θ)(ℓ1 , ℓ2 ) = ∆i (ℓ1 , ℓ2 )+(θ−θ0 )′ ∂θℓ3 ∂θℓ3 ∂θ ∂ ˆu ∗ ∆ (θ )(ℓ1 , ℓ2 ) , (31) ∂θℓ3 i where θ∗ is between θ and θ0 . From (31), (30), Lemma 7 and the Cesaro Lemma we obtain ∂ u Iˆn (θ)(ℓ1 , ℓ2 ) ≤ K ∂θ [ab−1 n ] X i=−[ab−1 n ] ∂ u ∆ ˆ ∂θ i (ℓ1 , ℓ2 ) [ab−1 n ] +K kθ − θ0 k X i=−[ab−1 n ] 2 ∂ u ∗ ˆ (θ )(ℓ1 , ℓ2 ) sup sup ∆ i ′ ∂θ∂θ i θ ∗ ∈Θ δ −1 = OP (1) + OP (n−1/2 b−1 n ) + OP (bn kθ − θ0 k). Hence (28), and thus ii) is proved. Next we prove iii). We have [a/bn ] vec (Iˆnu − Inu ) = X i=1 n 1 X ω(ibn ) ut+i ⊗ ut + ut ⊗ ut+i n t=n−i+1 41 (32) Hence, by Lemma 3 n o u u ˆ nbn Var vec (In − In ) [a/bn ] bn X = ω(ibn )ω(jbn ) n i,j=1 × n X n X t=n−i+1 s=n−j+1 Cov {ut+i ⊗ ut + ut ⊗ ut+i , us+j ⊗ us + us ⊗ us+j } [a/bn ] 1 Kbn X ≤ ij = O = o(1), n i,j=1 nb3n which establishes iii). u Note that, under A4, we have I = E(Inu − I n ) = E∆u0 . To prove iv) it suffices therefore to show that √ n 1 X u { ut ⊗ ut − E(ut ⊗ ut )} n vec Inu − I − I n = √ n t=1 is bounded in probability. This is straightforward from Lemma 4. 2 Lemma 10 lim Var n→∞ hp nbn vec u (I n − u rI n) i = O(ρr ) as r → ∞. Proof. We start by writing n vec u u (I n − r I n ) = X 0<|i|≤[a/bn ] 1X r ut+i ⊗ut +ut+i ⊗ r ut − r ut+i ⊗ r ut . ω(ibn ) n t=1 This double sum can obviously be split into six parts (distinguishing i > 0 and i < 0), and it will be sufficient to show for instance that [a/bn ] n X p 1X r lim Var nbn ω(ibn ) ut+i ⊗ ut = O(ρr ) n→∞ n t=1 i=1 42 (33) since the other terms can be treated in precisely the same way. The variance in (33) can be written as [a/bn ] X bn X ω(ibn )ω(jbn ) (n − |h|)Cov { r u1+i ⊗ u1 , r u1+h+j ⊗ u1+h } . (34) n i,j=1 |h|<n The existence of these covariances is obtained by a straightforward extension of Lemma 3. Proceeding as in the proofs of Lemmas 4 and 5 we find, for i, j > 0 0 when h 6= 0, r r Cov { u1+i ⊗ u1 , u1+h+i ⊗ u1+h } = O(ρr ) when h = 0, uniformly in i, and +∞ X h=−∞ kCov { r u1+i ⊗ u1 , r u1+h+j ⊗ u1+h }k ≤ Kρi ρj ρr when i 6= j. It follows from (34) that (33) holds, which concludes the proof of this lemma. 2 Lemma 11 For m < k, hp i u u,κ Var (k − m + 1)bn vec ( r I n,(m,k) − r I n,(m,k) ) ≤ Kκ−ν/2 , where K is independent of m, k, n, κ. Proof. We have u u,κ vec ( r I n,(m,k) − r I n,(m,k) ) = X k X 1 κ ω(ibn ) ut+i ⊗ r ut + r ut+i ⊗ κr ut − κr ut+i ⊗ κr ut . k − m + 1 t=m r 0<|i|≤[a/bn ] 43 Again, this double sum can be split into six parts and, as in the above lemma, it will be sufficient to show for instance that r [a/bn ] k X X bn κ −ν/4 Var ω(ibn ) . r ut+i ⊗ r ut ≤ Kκ k − m + 1 i=1 t=m (35) Let κ ǫt = ǫt − ǫκt . Recall that E κ ǫt = Eǫt = Eǫκt = 0. Note that the ℓ-th components of κr ut+i and r ut are κ r ut+i (ℓ) = r X ck1 ,ℓ κ ǫt+i ǫt+i−k1 + ǫκt+i κ ǫt+i−k1 k1 =1 and r ut (ℓ) = r X ck2 ,ℓ ǫt ǫt−k2 . k2 =1 It is clear that in Lemma 3, some or all of the ǫt ’s can be replaced by the truncated variables κ ǫt or ǫκt . Thus the variance in (35) is well defined. In addition, by κ k ǫt k4 ≤ ǫt 1{|ǫt |>κ} 4 +Eǫt 1{|ǫt |≤κ} ≤ 1 E |ǫt |4+ν κν and by the Hölder inequality we get 1/4 K +Eǫt 1{|ǫt |>κ} ≤ ν/4 κ |Cov (κ ǫ1+i ǫ1+i−k1 ǫ1 ǫ1−k2 , κ ǫ1+i ǫ1+i−k3 ǫ1 ǫ1−k4 )| ≤ kκ ǫt k24 kǫt k64 ≤ K . κν/2 The same inequality holds when the indexes are permuted and/or when the ǫt ’s are replaced by the ǫκt ’s. Therefore we have, for i, j > 0 0 when h 6= 0, κ κ Cov { r u1+i ⊗ r u1 , r u1+i+h ⊗ r u1+h } = O(κ−ν/2 ) when h = 0, +∞ X h=−∞ kCov { κr u1+i ⊗ r u1 , κr u1+j+h ⊗ r u1+h }k ≤ Kρi ρj κ−ν/2 when i 6= j and the conclusion follows as in Lemma 10. 44 2 2 Lemma 12 For any λ ∈ R(p+q) , λ 6= 0 p bn λ′ vec u,κ 2 −1/4 I ) r n,(m,k) = O(κ bn 4 uniformly in m and k. as κ → ∞ Proof. The variable in the left-hand side can be written as 1 k−m+1 where Ut = p X bn ω(ibn )λ′ κ r ut+i 0<i≤[a/bn ] Pk t=m Ut ⊗ r uκt + r uκt ⊗ r uκt+i . −1/2 It is clear that kr uκt k ≤ Kκ2 . Hence |Ut | ≤ Kκ4 bn and EUt4 ≤ Kκ8 EUt2 b−1 n . By arguments used in the proof of Lemma 4, Cov κ r ut ⊗ r uκt+i , r uκt ⊗ r uκt+j = 0 for i, j > 0 and i 6= j. Therefore VarUt = bn X ω 2 (ibn )λ′ Var κ r ut+i 0<i≤[a/bn ] ⊗ r uκt + r uκt ⊗ r uκt+i = O(1) uniformly in κ and r. The conclusion follows. 2 Lemma 13 The following hold limr→∞ limn→∞ Var (a) (b) if kn bn → ∞ and kn n−1 → 0, (c) if, moreover, κn → ∞, u u nbn vec I n − r I n = 0; √ u nbn vec r I n − r Sn = oP (1); √ nbn vec ( r Sn − r Snκn ) = oP (1); √ Proof. Part (a) is a direct consequence of Lemma 10. 45 Next we turn to (b). Observe that, in view of (21), p u nbn vec r I n − r Sn (ℓ+1)kn pn −2 √ X bn X X √ = ω(ibn ) r ut+i ⊗ r ut + r ut ⊗ r ut+i n ℓ=0 0<i≤[a/bn ] t=(ℓ+1)kn −mn +1 √ n X bn X +√ ω(ibn ) r ut+i ⊗ r ut + r ut ⊗ r ut+i n t=p k −m +1 0<i≤[a/bn ] n n n is a sum of pn − 1 independent random matrices (for n large enough so that kn − mn > mn ). Now, by the arguments of the proofs of Lemmas 4 and 5, we have for i, j > 0 0 when t + i 6= s + j, Cov ( r ut ⊗ r ut+i , r us ⊗ r us+j ) = O(ρi ρj ) when t + i = s + j. Therefore m √b X X n √ ω(ibn ) Var r ut+i ⊗ r ut + r ut ⊗ r ut+i n t=k 0<i≤[a/bn ] bn (m − k + 1) m−k+1 = O =O . n n Then nbn Var vec u rI n Hence, since E − r Sn vec u rI n (pn − 1)mn n − pn kn + mn +O = o(1). n n = 0, (b) is proved. =O − r Sn For part (c), we note that √ nbn vec ( r Sn − r Snκn ) is a sum of pn i.i.d. variables whose common variance is np kn − mn Var (kn − mn )bn vec n 46 u u,κn r I n,(1,kn −mn ) − r I n,(1,kn −mn ) o . Thus, in view of Lemma 11 Var np nbn vec ( r Sn − κn r Sn ) o =O pn (kn − mn ) ν/2 nκn =O 1 ν/2 κn = o(1) (36) when n → ∞. This establishes (c) and completes the proof of Lemma 13. 2 Lemma 14 p nbn vech u d rI n ; 4 N 0, 2σ ̟ √ 2 + Dp+q ( rJ u ⊗ +′ r J)Dp+q . nbn vech r I n are centered. By a trivial exten√ √ u sion of part iv) of the proof of Lemma 9, nbn vech r I n and nbn vech (r Inu − E r Inu ) Proof. The random matrices have the same asymptotic distribution. Is is easy to see that Lemmas 4 and 5 still hold when (ut ) and J are replaced by (r ut ) and r J. Therefore, by the proof of Lemma 6, lim Var n→∞ np u nbn vech r I n o ′ + + = 2σ 4 ̟ 2Dp+q ( r J ⊗ r J)Dp+q . u By virtue of Lemma 13 (b) and (c), vech r I n , vech r Sn and vech r Snκn have the same asymptotic distribution for appropriately chosen sequences (kn ) and (κn ). To establish the asymptotic normality of √ nbn vech r Snκn , we will use the Cramér-Wold device. Therefore we will show the asymptotic normality of p nbn λ′ vec r Snκn pn −1 pn −1 X kn − mn p X u,κn ′ √ = bn λ vec r I n,(ℓkn +1,ℓkn +kn −mn ) := Xnℓ , n ℓ=0 ℓ=0 Xn := 2 for any non trivial λ ∈ R(p+q) . 47 Observe that Xn is a sum of pn i.i.d. centered variables with common variance vnκn √b n √ = Var n X ω(ibn ) knX −mn κn r ut+i λ′ t=1 0<i≤[a/bn ] ⊗ r uκt n + r uκt n κn ⊗ r ut+i . By (36), vnκn is asymptotically equivalent, when κn → ∞, to knX −mn √b X n ′ √ vn = Var ω(ibn ) λ ( r ut+i ⊗ r ut + r ut ⊗ r ut+i ) . n t=1 0<i≤[a/bn ] The arguments used to prove Lemmas 4 and 5 show that vn = O kn −mn n Next, we will verify the Lindeberg condition. For any ε > 0, pn −1 X ℓ=0 1 pn vnκn Z n √ κn |Xnℓ |≥ε pn vn 2 o Xnℓ dP 1 = κn vn Z n √ κn o |Xn1 |≥ε pn vn . 2 Xn1 dP 4 EXn1 . ε2 pn (vnκn )2 4 8 = O knn2κbnn . Therefore ≤ 4 Now Lemma 12 implies that EXn1 pn −1 X ℓ=0 1 pn vnκn Z n √ κn |Xnℓ |≥ε pn vn 2 o Xnℓ dP =O kn3 κ8n nbn To fulfill the Lindeberg condition, it therefore suffices that . 3 κ8 kn n nbn → 0. From Lemma 13 (c), it is also required that κn → ∞. Therefore, in view Lemma 13 (b), it remains to show that we can find kn such that This is obvious because 3 kn nbn = (kn bn nb4n )3 3 kn nbn → 0 and kn bn → ∞. and it is supposed that nb4n → ∞. 2 Proof of Theorem 2. It comes from Lemma 9, Lemma 14, Lemma 13 (a), and a standard argument (see e.g. Billingsley (1995), Theorem 25.5). 48 2 7.3 Lemmas and proofs for Theorem 3 In all subsequent lemmas, the assumptions of Theorem 2 are supposed to be satisfied. Lemma 15 Under the assumptions of Theorem 3, p ˆ nbn vec J − J = oP (1). Proof. By arguments already employed, we have o n√ 1 X ∂ǫ1 ∂ǫ1 ∂ǫ1+h ∂ǫ1+h ˆ n vec J − J = Var (n − |h|)Cov ⊗ , ⊗ + o(1) n ∂θ ∂θ ∂θ ∂θ |h|<n = O(1). Since bn → 0, the conclusion follows. 2 Proof of Theorem 3. From Lemma 15 and straightforward algebra, we have n o p p (2) (1) −1 ˆ −1 nbn vech Σ̂ − Σ̂ = nbn vech J I −I J + oP (1) p + = Dp+q J −1 ⊗ J −1 Dp+q vech nbn Iˆ − I + oP (1). The first convergence follows from Theorem 2, by application of the relation + Dp+q Dp+q (J ⊗ J)Dp+q = (J ⊗ J)Dp+q . (see Magnus and Neudecker, 1988, Theorem 3.13). The second convergence is deduced, as in Theorem 3. 2 Proof of Theorem 8. Under the assumptions of Theorem 8, Francq and Zakoïan (2000, Theorems 2 and 3) have shown that Iˆ and Jˆ are weakly consistent estimators of I and J. We deduce that Σ̂(1) , Σ̂(2) and Λ̂−1 are weakly 49 consistent estimators of Σ(1) , Σ(2) and Λ−1 . Therefore Υn = nbn (c + oP (1)) , ′ with c = vech Σ(2) − Σ(1) Λ−1 vech Σ(2) − Σ(1) . Since Σ(1) 6= Σ(2) and Λ−1 is positive definite we have c > 0. Because nbn tends to infinity, we have lim P Υn > χ2(p+q)(p+q+1)/2 (1 − α) n→∞ = lim P nbn (c + oP (1)) > χ2(p+q)(p+q+1)/2 (1 − α) = 1, n→∞ for any α ∈ (0, 1). 2 Proof of Lemma 1. Let κ1 and κ2 be constants such that 0 < κ1 < κ2 < 1, κ < κ1 , κ1 + κ < κ2 (1 + r ∗ )(2 + ν ∗ ) ν∗ κ2 > 1 − κ , κ < . 1 2 ν∗ 2(1 + ν ∗ ) (37) Figure 3 shows that these inequalities are compatible. Define sequences of integers (kn ), (mn ), (qn ) and (pn ) by κ2 kn = [n ], κ1 mn = [n ], qn = kn − mn , n . pn = kn Note that qn ∼ kn ∼ nκ2 , mn ∼ nκ1 , pn ∼ n1−κ2 as n → ∞. Employing a standard technique, we split the sum Sn = (38) Pn t=1 xn,t into pn alternate "big" blocks of length qn and "small" blocks of length mn . More precisely, write Sn = Sn′ + Sn′′ , where pn −1 ′ Sn = X ξn,ℓ ξn,ℓ = xn,ℓkn+1 + · · · + xn,ℓkn +qn X ζn,ℓ ζn,ℓ = xn,ℓkn +qn +1 + · · · + xn,(ℓ+1)kn for ℓ ≤ pn − 1 ℓ=0 pn ′′ Sn = ℓ=0 ζn,pn = xn,pn kn +1 + · · · + xn,n . 50 κ2 κ2 = κ1 + κ A B ν∗ 2(1+ν ∗ ) C κ1 κ κ2 = 1 − (1+r ∗ )(2+ν ∗ ) ν∗ κ1 ν∗ ν∗ Inequalities (37). Let A = , = 2(1+r ∗ )(1+ν ∗ ) 2(1+ν ∗ ) , B ∗ (1−κ) ν ∗ (1−κ) ν∗ − κ, 2(1+ν , C = 2+2rν∗ +2ν . We have xB := ∗) ∗ +r ∗ ν ∗ , κ + 2+2r ∗ +2ν ∗ +r ∗ ν ∗ Figure 3: ν∗ 2(1+ν ∗ ) ν∗ 2(1+ν ∗ ) − κ > κ iff κ < ν∗ 4(1+ν ∗ ) , and yC := κ + ν ∗ (1−κ) 2+2r ∗ +2ν ∗ +r ∗ ν ∗ < ν∗ 2(1+ν ∗ ) iff r∗ > 2κ(1+ν ∗ ) ν ∗ −2κ(1+ν ∗ ) . To prove the CLT it suffices to show that for n → ∞ ′′ i) n−1 ESn 2 → 0, Q n −1 ′ ii) E exp(itn−1/2 Sn ) ∼ pℓ=0 E exp(itn−1/2 ξn,ℓ ) with i2 = −1, P n −1 2 iii) pn → ∞ and n−1 pℓ=0 Eξn,ℓ I{|ξn,ℓ |≥n1/2 ǫ} → 0 for every ǫ > 0. Indeed, i) implies that n−1/2 Sn and n−1/2 Sn′ have the same asymptotic distribution, ii) implies that the characteristic function of n−1/2 Sn′ is asymptotPpn −1 ′ ′ ically equivalent to that of n−1/2 ℓ=0 ξn,ℓ where the ξn,ℓ are independent, and distributed like ξn,ℓ , and iii) is simply the Lindeberg condition ensuring the central limit theorem for independent, but not necessarily identically distributed, random variables. 51 Using the Davydov (1968) inequality, there exists an universal constant K0 (from Rio (1993), one can take K0 = 4), such that for ν ∗ < ∞, n n X X t=1 s=t+q+1 |Exn,t xn,s | ≤ K0 n n X X ν∗ kxn,t k2+ν ∗ kxn,s k2+ν ∗ {αn (s − t)} 2+ν ∗ t=1 s=t+q+1 X ≤ K0 n sup kxn,t k22+ν ∗ t k>q ν∗ {αn (k)} 2+ν ∗ and E t0 +m−1 X xn,t t=t0 !2 ≤ K0 sup kxn,t k22+ν ∗ t m−1 X k=0 ν∗ (m − |k|) {αn (k)} 2+ν ∗ . Using inequality (1.4) in Ibragimov (1962), the same inequalities hold when ν ∗ = ∞ (with 2 + ν ∗ = ∞ and ν ∗ /(2 + ν ∗ ) = 1). Thus pn −1 ′′ 2 ESn ≤ X 2 Eζn,ℓ + 2 Eζn,p n n X t=1 s=t+qn +1 ℓ=0 ≤ Kpn mn +2 n X m n −1 X k=0 ν∗ {αn (k)} 2+ν ∗ + Kkn ≤ Kpn mn (Tn + 1) + Kkn2 + Kn |Exn,t xn,s | kX n −1 k=0 X ν∗ {αn (k)} 2+ν ∗ + Kn k>qn −Tn {α(k)} ν∗ 2+ν ∗ X k>qn ν∗ {αn (k)} 2+ν ∗ for some positive constant K. Hence i) holds since, in view of (37), pn mn (Tn + 1)/n ∼ n−κ2 +κ1 +κ → 0, kn2 /n ∼ n2κ2 −1 → 0 and qn − Tn ∼ nκ2 → ∞. Using the Ibragimov inequality (1962) and the fact that un = O n−r P ∗ when un ↓ 0 and n nr un < ∞, we obtain p −1 pn −1 n Y Y −1/2 −1/2 exp(itn ξn,ℓ ) − E exp(itn ξn,ℓ ) E ℓ=0 ≤ 4pn αn (mn ) ≤ 4pn ∗ −1 ℓ=0 1 (mn − Tn ) (1+r ∗ )(2+ν ∗ ) ν∗ 52 ∼ n1−κ2 − (1+r ∗ )(2+ν ∗ ) κ1 ν∗ →0 which establishes ii). When ν ∗ < ∞, by the Hölder and Markov inequalities, we have pn −1 −1 n X ℓ=0 −ν ∗ 2 2+ν ∗ Eξn,ℓ I{|ξn,ℓ |≥n1/2 ǫ} ≤ n−1 pn n1/2 ǫ Eξn,ℓ −ν ∗ ∗ ∗ (1+ν ∗ )κ2 −ν ∗ /2 ≤ n−1 pn n1/2 ǫ (qn )2+ν sup kxn,t k2+ν → 0, 2+ν ∗ ∼ n t which establishes iii) in the case ν ∗ < ∞. Finally, when M := supn≥1 sup1≤t≤n kxn,t k∞ < ∞, we have kξn,ℓ k∞ ≤ Mqn < √ nǫ, and the sum in iii) equals 0, for sufficiently large n. The proof is complete. 2 Lemma 16 Let (ut ) be any stationary centered process verifying A5′′ . Then n X ⊗4 E ut ⊗ ut+i1 ⊗ ut+i2 ⊗ ut+i3 ⊗ ut+i4 = O(n2 ). i1 ,i2 ,i3 ,i4 =1 Proof. We will only consider the first component of the vector inside the norm. Let ut the first component of ut . The first component of the sum is P P P bounded by 4! 4k=1 (k) |Eu4t ut+i1 ut+i2 ut+i3 ut+i4 |, where (k) denotes the sum over the indices such that i1 ≤ i2 ≤ i3 ≤ i4 and ik −ik−1 = max1≤r≤4 ir − ir−1 (with i0 = 0). Using the Davydov (1968) inequality, for i1 ≤ i2 ≤ i3 ≤ i4 ν Cov u4 ut+i1 ut+i2 ut+i3 , ut+i4 ≤ K0 ku4 ut+i1 ut+i2 ut+i3 k 8+ν kut+i4 k8+ν αu (|i4 − i3 |) 8+ν t t 7 ≤ K0 kut k88+ν αu (|i4 53 ν − i3 |) 8+ν . Therefore, we have X X Eu4 ut+i1 ut+i2 ut+i3 ut+i4 = Cov u4 ut+i1 ut+i2 ut+i3 , ut+i4 t t (4) ≤ n n X h=0 (4) ν (h + 1)2 K0 kut k88+ν αu (h) 8+ν = O(n). (39) The last inequality has been obtained by setting h = i4 − i3 , by noting that there exist at most n values for the subscript i4 , and that, when h and i4 are fixed, it remains at most h + 1 possibilities for i1 and for i2 . P The term involving (3) is bounded by X X Cov u4 ut+i1 ut+i2 , ut+i3 ut+i4 + Eu4 ut+i1 ut+i2 Eut+i3 ut+i4 . t t (3) (3) By the arguments used to show (39), it can be shown that the first sum is O(n). The second sum is bounded by )2 ( h n X X ν = O(n2 ), n αu (ℓ) 8+ν h=0 ℓ=0 setting h = i3 − i2 , arguing that there exist at most n possibilities for i2 , and 2+ν that we have |Eu4t ut+i1 ut+i2 | ≤ K0 kut k68+ν αu (ℓ) 8+ν with ℓ = i2 − i1 ≤ h, and 6+ν |Eut+i3 ut+i4 | ≤ K0 kut k28+ν αu (ℓ) 8+ν with ℓ = i4 − i3 ≤ h. Similarly, we have P 4 2 (2) |Eut ut+i1 ut+i2 ut+i3 ut+i4 | = O(n ). P The last term, (1) , is bounded by X X 4 Cov u4 , ut+i1 ut+i2 ut+i3 ut+i4 + Eu Eut+i1 ut+i2 ut+i3 ut+i4 t t (1) ≤ n X h=0 (1) ν K0 kut k88+ν (h + 1)3 αu (h) 8+ν + Eu4t ≤ O(n) + Eu4t (k) n X 3 X X h=0 k=1 n X h=0 0≤i1 ≤i2 ≤i3 ≤i4 ≤h |Eu0 ui2 −i1 ui3 −i1 ui4 −i1 | , 54 X |Eut+i1 ut+i2 ut+i3 ut+i4 | (40) P(k) where denotes the sum over the indices such that 0 ≤ i1 ≤ i2 ≤ i3 ≤ i4 ≤ h and ik+1 − ik = max1≤r≤3 ir+1 − ir . We have (3) X |Eu0 ui2 −i1 ui3 −i1 ui4 −i1 | = ≤ The same bound holds when equal to (3) X h X ℓ=0 P(3) |Cov (u0ui2 −i1 ui3 −i1 , ui4 −i1 )| 4+ν (ℓ + 1)2 K0 kut k48+ν αu (ℓ) 8+ν = O(1). is replaced by (2) X P(1) . The term P(2) is (2) X |Cov (u0 ui2 −i1 , ui3 −i1 ui4 −i1 )| + |Eu0 ui2 −i1 Eui3 −i1 ui4 −i1 | )2 ( ℓ h X X 6+ν = O(h). ≤ O(1) + αu (ℓ) 8+ν ℓ=0 ℓ′ =0 It follows that the right-hand side of the inequality (40) is O(n2 ), which completes the proof. 2 Proof of Theorem 5. It can be shown by adapting the proof of i)-iii) in Lemma 9, that p p ˆ nbn vech I − I = nbn vech (Inu − I) + oP (1). Write p where a1,n = nbn vech (Inu − I) := a1,n + a2,n p nbn vech (Inu − EInu ) , a2,n = p nbn vech (EInu − I) . We will show that (i) a1,n converges to the normal distribution of the theorem, 55 (ii) a2,n = oP (1). For any λ ∈ R(p+q)(p+q+1)/2 , λ 6= 0, we have λ′ a1,n = n−1/2 xn,t = yn,t − Eyn,t , yn,t = p bn X |i|≤[a/bn ] Pn t=1 xn,t where + ω(ibn )λ′ Dp+q ut+i ⊗ ut . (41) To show (i) we will use Lemma 1. We begin to show that supn kxt,n k4 < ∞. We have, by the Davydov (1968) inequality kEyn,t k ≤ K p bn X |i|≤[a/bn ] 6+ν ω(ibn )kut k28+ν {αu (i)} 8+ν = o(1). Moreover, since ω is bounded, kyn,tk44 ≤ Kb2n X X Eut (j1 ) . . . ut (j4 )ut+i1 (j5 ) . . . ut+i4 (j8 ) j1 ,...,j8 |i1 |,...,|i4 |≤[a/bn ] = O(1) where the last equality follows from Lemma 16. Hence kxt,n k4 ≤ kyn,t k4 + kEyn,t k = O(1), and (12) holds with ν ∗ = 2. From Lemma 1 in Andrews (1991), Assumption A5” implies that, for all ℓ1 , . . . , ℓ4 +∞ X j1 ,j2 ,j3 =−∞ where the u κ ℓ1 ,...,ℓ4 (0, j1 , j2 , j3 ) < ∞ κuℓ1 ,...,ℓ4 (0, j1 , j2 , j3 )’s (42) denote the fourth order cumulants of (ut (ℓ1 ), ut+j1 (ℓ2 ), ut+j2 (ℓ3 ), ut+j3 (ℓ4 )). Hence, by Proposition 1 (a) of Andrews P + +′ (1991), n−1 Var ( nt=1 xt,n ) converges to 2̟ 2 λ′ Dp+q (I ⊗ I)Dp+q λ > 0. Thus (13) holds. 56 Note that xn,t is a measurable function of the ut−[a/bn ] , . . . , ut+[a/bn ] . Note that, by an argument already used in the proof of Lemma 1, A5” implies ν {αu (k)} 8+ν = O(k −(r+1) ), from which we deduce ∞ X k=1 k {αu (k)}1/2 = O ∞ X k (r+1)(8+ν) 1− 2ν k=1 ! = O(1). Thus (15) holds with Tn = 2[a/bn ], α(·) = αu (·), r ∗ = 1 and ν ∗ = 2. Clearly, 1/κ lim inf nbn > 0 implies Tn = O(nκ ). Therefore (14) holds. Finally, Lemma 1 entails (i). To show (ii), write a2,n = p nbn X |i|≤[a/bn ] {ω(ibn ) − 1} vech∆i − p nbn X vech∆i . |i|>[a/bn ] Since 6+ν k∆i k ≤ Kkut k28+ν {αu (i)} 8+ν and {αu (i)} is a decreasing sequence, Assumption A5” implies that k∆i k = O(i−(r+1)(1+6/ν) ). Therefore p nbn X |i|>[a/bn ] p p 0 +1 kvech∆i k = O( nb2r+1 ) = O( nb2r ) = o(1). n n Now X p nbn {ω(ibn ) − 1} vech∆i |i|≤[a/bn ] X p ω(ibn ) − 1 r0 0 +1 ≤ nb2r i kvech∆i k = o(1) n (ibn )r0 |i|≤[a/bn ] 0 +1 in view of the Lebesgue theorem, kI (r0 ) k < ∞, lim nb2r = 0 and since the n function (ω(x) − 1)x−r0 is bounded. Hence (ii) is shown and the proof is complete. 57 8 Conclusion In this paper we have proposed a test of strong linearity in the framework of weak ARMA models. We have derived the asymptotic distribution of the test statistic under the null hypothesis and we have shown the consistency of the test. The usefulness of this test is the following. When the null assumption is not rejected, there is no evidence against standard strong ARMA models. In this case, there is no reason to think that the ARMA predictions are not optimal in the least squares sense. When the null assumption is rejected, two different strategies can be considered. A weak ARMA model can be fitted, following the lines of Francq and Zakoïan (1998, 2000), and used for optimal linear prediction. Another approach is to fit a nonlinear model to provide the (nonlinear) optimal prediction, though it can be a difficult task to determine the most appropriate nonlinear model. Finally, we believe that the asymptotic distribution of the HAC estimator established in this paper has its own interest, apart from the proposed test. Other assumptions on the ACM could be tested, such as noninvertibility which may indicate some misspecification of the model. FOOTNOTES 1. For ease of presentation we have not included a constant in the ARMA model. This can be done without altering the asymptotic behaviours of the estimators and test statistics introduced in the paper. The subsequent analysis apply to data that have been adjusted by subtraction of the mean. 58 2. When the strong mixing coefficients decrease at an exponential rate (which is the case for a large class of processes) ν can be chosen arbitrarily small in A5 or A5′ . Thus Eǫ4+2ν < ∞ is a mild assumption. t 3. Indeed, we have X i ˆ i (θ̂n ) = n−1 ∆ ( X t ∂ et (θ̂n ) et (θ̂n ) ∂θ )2 = 0. 4. Recent papers by Kiefer and Vogelsang (2002a, 2002b) have suggested the use of kernels with bandwidth equal to the sample size, i.e. bn = 1/n in our notations. It is well known that this choice of bandwidth results in inconsistent long-run variance estimators. However, Kiefer and Vogelsang have shown that, because the long-run variance matrix plays the role of a nuisance parameter, asymptotically valid tests statistics (nuisance parameter free) can be constructed based on such inconsistent estimators. In our framework it is of course crucial to have a consistent estimator of the matrix I. 5. Dm transforms vech(A) into vec(A), for any symmetric m × m matrix A. 6. As noted by Andrews (1991), for the rectangular kernel, ω(r) = 0 for all r ≥ 0. For the Bartlett kernel, ω(1) = 1 and ω(r) = ∞ for all r > 1. For the Parzen and Tukey-Hanning kernels, ω(2) = 6 and π 2 /4, respectively, ω(r) = 0 for r < 2, ω(r) = ∞ for r > 2. 59 References Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858. Berlinet, A. & C. Francq (1999) Estimation des covariances entre autocovariances empiriques de processus multivariés non linéaires. La revue Canadienne de Statistique 27, 1–22. Billingsley, P. (1995) Probability and Measure. Wiley, New-York. Brockwell, P.J. & R.A. Davis (1991) Time Series: Theory and Methods. Springer-Verlag, New-York. de Jong, R.M. & J. Davidson (2000) Consistency of Kernel Estimators of Heteroskedastic and Autocorrelated Covariance Matrices. Econometrica 68, 407–424. Carrasco, M. & X. Chen (2002) Mixing and Moment Properties of Various GARCH and Stochastic Volatility Models. Econometric Theory 18, 17–39. Davydov, Y.A. (1968) Convergence of Distributions Generated by Stationary Stochastic Processes. Theory of Probability and Its Applications 13, 691–696. Francq, C. & J.M. Zakoïan (1998) Estimating linear representations of nonlinear processes. Journal of Statistical Planning and Inference 68, 145– 165. 60 Francq, C. & J.M. Zakoïan (2000) Covariance matrix estimation for estimators of mixing weak ARMA models. Journal of Statistical Planning and Inference 83, 369–394. Francq, C., Roy, R. & J.M. Zakoïan (2004) Goodness-of-fit Tests for ARMA Models with Uncorrelated Errors. Technical report CRM-2925, Centre de recherches mathématiques, université de Montréal. Gallant, A.R. & H. White (1988) A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. New York: Basil Blackwell. Hansen, B.E. (1992) Consistent covariance matrix estimation for dependent heterogeneous processes. Econometrica 60, 967–972. Hansen, L.P. (1982) Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 50, 1029–1054. Harville, D.A. (1997) Matrix Algebra From a Statistician’s Perspective. SpringerVerlag, New-York. Herrndorf, N. (1984) A Functional Central Limit Theorem for Weakly Dependent Sequences of Random Variables The Annals of Probability 12, 141–153. Hong, Y. (1996) Consistent Testing for Serial Correlation of Unknown Form. Econometrica 64, 837–864. Hong, Y. (1999) Testing for Serial Independence via the Empirical Characteristic Function, Working Paper, Department of Economics, Cornell University. 61 Ibragimov, I.A. (1962) Some Limit Theorems for Stationary Processes. Theory Probability and its Applications 7, 349–382. Kiefer, N.M. & T.J. Vogelsang (2002a) Heteroskedasticity-autocorrelation Robust Testing Using Bandwidth equal to Sample Size. Econometric Theory 18, 1350–1366. Kiefer, N.M. & T.J. Vogelsang (2002b) Heteroskedasticity-autocorrelation Robust Standard Errors Using the Bartlett Kernel without Truncation. Econometrica 70, 2093–2095. Lobato, I.N. (2002) Testing for Zero Autocorrelation in the Presence of Statistical Dependence. Econometric Theory 18, 730–743. Magnus, J.R. & H. Neudecker (1988) Matrix Differential Calculus with Application in Statistics and Econometrics. New-York, Wiley. May, R.M. (1976) Simple mathematical models with very complicated dynamics. Nature 261, 459–467. Newey, W.K. & K.D. West (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Parzen, E. (1957) On Consistent Estimates of the Spectrum of a Stationary Time Series. Annals of Mathematical Statistics 28, 329–348. Pham, D.T. (1986) The Mixing Property of Bilinear and Generalized Random Coefficient Autoregressive Models. Stochastic Processes and their Applications 23, 291–300. 62 Phillips, P.C.B (1987) Time series regression with a unit root. Econometrica 55, 277-301. Phillips, P.C.B., Sun, Y. & S. Jin (2003) Consistent HAC Estimation and Robust Regression testing Using Sharp Origin Kernels with No Truncation. Discussion Paper, Yale University. Priestley, M.B. (1981) Spectral Analysis and Time Series. Vols. 1 and 2, Academic press, New York. Romano, J.P. & L.A. Thombs (1996) Inference for autocorrelations under weak assumptions. Journal of the American Statistical Association 91, 590–600. Van Der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge University Press, Cambridge. Withers, C.S. (1981) Central Limit Theorems for Dependent Variables. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandete Gebiete 57, 509–534 63
© Copyright 2026 Paperzz