HAC Estimation and Strong Linearity Testing in Weak ARMA

HAC Estimation and Strong Linearity Testing
in Weak ARMA Models
Christian Francq
Université Lille III, GREMARS
E-mail: [email protected]
Jean-Michel Zakoïan
Université Lille III, GREMARS
and CREST
E-mail: [email protected]
1
Running head: HAC Estimation in Weak ARMA Models.
Corresponding Author: Jean-Michel Zakoïan
CREST, MK1, 3 Avenue Pierre Larousse, 92245 Malakoff Cedex France
Email :[email protected]
Phone number: 33(0)141176840
Fax: 33(0)141176840.
2
Abstract
The paper develops a procedure for testing the null hypothesis that the errors of an
ARMA model are independent and identically distributed against the alternative
that they are uncorrelated but not independent. The test statistic is based on the
difference between a conventional estimator of the asymptotic covariance matrix
of the least-squares estimator of the ARMA coefficients and its robust HAC-type
version. The asymptotic distribution of the HAC estimator is established under
the null hypothesis of independence, and under a large class of alternatives. The
asymptotic distribution of the proposed statistic is shown to be a standard chisquare under the null, and a noncentral chi-square under the alternatives. The
choice of the HAC estimator is discussed through a local power analysis. An automatic bandwidth selection method is also considered. The finite sample properties
of the test are analyzed via Monte Carlo simulation.
3
1
Introduction
The standard statistical inference of time series models relies on methods for fitting ARMA models by model selection and estimation followed by model criticism through significance tests and diagnostic checks on the adequacy of the fitted
model. The large sample distributions of the statistics involved in the so-called
Box-Jenkins methodology have been established under the assumption of independent white noise. Departure from this assumption can severely alter the large
sample distributions of standard statistics, such as the empirical autocovariances
(Romano and Thombs (1996) and Berlinet and Francq (1999)), the estimators of
the ARMA parameters (Francq and Zakoïan (1998)) or the portmanteau statistics
(Lobato (2002), Francq, Roy and Zakoïan (2004)).
Unfortunately, ARMA models endowed with an independent white noise sequence, referred to as strong ARMA models, are often found to be unrealistic in
economic applications. On the other hand, weak ARMA models (i.e. in which the
noise is not required to be independent nor to be a martingale difference) have
found increasing interest in the recent statistical and econometric literatures. As
we will see, relaxing the strong assumptions on the noise allows for a great generality. Indeed, a large variety of well-known nonlinear processes admit (weak) ARMA
representations. Other examples of weak ARMA processes are obtained from usual
transformations (such as aggregation) of strong ARMA processes.
It is therefore crucial, at the model criticism stage, to be able to detect departure from the key assumption that the white noise disturbances are independent.
Apart from the technical reasons already mentioned, one important motivation for
testing this assumption is to find out whether or not the estimated model captures
the whole dynamics of the series. Only optimal linear predictions can be deduced
from non-strong ARMA models. Thus, when the strong ARMA assumption is
4
rejected, improvements can be expected from using nonlinear models.
In this paper we propose a test for white noise independence in ARMA models.
An informal presentation of the test is as follows, precise notations and assumptions being introduced in the next section. Let θ denote the vector of the ARMA
coefficients, and let θ̂n denote the standard Least-Squares Estimator (LSE) of θ for
√
a sample of size n. Then, under appropriate conditions, the n-difference between
θ̂n and the true parameter value converges in distribution to a N (0, Σ(1) ) when the
ARMA is strong. When the noise is only uncorrelated, the asymptotic distribution
turns out to be of the form N (0, Σ(2) ). The aim of the paper is to test
H0 : Σ(1) = Σ(2) ,
against
H1 : Σ(1) 6= Σ(2) .
The Asymptotic Covariance Matrices (ACM) Σ(1) and Σ(2) coincide in case of a
strong ARMA process, but they will be different, in general, in the weak situation.
It should be noted that H0 is not equivalent to the noise independence. Hypothesis H0 is the consequence of independence that matters as far as the asymptotic
precision of the LS estimator is concerned.
The test proposed in this paper is built on the difference between consistent
estimators of the two ACM. Scaled in an appropriate way, this difference will have
two different behaviors: it will converge to a non degenerate distribution if the
ARMA process is strong and to infinity otherwise.
The matrix Σ(2) is function of a matrix of the form limn→∞
1
n
Pn
s,t=1 Cov(Vt , Vs ),
which is also 2π times the spectral density matrix of the multivariate process (Vt )
evaluated at frequency zero. Many papers in the statistical and econometric literatures deal with estimating such ’long-run variance’ matrices. Examples include
the estimation of the optimal weighting matrix for General Method of Moments
estimators (Hansen, 1982), the estimation of the covariance matrix of the error
term in unit root tests (Phillips, 1987), the estimation of the asymptotic variance
5
of sample autocovariances of nonlinear processes (Berlinet and Francq, 1999); other
important econometric contributions include Newey and West (1987), Gallant and
White (1988), Andrews (1991), Hansen (1992), de Jong and Davidson (2000).
However, the asymptotic distribution of the ACM estimators is seldom considered in the literature dealing with HAC estimation. See Phillips, Sun and Jin
(2003, Theorem 2) for the asymptotic distribution for a HAC estimator in the
framework of robust regression. The main difficulty of the present paper is to
derive the asymptotic distributions of the HAC estimator of Σ(2) , under both assumptions of weak and strong ARMA. Those distributions are needed to construct
the asymptotic critical region of our test and to derive its local asymptotic power.
The order of the paper is as follows. Section 2 introduces the notion of weak
ARMA representations. Examples treated explicitly are the first component of a
strong bivariate MA(1) model and a chaotic process. Section 3 presents notations
and briefly overviews results concerning the asymptotic behaviour of the LSE in the
weak ARMA framework. Section 4 establishes the asymptotic distribution of the
above-mentioned HAC estimator. Section 5 introduces the test statistic and derives
its asymptotic properties under the null of strong ARMA, and under the alternative
of weak ARMA model. In Section 6, the choice of the HAC estimator is discussed
through a local power analysis and an automatic bandwidth selection method. The
finite sample performance of the tests is studied. Proofs and additional notations
are in Section 7. Section 8 concludes.
2
Weak ARMA models
It is not very surprising that number of non linear processes admit an ARMA
representation. Indeed, the Wold Theorem states that any purely non deterministic, second-order stationary process (Xt ) can be represented by an infinite MA
6
representation. When the infinite MA polynomial is obtained as the ratio of two
finite-order polynomials, the nonlinear process also admits a finite-order ARMA
representation. This representation is weak because the noise is only the linear
innovation of (Xt ) (otherwise (Xt ) would be a linear process). This distinction
has important consequences in terms of prediction. The predictors obtained from
a weak ARMA model are only optimal in the linear sense. They are particularly
useful if the nonlinear dynamics of (Xt ) is difficult to identify, which is often the
case given the variety of non linear models. It is also crucial to take into account
the differences between weak and strong ARMA models in the different steps of
the methodology of Box and Jenkins. When the noise is only uncorrelated, the
tools provided by the standard Box-Jenkins methodology can be quite misleading.
Recent papers cited in the introduction have been devoted to the correction of such
tools and have attempted to develop tests and methods allowing to work with a
broad class of ARMA models. It is, however, of importance to know, in practical
situations, when these new methods are required and when the classical ones are
reliable. Needless to say that the latter procedures are simpler to use, in particular
because they are widely implemented in all standard statistical softwares. It is
precisely the purpose of the present paper to develop a test of the reliability of
such procedures.
Examples of ARMA representations of bilinear processes, Markov switching
processes, threshold processes, processes obtained by temporal aggregation of a
strong ARMA, or by linear combination of independent ARMA processes can be
found in Romano and Thombs (1996), Francq and Zakoïan (1998, 2000). We will
give two new illustrations which have their own interest. To construct a simple
example of weak ARMA model obtained by transformation of a strong ARMA
7
process, let us consider the following bivariate MA(1) model

 
 


X1 t
η1 t
b11 b12
η1 t−1

=
+


X2 t
η2 t
b21 b22
η2 t−1
where {η1 t , η2 t }′t is a centered iid sequence with covariance matrix (ξij ). It is easy
to see that the first component of the strong bivariate MA(1) model satisfies a
univariate MA(1) model of the form X1 t = ǫt + θǫt−1 , where θ ∈ (−1, 1) is such
that
EX1 t X1 t−1
ξ11 b11 + ξ12 b12
θ
=
=
.
1 + θ2
EX12 t
ξ11 (1 + b211 ) + ξ22 b212 + 2ξ12 b11 b12
From ǫt + θǫt−1 = η1 t + b11 η1 t−1 + b12 η2 t−1 , we find
ǫt = (b11 − θ)(η1 t−1 − θη1 t−2 ) + b12 (η2 t−1 − θη2 t−2 ) + Rt ,
where Rt is centered and independent of X1 t−1 . Therefore
EX12 t−1 ǫt = (b11 − θ) (1 − θb211 )Eη13 t − θb212 Eη1 t η22 t − 2θb11 b12 Eη12 t η2 t
+b12 (1 − θb211 )Eη12 t η2 t − θb212 Eη23 t − 2θb11 b12 Eη1 t η22 t .
The random variable X12 t−1 belongs to σ{ǫs , s < t}. It can be seen that, in general,
EX12 t−1 ǫt = E[X12 t−1 E(ǫt |ǫt−1 , . . .)] 6= 0. Thus, (ǫt ) is not a martingale difference.
So the MA(1) for X1 t is only weak.
Now we will give an example based on a chaotic process (see May (1976)). Let
ǫt = ut + ηt ,
ut = 4ut−1 (1 − ut−1 ),
t≥1
(1)
where u0 has the arc-sinus density f (x) = π −1 {x(1 − x)}−1/2 on the interval [0, 1],
(ηt )t≥1 is an iid sequence, independent from u0 , with mean -1/2 and finite variance.
Since f is the invariant density of (ut ), this process is stationary. We have Eǫt = 0
and, since ut and 1 − ut have the same law,
Cov(ǫt , ǫt−1 ) = Cov(ut , ut−1 )
= Cov{4ut−1 (1 − ut−1 ), ut−1 } = Cov(ut , 1 − ut−1 ) = 0.
8
The same symmetry argument shows that Cov(ǫt , ǫt−h ) = 0 for all h 6= 0. Therefore (ǫt ) is a white noise. Consequently, given {ǫu , u ≤ t}, the best linear predictor
of ǫt+h is equal to zero, for any horizon h. However, in general, the best (nonlinear) predictor is quite different. For illustrative purpose, Figure 1 displays the
scatter plot of the pairs (ǫt , ǫt−2 ), for t = 1, . . . , 1, 000, obtained by simulation, and
the nonlinear regression obtained by tedious computation, in the case where ηt is
uniformly distributed over [−0.6, −0.4].
0.6
0.4
0.2
-0.6
-0.4
-0.2
0.2
0.4
0.6
-0.2
-0.4
-0.6
Figure 1: Scatter plot of 1,000 pairs (ǫt , ǫt−2 ), simulated from (1) with ηt uniformly
distributed over [−0.6, −0.4]. The full line is the theoretical (nonlinear) regression of ǫt
on ǫt−2 .
This example illustrates the, possibly dramatic, differences between linear and
nonlinear predictions of a given weak ARMA process.
9
3
Notations and preliminary asymptotic results
In this section we introduce the main notations and recall some results established
in Francq and Zakoïan (1998, 2000). Let X = (Xt )t∈Z be a real second-order
stationary ARMA(p, q) process such that, for all t ∈ ZZ,
Xt +
p
X
φi Xt−i = ǫt +
i=1
q
X
(2)
ψi ǫt−i .
i=1
We consider the estimation of parameter θ = (θ1 , . . . , θp+q )′ ∈ Rp+q with true
value θ0 = (φ1 , . . . , φp , ψ1 , . . . , ψq )′ . Let Φθ (z) = 1 + θ1 z + · · · + θp z p and Ψθ (z) =
1 + θp+1 z + · · · + θp+q z q be the AR and MA polynomials.1 For any δ > 0, let the
compact set
Θδ = {θ ∈ Rp+q ; the zeros of polynomials Φθ (z) and Ψθ (z) have moduli ≥ 1+δ}.
We make the following assumptions.
A1. ǫ = (ǫt ) is a strictly stationary sequence of uncorrelated random variables
with zero mean and variance σ 2 > 0, defined on some probability space (Ω,
A, P ).
A2. θ0 belongs to the interior of Θδ , and the polynomials Φθ0 (z) and Ψθ0 (z) have
no zero in common.
A3. φp and ψq are not both equal to zero (by convention φ0 = ψ0 = 1).
For all θ ∈ Θ, let ǫt (θ) = Ψ−1
θ (B)Φθ (B)Xt , where B denotes the backshift operator.
Given a realization X1 , X2 , . . . , Xn of X, the ǫt (θ) can be approximated, for 0 <
t ≤ n, by et (θ) = Ψ−1
θ (B)Φθ (B)(Xt 11≤t≤n ).
The random variable θ̂n is called Least Squares Estimator (LSE) if it satisfies,
almost surely,
n
1 X 2
Qn (θ̂n ) = min Qn (θ) where Qn (θ) =
et (θ).
θ∈Θδ
2n
t=1
10
(3)
Define the strong ARMA assumption:
A4. The process X is solution of Model (2) where the random variables ǫt are
independent and identically distributed (iid).
In the ARMA models statistical literature, most results are obtained under A4.
Less restrictive hypotheses rely on the α−mixing (strong mixing) coefficients {αǫ (h)}h≥0
for (ǫt ), or {αX (h)}h≥0 for (Xt ). Let us consider the assumptions:
A5. The process X is solution of Model (2) where
some ν > 0.
A5′ . The process X is solution of Model (2) where
some ν > 0.
P∞
h=0 {αǫ (h)}
P∞
h=0 {αX (h)}
ν
2+ν
ν
2+ν
< ∞,
for
< ∞,
for
It is well known that A5 and A5′ are not equivalent. Pham (1986), Carrasco and
Chen (2002) have shown that, for a wide class of processes, the mixing conditions
A5 and/or A5′ are satisfied. Let
∂2
Qn (θ) a.s.,
n→∞ ∂θ∂θ ′
√ ∂
I(θ) = lim Var( n Qn (θ)).
n→∞
∂θ
J(θ) = lim
The following theorem gives already-established results concerning the asymptotic
d
behaviour of the LSE. The symbol ; denotes convergence in distribution as the
sample size n goes to infinity.
Theorem 1 Assume that A1 − A3 hold.
If Eǫ2t < ∞ and A4 holds, then θ̂n is strongly consistent and
√
d
n(θ̂n − θ0 ) ; N (0, Σ(1) ),
where
Σ(1) = σ 2 J −1 (θ0 ).
(4)
If Eǫ4+2ν
< ∞ and either A4, A5 or A5′ holds, then θ̂n is strongly consistent
t
and
√
d
n(θ̂n − θ0 ) ; N (0, Σ(2) ),
where
11
Σ(2) = J −1 (θ0 )I(θ0 )J −1 (θ0 ).
(5)
Obviously the moment assumptions on ǫ could be replaced by the same assumption
on X. The convergence under A4 is standard (see e.g. Brockwell and Davis (1991)).
Francq and Zakoïan (1998) established (5) under A5′ . It can be shown that the
result remains valid under A5. Note that the finiteness of Eǫ4t is required for the
existence of Σ(2) .2
Remarks
(a) Straightforward computations show that
I(θ) =
+∞
X
where
∆i (θ),
∆i (θ) = Eut (θ)u′t+i (θ),
ut (θ) = ǫt (θ)
i=−∞
∂ǫt
(θ).
∂θ
(b) Under A4 the asymptotic variances Σ(1) and Σ(2) are clearly equal. However it may also be the case that Σ(1) = Σ(2) under A5 or A5′ . More precisely we
have
Σ(1) = Σ(2) ⇔ I(θ0 ) = σ 2 J (θ0 )
′
+∞
X
∂ǫt
∂ǫt
′
2
⇔
Eut (θ0 )ut+i (θ0 ) = σ E
(θ0 )
(θ0 )
∂θ
∂θ
i=−∞
where all quantities are taken at θ0 . In particular, the asymptotic variances are
∂ǫt ′
∂ǫt
2
t
equal when the process ǫt ∂ǫ
.
∂θ is a noise and ǫt is uncorrelated with ∂θ
∂θ
In the case of a martingale difference the former condition holds but, in general,
the latter condition does not. More precisely we have, when (ǫt ) is a martingale
difference
Σ
(1)
−Σ
(2)
=E
∂ǫt
ǫ2t
∂ǫt
∂θ ∂θ ′
−E
ǫ2t
E
∂ǫt
∂θ
∂ǫt
∂θ
′
For example if the model is an AR(1) with true value θ0 = 0, then
.
∂ǫt
∂θ
= Xt−1 =
ǫt−1 . Thus we have Σ(1) − Σ(2) = Cov(ǫ2t , ǫ2t−1 ), which is not equal to zero in presence of ARCH-type conditional heteroskedasticity. When (ǫt ) is not a martingale
t
difference, the sequence ǫt ∂ǫ
is not uncorrelated in general and the difference
∂θ
12
between the matrices Σ(1) and Σ(2) can be substantial, as illustrated in Figure 2.
Interestingly, it is also seen on this example that Σ(1) − Σ(2) is not always positive
definite. Therefore, for some linear combinations of the ARMA parameters, a better asymptotic accuracy may be obtained when the noise is weak than when it is
strong. The same remark was made by Romano and Thombs (1996) on another
example.
1.75
1.5
1.25
1
0.75
1
0.8
0.6
p
0.4
-0.5
0
b
0.2
0.5
Figure 2: Σ(2) (1, 1)/Σ(1) (1, 1) as function of b ∈ [−0.9, 0.9] and p ∈ [0.1, 1] for the model
∀t ∈ Z,

 X + aX
t
t−1 = ǫt + bǫt−1
 ǫ = η + (c − 2c∆ )η ,
t
t
t
t−1
where a = −0.5 and c = 1, (ηt ) is an i.i.d. N (0, 1) sequence, (∆t ) is a stationary Markov
chain, independent of (ηt ), with state space {0, 1} and transition probabilities p = P (∆t =
1|∆t−1 = 0) = P (∆t = 0|∆t−1 = 1) ∈ (0, 1). It can be shown that (ǫt ) is a white noise.
13
∂
Consistent estimation of the matrix J = J(θ0 ) = E ∂θ
ǫt (θ0 ) ∂θ∂ ′ ǫt (θ0 ) involved
in (4) and (5) is straightforward, for example by taking
n
Jˆ = Jˆn (θ̂n ),
1X ∂
∂
et (θ) ′ et (θ).
Jˆn (θ) =
n
∂θ
∂θ
(6)
t=1
Estimation of the matrix I = I(θ0 ) is a much more intricate problem and is
the object of the next section.
4
Asymptotic distribution of the HAC estimator of the covariance matrix I
The formula displayed in Remark (a) of Theorem 1 motivates the introduction of
a HAC estimator of I of the general form
Iˆ = Iˆn (θ̂n ) =
+∞
X
ˆ i (θ̂n ),
ω(ibn )∆
(7)
i=−∞
involving the ARMA residuals et (θ̂n ) through the functions
 n−i
X

∂
∂

 1
et (θ) et (θ) ′ et+i (θ)et+i (θ),
n
′
ˆ
ˆ
∂θ
∂θ
∆i (θ) = ∆−i (θ) =
t=1


 0,
0 ≤ i < n,
i ≥ n.
In (7), ω(·) is a kernel function belonging to the set K defined below, and bn
is a size-dependent bandwidth parameter. When i is large relative to n, ∆i (θ0 )
ˆ i (θ̂n ) which is based on too few observations.
is in general poorly estimated by ∆
Consistency of Iˆ therefore requires that the weights ω(ibn ) be close to one for small
P
ˆ
i and close to zero for large i. In particular, the naive estimator ∞
i=−∞ ∆i (θ̂n ) is
inconsistent.3 The set K is defined by
K = {ω(·) : IR → IR | ω(0) = 1, ω(·) is bounded, even,
has a compact support [−a, a] and is continuous on [−a, a]} .
14
(8)
Various kernels ω(·) belonging to K are available for use in (7). Standard examples are the rectangular kernel ω(x) = 1[−1,1] (x), the Bartlett kernel ω(x) =
(1 − |x|)1[−1,1] (x), the Parzen kernel ω(x) = (1 − 6x2 + 6|x|3 )1[0,1/2] (|x|) + 2(1 −
|x|)3 1(1/2,1] (|x|) or the Tukey-Hanning kernel ω(x) = {1/2 + cos(πx)/2} 1[−1,1] (x).
The properties of kernel functions have been extensively studied in the time series literature (see e.g. Priestley (1981)). For a given kernel ω(·) in K, let ̟2 =
R 2
ω (x)dx. Our asymptotic normality result on Iˆ requires4
lim bn = 0,
n→∞
lim nb4n = +∞.
n→∞
(9)
We denote by A ⊗ B the Kronecker product of two matrices A and B, vecA
denotes the vector obtained by stacking the columns of A, and vechA denotes the
vector obtained by stacking the diagonal and subdiagonal elements of A (see e.g.
Harville (1997) for more details about these matrix operators). Let Dm denote
+ = (D ′ D )−1 D ′ . Thus
the m2 × m(m + 1)/2 duplication matrix,5 and let Dm
m m
m
+ vec (A) = vech (A) when A = A′ .
Dm
4.1
Strong ARMA
Our first result is stated in the following theorem.
Theorem 2 Let A1 − A3 hold and let (9) be satisfied.
If Eǫ4+ν
< ∞ and A4 holds, then Iˆ defined by (7), with ω(·) ∈ K, is consistent
t
and
p
d
+
+′
nbn vech Iˆ − I ; N 0, 2̟2 Dp+q
(I ⊗ I)Dp+q
Next, we give a corresponding result for the estimator of the difference between
the ACM under the two set of assumptions, Σ(1) − Σ(2) , upon which the test of the
next section will be based.
15
Let
Σ̂(1) = σ̂ 2 Jˆ−1 ,
Σ̂(2) = Jˆ−1 IˆJˆ−1 ,
σ̂ 2 = Qn (θ̂n ).
Theorem 3 Let the assumptions of Theorem 2 hold. Then
p
d
+
+′
nbn vech Σ̂(2) − Σ̂(1) ; N 0, 2̟2 Dp+q
(Σ(1) ⊗ Σ(1) )Dp+q
:= N (0, Λ)
The asymptotic distribution of
√
nbn vech Σ̂(2) − Σ̂(1) is non degenerate. The
non-singularity of Σ(1) results from Theorem 1 and the determinant of Λ is, up to
a constant, equal to the determinant of Σ(1) at the power (p + q)2 + 1 (see Magnus
and Neudecker, Theorem 3.14, 1988). Moreover, we have
′
′
Λ−1 = (2σ 4 ̟2 )−1 Dp+q
(J ⊗ J)Dp+q = (2̟2 )−1 Dp+q
(Σ(1) ⊗ Σ(1) )−1 Dp+q .
(10)
Explicit expressions for the asymptotic covariance matrices appearing in Theorems 1 and 3 can be obtained for the MA(1) and AR(1) models.
Corollary 1 Let Xt = ǫt + ψ1 ǫt−1 , where ǫt iid (0, σ 2 ), σ 2 > 0 and |ψ1 | < 1. Then
Theorem 1 holds with Σ(1) = Σ(2) = 1 − ψ12 and
p
d
nbn Σ̂(2) − Σ̂(1) ; N (0, 2̟2 (1 − ψ12 )2 ).
The same results hold for the stationary solution of the AR(1) model Xt +φ1 Xt−1 =
ǫt with φ1 = ψ1 and under the same noise assumptions.
4.2
Weak ARMA
Next we state additional assumptions on the kernel at zero and on ut (θ0 ) to obtain
asymptotic results for the HAC estimator when the ARMA is not strong. Following
Parzen (1957), define
ω(r) = lim
x→0
1 − ω(x)
|x|r
16
for r ∈ [0, +∞).
The largest exponent r such that ω(r) exists and is finite characterizes the smoothness of ω(·) at zero.6 Let the matrix
I (r) =
∞
X
i=−∞
|i|r ∆i (θ0 )
for r ∈ [0, +∞).
Let
ut = ut (θ0 ) = ǫt
∂ǫt
(θ0 ) := (ut (1), . . . , ut (p + q))′ .
∂θ
Denote by κℓ1 ,...,ℓ8 (0, j1 , . . . , j7 ) the eighth order cumulant of
(ut (ℓ1 ), ut+j1 (ℓ2 ), . . . , ut+j7 (ℓ8 )) (see Brillinger, 1981 p.19), where ℓ1 , . . . , ℓ8 are positive integers less than p + q and j1 , . . . , j7 are integers. We consider the following
assumptions.
A6. E(ǫ16
t ) < ∞ and for all ℓ1 , . . . , ℓ8
+∞
X
j1 =−∞
···
+∞
X
j7 =−∞
|κℓ1 ,...,ℓ8 (0, j1 , . . . , j7 )| < ∞
A7. For some r0 ∈ (0, +∞),
0 +1
lim nb2r
= γ ∈ [0, +∞),
n
n→∞
ω(r0 ) < ∞
and
kI (r0 ) k < ∞.
The following result is a consequence of Andrews (Theorem 1, 1991) (and partly
of Francq and Zakoïan (1998, 2000)):
Theorem 4 Let A1 − A3 and A6 − A7 hold with γ > 0. Then for ω(·) ∈ K,
n
o′
lim nbn E vec Iˆ − I
vec Iˆ − I
n→∞
n
o′ n
o
+
2
(r0 )
(r0 )
= ω(r
vecI
vecI
γ + 2̟2 tr Dp+q Dp+q
(I ⊗ I) .
0)
(11)
Note that this result is not in contradiction with Theorem 2. Indeed, under
A4, I (r) = 0 for r > 0, so the first term in the right-hand side of (11) vanishes.
17
As noted by Andrews (1991), it seems likely that Assumption A6 could be
replaced by mixing plus moment conditions, such as Assumption A5, or A5′ , and
the moment condition Eǫ16+ν
< ∞. We will do so in the next theorem.
t
We introduce the following mixing and moment assumptions on the process
u = (ut ).
A5′′ . A7 holds with γ = 0, kut k8+ν < ∞ and
P∞
r
h=0 h
ν
{αu (h)} 8+ν < ∞,
for
some ν > 0 and some r such that r ≥ 2, r > (3ν − 8)/(8 + ν) and r ≥ r0 .
The extensions of Theorems 2 and 3 to weak ARMA can be formulated as
follows.
Theorem 5 Let A1 − A3 and A5′′ hold. Assume there exists κ < 1/6 such that
1/κ
lim inf n→∞ nbn
> 0. Then the convergence in distribution of Theorem 2 holds,
and that of Theorem 3 becomes
p
d
+
+′
nbn vech Σ̂(2) − Σ̂(1) − Σ(2) + Σ(1) ; N 0, 2̟2 Dp+q
(Σ(2) ⊗ Σ(2) )Dp+q
.
To prove this theorem we searched for a CLT for a triangular array (xn,t ) (see
Equation (41) below) such that xn,t is a measurable function of ut−Tn , ut−Tn +1 ,
. . . ,ut+Tn , for some mixing process ut , with Tn = [a/bn ] → ∞. Denote by αn (·)
the mixing coefficients of (xn,t )t . It seems that the existing CLT’s (see e.g. Withers
(1981)) require conditions on supn αn (·) or other conditions which are difficult to
check in our framework. We therefore establish the following Lemma, which can
be viewed as a direct extension of the CLT given by Herrndorf (1984) in the case
of a non stationary sequence (xt ), but may have its own interest.
Lemma 1 Let (xn,t )n≥1,1≤t≤n be a triangular array of centered real-valued random
variables. For each n ≥ 2, let αn (h), h = 1, . . . , n − 1, be the strong mixing
coefficients of xn,1 , . . . , xn,n , defined by:
αn (h) =
sup
sup
1≤t≤n−h A∈An,t , B∈Bn,t+h
18
|P (A ∩ B) − P (A)P (B)| .
where An,t = σ (xn,u : 1 ≤ u ≤ t) and Bn,t = σ (xn,u : t ≤ u ≤ n). By convention,
P
we set αn (h) = 1/4 for h ≤ 0 and αn (h) = 0 for h ≥ n. Put Sn = nt=1 xn,t and
assume that
sup sup kxn,t k2+ν ∗ < ∞
n≥1 1≤t≤n
for some ν ∗ (0, ∞],
lim n−1 VarSn = σ 2 > 0,
(12)
(13)
n→∞
there exist a sequence of integers (Tn ) such that
Tn = O(nκ )
for some κ ∈ [0, ν ∗ /{4(1 + ν ∗ )}),
(14)
and a sequence {α(h)}h≥1 such that
αn (h) ≤ α(h − Tn ), for all h > Tn ,
∞
X
ν∗
∗
hr α(h) 2+ν ∗ < ∞ for some r ∗ >
h=1
(15)
ν ∗)
ν∗
2κ(1 +
.
− 2κ(1 + ν ∗ )
(16)
Then
d
n−1/2 Sn ; N (0, σ 2 ).
Theorem 5 applies for kernels which are very smooth at zero. Indeed, the condi6
0 +1 = 0 and lim inf
tions limn→∞ nb2r
n→∞ nbn 6= 0 imply r0 > 5/2. The following
n
theorem shows that this smoothness condition can be weaken when moment assumptions are added. The proof is similar to that of Theorem 5 and is therefore
skipped.
Theorem 6 Let A1−A3 and A5′′ hold with ν = ∞. Assume there exists κ < 1/4
1/κ
such that lim inf n→∞ nbn
> 0. Then the convergences in distribution of Theorem
5 hold.
The results of this section will now be used to derive the asymptotic level of
our test statistic.
19
5
Testing adequacy of the standard asymptotic
distribution
The results of Section 3 show that the asymptotic variances of the LSE under strong
and weak assumptions can be dramatically different. Standard statistical routines
estimate the asymptotic variance corresponding to strong ARMA models and it is
of importance to know if the resulting tests (or confidence intervals) are reliable.
The aim of the present section is therefore to test the assumptions presented in the
introduction, which we recall for convenience:
H0 : Σ(1) = Σ(2) ,
against
H1 : Σ(1) 6= Σ(2) .
It should be clear that under both assumptions the ARMA model is well-specified.
In particular, the case of serial correlation of (ǫt ) is not considered. A statistic
derived from Theorem 3 is
n
o′
n
o
Λ̂−1 vech Σ̂(2) − Σ̂(1)
Υn = nbn vech Σ̂(2) − Σ̂(1)
(17)
where Λ̂−1 is any consistent estimator of Λ−1 . In view of (10) we can take
′
ˆ p+q ,
Λ̂−1 = (2σ̂ 4 ̟2 )−1 Dp+q
(Jˆ ⊗ J)D
which does not require any matrix inversion. Then we have
n
n
o′
o
ˆ vec Σ̂(2) − Σ̂(1) .
(Jˆ ⊗ J)
Υn = nbn (2σ̂ 4 ̟2 )−1 vec Σ̂(2) − Σ̂(1)
But since
n
o′
n
o′
ˆ
ˆ = σ̂ 4 (p + q),
vecΣ̂(1) (Jˆ ⊗ J)vec
Σ̂(1) = σ̂ 4 vecJˆ−1 vecJˆ = σ̂ 4 tr(Jˆ−1 J)
n
o′
n
o′
ˆ
ˆ
vecΣ̂(1) (Jˆ ⊗ J)vec
Σ̂(2) = σ̂ 2 vecJˆ−1 vecIˆ = σ̂ 2 tr(Jˆ−1 I),
n
o′
n
o′
ˆ
ˆ 2 },
vecΣ̂(2) (Jˆ ⊗ J)vec
Σ̂(2) =
vecJˆ−1 IˆJˆ−1 vecIˆ = tr{(Jˆ−1 I)
20
we get
n
o
ˆ + tr{(Jˆ−1 I)
ˆ 2}
Υn = nbn (2σ̂ 4 ̟2 )−1 σ̂ 4 (p + q) − σ̂ 2 tr(Jˆ−1 I)
and therefore, denoting by Ip+q the (p + q) × (p + q) identity matrix, we obtain
nbn
1 ˆ−1 ˆ 2
J
I
Υn =
tr
I
−
.
(18)
p+q
2̟2
σ̂ 2
1 ˆ−1 ˆ
J I
σ̂2
Note that when the ARMA is strong,
converges to the identity matrix.
The next result, which is a straightforward consequence of Theorem 3 and (17),
provides a critical region of asymptotic level α ∈ (0, 1).
Theorem 7 Let the assumptions of Theorem 2, in particular H0 , hold. Then
n
o
lim P Υn > χ21 (p+q)(p+q+1),1−α = α,
n→∞
2
where χ2r,α denotes the α-quantile of the χ2r distribution.
The following theorem gives conditions for the consistency of our test.
Theorem 8 Assume that A1 − A3 and A5 (or A5′ ) hold and that E|Xt |4+2ν <
4+10/ν
∞. Let ω(·) ∈ K, and (bn ) satisfying limn→∞ bn = 0 and limn→∞ nbn
= +∞.
Then, under H1 we have
n
o
lim P Υn > χ21 (p+q)(p+q+1),1−α = 1,
n→∞
2
for any α ∈ (0, 1).
The test procedure is quite simple. For a given time series it consists in :
(i) fitting an ARMA(p, q) model (after an identification step of the orders p and
q, which is not the subject of this paper); (ii) estimating the matrices I and J
by (7) and (6); (iii) computing the statistic Υn by (18) and rejecting H0 when
Υn > χ21 (p+q)(p+q+1),1−α . Choice of the bandwidth and kernel used to define the
2
estimator of matrix I will be discussed in the next section.
21
Remarks
(a) To our knowledge, this is the first test designed for the purpose of distinguishing between weak and strong ARMA models. In other words, this statistic
allows to test whether the error term of an ARMA model is independent or simply
non correlated.
(b) Our test is related to other tests recently introduced in the time series
literature. Some of them are goodness-of-fit tests (which is not the case of ours,
since under both H0 and H1 the ARMA model (2) is well-specified). Let us
mention Hong (1996), who proposes a test, based on a kernel-based spectral density
estimator, for uncorrelatedness of the residuals of AR models with exogeneous
variables. Its asymptotic distribution is established under the null assumption
of independent errors and his consistent for serial correlation of unknown form.
In the framework of ARMA models, Francq, Roy and Zakoian (2004) propose a
modification of the standard portmanteau test for serial correlation, when the errors
are only assumed to be uncorrelated (not independent) under the null hypothesis.
The test of the present paper can be viewed as complementary to those goodnessof-fit tests.
Another approach very closely related to ours is taken by Hong (1999) who
proposes a nonparametric test for serial independence, based on a generalization
of the spectral density. Hong’s test has power against various types of pairwise
dependencies, including cases of absence of correlation, and is aimed to detect any
departure from independence. Our test has a more limited scope since it is devoted
to ARMA models. Moreover it is aimed to test one consequence of independence,
the one that matters for the reliability of the standard routines. Another difference
between the two approaches, is that Hong’s test is aimed to detect departure from
independence of the observed process. It cannot be straightforwardly applied to our
framework because, even under the null hypothesis of independence, the residuals
22
are dependent.
6
Choice of the bandwidth and kernel, and finite sample performances of the test
To make the test procedure fully operational, it is necessary to specify how to
chose the kernel and bandwidth parameters. To this aim we will first consider two
standard asymptotic local efficiency criteria, respectively derived from the Bahadur
and Pitman approaches. The reader is referred to Van der Vaart (1998) for details
concerning the asymptotic local efficiency of tests.
6.1
Bahadur’s approach
In view of (18), under the assumptions of Theorem 8,
1
1
1 −1 2
Υn →
tr
I
−
J
I
p+q
nbn
2̟2
σ2
(1)
(19)
(2)
in probability as n → ∞. Let Υn and Υn be two test statistics of the form Υn ,
with respective kernels ω1 and ω2 . The p-value of the tests can be approximated
by 1−Fχ21
(i)
2 (p+q)(p+q+1)
of the
χ2k
(Υn ), where Fχ2 denotes the cumulative distribution function
k
distribution. Assume we are under the alternative. Let {n1 (n)}n and
{n2 (n)}n be two sequences of integers tending to infinity such that
(1)
log 1 − Fχ21
(Υn1 (n) )
2 (p+q)(p+q+1)
= 1 a.s.
lim
n→∞
(2)
log 1 − Fχ21
(Υn2 (n) )
2 (p+q)(p+q+1)
One can say that, for large n, the two tests require respectively n1 (n) and n2 (n)
observations to reach the same (log) p-value. The Bahadur asymptotic relative
(1)
efficiency (ARE) of Υn
(2)
with respect to Υn
23
(1)
(2)
is defined by ARE(Υn , Υn ) =
limn→∞ n2 (n)/n1 (n). To make the comparison meaningful we use the same bandwidth for the two tests. Assume that bn = cn−ν , c > 0, ν ∈ (0, 1). Using
n
o
1/(1−ν)
(1)
(2)
log 1 − Fχ2 (x) ∼ −x/2 as x → ∞, we obtain ARE(Υn , Υn ) = ̟22 /̟12
,
k
R
where ̟i2 = ωi2 (x)dx. Thus, one can consider that the asymptotic superiority of
the first test over the second test hold if ̟12 < ̟22 . In this sense, it is easy to see that
the tests based on the truncated, Bartlett, Tukey-Hanning and Parzen kernels are
ranked in an increasing order of asymptotic efficiency. A similar argument shows
that, when the kernel is fixed and ν varies, the Bahadur efficiency decreases when ν
increases. Thus, in the Bahadur sense, there is no optimal choice of bn : the slower
bn tends to zero, the asymptotically more efficient the tests are. Unfortunately,
this result does not give indication on how to choose the bandwidth parameter for
finite samples. If bn tends to zero too slowly and/or if ̟i2 is too small, the finite
sample bias of Iˆ is likely to be important, and the rate of convergence in (19) is
likely to be very slow.
6.2
Pitman’s approach
Another popular approach used to compare the asymptotic local powers of tests is
√
that of Pitman. Consider local alternatives of the form H1n : I = σ 2 J + ∆/ nbn ,
where ∆ 6= 0 is a symmetric positive definite matrix. Alternatively, one could for√
mulate these alternatives as H1n : Σ(2) = Σ(1) + J −1 ∆J −1 / nbn . Under standard
assumptions, Σ̂(2) − Σ̂(1) is a regular estimator of Σ(2) − Σ(1) (see Van der Vaart
(1998), Section 8.5). Therefore, in view of Theorem 5, under H1n ,
p
d
nbn vech Σ̂(2) − Σ̂(1) ; N vechJ −1 ∆J −1 , Λ .
It follows that, under H1n ,
d
Υn ; χ21 (p+q)(p+q+1)
2
n
′
o
vechJ −1 ∆J −1 Λ−1 vechJ −1 ∆J −1 ,
24
where χ2k (δ) denotes the noncentral χ2k distribution with noncentrality parameter δ.
It follows that the Pitman asymptotic local power increases with the noncentrality
parameter. In view of (10), we draw the same conclusion as for the Bahadur approach: tests with small ̟i2 are preferred. Amazingly, the asymptotic distribution
of Υn does not depend on the asymptotic behaviour of bn . However the slower bn
tends to zero, the faster H1n tends to H0 . It is therefore preferable to chose bn as
large as possible. This was also our conclusion with the previous approach.
6.3
Automatic bandwidth estimators
Since the approaches of the previous sections do not allow to chose bn in practice,
we now turn to an automatic bandwidth method. Andrews (1991) obtained asymptotically optimal data-dependent automatic bandwidth parameters for HAC estimators. We will apply his results to our framework. It should be noted, however,
that optimal HAC estimators do not necessarily provide asymptotically optimal
test statistics. This issue is left for further investigation.
Andrews showed that, under the assumption kI (r) k > 0 and other regularity assumptions, the asymptotically optimal bandwidth parameter (leading to an
estimator with the best bias-variance trade-off) is given by
b∗n
−1
=c
{α(r)n}
−1/(2r+1)
,
α(r) =
′ (r) e 2
i
i=1 ei I
Pp+q ′
2 ,
(e
Ie
)
i=1
i i
Pp+q
where (c, r) is equal to (1.1447, 1) for the Bartlett kernel, to (2.6614, 2) for the
Parzen kernel, and to (1.7462, 2) for the Tukey-Hanning kernel, and where ei denotes the i-th vector of the canonical basis of Rp+q . Approximating, for j =
1, . . . , p + q, the dynamics of ǫt ∂ǫt (θ0 )/∂θj by a simple AR(1) model with autoregressive parameter âj and variance parameter σ̂j2 , Andrews obtained the data-
25
dependent estimate b̂∗n = c−1 {α̂(r)n}−1/(2r+1) of b∗n , by setting
α̂(1) =
6.4
Pp+q
i=1
4â2j σ̂j4 (1 − âj )−6 (1 + âj )−2
,
Pp+q 4
−4
i=1 σ̂j (1 − âj )
Finite sample performance
α̂(2) =
Pp+q
2 4
−8
i=1 4âj σ̂j (1 − âj )
.
Pp+q 4
−4
i=1 σ̂j (1 − âj )
To assess the finite sample performances of the tests proposed in this paper, we first
simulate 1,000 replications of several strong ARMA models of size n = 200, n =
400 and n = 800. We consider tests with nominal level α = 5%. We use the
estimated optimal bandwidth given by Andrews, as described previously, and 3
different kernels. The relative rejection frequencies are given in Table 1. All the
empirical sizes are less than the nominal 5% level. It seems that the tests are
slightly conservative, and that, in terms of control of type I error, the performance
of the three kernels is very similar.
Now we turn to experiments under the alternative of non independent errors.
Five models were considered: i) an AR(1) with GARCH(1,1) errors; ii) the square
of a GARCH(1,1); iii) an ARMA(1,1) with a Markov-Switching white noise; iv)
the first component of a strong bivariate MA(1); v) an AR(1) with a chaotic noise.
Precise specifications are displayed in Table 2. All these examples have been shown
to provide ARMA models with non independent errors: an ARMA(1,1) for models
ii) and iii), an AR(1) for models i) and v), and a MA(1) for model iv). It should
be emphasized that we only need to estimate the ARMA representation, not the
DGP.
Andrews (1991) showed that, for HAC estimation, the Bartlett kernel is less
efficient than the two other kernels. In these examples, the power of the test
does not appear to be very sensitive to the kernel choice. There is no particular
user-chosen kernel that is the most satisfactory for all cases. For each sample
size, the best performance is obtained for the Markov-switching model. For this
26
model, the tests almost always take the right decision, at least when n ≥ 500.
Slower convergences to 1 are obtained for the powers in models i), iv) and v). The
very slow power convergence in model ii) can be explained as follows. The weak
ARMA(1, 1) representation of ǫ2t is ǫ2t − 0.97ǫ2t−1 = 1 + ut − 0.85ut−1 for some
white noise ut . It is seen that the AR and MA parts have close roots, making
statistical inference difficult. As continuous functions of the ARMA estimators,
the estimators of I and J inherit their poor accuracy. Another explanation, which
also holds for the relatively poor performance of model i), is that, in models based
on GARCH errors, the noise is a martingale difference. For this reason, departure
from the strong assumption can be more difficult to detect than in cases when the
noise is only uncorrelated (as in models iii)-v)).
Table 1: Size (in % of relative rejection frequencies) of Υn -tests with estimated
optimal bandwidth. The nominal significance level is α = 5%. The number of
replications is N = 1, 000.
Model
Strong AR(1)1
Strong MA(1)2
Strong ARMA(1,1)3
Kernel
n Bartlett Parzen Tukey-Hanning
200
2.3
1.5
2.2
500
2.0
3.8
2.3
800
2.9
2.1
2.6
200
1.3
3.1
1.3
500
1.9
2.7
2.1
800
2.2
2.5
3.0
200
1.9
3.0
3.3
500
3.2
4.8
3.8
800
3.6
4.4
4.0
1: Xt = 0.5Xt−1 + ǫt , ǫt iid Student with ν = 5 degrees of freedom
2: Xt = ǫt + 0.7ǫt−1 , ǫt iid with centered exponential density
3: Xt − 0.5Xt−1 = ǫt + 0.7ǫt−1 , ǫt iid N (0, 1)
27
Table 2: Power (in % of relative rejection frequencies) of Υn -tests with estimated
optimal bandwidth. The nominal significance level is α = 5%. The number of
replications is N = 1, 000.
Model
AR(1)-GARCH(1,1)4
Square of a GARCH(1,1)5
MS-ARMA(1,1)6
MA(1) marginal
AR(1)
8
7
n Bartlett
200
14.1
500
53.5
800
74.6
200
9.4
500
23.6
800
38.8
200
75.8
500
98.4
800
99.9
200
32.6
500
70.4
800
86.0
200
22.5
500
42.1
800
53.9
Kernel
Parzen Tukey-Hanning
18.9
19.7
50.0
51.0
72.2
70.4
9.4
9.7
24.9
26.8
39.0
36.1
81.1
79.8
98.4
98.3
99.9
99.9
30.5
36.2
73.1
78.8
89.4
94.1
25.8
21.0
46.2
48.9
63.3
62.2
√
4: Xt = 0.5Xt−1 + ǫt , ǫt = ht ηt , ht = 1 + 0.12ǫ2t−1 + 0.85ht−1 , ηt iid N (0, 1)
5: Xt = ǫ2t , where ǫt is as in 4
6: Xt − 0.5Xt−1 = ǫt + 0.7ǫt−1 , ǫt = ηt + (1 − 2∆t )ηt−1 , (∆t ) is a Markov Chain with
state-space {0, 1} and transition probabilities P (∆t = 1|∆t−1 = 0) = P (∆t = 0|∆t−1 =
1) = 0.01, ηt iid N (0, 1)
η1 t
7: X1 t = ǫ1 t + 0.8ǫ1 t−1 − 0.9ǫ2 t−1 , where ǫ1 t = η12 t − 1, ǫ2 t = η22 t − 1, and
∼
η2 t
1 0.9
N 0,
0.9 1
8: Xt = 0.5Xt−1 + ǫt , where (ǫt ) is the noise defined by (1) with ηt ∼ N (−0.5, 0.052)
28
7
Proofs
7.1
Additional notations and scheme of proof for Theorem 2
Throughout this section, the letter K (resp. ρ) will be used to denote positive
constants (resp. constants in (0, 1)) whose values are unimportant and may
P
vary. We will use the norm kAk = |aij | for any matrix A = (aij ). Let
ut (θ) = ǫt (θ)
∂
ǫt (θ),
∂θ
ut = ut (θ0 ),
vt (θ) = et (θ)
∂
et (θ),
∂θ
vt = vt (θ0 ).
Hence, for 0 ≤ i < n, and for i ≥ 0, respectively,
n−i
X
′
ˆ i (θ) = ∆
ˆ ′ (θ) = 1
∆
vt (θ)vt+i
(θ),
−i
n t=1
∆i (θ) = ∆′−i (θ) = E{ut (θ)u′t+i (θ)}.
We set, for 0 ≤ i < n,
[a/bn ]
n−i
n
o′
1X
u
u
ˆ
ˆ
∆i (θ) = ∆−i (θ) =
ut (θ)u′t+i (θ),
n t=1
Iˆnu (θ) =
X
ˆ u (θ),
ω(ibn )∆
i
i=−[a/bn ]
assuming without loss of generality that a/bn ≤ n. Similarly, we can write
ˆ i (θ) = ∆
ˆ v (θ) and Iˆn (θ) = Iˆv (θ). In the notation of all those quantities the
∆
i
n
parameter θ will be removed when equal to θ0 .
It will also be convenient to modify the number of terms taken into acˆ u (θ0 ) = ∆
ˆ u . Define, for i ∈ N = {0, 1, . . . },
count in the definition of ∆
i
i
[a/bn ]
n
∆ui
=
(∆u−i )′
1X
ut u′t+i ,
=
n t=1
29
Inu
=
X
i=−[a/bn ]
ω(ibn )∆ui ,
and the centered matrix
u
X
I n = Inu − ∆u0 =
ω(ibn )∆ui =
0<|i|≤[a/bn ]
It will be shown that
√
X
0<i≤[a/bn ]
ω(ibn ) {∆ui + (∆ui )′ } .
n
o
√
√
u
nbn Iˆ − I = nbn Iˆnv (θ̂n ) − I and nbn I n
have the same asymptotic distribution (see Lemma 9 below).
P
Define the sequences (ck,ℓ )k by ∂θ∂ ℓ ǫt = ∞
k=1 ck,ℓ ǫt−k . Since a central limit
theorem for r-dependent sequences will be used, it will be useful to consider
the following truncated variables. For any positive integer r, let
r
∂ X
ǫt =
ck,ℓ ǫt−k ,
r ∂θℓ
r ut
= ǫt
k=1
∂ ǫt ,
r ∂θ
r
ut = ut − r ut ,
(20)
and for m < k
u
r ∆i,(m,k)
=
′
u
r ∆−i,(m,k)
u
r I n,(m,k)
=
k
X
1
′
=
r ut r ut+i
k − m + 1 t=m
X
(i ∈ N),
ω(ibn )r ∆ui,(m,k) .
0<|i|≤[a/bn ]
When m = 1 and k = n we will write r ∆ui,(m,k) =
u
r ∆i
u
u
and r I n,(m,k) = r I n .
√
u
By a standard argument, we will obtain the limit distribution of nbn I n
by taking the limit as r → ∞ of the asymptotic distribution, as n → ∞, of
√
u
nbn r I n . We also denote
∂ ∂ ǫt
ǫt .
rJ = E
r ∂θ
r ∂θ ′
P
u
Now note that r I n = nt=1 Zn,t , where
Zn,t =
1
n
X
ω(ibn )
0<i≤[a/bn ]
30
′
r ut r ut+i
+ r ut+i r ut ′ .
Process (Zn,t )t is mn -dependent, with mn = [ab−1
n ] + r. Next we split the sum
Pn
t=1 Zn,t into alternate blocks of length kn − mn and mn (with a remaining
block of size n − pn kn + mn ):
pn −2
u
rI n
=
r Sn
+
X
ℓ=0
r Sn
Zn,(ℓ+1)kn −mn +1 + · · · + Zn,(ℓ+1)kn
!
+Zn,pnkn −mn +1 + · · · + Zn,n ,
pX
n −1
=
Zn,ℓkn+1 + · · · + Zn,(ℓ+1)kn −mn
(21)
ℓ=0
pn −1
kn − mn X u
=
r I n,(ℓkn +1,(ℓ+1)kn −mn ) ,
n
ℓ=0
where kn is an integer in (mn , n) to be specified later and pn = [n/kn ], the
integer part of n/kn (assuming that n is sufficiently large, so that mn ≤ n).
It will be shown that, when mn = o(kn − mn ) and pn → ∞, the asymptotic
√
√
u
distributions of nbn r I n and nbn r Sn are identical (see part (b) of Lemma
13).
To avoid moment assumptions of excessive order, we then introduce variables that are truncated in level. For any positive constant κ and for m < k,
let
r
∂ κ X
ǫt =
ck,ℓ ǫκt−k ,
∂θ
r
ℓ
k=1
ǫκt = ǫt 1{|ǫt |≤κ} − Eǫt 1{|ǫt |≤κ} ,
κ
κ
r ut = ǫt
let r ∆u,κ
i,(m,k) (resp.
′
r ut r ut+i
∂ κ
ǫt ,
r ∂θ
u,κ
r I n,(m,k) )
κ
r ut
be the matrix obtained by replacing the variables
′
by r uκt r uκt+i in r ∆ui,(m,k) (resp.
r Sn
κ
= r ut − r uκt ,
u
r I n,(m,k) )
and let
pn −1
kn − mn X u,κ
=
r I n,(ℓkn +1,ℓkn +kn −mn ) .
n
ℓ=0
31
√
We will show that, when κn → ∞, the asymptotic distributions of nbn r Sn
√
and nbn r Snκn are identical (see part (c) of Lemma 13). The Lindeberg
central limit theorem will be used to show the asymptotic normality of
√
nbn r Snκn .
7.2
Lemmas and proofs for Theorem 2
In all subsequent lemmas, the assumptions of Theorem 2 are supposed to
be satisfied. The first four lemmas are concerned with some fourth-order
properties of the process (ut ).
Lemma 2 Let k ∈ N∗ = {1, 2, . . . }, t1 , t2 , . . . , tk ∈ Z = {. . . , −1, 0, 1, . . . }
and i1 , i2 , . . . , ik ∈ N∗ . If the indexes t1 , t2 , . . . , tk are all distinct and the
indices i1 , i2 , . . . , ik are all less than or equal to 4, then
k
Y
ij
ǫt ≤ Mk < ∞.
E
j
j=1
Proof. Arguing by induction, it suffices to note that E|Xǫit | = E|X|E|ǫit | ≤
E|X| maxj∈{1,...,4} E|ǫt |j < ∞, when X and ǫt are independent, E|X| < ∞,
i ∈ {1, . . . , 4} and Eǫ4t < ∞.
2
Lemma 3 For all i, j, h ∈ Z, the matrix Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j } is
2
2
well defined in R(p+q) × R(p+q) and is bounded in norm by a constant inde-
pendent of i, j and h.
Proof. For ℓ ∈ {1, . . . , p + q}, recall that ut (ℓ) = ǫt ∂θ∂ ℓ ǫt (θ0 ) is the ℓ-th
P
element of ut . Assumption A2 entails that ∂θ∂ ℓ ǫt (θ0 ) = k≥1 ck,ℓ ǫt−k , where
32
|ck,ℓ | < Kρk . Therefore we have, for all ℓ1 , ℓ2 , ℓ3 , ℓ4 ∈ {1, . . . , p + q},
E |u1 (ℓ1 )u1+i (ℓ2 )u1+h (ℓ3 )u1+h+j (ℓ4 )| ≤ K
X
ρi1 +i2 +i3 +i4 E
i1 ,i2 ,i3 ,i4 ≥1
8
Y
j=1
|ǫtj | (22)
with t1 := 1 6= t2 := 1 − i1 , t3 := 1 + i 6= t4 := 1 + i − i2 , t5 := 1 + h 6= t6 :=
1 + h − i3 and t7 := 1 + h + j 6= t8 := 1 + h + j − i4 . Since, at most four of
the indices t1 , . . . , t8 are equal, Lemma 2 shows that the right-hand side of
(22) is bounded. The proof follows.
2
Lemma 4 For i, h ∈ Z = {. . . , −1, 0, 1, . . . },



0
when ih 6= 0,


Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+i } =
O(ρ|h|)
when i = 0,



 σ 4 J ⊗ J + O(ρ|i| ) when h = 0.
Proof. We keep the notations of the proof of Lemma 3. First note that, for
ℓ1 , ℓ2 , ℓ3 , ℓ4 ∈ {1, . . . , p+q}, the (p+q)(ℓ1−1)+ℓ2 -th row and (p+q)(ℓ3 −1)+ℓ4th column element of the previous covariance matrix (resp. of J ⊗ J) is
Cov {u1 (ℓ1 )u1+i (ℓ2 ), u1+h (ℓ3 )u1+h+i (ℓ4 )}
(23)
(resp. J(ℓ1 , ℓ3 )J(ℓ2 , ℓ4 )).
The case ih 6= 0 is obvious by noting that, in (23), one of the u’s has an
index that is strictly greater than the other indexes.
To deal with the case i = 0, without loss of generality suppose that h > 0.
The covariance in (23) is given by
σ2
X
i1 ,i2 ,i3 ,i4 ≥1
ci1 ,ℓ1 ci2 ,ℓ2 ci3 ,ℓ3 ci4 ,ℓ4 Cov ǫ21 ǫ1−i1 ǫ1−i2 , ǫ1+h−i3 ǫ1+h−i4 .
33
When i4 < h, Cov {ǫ21 ǫ1−i1 ǫ1−i2 , ǫ1+h−i3 ǫ1+h−i4 } = 0 because when i3 < h,
ǫ21 ǫ1−i1 ǫ1−i2 and ǫ1+h−i3 ǫ1+h−i4 are independent, and when i3 ≥ h this covariance is given by Eǫ21 ǫ1−i1 ǫ1−i2 ǫ1+h−i3 Eǫ1+h−i4 = 0. Therefore
|Cov {u1(ℓ1 )u1 (ℓ2 ), u1+h (ℓ3 )u1+h (ℓ4 )}| ≤ K
X
i4 ≥h
ci4 ,ℓ4 ≤ Kρh .
Now we consider the case h = 0. For i 6= 0, (23) is given by
Eu1 (ℓ1 )u1 (ℓ3 )u1+i (ℓ2 )u1+i (ℓ4 ) = Eu1 (ℓ1 )u1 (ℓ3 )Eu1+i (ℓ2 )u1+i (ℓ4 ) + ∇
= σ 4 J(ℓ1 , ℓ3 )J(ℓ2 , ℓ4 ) + ∇,
where ∇ = Cov {u1 (ℓ1 )u1 (ℓ3 ), u1+i (ℓ2 )u1+i (ℓ4 )} = O(ρ|i| ), as already proven.
2
Lemma 5 For |i| =
6 |j|,
+∞
X
h=−∞
kCov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j }k ≤ Kρ|i| ρ|j| .
(24)
Proof. Without loss of generality, assume that i ≥ 0 and j ≥ 0 (i 6= j). For
i > 0 and j > 0, all norms in (24) vanish, except perhaps one which is the
sum (over the ℓi ’s) of
|Cov {u1 (ℓ1 )u1+i (ℓ2 ), u1−j+i (ℓ3 )u1+i (ℓ4 )}|
X
≤
|ci1 ,ℓ1 ci2 ,ℓ2 ci3 ,ℓ3 ci4 ,ℓ4 | Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ21+i ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 i1 ,i2 ,i3 ,i4 ≥1
≤ K
X
i1 ,i2 ,i3 ,i4 ≥1
ρi1 +i2 +i3 +i4 |Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 | .
Let Si = {(i1 , i2 , i3 , i4 )| max(i1 , i2 , i3 , i4 ) ≥ i}. Clearly,
X
Si
ρi1 +i2 +i3 +i4 |Eǫ1−i1 ǫ1 ǫ1+i−i2 ǫ1−j+i−i3 ǫ1−j+i ǫ1+i−i4 | ≤ Kρi .
34
Now for (i1 , i2 , i3 , i4 ) not belonging to Si , the indices of the ǫ’s in the expectation can be ranked as follows:
1 − i1 < 1 < min(1 + i − i2 , 1 + i − i4 ) and 1 − j + i − i3 < 1 − j + i. (25)
It is therefore clear that, at least one of the ǫ’s in the previous expectations
has an index different from the others, making these expectations equal to
0. We conclude that the left-hand side of (24) is bounded by Kρi uniformly
in j > 0, and, by symmetry, by Kρj uniformly in i > 0.
For i = 0 and j > 0, the left-hand side of (24) reduces to a sum (over the
ℓi ’s) of
−j
X
h=−∞
≤ K
|Cov {u1 (ℓ1 )u1 (ℓ2 ), u1+h (ℓ3 )u1+h+j (ℓ4 )}|
−j
X
X
h=−∞ i1 ,i2 ,i3 ,i4 ≥1
ρi1 +i2 +i3 +i4 Eǫ1−i1 ǫ21 ǫ1−i2 ǫ1+h−i3 ǫ1+h ǫ1+h+j−i4 ǫ1+h+j .
In this sum all terms vanish, except when i1 = −h or i2 = −h or i4 = j (in
which case at least two indices are equal to 1+h) and when i1 = −h−j or h =
−j or i2 = −h − j (in which case at least two indices are equal to 1 + h + j).
Therefore, it can be seen that the left-hand side of (24) is also bounded by
Kρj when i = 0. The case j = 0 is handled in the same way, by symmetry.
The conclusion follows.
2
Lemma 6
′
+
+
lim nbn Var {vech Inu } = 2σ 4 ̟ 2 Dp+q
(J ⊗ J)Dp+q
.
n→∞
Proof. By stationarity of the processes (ut ), and by the elementary relations
+
vech(A) = Dp+q
vec(A), for any p + q × p + q symmetric matrix A, and
35
vec(ui u′j ) = uj ⊗ ui , we have
5
X
nbn Var {vech Inu } =
X
An (i, j, h)
k=1 (i,j,h)∈Ik
where
An (i, j, h) =
bn
+
+′
ω(ibn )ω(jbn )(n−|h|)Dp+q
Cov {u1 ⊗ u1+i , u1+h ⊗ u1+h+j } Dp+q
n
and the Ik ’s are subsets of ZZ2 × {−n + 1, . . . , n − 1} defined by
I1 = {i = j, h = 0},
I3 = {|i| =
6 |j|},
I2 = {i = −j = h 6= 0},
I4 = {i = j, h 6= 0},
I5 = {i = −j, h 6= i 6= 0}.
In view of Lemma 4,
X
An (i, j, h)
4
=
σ bn
n→∞
4
+∞ n
X
i=−∞
(i,j,h)∈I1
2
′
+
+
ω 2(ibn )Dp+q
(J ⊗ J)Dp+q
+ O(ρ|i| )
′
o
+
+
−→ σ ̟ Dp+q
(J ⊗ J)Dp+q
.
We have Cov {u1 ⊗ u1+i , u1+i ⊗ u1 } = Cov {u1 ⊗ u1+i , u1 ⊗ u1+i } Kp+q where
Kp+q is the (symmetric) commutation matrix such that Kp+q vecA = vecA′
for any (p + q) × (p + q) matrix A. Thus,
X
An (i, j, h)
n−1
o
X
n − |i| n 2
+
+′
|i|
ω (ibn )Dp+q (J ⊗ J)Kp+q Dp+q + O(ρ )
σ bn
n
i=−n+1
4
=
(i,j,h)∈I2
n→∞
′
+
+
−→ σ 4 ̟ 2 Dp+q
(J ⊗ J)Kp+q Dp+q
From Lemma 5,
X
(i,j,h)∈I3
n→∞
An (i, j, h) = O(bn ) −→ 0.
36
From Lemma 4,
X
An (i, j, h) = bn
(i,j,h)∈I4
X
0<|h|<n
n→∞
ρ|h| −→ 0,
X
An (i, j, h) = 0.
(i,j,h)∈I5
Therefore
′
+
+
lim nbn Var {vech Inu } = σ 4 ̟ 2Dp+q
(J ⊗ J)(I(p+q)2 + Kp+q )Dp+q
.
n→∞
The conclusion follows from the relation
′
′
′
+
+
+
+
(I(p+q)2 + Kp+q )Dp+q
= 2Dp+q Dp+q
Dp+q
= 2Dp+q
.
(see Magnus and Neudecker, 1988, Theorem 3.12). 2
Lemma 7
∂
K
u ˆ
E ′ (vec ∆i ) < Kρ|i| + √ .
∂θ
n
ˆ ui = ∆
ˆ u−i′ , we will only consider the case 0 ≤ i < n. We
Proof. Because ∆
have
n−i
X ∂ut+i
∂ut
∂
ˆ u) = 4
(vec
∆
⊗ ut + ut+i ⊗ ′
i
′
′
∂θ
n t=1 ∂θ
∂θ
n−i
n−i
4X
∂ǫt+i ∂ǫt+i
∂ǫt
4X
∂ 2 ǫt+i ∂ǫt
=
ǫt
⊗
+
ǫ
ǫ
⊗
t t+i
n t=1
∂θ ∂θ′
∂θ
n t=1
∂θ∂θ′
∂θ
n−i
n−i
∂ǫt+i
∂ǫt ∂ǫt
4X
∂ǫt+i
∂ 2 ǫt
4X
ǫt+i
⊗
+
ǫ
ǫ
⊗
.
+
t t+i
n t=1
∂θ
∂θ ∂θ′
n t=1
∂θ
∂θ∂θ′
Considering the first sum in the right-hand side, we will prove that, for any
ℓ1 , ℓ2 , ℓ3 ∈ {1, . . . , p + q}
n−i
1 X
∂ǫ
∂ǫ
∂ǫ
K
t+i
t+i
t E
ǫt
< Kρi + √ .
n
∂θℓ1 ∂θℓ2 ∂θℓ3 n
t=1
37
(26)
The other sums can be handled in a similar fashion. For i = 0 or i = 1,
(26) holds straightforwardly because the L1 norm of the term inside the sum
exists. Now, for i > 1 write
∂ǫ i−1 ∂ǫ ∂ǫt+i
t+i
t+i
=
+
,
∂θℓ
i−1 ∂θℓ
∂θℓ
with the notation in (20). Note that the truncated derivative, i.e. the first
term in the right-hand side of the previous equality, is independent of ǫt−j ,
j ≥ 0.
We will first prove that (26) holds when ∂ǫt+i /∂θℓ1 and ∂ǫt+i /∂θℓ2 are
replaced by the truncated derivatives. It will be sufficient to show that
E
(
n−i
1X
ǫt
n t=1
∂ǫ
t+i
∂θℓ1
i−1
∂ǫ
∂ǫ
t
∂θℓ2 ∂θℓ3
t+i
i−1
)2
<
K
.
n
(27)
By stationarity the left-hand side in (27) is bounded by
+∞ 1 X E ǫ1
n h=−∞ ≤
+∞
X
i−1
X
∂ǫ
1+i
i−1
∂θℓ1
× ǫ1+|h|
∞
X
∂ǫ
1+i
∂θℓ2
i−1
∂ǫ
∂ǫ
1
∂θℓ3
1+|h|+i
i−1
∂θℓ1
∂ǫ
1+|h|+i
i−1
∂θℓ2
∂ǫ
1+|h|
∂θℓ3
1
|ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ ck ,ℓ |
n h=−∞ k ,k ,k ,k =1 k ,k =1 1 1 2 2 3 3 4 1 5 2 6 3
1 2 4 5
3 6
× E ǫ1 ǫ1+i−k1 ǫ1+i−k2 ǫ1−k3 ǫ1+|h| ǫ1+|h|+i−k4 ǫ1+|h|+i−k5 ǫ1+|h|−k6 .
It is easily seen that in the last expectation at most four indexes can be
equal which, by Lemma 2 ensures its existence. Moreover, when h 6= 0 the
expectation vanishes. Therefore (27) holds.
It remains to show that (26) holds when ∂ǫt+i /∂θℓ1 and/or ∂ǫt+i /∂θℓ2 are
replaced by the complements of the truncated derivatives. For instance we
38
have
n−i
1 X
E
ǫt
n
t=1
i−1 ∂ǫ
t+i ∂ǫt+i
∂θℓ1
∂θℓ2
∂ǫt ≤ kǫt k4 ∂θℓ3 ∂ǫt+i ∂ǫt ∂θℓ ∂θℓ 2
3 4
4
4
i−1 ∂ǫ
t+i ∂θℓ1
< K × Kρi × K × K.
The proof of (26) is completed. Hence the Lemma is proved.
2
Lemma 8
p
nbn E sup
θ∈Θδ
ˆ
n→∞
u
ˆ
In (θ) − In (θ) −→ 0.
Proof. The matrix norm being multiplicative, the supremum inside the
brackets is bounded by
[a/bn ]
n−i
X
2 X
ω(ibn )
λi,t
n i=0
t=1
where
λi,t = sup {kut (θ) − vt (θ)kkut+i (θ)k + kut+i (θ) − vt+i (θ)kkvt (θ)k}.
θ∈Θδ
Note that
max sup |ǫt (θ) − et (θ)| , sup
X
∂
∂
ǫt (θ) − et (θ) ≤ K
ρt+m |ǫ1−m | .
∂θ
θ∈Θδ ∂θ
m≥1
θ∈Θδ
Hence
∂
∂
∂
sup kut (θ) − vt (θ)k ≤ sup |ǫt (θ) − et (θ)| ǫt (θ) + |et (θ)| ǫt (θ) − et (θ)
∂θ
∂θ
∂θ
θ∈Θδ
θ∈Θδ
X
≤ K
ρt+m1 +m2 |ǫ1−m1 | |ǫt−m2 | .
m1 ,m2 ≥1
It follows that, for i ≥ 0
λi,t ≤ K
X
m1 ,m2 ,m3 ,m4 ≥1
ρt+m1 +m2 +m3 +m4 |ǫ1−m1 | |ǫt−m2 | |ǫt+i−m3 | (|ǫt+i−m4 |+|ǫt−m4 |).
39
Therefore, in view of Eǫ4t < ∞, we get E(λi,t ) ≤ Kρt , from which we deduce
r [a/bn ]
∞
X
p
bn X
ˆ
u
ˆ
nbn E sup In (θ) − In (θ)
≤ K
ω(ibn )
ρt
n i=0
θ∈Θδ
t=1
K
≤ √
= o(1). 2
nbn
Lemma 9
p
u
nbn Iˆ − I − I n = oP (1).
Proof. We prove this lemma by showing that:
√
u
ˆ
ˆ
i)
nbn I − In (θ̂n ) = oP (1);
√
nbn Iˆnu (θ̂n ) − Iˆnu = oP (1);
ii)
√
u
u
ˆ
iii)
nbn In − In = oP (1);
√
u
iv)
nbn Inu − I − I n = oP (1).
Result i) is a straightforward consequence of Lemma 8. To prove ii) we
proceed by applying the mean-value theorem to the (ℓ1 , ℓ2 )-th component of
Iˆnu . For some θ between θ̂n and θ0 we have
∂
Iˆnu (θ̂n )(ℓ1 , ℓ2 ) − Iˆnu (ℓ1 , ℓ2 ) = (θ̂n − θ0 )′ Iˆnu (θ)(ℓ1 , ℓ2 ).
∂θ
Since kθ̂n − θ0 k = OP (n−1/2 ), it is sufficient to prove that
∂ u
sup Iˆn (θ)(ℓ1 , ℓ2 )
= OP (1).
θ∈Θδ ∂θ
(28)
Straightforward algebra shows that
∂
∂
u
ˆ
vec
vec ∆i (θ)
∂θ′
∂θ′
n−i n−i
1X ∂
∂ut+i
1X
∂ut+i
∂ut
=
vec
⊗ ut (θ) +
vec
⊗ ′ (θ)
′
′
′
n t=1 ∂θ
∂θ
n t=1
∂θ
∂θ
n−i
n−i
1 X ∂ut+i
∂ut
1X
∂
∂ut
+
⊗ vec
(θ) +
ut+i ⊗
vec
(θ). (29)
n t=1 ∂θ′
∂θ′
n t=1
∂θ′
∂θ′
40
Using the Cauchy-Schwarz inequality and the ergodic theorem, we have, almost surely,
n−i 1 X
∂
∂ut+i
lim sup sup vec
⊗ ut (θ)
′
′
n→∞ i θ∈Θ n
∂θ
∂θ
δ
t=1
n−i
∂
∂ut+i
1X
sup kut (θ)k
≤ lim sup
sup vec
(θ)
θ∈Θ
′
′
n→∞ i n
∂θ
∂θ
θ∈Θ
δ
δ
t=1
( n
)1/2 ( n
)1/2
X
∂
2
∂u
1
1X
t
2
sup vec
(θ) sup kut (θ)k
< ∞.
≤ lim
n→∞
n t=1 θ∈Θδ ∂θ′
∂θ′
n t=1 θ∈Θδ
Treating in the same way the three other sums of (29), we deduce
∂
∂
u
ˆ
< ∞ a.s.
lim sup sup ′ vec
vec
∆
(θ)
i
′
n→∞ i θ∈Θ
∂θ
∂θ
δ
(30)
Now a Taylor expansion gives, for any ℓ1 , ℓ2 , ℓ3
∂ ˆu
∂ ˆu
∂
∆i (θ)(ℓ1 , ℓ2 ) =
∆i (ℓ1 , ℓ2 )+(θ−θ0 )′
∂θℓ3
∂θℓ3
∂θ
∂ ˆu ∗
∆ (θ )(ℓ1 , ℓ2 ) , (31)
∂θℓ3 i
where θ∗ is between θ and θ0 . From (31), (30), Lemma 7 and the Cesaro
Lemma we obtain
∂ u
Iˆn (θ)(ℓ1 , ℓ2 ) ≤ K
∂θ
[ab−1
n ]
X
i=−[ab−1
n ]
∂ u
∆
ˆ
∂θ i (ℓ1 , ℓ2 )
[ab−1
n ]
+K kθ − θ0 k
X
i=−[ab−1
n ]
2
∂
u
∗
ˆ (θ )(ℓ1 , ℓ2 )
sup sup ∆
i
′
∂θ∂θ
i θ ∗ ∈Θ
δ
−1
= OP (1) + OP (n−1/2 b−1
n ) + OP (bn kθ − θ0 k).
Hence (28), and thus ii) is proved.
Next we prove iii). We have
[a/bn ]
vec (Iˆnu − Inu ) =
X
i=1
n
1 X
ω(ibn )
ut+i ⊗ ut + ut ⊗ ut+i
n t=n−i+1
41
(32)
Hence, by Lemma 3
n
o
u
u
ˆ
nbn Var vec (In − In )
[a/bn ]
bn X
=
ω(ibn )ω(jbn )
n i,j=1
×
n
X
n
X
t=n−i+1 s=n−j+1
Cov {ut+i ⊗ ut + ut ⊗ ut+i , us+j ⊗ us + us ⊗ us+j }
[a/bn ]
1
Kbn X
≤
ij = O
= o(1),
n i,j=1
nb3n
which establishes iii).
u
Note that, under A4, we have I = E(Inu − I n ) = E∆u0 . To prove iv) it
suffices therefore to show that
√
n
1 X
u
{ ut ⊗ ut − E(ut ⊗ ut )}
n vec Inu − I − I n = √
n t=1
is bounded in probability. This is straightforward from Lemma 4.
2
Lemma 10
lim Var
n→∞
hp
nbn vec
u
(I n
−
u
rI n)
i
= O(ρr ) as r → ∞.
Proof. We start by writing
n
vec
u
u
(I n − r I n )
=
X
0<|i|≤[a/bn ]
1X r
ut+i ⊗ut +ut+i ⊗ r ut − r ut+i ⊗ r ut .
ω(ibn )
n t=1
This double sum can obviously be split into six parts (distinguishing i > 0
and i < 0), and it will be sufficient to show for instance that


[a/bn ]
n
X
p
1X r
lim Var  nbn
ω(ibn )
ut+i ⊗ ut  = O(ρr )
n→∞
n
t=1
i=1
42
(33)
since the other terms can be treated in precisely the same way. The variance
in (33) can be written as
[a/bn ]
X
bn X
ω(ibn )ω(jbn )
(n − |h|)Cov { r u1+i ⊗ u1 , r u1+h+j ⊗ u1+h } . (34)
n i,j=1
|h|<n
The existence of these covariances is obtained by a straightforward extension
of Lemma 3. Proceeding as in the proofs of Lemmas 4 and 5 we find, for
i, j > 0

 0
when h 6= 0,
r
r
Cov { u1+i ⊗ u1 , u1+h+i ⊗ u1+h } =
 O(ρr ) when h = 0,
uniformly in i, and
+∞
X
h=−∞
kCov { r u1+i ⊗ u1 , r u1+h+j ⊗ u1+h }k ≤ Kρi ρj ρr when i 6= j.
It follows from (34) that (33) holds, which concludes the proof of this lemma.
2
Lemma 11 For m < k,
hp
i
u
u,κ
Var
(k − m + 1)bn vec ( r I n,(m,k) − r I n,(m,k) ) ≤ Kκ−ν/2 ,
where K is independent of m, k, n, κ.
Proof. We have
u
u,κ
vec ( r I n,(m,k) − r I n,(m,k) )
=
X
k
X
1
κ
ω(ibn )
ut+i ⊗ r ut + r ut+i ⊗ κr ut − κr ut+i ⊗ κr ut .
k − m + 1 t=m r
0<|i|≤[a/bn ]
43
Again, this double sum can be split into six parts and, as in the above lemma,
it will be sufficient to show for instance that
r

[a/bn ]
k
X
X
bn
κ
−ν/4

Var 
ω(ibn )
.
r ut+i ⊗ r ut ≤ Kκ
k − m + 1 i=1
t=m
(35)
Let κ ǫt = ǫt − ǫκt . Recall that E κ ǫt = Eǫt = Eǫκt = 0. Note that the ℓ-th
components of κr ut+i and r ut are
κ
r ut+i (ℓ)
=
r
X
ck1 ,ℓ
κ
ǫt+i ǫt+i−k1 +
ǫκt+i κ ǫt+i−k1
k1 =1
and
r ut (ℓ)
=
r
X
ck2 ,ℓ ǫt ǫt−k2 .
k2 =1
It is clear that in Lemma 3, some or all of the ǫt ’s can be replaced by the
truncated variables κ ǫt or ǫκt . Thus the variance in (35) is well defined. In
addition, by
κ
k ǫt k4 ≤ ǫt 1{|ǫt |>κ} 4 +Eǫt 1{|ǫt |≤κ} ≤
1
E |ǫt |4+ν
κν
and by the Hölder inequality we get
1/4
K
+Eǫt 1{|ǫt |>κ} ≤ ν/4
κ
|Cov (κ ǫ1+i ǫ1+i−k1 ǫ1 ǫ1−k2 , κ ǫ1+i ǫ1+i−k3 ǫ1 ǫ1−k4 )| ≤ kκ ǫt k24 kǫt k64 ≤
K
.
κν/2
The same inequality holds when the indexes are permuted and/or when the
ǫt ’s are replaced by the ǫκt ’s. Therefore we have, for i, j > 0

 0
when h 6= 0,
κ
κ
Cov { r u1+i ⊗ r u1 , r u1+i+h ⊗ r u1+h } =
 O(κ−ν/2 ) when h = 0,
+∞
X
h=−∞
kCov { κr u1+i ⊗ r u1 , κr u1+j+h ⊗ r u1+h }k ≤ Kρi ρj κ−ν/2 when i 6= j
and the conclusion follows as in Lemma 10.
44
2
2
Lemma 12 For any λ ∈ R(p+q) , λ 6= 0
p
bn λ′ vec
u,κ
2 −1/4
I
)
r n,(m,k) = O(κ bn
4
uniformly in m and k.
as κ → ∞
Proof. The variable in the left-hand side can be written as
1
k−m+1
where
Ut =
p
X
bn
ω(ibn )λ′
κ
r ut+i
0<i≤[a/bn ]
Pk
t=m
Ut
⊗ r uκt + r uκt ⊗ r uκt+i .
−1/2
It is clear that kr uκt k ≤ Kκ2 . Hence |Ut | ≤ Kκ4 bn
and EUt4 ≤ Kκ8 EUt2 b−1
n .
By arguments used in the proof of Lemma 4,
Cov
κ
r ut
⊗ r uκt+i , r uκt ⊗ r uκt+j = 0
for i, j > 0 and i 6= j. Therefore
VarUt = bn
X
ω 2 (ibn )λ′ Var
κ
r ut+i
0<i≤[a/bn ]
⊗ r uκt + r uκt ⊗ r uκt+i = O(1)
uniformly in κ and r. The conclusion follows.
2
Lemma 13 The following hold
limr→∞ limn→∞ Var
(a)
(b) if
kn bn → ∞ and kn n−1 → 0,
(c) if, moreover,
κn → ∞,
u
u nbn vec I n − r I n
= 0;
√
u
nbn vec r I n − r Sn = oP (1);
√
nbn vec ( r Sn − r Snκn ) = oP (1);
√
Proof. Part (a) is a direct consequence of Lemma 10.
45
Next we turn to (b). Observe that, in view of (21),
p
u
nbn vec r I n − r Sn
(ℓ+1)kn
pn −2 √
X bn X
X
√
=
ω(ibn )
r ut+i ⊗ r ut + r ut ⊗ r ut+i
n
ℓ=0
0<i≤[a/bn ]
t=(ℓ+1)kn −mn +1
√
n
X
bn X
+√
ω(ibn )
r ut+i ⊗ r ut + r ut ⊗ r ut+i
n
t=p k −m +1
0<i≤[a/bn ]
n n
n
is a sum of pn − 1 independent random matrices (for n large enough so that
kn − mn > mn ). Now, by the arguments of the proofs of Lemmas 4 and 5,
we have for i, j > 0

 0
when t + i 6= s + j,
Cov ( r ut ⊗ r ut+i , r us ⊗ r us+j ) =
 O(ρi ρj ) when t + i = s + j.
Therefore


m

 √b
X
X
n
√
ω(ibn )
Var
r ut+i ⊗ r ut + r ut ⊗ r ut+i

 n
t=k
0<i≤[a/bn ]
bn (m − k + 1)
m−k+1
= O
=O
.
n
n
Then
nbn Var
vec
u
rI n
Hence, since E
− r Sn
vec
u
rI n
(pn − 1)mn
n − pn kn + mn
+O
= o(1).
n
n
= 0, (b) is proved.
=O
− r Sn
For part (c), we note that
√
nbn vec ( r Sn − r Snκn ) is a sum of pn i.i.d.
variables whose common variance is
np
kn − mn
Var
(kn − mn )bn vec
n
46
u
u,κn
r I n,(1,kn −mn ) − r I n,(1,kn −mn )
o
.
Thus, in view of Lemma 11
Var
np
nbn vec ( r Sn −
κn
r Sn )
o
=O
pn (kn − mn )
ν/2
nκn
=O
1
ν/2
κn
= o(1)
(36)
when n → ∞. This establishes (c) and completes the proof of Lemma 13.
2
Lemma 14
p
nbn vech
u d
rI n ;
4
N 0, 2σ ̟
√
2
+
Dp+q
( rJ
u
⊗
+′
r J)Dp+q
.
nbn vech r I n are centered. By a trivial exten√
√
u
sion of part iv) of the proof of Lemma 9, nbn vech r I n and nbn vech (r Inu − E r Inu )
Proof. The random matrices
have the same asymptotic distribution. Is is easy to see that Lemmas 4 and
5 still hold when (ut ) and J are replaced by (r ut ) and r J. Therefore, by the
proof of Lemma 6,
lim Var
n→∞
np
u
nbn vech r I n
o
′
+
+
= 2σ 4 ̟ 2Dp+q
( r J ⊗ r J)Dp+q
.
u
By virtue of Lemma 13 (b) and (c), vech r I n , vech r Sn and vech r Snκn
have the same asymptotic distribution for appropriately chosen sequences
(kn ) and (κn ).
To establish the asymptotic normality of
√
nbn vech r Snκn , we will use the
Cramér-Wold device. Therefore we will show the asymptotic normality of
p
nbn λ′ vec r Snκn
pn −1
pn −1
X kn − mn p
X
u,κn
′
√
=
bn λ vec r I n,(ℓkn +1,ℓkn +kn −mn ) :=
Xnℓ ,
n
ℓ=0
ℓ=0
Xn :=
2
for any non trivial λ ∈ R(p+q) .
47
Observe that Xn is a sum of pn i.i.d. centered variables with common
variance
vnκn

 √b
n
√
= Var
 n
X
ω(ibn )
knX
−mn
κn
r ut+i
λ′
t=1
0<i≤[a/bn ]
⊗ r uκt n + r uκt n


κn
⊗ r ut+i
.

By (36), vnκn is asymptotically equivalent, when κn → ∞, to


knX
−mn
 √b

X
n
′
√
vn = Var
ω(ibn )
λ ( r ut+i ⊗ r ut + r ut ⊗ r ut+i ) .
 n

t=1
0<i≤[a/bn ]
The arguments used to prove Lemmas 4 and 5 show that vn = O
kn −mn
n
Next, we will verify the Lindeberg condition. For any ε > 0,
pn −1
X
ℓ=0
1
pn vnκn
Z
n
√ κn
|Xnℓ |≥ε pn vn
2
o Xnℓ dP
1
= κn
vn
Z
n
√ κn o
|Xn1 |≥ε pn vn
.
2
Xn1
dP
4
EXn1
.
ε2 pn (vnκn )2
4 8
= O knn2κbnn . Therefore
≤
4
Now Lemma 12 implies that EXn1
pn −1
X
ℓ=0
1
pn vnκn
Z
n
√ κn
|Xnℓ |≥ε pn vn
2
o Xnℓ dP
=O
kn3 κ8n
nbn
To fulfill the Lindeberg condition, it therefore suffices that
.
3 κ8
kn
n
nbn
→ 0. From
Lemma 13 (c), it is also required that κn → ∞. Therefore, in view Lemma 13
(b), it remains to show that we can find kn such that
This is obvious because
3
kn
nbn
=
(kn bn
nb4n
)3
3
kn
nbn
→ 0 and kn bn → ∞.
and it is supposed that nb4n → ∞.
2
Proof of Theorem 2. It comes from Lemma 9, Lemma 14, Lemma 13 (a),
and a standard argument (see e.g. Billingsley (1995), Theorem 25.5).
48
2
7.3
Lemmas and proofs for Theorem 3
In all subsequent lemmas, the assumptions of Theorem 2 are supposed to be
satisfied.
Lemma 15 Under the assumptions of Theorem 3,
p
ˆ
nbn vec J − J = oP (1).
Proof. By arguments already employed, we have
o
n√
1 X
∂ǫ1 ∂ǫ1 ∂ǫ1+h ∂ǫ1+h
ˆ
n vec J − J
=
Var
(n − |h|)Cov
⊗
,
⊗
+ o(1)
n
∂θ
∂θ ∂θ
∂θ
|h|<n
= O(1).
Since bn → 0, the conclusion follows.
2
Proof of Theorem 3. From Lemma 15 and straightforward algebra, we
have
n
o
p
p
(2)
(1)
−1 ˆ
−1
nbn vech Σ̂ − Σ̂
=
nbn vech J
I −I J
+ oP (1)
p
+
= Dp+q
J −1 ⊗ J −1 Dp+q vech nbn Iˆ − I + oP (1).
The first convergence follows from Theorem 2, by application of the relation
+
Dp+q Dp+q
(J ⊗ J)Dp+q = (J ⊗ J)Dp+q .
(see Magnus and Neudecker, 1988, Theorem 3.13). The second convergence
is deduced, as in Theorem 3.
2
Proof of Theorem 8. Under the assumptions of Theorem 8, Francq and
Zakoïan (2000, Theorems 2 and 3) have shown that Iˆ and Jˆ are weakly consistent estimators of I and J. We deduce that Σ̂(1) , Σ̂(2) and Λ̂−1 are weakly
49
consistent estimators of Σ(1) , Σ(2) and Λ−1 . Therefore Υn = nbn (c + oP (1)) ,
′
with c = vech Σ(2) − Σ(1) Λ−1 vech Σ(2) − Σ(1) . Since Σ(1) 6= Σ(2) and Λ−1
is positive definite we have c > 0. Because nbn tends to infinity, we have
lim P Υn > χ2(p+q)(p+q+1)/2 (1 − α)
n→∞
= lim P nbn (c + oP (1)) > χ2(p+q)(p+q+1)/2 (1 − α) = 1,
n→∞
for any α ∈ (0, 1).
2
Proof of Lemma 1. Let κ1 and κ2 be constants such that
0 < κ1 < κ2 < 1, κ < κ1 , κ1 + κ < κ2
(1 + r ∗ )(2 + ν ∗ )
ν∗
κ2 > 1 −
κ
,
κ
<
.
1
2
ν∗
2(1 + ν ∗ )
(37)
Figure 3 shows that these inequalities are compatible. Define sequences of
integers (kn ), (mn ), (qn ) and (pn ) by
κ2
kn = [n ],
κ1
mn = [n ],
qn = kn − mn ,
n
.
pn =
kn
Note that
qn ∼ kn ∼ nκ2 ,
mn ∼ nκ1 ,
pn ∼ n1−κ2
as n → ∞. Employing a standard technique, we split the sum Sn =
(38)
Pn
t=1
xn,t
into pn alternate "big" blocks of length qn and "small" blocks of length mn .
More precisely, write Sn = Sn′ + Sn′′ , where
pn −1
′
Sn =
X
ξn,ℓ
ξn,ℓ = xn,ℓkn+1 + · · · + xn,ℓkn +qn
X
ζn,ℓ
ζn,ℓ = xn,ℓkn +qn +1 + · · · + xn,(ℓ+1)kn for ℓ ≤ pn − 1
ℓ=0
pn
′′
Sn =
ℓ=0
ζn,pn = xn,pn kn +1 + · · · + xn,n .
50
κ2
κ2 = κ1 + κ
A
B
ν∗
2(1+ν ∗ )
C
κ1
κ
κ2 = 1 −
(1+r ∗ )(2+ν ∗ )
ν∗
κ1
ν∗
ν∗
Inequalities (37).
Let A =
,
=
2(1+r ∗ )(1+ν ∗ ) 2(1+ν ∗ ) , B
∗
(1−κ)
ν ∗ (1−κ)
ν∗
− κ, 2(1+ν
, C = 2+2rν∗ +2ν
. We have xB :=
∗)
∗ +r ∗ ν ∗ , κ + 2+2r ∗ +2ν ∗ +r ∗ ν ∗
Figure 3:
ν∗
2(1+ν ∗ )
ν∗
2(1+ν ∗ )
− κ > κ iff κ <
ν∗
4(1+ν ∗ ) ,
and yC := κ +
ν ∗ (1−κ)
2+2r ∗ +2ν ∗ +r ∗ ν ∗
<
ν∗
2(1+ν ∗ )
iff r∗ >
2κ(1+ν ∗ )
ν ∗ −2κ(1+ν ∗ ) .
To prove the CLT it suffices to show that for n → ∞
′′
i) n−1 ESn 2 → 0,
Q n −1
′
ii) E exp(itn−1/2 Sn ) ∼ pℓ=0
E exp(itn−1/2 ξn,ℓ ) with i2 = −1,
P n −1 2
iii) pn → ∞ and n−1 pℓ=0
Eξn,ℓ I{|ξn,ℓ |≥n1/2 ǫ} → 0 for every ǫ > 0.
Indeed, i) implies that n−1/2 Sn and n−1/2 Sn′ have the same asymptotic distribution, ii) implies that the characteristic function of n−1/2 Sn′ is asymptotPpn −1 ′
′
ically equivalent to that of n−1/2 ℓ=0
ξn,ℓ where the ξn,ℓ are independent,
and distributed like ξn,ℓ , and iii) is simply the Lindeberg condition ensuring the central limit theorem for independent, but not necessarily identically
distributed, random variables.
51
Using the Davydov (1968) inequality, there exists an universal constant
K0 (from Rio (1993), one can take K0 = 4), such that for ν ∗ < ∞,
n
n
X
X
t=1 s=t+q+1
|Exn,t xn,s | ≤ K0
n
n
X
X
ν∗
kxn,t k2+ν ∗ kxn,s k2+ν ∗ {αn (s − t)} 2+ν ∗
t=1 s=t+q+1
X
≤ K0 n sup kxn,t k22+ν ∗
t
k>q
ν∗
{αn (k)} 2+ν ∗
and
E
t0 +m−1
X
xn,t
t=t0
!2
≤
K0 sup kxn,t k22+ν ∗
t
m−1
X
k=0
ν∗
(m − |k|) {αn (k)} 2+ν ∗ .
Using inequality (1.4) in Ibragimov (1962), the same inequalities hold
when ν ∗ = ∞ (with 2 + ν ∗ = ∞ and ν ∗ /(2 + ν ∗ ) = 1). Thus
pn −1
′′ 2
ESn
≤
X
2
Eζn,ℓ
+
2
Eζn,p
n
n
X
t=1 s=t+qn +1
ℓ=0
≤ Kpn mn
+2
n
X
m
n −1
X
k=0
ν∗
{αn (k)} 2+ν ∗ + Kkn
≤ Kpn mn (Tn + 1) + Kkn2 + Kn
|Exn,t xn,s |
kX
n −1
k=0
X
ν∗
{αn (k)} 2+ν ∗ + Kn
k>qn −Tn
{α(k)}
ν∗
2+ν ∗
X
k>qn
ν∗
{αn (k)} 2+ν ∗
for some positive constant K. Hence i) holds since, in view of (37), pn mn (Tn +
1)/n ∼ n−κ2 +κ1 +κ → 0, kn2 /n ∼ n2κ2 −1 → 0 and qn − Tn ∼ nκ2 → ∞.
Using the Ibragimov inequality (1962) and the fact that un = O n−r
P
∗
when un ↓ 0 and n nr un < ∞, we obtain
p −1
pn −1
n
Y
Y
−1/2
−1/2
exp(itn
ξn,ℓ ) −
E exp(itn
ξn,ℓ )
E
ℓ=0
≤ 4pn αn (mn ) ≤ 4pn
∗ −1
ℓ=0
1
(mn − Tn )
(1+r ∗ )(2+ν ∗ )
ν∗
52
∼ n1−κ2 −
(1+r ∗ )(2+ν ∗ )
κ1
ν∗
→0
which establishes ii).
When ν ∗ < ∞, by the Hölder and Markov inequalities, we have
pn −1
−1
n
X
ℓ=0
−ν ∗
2
2+ν ∗
Eξn,ℓ
I{|ξn,ℓ |≥n1/2 ǫ} ≤ n−1 pn n1/2 ǫ
Eξn,ℓ
−ν ∗
∗
∗
(1+ν ∗ )κ2 −ν ∗ /2
≤ n−1 pn n1/2 ǫ
(qn )2+ν sup kxn,t k2+ν
→ 0,
2+ν ∗ ∼ n
t
which establishes iii) in the case ν ∗ < ∞. Finally, when M := supn≥1 sup1≤t≤n kxn,t k∞ <
∞, we have
kξn,ℓ k∞ ≤ Mqn <
√
nǫ,
and the sum in iii) equals 0, for sufficiently large n. The proof is complete.
2
Lemma 16 Let (ut ) be any stationary centered process verifying A5′′ . Then
n
X
⊗4
E ut ⊗ ut+i1 ⊗ ut+i2 ⊗ ut+i3 ⊗ ut+i4 = O(n2 ).
i1 ,i2 ,i3 ,i4 =1
Proof. We will only consider the first component of the vector inside the
norm. Let ut the first component of ut . The first component of the sum is
P
P
P
bounded by 4! 4k=1 (k) |Eu4t ut+i1 ut+i2 ut+i3 ut+i4 |, where (k) denotes the
sum over the indices such that i1 ≤ i2 ≤ i3 ≤ i4 and ik −ik−1 = max1≤r≤4 ir −
ir−1 (with i0 = 0).
Using the Davydov (1968) inequality, for i1 ≤ i2 ≤ i3 ≤ i4
ν
Cov u4 ut+i1 ut+i2 ut+i3 , ut+i4 ≤ K0 ku4 ut+i1 ut+i2 ut+i3 k 8+ν kut+i4 k8+ν αu (|i4 − i3 |) 8+ν
t
t
7
≤
K0 kut k88+ν αu (|i4
53
ν
− i3 |) 8+ν .
Therefore, we have
X
X
Eu4 ut+i1 ut+i2 ut+i3 ut+i4 =
Cov u4 ut+i1 ut+i2 ut+i3 , ut+i4 t
t
(4)
≤ n
n
X
h=0
(4)
ν
(h + 1)2 K0 kut k88+ν αu (h) 8+ν = O(n).
(39)
The last inequality has been obtained by setting h = i4 − i3 , by noting that
there exist at most n values for the subscript i4 , and that, when h and i4 are
fixed, it remains at most h + 1 possibilities for i1 and for i2 .
P
The term involving (3) is bounded by
X
X
Cov u4 ut+i1 ut+i2 , ut+i3 ut+i4 +
Eu4 ut+i1 ut+i2 Eut+i3 ut+i4 .
t
t
(3)
(3)
By the arguments used to show (39), it can be shown that the first sum is
O(n). The second sum is bounded by
)2
( h
n
X
X
ν
= O(n2 ),
n
αu (ℓ) 8+ν
h=0
ℓ=0
setting h = i3 − i2 , arguing that there exist at most n possibilities for i2 , and
2+ν
that we have |Eu4t ut+i1 ut+i2 | ≤ K0 kut k68+ν αu (ℓ) 8+ν with ℓ = i2 − i1 ≤ h, and
6+ν
|Eut+i3 ut+i4 | ≤ K0 kut k28+ν αu (ℓ) 8+ν with ℓ = i4 − i3 ≤ h. Similarly, we have
P
4
2
(2) |Eut ut+i1 ut+i2 ut+i3 ut+i4 | = O(n ).
P
The last term, (1) , is bounded by
X
X 4
Cov u4 , ut+i1 ut+i2 ut+i3 ut+i4 +
Eu Eut+i1 ut+i2 ut+i3 ut+i4 t
t
(1)
≤
n
X
h=0
(1)
ν
K0 kut k88+ν (h + 1)3 αu (h) 8+ν + Eu4t
≤ O(n) + Eu4t
(k)
n X
3 X
X
h=0 k=1
n
X
h=0 0≤i1 ≤i2 ≤i3 ≤i4 ≤h
|Eu0 ui2 −i1 ui3 −i1 ui4 −i1 | ,
54
X
|Eut+i1 ut+i2 ut+i3 ut+i4 |
(40)
P(k)
where
denotes the sum over the indices such that 0 ≤ i1 ≤ i2 ≤ i3 ≤
i4 ≤ h and ik+1 − ik = max1≤r≤3 ir+1 − ir . We have
(3)
X
|Eu0 ui2 −i1 ui3 −i1 ui4 −i1 | =
≤
The same bound holds when
equal to
(3)
X
h
X
ℓ=0
P(3)
|Cov (u0ui2 −i1 ui3 −i1 , ui4 −i1 )|
4+ν
(ℓ + 1)2 K0 kut k48+ν αu (ℓ) 8+ν = O(1).
is replaced by
(2)
X
P(1)
. The term
P(2)
is
(2)
X
|Cov (u0 ui2 −i1 , ui3 −i1 ui4 −i1 )| +
|Eu0 ui2 −i1 Eui3 −i1 ui4 −i1 |
)2
( ℓ
h
X
X
6+ν
= O(h).
≤ O(1) +
αu (ℓ) 8+ν
ℓ=0
ℓ′ =0
It follows that the right-hand side of the inequality (40) is O(n2 ), which
completes the proof.
2
Proof of Theorem 5. It can be shown by adapting the proof of i)-iii) in
Lemma 9, that
p
p
ˆ
nbn vech I − I = nbn vech (Inu − I) + oP (1).
Write
p
where
a1,n =
nbn vech (Inu − I) := a1,n + a2,n
p
nbn vech (Inu − EInu ) ,
a2,n =
p
nbn vech (EInu − I) .
We will show that
(i) a1,n converges to the normal distribution of the theorem,
55
(ii) a2,n = oP (1).
For any λ ∈ R(p+q)(p+q+1)/2 , λ 6= 0, we have λ′ a1,n = n−1/2
xn,t = yn,t − Eyn,t ,
yn,t =
p
bn
X
|i|≤[a/bn ]
Pn
t=1
xn,t where
+
ω(ibn )λ′ Dp+q
ut+i ⊗ ut .
(41)
To show (i) we will use Lemma 1.
We begin to show that supn kxt,n k4 < ∞. We have, by the Davydov
(1968) inequality
kEyn,t k ≤ K
p
bn
X
|i|≤[a/bn ]
6+ν
ω(ibn )kut k28+ν {αu (i)} 8+ν = o(1).
Moreover, since ω is bounded,
kyn,tk44 ≤ Kb2n
X
X
Eut (j1 ) . . . ut (j4 )ut+i1 (j5 ) . . . ut+i4 (j8 )
j1 ,...,j8 |i1 |,...,|i4 |≤[a/bn ]
= O(1)
where the last equality follows from Lemma 16. Hence
kxt,n k4 ≤ kyn,t k4 + kEyn,t k = O(1),
and (12) holds with ν ∗ = 2.
From Lemma 1 in Andrews (1991), Assumption A5” implies that, for all
ℓ1 , . . . , ℓ4
+∞
X
j1 ,j2 ,j3 =−∞
where the
u
κ
ℓ1 ,...,ℓ4 (0, j1 , j2 , j3 ) < ∞
κuℓ1 ,...,ℓ4 (0, j1 , j2 , j3 )’s
(42)
denote the fourth order cumulants of
(ut (ℓ1 ), ut+j1 (ℓ2 ), ut+j2 (ℓ3 ), ut+j3 (ℓ4 )). Hence, by Proposition 1 (a) of Andrews
P
+
+′
(1991), n−1 Var ( nt=1 xt,n ) converges to 2̟ 2 λ′ Dp+q
(I ⊗ I)Dp+q
λ > 0. Thus
(13) holds.
56
Note that xn,t is a measurable function of the ut−[a/bn ] , . . . , ut+[a/bn ] . Note
that, by an argument already used in the proof of Lemma 1, A5” implies
ν
{αu (k)} 8+ν = O(k −(r+1) ), from which we deduce
∞
X
k=1
k {αu (k)}1/2 = O
∞
X
k
(r+1)(8+ν)
1−
2ν
k=1
!
= O(1).
Thus (15) holds with Tn = 2[a/bn ], α(·) = αu (·), r ∗ = 1 and ν ∗ = 2. Clearly,
1/κ
lim inf nbn
> 0 implies Tn = O(nκ ). Therefore (14) holds.
Finally, Lemma 1 entails (i).
To show (ii), write
a2,n =
p
nbn
X
|i|≤[a/bn ]
{ω(ibn ) − 1} vech∆i −
p
nbn
X
vech∆i .
|i|>[a/bn ]
Since
6+ν
k∆i k ≤ Kkut k28+ν {αu (i)} 8+ν
and {αu (i)} is a decreasing sequence, Assumption A5” implies that k∆i k =
O(i−(r+1)(1+6/ν) ). Therefore
p
nbn
X
|i|>[a/bn ]
p
p
0 +1
kvech∆i k = O( nb2r+1
)
=
O(
nb2r
) = o(1).
n
n
Now
X
p
nbn
{ω(ibn ) − 1} vech∆i |i|≤[a/bn ]
X p
ω(ibn ) − 1 r0
0 +1
≤
nb2r
i kvech∆i k = o(1)
n
(ibn )r0
|i|≤[a/bn ]
0 +1
in view of the Lebesgue theorem, kI (r0 ) k < ∞, lim nb2r
= 0 and since the
n
function (ω(x) − 1)x−r0 is bounded. Hence (ii) is shown and the proof is
complete.
57
8
Conclusion
In this paper we have proposed a test of strong linearity in the framework of
weak ARMA models. We have derived the asymptotic distribution of the test
statistic under the null hypothesis and we have shown the consistency of the
test. The usefulness of this test is the following. When the null assumption
is not rejected, there is no evidence against standard strong ARMA models.
In this case, there is no reason to think that the ARMA predictions are not
optimal in the least squares sense. When the null assumption is rejected, two
different strategies can be considered. A weak ARMA model can be fitted,
following the lines of Francq and Zakoïan (1998, 2000), and used for optimal
linear prediction. Another approach is to fit a nonlinear model to provide the
(nonlinear) optimal prediction, though it can be a difficult task to determine
the most appropriate nonlinear model.
Finally, we believe that the asymptotic distribution of the HAC estimator
established in this paper has its own interest, apart from the proposed test.
Other assumptions on the ACM could be tested, such as noninvertibility
which may indicate some misspecification of the model.
FOOTNOTES
1. For ease of presentation we have not included a constant in the ARMA
model. This can be done without altering the asymptotic behaviours
of the estimators and test statistics introduced in the paper. The subsequent analysis apply to data that have been adjusted by subtraction
of the mean.
58
2. When the strong mixing coefficients decrease at an exponential rate
(which is the case for a large class of processes) ν can be chosen arbitrarily small in A5 or A5′ . Thus Eǫ4+2ν
< ∞ is a mild assumption.
t
3. Indeed, we have
X
i
ˆ i (θ̂n ) = n−1
∆
(
X
t
∂
et (θ̂n ) et (θ̂n )
∂θ
)2
= 0.
4. Recent papers by Kiefer and Vogelsang (2002a, 2002b) have suggested
the use of kernels with bandwidth equal to the sample size, i.e. bn =
1/n in our notations. It is well known that this choice of bandwidth
results in inconsistent long-run variance estimators. However, Kiefer
and Vogelsang have shown that, because the long-run variance matrix plays the role of a nuisance parameter, asymptotically valid tests
statistics (nuisance parameter free) can be constructed based on such
inconsistent estimators. In our framework it is of course crucial to have
a consistent estimator of the matrix I.
5. Dm transforms vech(A) into vec(A), for any symmetric m × m matrix
A.
6. As noted by Andrews (1991), for the rectangular kernel, ω(r) = 0 for all
r ≥ 0. For the Bartlett kernel, ω(1) = 1 and ω(r) = ∞ for all r > 1. For
the Parzen and Tukey-Hanning kernels, ω(2) = 6 and π 2 /4, respectively,
ω(r) = 0 for r < 2, ω(r) = ∞ for r > 2.
59
References
Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent
covariance matrix estimation. Econometrica 59, 817–858.
Berlinet, A. & C. Francq (1999) Estimation des covariances entre autocovariances empiriques de processus multivariés non linéaires. La revue
Canadienne de Statistique 27, 1–22.
Billingsley, P. (1995) Probability and Measure. Wiley, New-York.
Brockwell, P.J. & R.A. Davis (1991) Time Series: Theory and Methods.
Springer-Verlag, New-York.
de Jong, R.M. & J. Davidson (2000) Consistency of Kernel Estimators of
Heteroskedastic and Autocorrelated Covariance Matrices. Econometrica 68, 407–424.
Carrasco, M. & X. Chen (2002) Mixing and Moment Properties of Various
GARCH and Stochastic Volatility Models. Econometric Theory 18,
17–39.
Davydov, Y.A. (1968) Convergence of Distributions Generated by Stationary Stochastic Processes. Theory of Probability and Its Applications
13, 691–696.
Francq, C. & J.M. Zakoïan (1998) Estimating linear representations of nonlinear processes. Journal of Statistical Planning and Inference 68, 145–
165.
60
Francq, C. & J.M. Zakoïan (2000) Covariance matrix estimation for estimators of mixing weak ARMA models. Journal of Statistical Planning
and Inference 83, 369–394.
Francq, C., Roy, R. & J.M. Zakoïan (2004) Goodness-of-fit Tests for ARMA
Models with Uncorrelated Errors. Technical report CRM-2925, Centre
de recherches mathématiques, université de Montréal.
Gallant, A.R. & H. White (1988) A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. New York: Basil Blackwell.
Hansen, B.E. (1992) Consistent covariance matrix estimation for dependent
heterogeneous processes. Econometrica 60, 967–972.
Hansen, L.P. (1982) Large Sample Properties of Generalized Method of
Moments Estimators. Econometrica 50, 1029–1054.
Harville, D.A. (1997) Matrix Algebra From a Statistician’s Perspective. SpringerVerlag, New-York.
Herrndorf, N. (1984) A Functional Central Limit Theorem for Weakly Dependent Sequences of Random Variables The Annals of Probability 12,
141–153.
Hong, Y. (1996) Consistent Testing for Serial Correlation of Unknown Form.
Econometrica 64, 837–864.
Hong, Y. (1999) Testing for Serial Independence via the Empirical Characteristic Function, Working Paper, Department of Economics, Cornell
University.
61
Ibragimov, I.A. (1962) Some Limit Theorems for Stationary Processes. Theory Probability and its Applications 7, 349–382.
Kiefer, N.M. & T.J. Vogelsang (2002a) Heteroskedasticity-autocorrelation
Robust Testing Using Bandwidth equal to Sample Size. Econometric
Theory 18, 1350–1366.
Kiefer, N.M. & T.J. Vogelsang (2002b) Heteroskedasticity-autocorrelation
Robust Standard Errors Using the Bartlett Kernel without Truncation.
Econometrica 70, 2093–2095.
Lobato, I.N. (2002) Testing for Zero Autocorrelation in the Presence of
Statistical Dependence. Econometric Theory 18, 730–743.
Magnus, J.R. & H. Neudecker (1988) Matrix Differential Calculus with Application in Statistics and Econometrics. New-York, Wiley.
May, R.M. (1976) Simple mathematical models with very complicated dynamics. Nature 261, 459–467.
Newey, W.K. & K.D. West (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708.
Parzen, E. (1957) On Consistent Estimates of the Spectrum of a Stationary
Time Series. Annals of Mathematical Statistics 28, 329–348.
Pham, D.T. (1986) The Mixing Property of Bilinear and Generalized Random Coefficient Autoregressive Models. Stochastic Processes and their
Applications 23, 291–300.
62
Phillips, P.C.B (1987) Time series regression with a unit root. Econometrica
55, 277-301.
Phillips, P.C.B., Sun, Y. & S. Jin (2003) Consistent HAC Estimation and
Robust Regression testing Using Sharp Origin Kernels with No Truncation. Discussion Paper, Yale University.
Priestley, M.B. (1981) Spectral Analysis and Time Series. Vols. 1 and 2,
Academic press, New York.
Romano, J.P. & L.A. Thombs (1996) Inference for autocorrelations under
weak assumptions. Journal of the American Statistical Association 91,
590–600.
Van Der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge University
Press, Cambridge.
Withers, C.S. (1981) Central Limit Theorems for Dependent Variables.
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandete Gebiete 57,
509–534
63