Cumulated sum of squares statistics for non-linear

Cumulated sum of squares statistics
for non-linear and non-stationary regressions
Vanessa Berenguer-Rico and Bent Nielseny
03 August 2015
Abstract
We show that the cumulated sum of squares test has a standard Brownian bridge-type
asymptotic distribution in non-linear regression models with non-stationary regressors. This
contrasts with cumulated sum tests which have been studied previously and where the asymptotic distribution involves nuisance quantities. Through simulation we show that the power is
comparable in a wide of range of situations.
Keywords: Cumulated sum of squares, Non-linear Least Squares, Non-stationarity, Speci…cation tests.
JEL classi…cation: C01; C22.
1
Introduction
An increasing range of non-linear models with non-stationary regressors are available in the literature. We show that the speci…cation of such models can be investigated with ease using a cumulated
sum of squares test with a standard Brownian bridge asymptotic distribution.
The Brownian bridge asymptotic result of the cumulated sum of squares test has been derived
in a linear model framework with stationary and non-stationary regressors, see for instance Brown,
Durbin and Evans (1975), McCabe and Harrison (1980), Ploberger and Krämer (1986), Lee, Na
and Na (2003), or Nielsen and Sohkanen (2011). In this paper, we …rst provide a set of general suf…cient assumptions for the Brownian Bridge result to hold. Then, we show that these assumptions
are satis…ed in several di¤erent scenarios dealing with non-linear regression functions involving
stationary or non-stationary regressors. In contrast, cumulated sum tests based directly on the
residuals rather than on their squares have a more complicated asymptotic theory with nuisance
terms when the regressors are nonstationary, see Hao and Inder (1996), Xiao and Phillips (2002),
Kasparis (2008), Choi and Saikkonen (2010) or Berenguer-Rico and Gonzalo (2014).
The paper is organized as follows. In Section 2, the model and test statistics are put forward.
Sections 3 and 4 provide, respectively, high-level and medium-level sets of su¢ cient assumptions for
the Brownian bridge result. Section 5 shows that the assumptions in Sections 3 and 4 are satis…ed
in various non-linear models. In Section 6 the performance of the test in terms of size and power
is investigated through Monte Carlo experiments. The proofs follow in an Appendix.
2
Model and statistics
Consider data (y1 ; x1 ); : : : ; (yn ; xn ) where yt is a scalar and xt is a p-vector. Consider the non-linear
regression model
yt = g(xt ; ) + "t
t = 1; : : : ; n;
(2.1)
y
Department of Economics, University of Oxford, Mans…eld College, and Programme for Economic Modelling.
Department of Economics, University of Oxford, Nu¢ eld College, and Programme for Economic Modelling.
1
2
where the functional form of g is known. The innovation "t is a martingale di¤erence sequence with
respect to a …ltration Ft with zero mean, variance 2 and fourth moment '2 = E"4t (E"2t )2 , the
regressor xt is a p-vector Ft 1 -adapted, and the parameter is a q-vector varying in a parameter
space
Rq . The model is a conditional mean model where any unmodelled autocorrelation or
correlation between "t and xt will be regarded as misspeci…cation.
The non-linear least squares estimator ^n of is the minimizer of the least squares criterion
Qn ( ) =
n
X
t=1
g(xt ; )g2 :
fyt
(2.2)
The least squares residuals based on the full sample estimation are then "^t;n = yt
The cumulated sum of squares statistic, is de…ned as
!
t
n
X
1
1
tX 2
2
CU SQn =
max p
"^s;n
"^s;n ;
'^n 1 t n n
n
s=1
g(xt ; ^n ):
s=1
where the standard deviation estimator can be chosen as, for instance,
!2
n
n
1X 4
1X 2
2
'^n =
"^t;n
"^t;n :
n
n
t=1
t=1
We will argue that under quite general assumptions,
D
CU SQn ! sup Bu0 ;
0 u 1
where Bu0 is a standard Brownian bridge. Billingsley (1999, pp. 101–104) gives an analytic expression for the distribution function. In particular, the 95% quantile is 1.36; see Koziol and Byar
(1975, Tab. 1).
We also consider a recursive cumulated sum of squares statistic, where the model (2.1) is
estimated recursively. Then de…ne the recursive statistic
!
t
n
X
X
t
1
1
"^2s;t
"^2s;n :
RCU SQn =
max p
'^n n0 t n n
n
s=1
s=1
If the sequence of estimators ^n converges strongly, we can show that also
D
RCU SQn ! sup Bu0 :
0 u 1
3
Results under High Level Assumptions
We start by proving the Brownian bridge results under a set of high level assumptions to the
residuals and martingale di¤erence innovations.
Assumption 3.1 Suppose ("t ; Ft ) is a martingale di¤ erence sequence with respect to a …ltration
Ft ; that is "t is Ft -adapted and E("t jFt 1 ) = 0 a:s:; so that
(a) E("2t jFt 1 ) = 2 a:s:;
4 jF
2
(b) E("4t
t 1 ) = ' a:s:;
(c) supt E("t jFt 1 ) < 1 a:s: for some > 4:
The …rst result shows that the tied down cumulated sum of squared innovations converges to
a Brownian bridge. This follows from the standard functional central limit theorem for martingale
di¤erences, see for instance Brown (1971).
3
Theorem 3.1 Suppose Assumption 3.1 is satis…ed. Let Bu0 be a standard Brownian bridge. Then,
as a process on D[0; 1], the space of right continuous functions with left limits endowed with the
Skorokhod metric,
!
n
1X 2 D
"t ! 'Bu0
n
t=1
!2
n
1X 2
D
"t
! '2 :
n
[nu]
1 X 2
p
"t
n t=1
n
1X 4
"t
n
t=1
u 2 [0; 1];
t=1
We would like to formulate similar results for the cumulated sum of squared residuals. This
can be done as long as the squares of residuals and innovations are close. We formulate this as two
assumptions.
Assumption 3.2 max1
Assumption 3.3 n
1
t n
n
Pn
"4t
t=1 (^
1=2
Pt
"2s;n
s=1 (^
"2s ) = oP (1):
"4t ) = oP (1):
We will later show that Assumptions 3.2 and 3.3 are satis…ed in a wide range of situations.
Under these assumptions we then have the following result.
D
Theorem 3.2 If Assumptions 3.1, 3.2, 3.3 are satis…ed then CU SQn ! sup0
u 1
Bu0 :
For the recursive version of the result we need to strengthen Assumption 3.2.
Assumption 3.4 max1
t n
n
1=2
Pt
"2s;t
s=1 (^
"2s ) = oP (1):
D
Theorem 3.3 If Assumptions 3.1, 3.3, 3.4 are satis…ed then RCU SQn ! sup0
u 1
Bu0 :
For a linear model it is possible to analyze Assumptions 3.2, 3.3 and 3.4 directly. In this
way Nielsen and Sohkanen (2011) consider the case of the linear autoregressive distributed lag
model with non-stationary (possibly explosive) regressors. Their Lemma 4.2 and Theorem 4.5
show that Assumptions 3.3 and 3.4 are satis…ed under the martingale di¤erence Assumption 3.1.
For non-linear models it is useful to formulate a set of intermediate level assumptions that imply
Assumptions 3.2, 3.3 and 3.4. We do this in the following.
4
Intermediate Level Results
In the non-linear regression model (2.1) we can replace the high level Assumptions 3.2 and 3.3 by
local consistency of ^n and smoothness of the criterion function.
Assumption 4.1 Let
< 1=4: Suppose Nn;10 ( ^n
0)
is either (a) oP (n ) or (b) o(n ) a:s:
The normalization Nn;10 allows both stationary and non-stationary regressors. In linear models
Nn;10 = n1=2 for stationary regressors and Nn;10 = n for random walk regressors. In more general
cointegrated models Nn;10 may be block diagonal with di¤erent normalizations in di¤erent blocks,
see Kristensen and Rahbek (2010). In non-linear models the normalization may depend on the
parameter under which we evaluate the distributions. We use the notation 0 to emphasize this
choice of parameter.
The following smoothness assumption involves normalized sums of the …rst two derivatives of
the known function g with respect to : These are the q-vector g(x
_ t ; ) = @g(xt ; )=@ and the q q
0
square matrix g•(xt ; ) = @g(xt ; )=@ @ . We will need a matrix norm. In the proof we use the
spectral norm, but at this point any equivalent matrix norm can be used.
4
Assumption 4.2 Suppose xt is Ft 1 -measurable and g(xt ; ) is twice di¤ erentiable with respect to
: Let < 1=4 be the consistency
rate in Assumption 4.1 and let > 0: Suppose
Pn
(a) sup :jjN 1 ( 0 )jj n
g(xt ; 0 )g2 = oP (n1=2 );
t=1 fg(xt ; )
n; 0
Pn
(b) sup :jjN 1 ( 0 )jj n
g(xt ; 0 )g4 = oP (n);
t=1 fg(xt ; )
n; 0
P
0
(c) nt=1 jjNn;
g(x
_ t ; 0 )jj2 = OP (n1 2 ) for some > 0;
0
Pn
0
(d) t=1 jjNn;
g•(xt ; 0 )Nn; 0 jj2 = OP (n1 4 ) for some > 0;
0
Pn
0
(e) sup :jjN 1 ( 0 )jj n
g (xt ; ) g•(xt ; 0 )g Nn; 0 jj2 = OP (n 4 ):
t=1 jjNn; 0 f•
n; 0
Finally, we need some technical conditions to ensure invertibility of certain matrices.
P
Assumption 4.3 Suppose inf[n : nt=1 wt wt0 is invertible] < 1 a:s: for wt = g(x
_ t ; 0 ) and wt =
vecf•
g (xt ; 0 )g with the convention that the empty set has in…nite in…mum. Moreover, suppose
Nn;10 = O(n` ) for some ` > 0:
We can now show that Assumptions 3.2, 3.3, 3.4 are satis…ed. Subsequently, we return to a
discussion of the assumptions.
Theorem 4.1 Assumptions 3.1,4.1(a),4.2,4.3 imply Assumptions 3.2,3.3 so Theorem 3.2 applies.
For the recursive cumulated sum of squares statistic we require strong uniformity properties.
If the estimator is strongly consistent we can get that uniformity from Egorov’s Theorem, see
Davidson (1994, Theorem 18.4).
Theorem 4.2 Assumptions 3.1,4.1(b),4.2,4.3 imply Assumptions 3.3,3.4 so Theorem 3.3 applies.
P
"2s;n "2s ) through a martingale decomposition. Noting that
In the proof we analyze n 1=2 ns=1 (^
"^s;n "s = rg(xs ; ^n ) = g(xs ; ^n ) g(xs ; 0 ) and expanding (" r)2 "2 = 2"r + r2 we get
n
1=2
n
X
(^
"2s;n
s=1
"2s )
=
2n
1=2
n
X
s=1
"s rg(xs ; ^n ) + n
1=2
n
X
frg(xs ; ^n )g2 :
(4.1)
s=1
Due to Assumption 4.1 the estimator ^n varies in a local region around 0 : Thus, it su¢ ces to replace
^n with a deterministic value and show that the sums in (4.1) vanish uniformly over the local
region. These sums are a martingale and its compensator. Now, the compensator vanishes under
Assumption 4.2(a). Jennrich (1969, Theorem 6) uses a similar condition when proving consistency
of non-linear least squares, with the di¤erence that he takes supremum over a non-vanishing set. In
the proof the main bulk of the work is to show that the martingale part vanishes under Assumption
4.2(c) (e): For this we exploit Lemma 1 of Lai and Wei (1982). The conditions (c) (e) are
somewhat weaker than the usual conditions for deriving the asymptotic distribution of non-linear
least squares estimators, see for instance Amemiya (1985, page 111). Finally, Assumption 4.2(b) is
used for showing the consistency of the fourth moment estimator '^2n :
In many applications the non-linear function g and its derivatives satisfy a Lipschitz condition.
In that case one can easily relate condition (a) of Assumption 4.2 to conditions (c) (e). To do
this, one just needs to second order Taylor expand g(xt ; ) g(xt ; 0 ) around 0 , square it, and take
supremum before cumulating. A similar argument applies to condition (b). This gives a somewhat
shorter set of assumptions that imply Assumption 4.2.
Assumption 4.4 Suppose xt is Ft 1 -measurable and g(xt ; ) is twice di¤ erentiable with respect
to : Let < 1=4 be the consistency rate in Assumption 4.1 and let > 0: Suppose, the following
conditions
hold, for k = 2; 4;
Pn
0
(a) t=1 jjNn;
g(x
_ t ; 0 )jjk = oP (nk=4 k );
0
Pn
0
(b) t=1 jjNn; 0 g•(xt ; 0 )Nn; 0 jjk = oP (nk=4 2k );
P
0
(c) nt=1 sup :jjN 1 ( 0 )jj n jjNn;
f•
g (xt ; ) g•(xt ; 0 )g Nn; 0 jjk = oP (n 4 ):
0
n; 0
Theorem 4.3 Assumption 4.4 implies Assumption 4.2.
5
5
Analysis of some particular models
In this section, we consider some particular non-linear models that have been discussed in the
literature. For these models it is relevant to test their validity using a cumulated sum of squares
test. We will assume that the consistency Assumption 4.1 has been dealt elsewhere. Thus, we
know the appropriate normalization of the estimator. The di¢ culty is therefore to establish the
smoothness Assumption 4.4. We will show that this assumption is rather mild.
5.1
The linear model
0 x so that g(x
In the linear model g(xP
_ t ; ) = xt and g•(xt ; ) = 0: Thus, Assumption
t; ) =
t
n
0
k
k=4 k ) for k = 2; 4. Suppose x is univariate
4.4 reduces to showing
t
t=1 jjNn; 0 xt jj = oP (n
1=2
1
and stationary
then Nn; 0 = n
whereas Nn; 0 = n if xt is a random walk. In both cases
Pn
k = O (1) = o (nk=4 k ).
k
jx
j
Nn;
t
P
P
t=1
0
For the recursive statistic we would need to establish that ^n is strongly consistent. For nonstationary models this is not always so easy. To our knowledge this has not been proved for a
…rst order autoregressive model with an intercept and where the autoregressive coe¢ cient is unity.
Nielsen and Sohkanen (2011) therefore work directly with the high level Assumption 3.4.
5.2
The power function model
As a …rst non-linear case we consider the power function g(xt ; ) = jxt j to illustrate where the
di¢ culties lie in the arguments. The model equation is then
yt = jxt j + "t
t = 1; : : : ; n;
(5.1)
with > 0 and where xt is either stationary or a random walk. We will suppose that Assumption
4.1 is satis…ed and show that Assumption 4.2 holds.
The properties of the regressor xt are re‡ected in the choice of the normalization Nn; 0 : Hence,
p
if xt is stationary with …nite jxt j4 0 log8 jxt j moments we let Nn;10 = n and apply techniques
from Wooldridge (1994). If xt is a random walk we let Nn;10 = n(1+ 0 )=2 log n1=2 and apply techniques from Park and Phillips (2001). These techniques go back to Cramér and involve smoothness conditions that are similar but also somewhat stronger than Assumption 4.2. Here, we take
Nn;10 ( ^n
) = OP (1) as given. Hence Assumption 4.1 follows for any > 0:
To prove Assumption 4.4 we di¤erentiate g and get
g(x; ) = jxj ;
g(x;
_
) = jxj log jxj;
g•(x; ) = jxj log2 jxj:
These functions are continuous in x when > 0 and jxj > 0 and they can be extended continuously
to all x 2 R because the power function dominates the logarithm at the origin.
We now look at Assumption 4.4 (a) in some detail. As in the linear case we show
S=
n
X
t=1
jNn; 0 g(x
_ t;
k
0 )j
= OP (1) = oP (nk=4
k
):
In the stationary case we use Theorems 3.5.3, 3.5.7 of Stout (1974) to get
S=
n
1 X
nk=2
t=1
jxt jk 0 logk jxt j = O(n1
k=2
) = O(1)
a:s:
In the random walk case we get
S=
1
n(1+
0 )k=2
logk n1=2
n
X
t=1
jxt jk 0 logk jxt j =
n
1 X
nk=2
t=1
jxt =n1=2 jk
0
log jxt =n1=2 j
+1
log n1=2
!k
= OP (1);
6
where the second equality follows by noting that log jxj = log jx=n1=2 j + log n1=2 . For the last bound
note that xinteger(nu) =n1=2 converges to a Brownian motion as a function on D[0; 1]. The functions
R1
R1
jyj2 0 log jyj and jyj2 0 are continuous and therefore the integrals 0 jyj2 0 log jyjdy and 0 jyj2 0 dy
are continuous mappings from D[0; 1] to R: The Continuous Mapping Theorem, see Billingsley
(1999, Theorem 2.7) then shows that the normalized sum
in distribution.
Pn converges
2
For Assumption 4.4 (b) a similar argument shows t=1 jNn; 0 g•(xt ; 0 )jk = OP (n1 k ):
For Assumption 4.4 (c) we apply a Lipschitz argument. The second derivative of g satis…es
j•
g (xt ; )
for all
jxj 0
n
X
0 )j
(jxj + jxj
)jxj 0 log2 jxj
so j
for some 0 < < 0 : The result is proved by analyzing the function
0j
1 for all four sign combinations of jxj 1 and
0 : Applying this to condition (c) gives
sup
t=1 :j
g•(xt ;
0j
2
f•
g (xt ; )
jNn;
0
g•(xt ;
n
X
t=1
k
0 )gj
)jxt j 0 log2 jxt jgk = OP (n1
2
(jxt j + jxt j
fNn;
0
k
) = oP (1); (5.2)
where the second bound follows by the same argument as above.
5.3
Cointegration with non-linear error correction
In the model of Kristensen and Rahbek (2010) xt is a p-dimensional time series satisfying
xt = g( 0 xt
1;
)+
1
xt
1
+
+
k
xt
k
+ "t :
In speci…cation analysis we consider the coordinates of the residual vector "^t separately. Their
Theorem 1 gives conditions ensuring that 0 xt 1 ; xt 1 ; : : : ; xt k are geometrically ergodic and
that xt satis…es a Granger–Johansen-type representation. With this and some further conditions
their Theorem 5 provides the normalization Nn;10 ( ^n
0 ) = OP (1) that is required in our Assumption 4.1. Their Assumption A.5 requires that the …rst, second and third derivatives of g(z; ) with
respect to z or are of order O(jzj): With these boundedness conditions our Assumption 4.4 can
be proved. The proof is slightly involved as one will have to keep track of the various components
in the Granger-Johansen-type representation and how they interact with the derivatives of g:
5.4
Non-linear models with random walk regressors
Park and Phillips (2001) consider a triangular system with a univariate random walk regressor:
yt = g (xt ; ) + "t
xt = xt
1
+ vt ;
t = 1; : : : ; n;
(5.3)
(5.4)
where "t is an Ft -martingale di¤erence sequence, ("t ; vt )0 satis…es a functional central limit theorem,
xt is Ft 1 -adapted, and g is in one of two main classes of functions: integrable and asymptotically
homogeneous. For recent developments see Chan and Wang (2015).
2
The class of integrable functions includes transformations g (xt ; ) such as 1=(1 + x2 ), e x , or
1(0 x 1) which are integrable over x 2 R and satisfy a Lipschitz condition over : In Theorem
5.1 of Park and Phillips (2001) the asymptotic distribution of the non-linear least squares estimator
for the integrable functions case is derived, showing that n1=4 ( ^n
) converges in distribution.
Thus, we can choose Nn;10 = n1=4 and otherwise proceed as in the power function example.
The class of asymptotically homogenous functions includes transformations g(x; ) which asymptotically behave like homogeneous functions; they include the power function in Section 5.2
7
as well as logistic, threshold-like or logarithmic transformations. Speci…cally, an asymptotically
homogeneous function f (x; ) is a function such that
f ( x; ) = ( ; )H(x; ) + R( ; x; );
where is a normalization, H satis…es some regularity conditions (such as local integrability –
see also Pötscher, 2004) and R is a lower order remainder term. In Theorem 5.3 of Park and
Phillips (2001) each of the functions g; g_ and g• are assumed to be asymptotically homogenous
and satisfy conditions that have the same ‡avour as those in Assumption 4.4. It then follows
p
0
that n1=2 _ ( n; 0 ) ( ^n
) converges in distribution. Thus, we can choose the normalization
p
0
1
1=2
Nn; 0 = n _ ( n; 0 ) : For instance, in the power function model (5.1) with random walk regressor
p
we have _ ( n; 0 ) = n 0 =2 log n1=2 :
6
Finite Sample Performance
In this section, we study the …nite sample performance of the CU SQ test through simulation. We
use the exact asymptotic 95% critical value of 1.36 and 10000 replicas. Two sets of results are
presented for various asymptotically homogeneous models. First, we check size and power for a
set of models that are either linear or non-linear in parameters. Next, we consider a set of models
suggested by Kasparis (2008). For these we compare the power of the CU SQ test with the power
of a cumulated sum (CUSUM) test reported by Kasparis (2008). We …nd that the two tests have
power of similar magnitude, so there is no apparent advantage in using the more complicated
CUSUM test.
Table 1 contains the …rst set of data generating processes (DGPs). Four correctly speci…ed (CS)
DGPs and …ve misspeci…ed (M) DGPs are analyzed. The regressor xt is (fractionally) integrated
so that
xt is iid N(0; 1) with xt = 0 for t
0 and with = 0:7, 1, 2. While the models in
Section 5 focus on stationary and random walk models the theory does extend to other types of
nonstationarity, see Chan and Wang (2015).
Table 2, DGPs 1-4, reports the size of the CU SQ test. The size control is fairly uniform across
the DGPs. This is in correspondence with the results for linear autoregressions in Nielsen and
Sohkanen (2011). The test is, however, slightly undersized in small samples. The size distortion
can be removed by applying the …nite sample 95% critical value 1:36 0:67n 1=2 0:89n 1 suggested
by Edgerton and Wells (1994). Similarly, for the recursive test Sohkanen (2011) suggests the 95%
critical value 1:36(1 0:68n 1=2 + 3:13n 1 33:9n 3=2 + 93:9n 2 ):
Table 2, DGPs 5-9, reports the power of the CU SQ test for a range of asymptotically homogenous functions. The power increases with sample size in all cases. The power also tends to
increase with the order of integration of the regressors. This is in line with the power analysis for
linear models conducted by McCabe and Harrison (1980), Ploberger and Krämer (1990), Deng and
Perron (2008), or Turner (2010).
The CU SQ also has power to detect misspeci…cation involving integrable functions of persistent
processes. As an example consider the data generating process yt = 1 =(1 + 2 x2t ) + "t , while the
regression model is polynomial. Simulations not reported here show that power arises as long as
the signal from the integrable function component 1 =(1 + 2 x2t ) dominates the noise "t :
Next, we compare the power of the CU SQ test with the CUSUM test of Kasparis (2008).
Table 3 reports his ten DGPs. In all cases a linear model for yt and xt is …tted, which is therefore
misspeci…ed. The results are reported in Table 4. Kasparis’test uses a long run variance estimator
to standardize the statistic; hence, the power of the test depends on a bandwidth choice. Kasparis
reports power for di¤erent bandwidths and we report the highest of these. Table 4 shows that no
test dominates in all cases but both tests perform in a similar way. We note that the CUSUM test
involves nuisance terms depending on the functional form of the model whereas the CU SQ has a
Brownian bridge theory quite generally.
8
A
Appendix: Proofs
In most places we use the spectral norm for matrices, so that for a matrix m then
p
kmk = max eigen(m0 m):
The spectral norm reduces to the Euclidean norm for vectors. It is compatible with the Euclidean
norm in the sense that jjmvjj = jjmjjjjvjj for a matrix m and a vector v: It satis…es the norm
inequality jjmnjj jjmjjjjnjj for matrices m; n. Occasionally, we will use the Frobenius norm
P
jjmjjF = ( i;j jmij j2 )1=2 :
Note that jjmjj
jjmjjF with equality when m is a vector, while jjmjjF
qjjmjj where q is the
column dimension of m: Further, jjmjjF = jjvec(m)jjF :
We start with a modi…cation of the martingale result by Lai and Wei (1982).
Lemma A.1 Let Nn; 0 be a q q normalizing matrix where Nn;10 = O(n` ) for some ` > 0.
Further, let g(xt ; 0 ) be a function g : Rp Rq ! R, with derivatives with respect to : g,
_ g•. Also
let Assumption 3.1(a) hold. Let wt be Ft 1 -measurable and given as either of
0
(i) wt = Nn;
g(x
_ t ; 0 );
0
0
(ii) wt = Nn; 0 g•(xt ; P
0 )Nn; 0 :
Suppose n0 = inf[n : nt=1 fvec(wt )gfvec(wt )g0 is invertible] < 1 a:s: Then, for all & > 0,
i
h
Pn
P
a:s:
0 1=2+&
fvec(w
)g
fvec(w
)g
+ O(1)
max k st=1 wt "t k = o n&
t
t
t=1
n0 s n
= o n&
Pn
t=1 kwt k
2
1=2+&
+ O(1):
Proof of Lemma A.1: Part (i): Introduce the notation
Sg";u
=
_
so that
P[nu]
_ t ; 0 )"t
t=1 g(x
0
Nn;
S_ =
0 g";u
Notice that, for n0 < [nu],
P[nu]
t=1 wt "t
and
Sg_ g;u
_ =
P[nu]
_ t ; 0 )g(x
_ t; 0)
t=1 g(x
0
Nn;
S _ Nn;
0 g_ g;u
and
1=2
0
0
Nn;
S_
= (Nn;
S _ Nn; 0 )1=2 Sg_ g;u
Sg";u
_
_
0 g";u
0 g_ g;u
0
=
0
Nn;
S _ Nn;
0 g_ g;u
0
;
P[nu]
0
t=1 wt wt :
1=2
0
1=2
Sg_ g;u
Sg";u
:
_
_
(A.1)
Use Lai and Wei (1982, Lemma 1,i,ii) with Assumption 3.1(a) recalling the de…nition of the spectral
norm, to see that
1=2
1
Sg_ g;u
Sg";u
= S"g;u
_
_ Sg_ g;u
_
_
_ Sg";u
1=2 a:s:
&~
= o kSg_ g;u
+ O(1);
_ k
1
for all &~ > 0. Since kSg_ g;u
_ k is non-decreasing in u this is bounded by kSg_ g;1
_ k : Using that Nn; 0 =
O n` for some ` > 0, we can write
1=2
Sg_ g;u
Sg";u
_
_
a:s:
Nn;10
= o
2~
&
0
Nn;
S _ Nn;
0 g_ g;1
&~
0
0
+ O(1) = o n& Nn;
S _ Nn;
0 g_ g;1
&
0
+ O(1);
for all & > 0, uniformly in u. Hence, using (A.1),
0
sup Nn;
S_
0 g";u
u
a:s:
0
= o(n& Nn;
S _ Nn;
0 g_ g;1
1=2+&
0
) + O (1) ;
which is the …rst desired expression and by the triangle inequality we get the second expression.
9
Part (ii): By the properties of the Frobenius norm we get that
P[nu]
P[nu]
t=1 wt "t
t=1 wt "t
F
P[nu]
=
t=1 vec (wt ) "t
Now argue as in (i) with wt replaced by vec (wt ) to get
sup
u
P[nu]
Pn
o n&
t=1 wt "t
t=1 fvec (wt )g fvec (wt )g
:
0 1=2+&
a:s:;
which is the …rst desired expression. To get the second expression notice that
Pn
Pn
0
0
t=1 fvec (wt )g fvec (wt )g
t=1 fvec (wt )g fvec (wt )g
and jjfvec(wt )gfvec(wt )g0 jj = kvec (wt )k2F = kwt k2F
q kwt k2 as desired.
Proof of Theorem 3.2: The statistic of interest is CU SQn = An ='^n , where
An = max n
1 t n
Expand An = Bn + (An
Bn ) ; where
1=2 Pt
s=1
Bn = max n
1 t n
Noting that "^2s;n = (^
"2s;n
An Bn = max n
1 t n
"2s ) + "2s we get
1=2 Pt
s=1
"2s
n
"^2s;n
n
"2s
n
1=2 Pt
s=1
2
1 Pn
r=1 "r
+n
1 Pn
^2r;n
r=1 "
2
1 Pn
r=1 "r
1=2 Pt
s=1
Cn = max n
1 t n
By Assumption 3.2,
An
Bn
Bn
Bn + Cn
1=2 Pt
s=1
max n
Cn
Bn = Cn where
"^2s;n
2 max n
1 t n
"2s
n
1=2 Pt
s=1
1 Pn
r=1
"^2s;n
:
"^2s;n
1 t n
By the triangle inequality An
:
"2s
1=2 Pt
s=1
"^2r;n
"2s
n
"2s
"2r
1 Pn
r=1
n
"^2r;n
"2r
2
1 Pn
r=1 "r
:
:
= oP (1):
(A.2)
Thus, by Theorem 3.1 and the Continuous Mapping Theorem applied to the maximum, we have
D
An = Bn + oP (1) ! ' sup Bu0 :
0 u 1
P
P
Consider now '^2n = n 1 nt=1 "^4t;n (n 1 nt=1 "^2t;n )2 : Further, n
k = 2; 4 by Assumptions 3.2, 3.3. Therefore,
'^2n = n
1 Pn
4
t=1 "t
n
1 Pn
2 2
t=1 "t
1
Pn
"kt;n
t=1 (^
"kt ) = oP (1) for
+ oP (1):
By Theorem 3.1, under Assumption 3.1, we have '^2n = '2 + oP (1):
All together, CU SQn converges in distribution to sup0 u 1 Bu0 as desired.
Proof of Theorem 3.3: Follow the proof of Theorem 3.2 replacing "^2s;n by "^2s;t and using Assumption 3.4 instead of Assumption 3.2 when evaluating (A.2).
10
Proof of Theorem 4.1: Part I: Assumption 3.2.
1. The problem. Let St; = n 1=2 fQt ( ) Qt ( 0 )g so that St; ^n = n
that St; ^n = oP (1) uniformly in 1
Set; = n
t
1=2
);
S t; = n
^2s;n
s=1 "
"2s : We show
2Set; + S t; ; where
n: From (4.1) we have St; =
1=2 Pt
s=1 "s rgs (
Pt
1=2 Pt
s=1 frgs (
)g2 :
2. Expand the martingale Set; . We use a second order mean value result. To simplify the expression
we introduce the notation
• s ( ) = N 0 g•(xs ; )Nn; ;
h
n; 0
0
0
•
g (xs ; ) g•(xs ;
rhs ( ) = Nn; 0 f•
0
g(x
_ s ; );
h_ s ( ) = Nn;
0
# = Nn;10 (
0) ;
0 )g Nn;
0
:
With this notation we get, for instance, that
(
0)
0
g(x
_ s;
0)
Overall, we can expand Set; ^n = n
Set; ^n = n
1=2 Pt
^0 _
s=1 "s #n hs ( 0 )
= fNn;10 (
1=2
1
+ n
2
0 )g
Pt
^
s=1 "s rgs ( n )
0
0
g(x
_ s;
Nn;
0
0)
= #0 h_ s ( 0 ):
as
1=2 Pt
^0 •
^
s=1 "s #n hs ( 0 )#n
1
+ n
2
1=2 Pt
^0 •
s=1 "s #n fhs (
)
• s ( 0 )g#^n ; (A.3)
h
for an intermediate point
depending on the summation limit t and ^n so jj
jj ^n
0 jj
0 jj:
Note that the …rst two terms only depend on ^n through the factor #^n : For simplicity we write
(A.3) as Set; ^n = Set;1 + (Set;2 + Set; ^n ;3 )=2.
3. The martingale term Set;1 . The norm inequality and the bound to #^n in Assumption 4.1 (a) give
jSet;1 j
n
1=2
P
jj#^n jjjj ts=1 "s h_ s ( 0 )jj
oP (n
1=2
P
)jj ts=1 "s h_ s ( 0 )jj:
Apply Lemma A.1 (i) using Assumptions 3.1, 4.3 to get, for any & > 0,
P
max jSet;1 j = oP (n 1=2 )oa:s: [n& f nt=1 jjh_ t ( 0 )jj2 g1=2+& ] + oP (n
1=2
n0 t n
P
By Assumption 4.2 (c), we have that nt=1 jjh_ t ( 0 )jj2 = OP n1
We then get, when choosing 2&
=(2 2
);
max jSet;1 j = oP (n
n0 t n
1=2
)oP fn(1
2
)(1=2+&)+&
2
for some
g + oP (n
1=2
)Oa:s: (1) :
> 0 while
< 1=4.
) = oP (1):
4. The martingale term Set;2 . Argue as in item 3. First, the norm inequality gives
jSet;2 j
n
1=2
P
• s ( 0 )jj
jj#^n jj2 jj ts=1 "s h
oP (n2
1=2
P
• s ( 0 )jj:
)jj ts=1 "s h
Then apply Lemma A.1 (ii) using Assumptions 3.1, 4.3 along with Assumption 4.2 (d) to get
max jSet;2 j = oP (n2
n0 t n
1=2
)oP (n(1
4
)(1=2+&)+&
) + oP (n2
1=2
) = oP (1);
when < 1=4 and & > 0 is chosen su¢ ciently small.
5. The term Set; ^n ;3 . Apply the norm and triangle inequalities to get
jSet; ^n ;3 j
jj#^n jj2 n
1=2 Pt
•
s=1 j"s j jjhs (
)
• s ( 0 )jj:
h
The summands are positive so that a further bound arises by extending the summation limit
P
• s( ) h
• s ( 0 )jj;
jSet; ^n ;3 j jj#^n jj2 n 1=2 ns=1 j"s j jjh
11
where
remains dependent on t and ^n : Apply the Hölder inequality to get
P
P
• s( ) h
• s ( 0 )jj2 )1=2 :
jSet; ^n ;3 j jj#^n jj2 (n 1 ns=1 "2s )1=2 ( ns=1 jjh
P
The martingale Law of Large Numbers (Chow, 1965, Theorem 5) shows n 1 ns=1 "2s = O(1) a:s:
By Assumption 4.1 (a) then #^n = Nn;10 ( ^n
> 0 and large n then
0 ) = oP (n ): For any
1 ^
^
jjNn; 0 ( n
n with large probability. For such n we have that
is also local to 0 and
0 )jj
we can then bound
Pn •
Pn •
• s ( 0 )jj2
• s ( 0 )jj2 ;
h
sup
h
s=1 jjhs ( )
s=1 jjhs ( )
:jjNn;1 (
0
0 )jj
n
which depends neither on t nor ^n . Then Assumption 4.2 (e) implies jSet; ^n ;3 j = oP (n2 2 ) = oP (1)
uniformly in t:
6. The compensator. As before jjNn;10 ( ^n
n on a set with large probability. On that set
0 )jj
S t; ^n sup :jjN 1 ( 0 )jj n S t; which is oP (1) by Assumption 4.2 (a).
n; 0
Part II: Assumption 3.3.
P
1. The problem. Let Vn; =Pn 1 nt=1 [f"t rgs ( )g4 "4t ] where rgs ( ) = g(xs ; ) g(xs ; 0 ) as
"4t "4t ):
before, so that Vn; ^n = n 1 nt=1 (^
2. Some inequalities: By binomial expansion (" r)4 "4 = r4 4r3 " + 6r2 "2 4r"3 : Thus,
by Hölder’s inequality,
P
P
P
jVn; j n 1 nt=1 frgs ( )g4 4[n 1 nt=1 frgs ( )g4 ]3=4 (n 1 nt=1 "4t )1=4
P
P
P
P
+ 6[n 1 nt=1 frgs ( )g4 ]1=2 (n 1 nt=1 "4t )1=2 4[n 1 nt=1 frgs ( )g4 ]1=4 (n 1 nt=1 "4t )3=4 :
P
Now, n 1 nt=1 "4t = OP (1) by the martingale Law of Large Numbers and Assumption 3.1 while
P
n 1 nt=1 frgs ( ^n )g4 = oP (1) by an argument as in part I, item 6 using Assumption 4.2 (b).
Proof of Theorem 4.2. Since ^n = 0 + o(n ) a:s: by Assumption 4.1 (b) then Egorov’s theorem
(Davidson 1994, Theorem 18.4) implies 8 > 0 9t0 so
= fsupt>t0 jNn;10 ( ^t
0 )j < n g satis…es
P( ) > 1
: On
we bound
max jn
1 t n
1=2
t
X
(^
"2s;t
"2s )j
n
s=1
1=2
max j
1 t t0
t
X
s=1
(^
"2s;t
"2s )j +
max
t0 +1 t n
jn
1=2
t
X
(^
"2s;t
"2s )j:
s=1
Since t0 is …nite, the …rst term vanishes. For the second term we can follow the proof of Theorem
4.1 replacing "^2s;n by "^2s;t : When expanding in item 3 the intermediate point
will now depend on
^
^
t through the summation limit and t : However, with t > t0 then t is local to 0 uniformly in t
and the remaining arguments apply.
Proof of Theorem 4.3. Assumption 4.4 (a,b,c) with k = 2 imply Assumption 4.2 (c,d,e).
Now, recall the notation in item 3 in the proof of Theorem 4.1 and expand
g(xt ; )
g(xt ;
0)
1 •
1 0 •
= #0 h_ t ( 0 ) + #0 h
t ( 0 )# + # fht ( t )
2
2
• t ( 0 )g#;
h
where t is an intermediate point depending on xt so j t
j
0j
0 j: Raise this to the power
k = 2 or k = 4 and apply the inequality (x + y + z)m C(xm + y m + z m ) to see that
jg(xt ; )
g(xt ;
k
0 )j
• t ( 0 )jjk + jj#jj2k jjh
• t( t)
Cfjj#jjk jjh_ t ( 0 )jjk + jj#jj2k jjh
• t ( 0 )jjk g:
h
12
In Assumption 4.2 (a,b) we only consider jj#jj
n . Thus t is local to
k
•
•
sup :jjN 1 ( 0 )jj n jjht ( ) ht ( 0 )jj : Then cumulate to get
0
• t( t) h
• t ( 0 )jjk
so that jjh
n; 0
j
n
X
t=1
fg(xt ; )
g(xt ;
0 )g
+
k
j
2k 2 k
n
k
n
k
n
X
t=1
n
X
t=1
jjh_ t ( 0 )jjk
• t ( 0 )jjk +
jjh
2k 2 k
n
n
X
t=1
sup
:jjNn;1 (
0
0 )jj
n
• t( )
jjh
• t ( 0 )jj2k ;
h
which is oP (n1=2 ) for k = 2 and oP (n) for k = 4 due to Assumption 4.4.
B
Tables
*
CS
CS
CS
CS
DGP
1
2
3
4
M
M
M
M
M
5
6
7
8
9
Table 1: DGPs: Data Generating Processes
yt
g (xt ; )
1 + 0:5xt + "t
1 + 2 xt
2
1 + 0:5x2t + "t
1 + 2 xt
1 + 0:9xt 1 (vt 0) + 0:5xt 1 (vt > 0) + "t
0) + 3 xt 1 (vt > 0)
1 + 2 xt 1 (vt
3
1 + 0:3 jxt j1:5 + "t
+
1
2 jxt j
+ 2 jxt j 3
1 + 2 xt
1 + 2 xt
3
1 + 2 jxt j
2
1 + 2 ln jxt j
yt 1 + "t
1 + 0:9xt 1 (vt 0) + 0:5xt 1 (vt > 0) + "t
1 + 0:5x2t + "t
1 + 0:3 jxt j1:5 + ut ut = xt + "t
1 + 0:5x2t + "t
1
CS denotes correct speci…cation and M denotes misspeci…cation. yt and g(xt ; ) are the
dependent variable and the estimated regression function, respetively. xt I( ) with =
0:7; 1; 2. "t ; vt i:i:d:N (0; 1). xt , "t , and vt are independent of each other.
Table 2: Size and Power: Finite Sample Performance
xt I (0:7)
xt I (1)
xt I (2)
n
n
n
CU SQn
*
CS
CS
CS
CS
DGP
1
2
3
4
100
0.031
0.031
0.030
0.031
500
0.041
0.040
0.041
0.040
1000
0.044
0.045
0.043
0.045
100
0.032
0.031
0.033
0.031
500
0.040
0.040
0.042
0.041
1000
0.044
0.044
0.043
0.043
100
0.031
0.031
0.033
0.033
500
0.040
0.039
0.041
0.040
1000
0.044
0.044
0.042
0.044
M
M
M
M
M
5
6
7
8
9
0.527
0.085
0.096
0.302
0.313
0.975
0.485
0.790
0.854
0.709
0.997
0.708
0.962
0.946
0.775
0.814
0.553
0.479
0.460
0.320
0.999
0.984
0.993
0.846
0.599
1.000
0.999
1.000
0.913
0.759
0.957
0.998
0.974
0.935
0.945
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
1.000
0.999
CS denotes correct speci…cation; hence, size is being analyzed in those cases. M denotes misspeci…cation; hence, power is considered in those cases. 10000 replications are conducted.
13
Table 3: Power performance comparison with Kasparis (2008)
DGP
yt
R1
zt
R2
sign (zt ) jzt j0:5
R3
sign (xt ) jxt j0:75 + ut
R4
sign (xt ) jxt j1:25 + ut
R5
ln (1 + jxt j) + ut
R6
xt + jxt j0:5 + ut
R7
0:4xt 1 (xt 0) +1:8xt 1 (xt 0) + ut
p
xt + 1:8 [xt =(1 + exp ( xt = n 2))] + ut
R8
R9
xt + zt + ut
R10
sign (xt ) (jxt j jzt j)0:5 + ut
zt = zt 1 + wt where wt = 0:3wt 1 + !t , xt = xt 1 + t ,
ut = t , ( t ; t+1 ; !t+1 )0 = Drt where rt i:i:d:N (0; 1) and
D = [1 .2 .1, .3 2 0, 0 .1 1.2]
Table 4: Power performance comparison with Kasparis (2008)
CU SQn
Kasparis’best power
n
100
200
500
100
200
500
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
0.909
0.925
0.093
0.349
0.408
0.514
0.548
0.340
0.882
0.670
0.999
1.000
0.612
0.962
0.922
0.953
0.825
0.849
0.999
0.997
1.000
1.000
0.860
0.996
0.986
0.993
0.872
0.959
1.000
1.000
0.762
0.790
0.180
0.430
0.706
0.626
0.485
0.327
0.753
0.411
0.920
0.930
0.377
0.706
0.901
0.862
0.597
0.557
0.915
0.702
0.984
0.984
0.698
0.902
0.993
0.989
0.704
0.825
0.983
0.904
14
References
Amemiya, T. (1985) Advanced econometrics. Cambridge, MA: Harvard University Press.
Berenguer-Rico, V. and J. Gonzalo (2014) Co-summability: From linear to non-linear co-integration. Mimeo.
Billingsley, P. (1999) Convergence of probability measures. New York, NJ: Wiley.
Brown, B.M. (1971) Martingale central limit theorems. Annals of Mathematical Statistics 42, 59–66.
Brown, R.L., J. Durbin and J.M. Evans (1975) Techniques for testing the constancy of regression relationships
over time. Journal of the Royal Statistical Society B37, 149-192.
Chan, N. and Q. Wang (2015) Nonlinear regressions with nonstationary time series. Journal of Econometrics
185, 182–195.
Choi, I. and P. Saikkonen (2010) Tests for non-linear cointegration. Econometric Theory 26, 682-709.
Chow, Y.S. (1965) Local convergence of martingales and the law of large numbers. Annals of Mathematical
Statistics 36, 552–558.
Davidson, J. (1994) Stochastic limit theory. Oxford: Oxford University Press.
Deng, A. and P. Perron (2008) A non-local perspective on the power properties of the CUSUM and CUSUM
of squares tests for structural change. Journal of Econometrics 142, 212–240.
Edgerton, D. and C. Wells (1994) Critical values for the CUSUMSQ statistic in medium and large sized
samples. Oxford Bulletin of Economics and Statistics 56, 355–365.
Hao, K. and B. Inder (1996) Diagnostic test for structural change in cointegrated regression models. Economic
Letters 50, 179–187.
Jennrich, R.I. (1969) Asymptotic properties of non-linear least squares estimators. The Annals of Mathematical Statistics 40, 633–643.
Kasparis, I. (2008) Detection of functional form misspeci…cation in cointegrating relations. Econometric
Theory 24, 1373-1403.
Koziol, J.A. and D.P. Byar (1975) Percentage points of the asymptotic distributions of one and two sample
K-S statistics for truncated or censored data. Technometrics 17, 507-510.
Kristensen, D. and A. Rahbek (2010) Likelihood-based inference for cointegration with nonlinear errorcorrection. Journal of Econometrics 158, 78-94.
Lai, T.L. and C.Z. Wei (1982) Least squares estimates in stochastic regression models with applications to
identi…cation and control of dynamic systems. Annals of Statistics 10, 154-166.
Lee, S., O. Na and S. Na (2003) On the CUSUM of squares test for variance change in non-stationary and
non-parametric time series models. Annals of the Institute of Statistical Mathematics 55, 467-485.
McCabe, B.P. M. and M.J. Harrison (1980) Testing the constancy of regression relationships over time using
least squares residuals. Applied Statistics 29, 142-148.
Nielsen, B. and J. Sohkanen (2011) Asymptotic behaviour of the CUSUM of squares test under stochastic
and deterministic time trends. Econometric Theory 27, 913-927.
Park, J.Y. and P.C.B. Phillips (2001) Non-linear regressions with integrated time series. Econometrica 69,
117-161.
Ploberger, W. and W. Krämer (1986) On studentizing a test for structural change. Economics Letters 20,
341-344.
Ploberger, W. and W. Krämer (1990) The local power of the CUSUM and CUSUM of squares test. Econometric Theory 6, 335-347.
Pötscher, B.M. (2004) Nonlinear functions and convergence to Brownian motion: Beyond the continuous
mapping theorem. Econometric Theory 20, 1-22.
Sohkanen, J.S. (2011) Properties of tests for mis-speci…cation in non-stationary autoregressions. DPhil
Thesis, University of Oxford.
Stout, W.F. (1974) Almost sure convergence. New York, NJ: Academic Press.
Turner, P. (2010) Power properties of the CUSUM and CUSUMSQ tests for parameter instability. Applied
Economic Letters 17, 1049-1053.
Wooldridge, J.M. (1994) Estimation and inference for dependent processes. In R.F. Engle and D.L. McFadden
(eds.) Handbook of Econometrics 4, 2639-2738.
Xiao, Z. and P.C.B. Phillips (2002) A CUSUM test for cointegration using regression residuals. Journal of
Econometrics 108, 43–61.