Least-Squares estimation of an unknown number of shifts in a time

Least-Squares estimation of an unknown number of shifts
in a time series
Marc Lavielle
UFR de Mathematiques et Informatique, Universite Paris V
and
CNRS URA 743, Universite Paris-Sud
[email protected]
Eric Moulines
Ecole Nationale Superieure des Telecommunications,
[email protected]
Marc Lavielle
UFR de Mathematique et Informatique,
Universite Rene Descartes
45 rue des Saints-Peres
75006 Paris, FRANCE
e-mail : [email protected]
Abstract
In this contribution, general results on the o-line least-squares estimate of changes in the mean of a
random process are presented. First, a generalisation of the Hajek-Renyi inequality, dealing with the
uctuations of the normalized partial sums, is given. This preliminary result is then used to derive
the consistency and the rate of convergence of the change-points estimate, in the situation where the
number of changes is known. Strong consistency is obtained under some mixing conditions. The limiting
distribution is also computed under an invariance principle. The case where the number of changes is
unknown is then addressed. All these results apply to a large class of dependent processes, including
strongly mixing and also long-range dependent processes.
Keywords : Detection of change-points - Hajek-Renyi inequality - Strongly mixing pro-
cesses - Strongly dependent processes - Fractional Brownian motion - Penalized LeastSquares estimate
2
1 Introduction
The problem of detecting and locating change-points in the mean of a random process, and estimating
the magnitude of the jumps has been around for more than fourty years. Most of the early eorts have
been devoted to the detection / location of a single change point in the mean of independent identically distributed random variables (see, among many others, Hinkley (1970), Sen and Srivastava (1975),
Hawkins (1977), Bhattacharya (1987)). Many recent contributions addressed the possible extensions of
these methods and results to the detection / location of single / several change points in the mean of
a random (perhaps non-stationary) process. One of the pioneering contribution in that eld is Picard
(1985), who considered the detection of a single change point in the mean a Gaussian AR process (whose
order is known). The proposed method is based on maximum likelihood and is thus rather computationally demanding (moreover these results are not necessarily robust to deviations in the assumed model). A
complete bibliography on change detection can be found in the books of Basseville and Nikiforov (1993)
and Brodsky and Darkhovsky (1993).
Recently, Bai (1994) has proposed and studied a least-square estimate of the location of a single
change point in the mean of a linear process of general type under rather weak regularity assumption.
This work has been later extended to multiple break points and weak dependent disturbance process
(mixingale) by Bai and Perron (1996). From a practical point of view, least squares estimates possess a
main advantage over maximum likelihood methods: it does not require to specify the distribution of the
error process ". Furthermore, it is straightforward to implement and is computationally ecient, even
when the number of change points is large. In this contribution, the results obtained by Bai (1994) are
extended in two directions:
While Bai (1994) considered the detection of a unique change-point, we consider the general case
where multiple change-points can be present. As seen in the sequel, this extension is not trivial,
and is worth to be developed. When the number of change-points is unknown, the change-points
problem becomes a problem of model selection and a penalized least-squares method is proposed
(similar to that proposed in Yao (1988)).
The results obtained by Bai (1994) and Bai and Perron (1996) hold only for weakly dependent
processes. The main reason of this restriction is the use of a Hajek-Renyi type maximal inequality,
extended by Bai (1994) to linear processes and by Bai and Perron (1996) to mixingales. We show in
section II that it is possible to obtain this kind of inequality, under very mild assumption, including,
for example, weakly and strongly (perhaps non stationary) dependent processes.
Exploiting this inequality, we show in Section III the consistency of the least-squares estimate when
the number of change-points is known. Using a precise inequality obtained by Rio (1995) for stationary strongly mixing processes, we show, under some suitable conditions, the strong consistency of this
estimator.
The rate of convergence of the change-points location estimator is then studied. It is shown, under
very mild conditions (which includes, as a special case, long-range dependent processes), that the rate is
n, where n is the number of observations (as in the case of independent and identically distributed random
variables). The limiting distribution of the change point location is then studied (when the magnitude
of the jumps goes to zero at some specied rate).
Section IV is devoted to the number of change-points problem. This problem has been rst addressed
3
by Yao (1988), who proved the consistency of the Schwarz criterion when the disturbance is i.i.d. Gaussian
with zero mean and unkonown variance. In this contribution, the number of change-points is estimated
by using a penalized least-squares approach. It is shown that an appropriately modied version of the
Schwarz criterion yields a consistent estimator of the number of change-points under very weak conditions
on the structure of the disturbance.
A small scale Monte-Carlo experiment is presented in Section V to support our claims. Some of the
proofs are given in the appendix.
2 Some results for the uctuations of partial sums
Let f"t gt0 be a sequence of random variables. We dene the partial sums by
Si:j =
j
X
t=i
1 i < j n:
"t ;
(1)
2.1 A generalisation of the Hajek-Renyi inequality
Let fbk g be a positive and decreasing sequence of real numbers. Hajek and Renyi (1955) have shown that,
provided that f"tgt0 is a sequence of independent and identically distributed variables with zero-mean
and nite variance E"2t = 2 < 1,
!
n
2 mb2 + X
(2)
b2i ;
P mmax
b
j
S
j
C
0 2
m
kn k 1:k
i=m+1
with C0 = 1. This result was extended to martingale increments by Birnbaum and Marshall (1961), and
later to linear processes by Bai (1994); recall that f"t gt0 is a linear process if
"t =
1
X
j =0
fj t j
(3)
where
f g Zis a sequence of independent variables with zero-mean and nite variance, such that
P
1 j jf jj <j 21
. In this context, the constant C0 depends on the impulse response coecients ffj g of the
j =0 j
linear lter. It should be stressed that the result obtained by Bai deeply relies on the linear structure
P (3) of
the process, and therefore, does hold neither for non-linear processes. Moreover, the condition j jf j <
j
p
1 is a 'weak-mixing condition' (typically, the normalized sum S1:k = k asymptotically converges to a
Gaussian random variable under appropriate moment conditions);
P in particular,Pthis condition does not
hold for long-range dependent linear processes (in such case, fj2 < 1, but j jfj j = 1). We shall
establish here a inequality of this kind, for a sequence (not necessarily stationary) f"t gt2Z
that satises,
for 1 < 2, the following condition:
H1() There exists C(") < 1 such that, for all i; j,
E (Si:j )2 C(")jj i + 1j
4
Condition H1() is fullled for a wide family of zero-mean processes " = f"gt2Z
. If " is a second order
stationaryPprocess, H1() is fullled with = 1 whenever the autocovariance function (s) = E "t+s "t
satises s0 j(s)j < 1. This property is satised for linear processes of the form (3) such that
P
1 jaj j < 1, a class which includes, as a particular example, any ARMA process. Condition H1() is
j =0
also satised for strongly mixing processes, under some conditions on the sequence of mixing coecients
((n)) and on the moments of "t , see Doukhan (1994), or the quantile function of "t , see Rio (1995). For
example, if "t is bounded with probability 1, H1() is satised with = 1 if (s) M=s log(s) (remark
that, if j"t j P
C < 1 with probability 1, the autocovariance function is bounded by jE"t+s"t j 4C 2(s)
and we have s0 j(s)j < 1). Finally, assumption H1() is also veried when " is a zero-mean longrange dependent process, i.e.:
sup E ("t+s "t ) C 0(")jsj2d 1 ; 0 < d < 1=2;
(4)
t2
Z
where d is the long-range dependence parameter (see, for example, Beran (1992)). Under these conditions,
it is easy to see that there exists a constant C(") < 1 such that, for j > i, E Si:j 2 = C(")(j i)1+2d and
H1() is satised with = 1 + 2d.
Theorem 1. Let f"t gt2Zbe a sequence of random variables that satises condition H1() for some
1 < < 2. Then, there exists a constant A() 1 (that does not depend on ") such that, for any n 1,
for any 1 m n, for any > 0, and for any positive and decreasing sequence b1 b2 : : : bn > 0,
we have the following inequalities:
A()C(") n 1 X b2 ;
P 1max
b
j
S
j
>
k
1:
k
t
kn
2
n
t=1
n
b2
X
C(")m
C(")A()
m
1
P mmax
b jS j > 4 2
+ 4 2 (n m)
b2t :
kn k 1:k
t=m+1
(5)
(6)
As a direct corollary of Theorem 1, we have
Corollary 2.1. Assume that H1() holds for some 1 < < 2. Then, there exists a constant C(; ") <
1, that depends on " only through the constant C(") such that, for any m > 0, any > 0, and any
> =2, we have:
P max
k jS1:k j > C(; ")m 2 :
km
(7)
Remark Let " a process satisfying H1() for some 1 < < 2 with some constant C("), i.e., E (Si:j )2 C(")jj i + 1j. Any time-shifted / time-reversed version of this process also veries H1() with the
same constants and C("). Though the constants in the preceding theorems depends on " only through
C("), the results in the preceding section are uniform with respect to the time-origin. In particular, we
have:
sup P k+max
k jSi:k+i j C(; ")m 2 :
im+i
i2Z
(8)
We shall use these results in the sequel with bk = 1 for any k, and bk = 1=k. We repeatedly use the
following lemma, which is a direct application of Theorem 1 and its corollary:
5
Lemma 2.2. Let f"tgt2Zbe a sequence of random variables that satises condition H1() for some
1 < < 2. Then, there exists two constants A(; ") and B(; ") (that depends upon " only through C("))
such that, for any n > 0 and any > 0, we have:
sup P i+1max
jS j A(; ") n 2
kn+i i:k
i2Z
m 2
j
S
j
i
:
k
B(;
")
sup P kmax
m+i 1 k
2
i2Z
(9)
(10)
2.2 A maximal inequality for strongly mixing stationary processes
We consider now that " is a strongly mixing stationary process (or -mixing), see Doukhan (1994), or
Rio (1995) for the denition of the sequence of strongly mixing coecient f(n)gn0. Recall that " is
-mixing if (n) ! 0 when n ! 1.
We dene also the quantile function Q : P("1 > Q(u)) = u, for 0 < u < 1.
We make the following hypothesis:
H2 The process " is -mixing, and there exist > 0 and > 1 such that:
a) There exists a constant CQ and u0 > 0 such that Q(u) < CQ u 1= for any 0 < u < u0.
b) There exists a constant C and n0 > 0 such that n < C n for any n > n0.
c) > 4=( 1).
Condition H2 means that we control the tails of distribution of "t together with the mixing coecients
of ". Furthermore, H2-c means that, the less mixing " is (i.e., the more dependent the sequence " is), then,
the more concentrated the marginal distribution of " must be. (i.e., the lightest the tail of distribution
of " must be).
A sharp inequality obtained by Rio (1995) leads to the following result:
Theorem 2. Assume that H2 is satised. Then, for any > 0 such that
+ ) + 4 ;
> (32(1
+ )
for any sequence (un) such that, for n large enough, un > n , and for any > 0, we have:
+1
X
P( max jS1:k j un ) < +1:
n=1 1kn
(11)
(12)
Note that, under H2-c, the term in the right hand side of (11) is strictly bounded by 1. Hence, the
sum in (12) is convergent for un = n.
6
3 Estimation of the change-points location
3.1 Model assumptions and notations
It is assumed the following model
Yt = ?k + "t t?k 1 + 1 t t?k ; 1 k r
(13)
?
?
where we use the convention t0 = 0 and tr+1 = n. The indices of the break points and the mean values
?1 ; ; ?r are explicitely treated as unknown. It is assumed that min j?j+1 ?j j > 0. The purpose
is to estimate the unknown means together with the break points when n observations (Y1 ; ; Yn ) are
available. In general, the number of breaks r can be treated as an unknown variable with true value
r? . However, for now, we treat it as known and will discuss methods to estimate it in later sections. It
is assumed that there exists 1? ; ; r? such that, for 1 k r, t?k = [nk? ] ([x] is the integer part of
x). Following Bai and Perron (1996), (k? ) are referred to as the break fractions and we let 0? = 0 and
r?+1 = 1.
The method of estimation considered is that based on the least-square criterion. Let
An;r = f(t0; t1; : : : ; tr+1 ) ; t0 = 0 < t1 < t2 < < tr < tr+1 = ng
(14)
be the set of allowable r-partitions. In the sequel, the following set of allowable r-partitions is also
considered:
(15)
An;rn = f(t0 ; t1; : : : ; tr+1 ) ; tk tk 1 nng:
where n is a sequence of non-increasing non-negative numbers such that n ! 0 as n ! 1 at some
prescribed rate (in some cases, one may simply set n = 0 for all n, and An;rn = An;r ).
For each t 2 An;r , the least-square estimates of the means are rst obtained by minimizing the sum
of square residuals, substituting them in the objective function and denoting the resulting sum as Qn (t)
Qn(t) = ( ; ;min)2Rr+1
1
=
rX
+1
r +1
tk
X
k=1 t=tk 1 +1
tk
r+1 X
X
(Yt k )2
k=1 t=tk 1 +1
Yt Y (tk 1; tk ) 2 :
(16)
P
where, for any sequence futgt2Z
, we denote u(i; j) (j > i) the average u(i; j) = (j i) 1 jt=i+1 ut .
For any r-partitions t; t0 2 An;r , we dene kt t0k1 = max1kr jtk t0k j .
Theorem 3. Assume H1() holds for some < 2. Let fngn0 be a positive non-increasing sequence
such that limn!1 n = 0 and limn!1 n2 n = 1. Let ^tn n be the value of t that minimizes Qn(t)
over An;rn . Then, ^ n n = ^tn n =n converges in probability to ? .
More precisely, denote ? , min1kr jk?+1 k? j. There exists a constant K1 < 1 such that, for all
(1 ; ; r+1 ), all 0 < ? and all n suciently large, it holds
(17)
P(k^ n ? k1 ) K1 n 2 2 1 1 + 1(=)2
where , min1kr j?k+1
n
?k j and
, max1kr jk+1 k j.
7
n
Remark 1: In the above theorem, a minimum length nn between two successive change-points is
imposed: instead of minimizing over all possible r-partitions, only the partitions such that tk tk 1 nn
are considered. Note that, since n ! 0 is chosen in such a way that consistent estimates of change
fractions are obtained even when the lower bound for the change fractions is not known a priori. As seen
later in this section, it is possible to remove this assumption by imposing either stronger conditions on
the disturbance process or by constraining the estimates of the mean to lie within a compact set.
Remark 2: Note that, since the constant K1 does not depend on (1 ; ; r ), the result (17) can be
used in situations where these quantities depend on the sample sizer n (goes to zero at a certain rate
with n). This property will be exploited later on, justifying the extract amount of eort needed to derive
a uniform bound.
Proof of Theorem 3: The proof is adapted from Bai (1994) and Bai and Perron (1996). We must
verify that the contrast function associated to this contrast process has a unique minimum at the true
value of the parameters, and that the contrast process converges uniformly to the contrast function.
Dene for any r-partition t 2 An;r , the following quantities
Jn(t) = n 1(Qn (t) Qn (t? ));
X
Kn(t) = n 1
tk
X
r+1
Vn(t) = n
(18)
(E Yt E Y (tk 1; tk ))2 ;
k=1 t=tk 1 +1
8 Pt?
rX
+1 >
< tk=t?k
1
: t?k
k=1 >
2 9
>
t=tk 1 +1 "t =
(tk tk 1) >
;
0 tk 1
@ X A
2 Ptk
"
1 +1 t
t?k 1
80 t? 1
r+1 <
X
Xk A ?
Wn(t) = 2 n 1 :@
"t k
t=t?k 1 +1
k=1
t=tk
9
=
"t E Y (tk 1; tk ); :
1 +1
(19)
(20)
(21)
Using these notations, Jn (t) may be decomposed as
Jn (t) = Kn (t) + Vn (t) + Wn(t):
(22)
We can show that Kn (t) is lower bounded by
Kn (t) min(n 1 kt t? k1 ; ? ) 2:
(23)
Similarly, we need to obtain lower bound for Vn (t) and Wn (t). For Vn (t), we have
0
!21
n
s !2
X
X
Vn (t) 2n 2 n 1(r + 1) @1max
"t + 1max
"t A
sn
sn
t=1
Finally, note that
Wn(t) = 2n
where ? ,
0 t?k
r+1
X
1 @ X
"t
k=1 t=t?k 1 +1
P ? .
(r + 1) 1 rk+1
=1 k
tk
X
t=tk 1 +1
t=n s
(24)
0 tk 1
1
r+1
X A ?
X
"t (k E Y (tk 1; tk ))
"t A (?k ? ) + @
k=1 t=tk 1 +1
Since for 1 j; k r + 1, it holds that j?j ?k j r, we have
8
j?k E Y (tk 1; tk )j r and j?k ? j r, which implies
For any > 0, dene
n
s
n
!
(25)
n , ft 2 An ; kt t? k1 n g
Cn;
n;r
Since n ! 0, t? 2 An;rn for suciently large n. Thus,
P(k^ n
X "t + max X "t
jWn(t)j 3n 1(r + 1)2 1max
sn t=1 1sn t=n s ? k1 )
!
P minn Jn(t) 0 ;
t2Cn;
1
0
!2
n
s !2
X
X
" + 1
max
P @1max
" c2 n2n A
sn t=n s t
sn t=1 t
X
X
!
n
s 1
+P 1max
" + max " c2n :
sn t=1 t 1sn t=n s t for some constant c > 0. The proof is concluded by applying Lemma 2.2 3.2 Alternative conditions
As mentioned above, it is possible to remove the constraint on the minimumsegment size n by imposing
some additional conditions. As seen in the proof of Theorem 3, n is used to obtain a uniform bound of
Vn (t) (see (24)). One can obtain such bound under additional assumptions on the disturbance process.
An example of such condition is given below; let fng a non increasing sequence of numbers, such that
n ! 0 and nn ! 1. Consider the following assumption
H3()
0
(t2 t1)
lim P @0tmax
n!1
1<t2 n
1
2
X
t2
1 t=t1+1 "t nnA = 0
Extending the result obtained in Lemma 1 in Yao (1988), H3 is for example satised when " is a zeromean Gaussian process, provided that the covariance bound (4) holds. In such case, n may be set to:
n = 4C 0(") log(n)=n1 2d. We have
Theorem 4. Assume that H1() and H3( ) hold for some < 2 and some sequence fng such that
n ! 0 and nn ! 1. Let ^tn be the value of t that minimizes Qn(t) over An;r . Then, ^n = ^tn =n
converges in probability to ? .
The proof of Theorem 4 is a direct adaptation of Theorem 3. Uniform bounds (w.r.t (?1 ; ; ?r))
similar to (17) can also be derived. To apply this theorem in a general setting, the following lemma
provides more explicit conditions (in terms on the moments and on the dependence structure of the
process) upon which H3 is veried:
9
Lemma 3.1. Assume that there exist constants s > 0, 1 hs < 2 and Cs > 0 such that s > 2=(2 hs )
and for all 0 t1 < t2 < +1,
E
Then, H3
t2
2s
X
Cs (t2 t1 )hs s :
"t
t=t1 +1
holds with n = n(2+(hs 2)s)=s logn.
Proof of Lemma 3.1 The result follows directly from the relation
1 n 1 n 0 t2 2
0
2
X
t2
nnA X X P @ X "t nn(t2
1 P @1tmax
"
(t
t
)
t
2
1
t=t1+1 t=t1+1 1<t2 n
t1=1 t2 =t1 +1
(26)
1
t1)A
and the Markov inequality. Remark: When hs = 1, the relation (26) is a Rosenthal's type inequality. It holds for martingale increments with uniformly bounded moments of order 2s. These inequalities also hold for weakly-dependent
processes under appropriate conditions on the moments and on the rate of convergence of the mixing
coecients (see, for example, Doukhan and Louichi (1997) for recent references). This inequality is also
satised for (perhaps non-stationary) long-range dependent Gaussian processes, provided that (4) holds
and for long-range dependent linear processes ((3)), under appropriate conditions on the fj and on the
moments of t .
Another solution consists in constraining the values of the estimated mean ^k , 1 k r to lie within
some compact subset r of Rr+1. More specically, for any (t; ) 2 An;r r , denote
X
Un (t; ) , n 1
r+1
Qn (t) ,
tk
X
(Yt k )2 ;
k=1 t=tk 1 +1
min U (t; )
2r n
(27)
(28)
^tn , argmin Qn (t):
(29)
t2An;r
This criterion may be seen as a robustied least-square tting procedure, which in a certain sense, trim
the extreme values of the series by constraining the estimated means to lie inside a feasible set. In such
case, it is not necessary to constraint the minimum length between the successive break points. Note
however that there is a price to pay, because the change fraction estimator is consistent only if the subset
r is chosen in such a way so that (?1 ; ; ?r+1 ) 2 r .
Theorem 5. Assume H1(
) holds for some < 2. Then, for any compact subset r of Rr+1, such that
?
?
(1 ; : : : ; r+1 ) 2 r , ^n = ^tn =n converges in probability to ? .
Proof of Theorem 5 Let ? , (?1 ; ; ?r+1 ). By denition, (^tn ; ^ n) minimizes Un(t; ) Un(t? ; ? ),
where
Un (t; ) Un (t? ; ? ) =
r+1 X
r+1 n
X
kj
r+1 X
r+1
X
Skj ( ? ):
(?j k )2 2
k
j
n
k=1 j =1
k=1 j =1 n
10
(30)
where,
nkj = # ftk 1 + 1; ; tk g \ ft?j 1; ; t?jg ;
X
Skj =
"t
t2ftk 1 +1; ;tk g\ft?j 1 ; ;t?j g
(31)
(32)
where, by convention, the sum over an empty set of indexes is zero. Note that the dependence of nkj and
Skj on the r-partitions t and t? is implicit. We can show that there exists a constant C > 0 such that,
for all (t; ) 2 An;r r we have
r+1 X
r+1 n
X
kj
(?j k )2 C max(n 1 kt t?k1 ; k ? k2):
n
k=1 j =1
(33)
P P
r+1
On the other hand, n 1 rk+1
=1 j =1 jSkj j converges to 0, uniformly in t 2 An;r . Thus, since r is
P
P
r+1
compact, n 1 rk+1
=1 j =1 j(k ?j )Skj j converges to 0, uniformly in (t; ) and (^n ; ^ n?) converges to
?
?
?
( ; ) if 2 r . Once again, it is also possible to obtain an uniform bound (w.r.t 2 r ) similar
to (17). Finally, strong consistency of the estimate is obtained under mixing conditions:
Theorem 6. Assume that H2 is satised. For any > 0, let ^tn be the value of t that minimizes Qn (t)
over An;r . Then, if ? , ^ n = ^tn =n converges almost-surely to ? .
Proof of Theorem 6: Since ? , t?
that, for any > 0,
P(k^ n
2 An;r . Thus, following the proof of Theorem 3, we have
0
1
!2
n
s !2
X
X
" c2 n2 A
" + 1max
? k1 ) P @1max
sn t=n s t
sn t=1 t
X
X
!
n
s 1
+ P 1max
" + max " c2n :
sn t=1 t 1sn t=n s t for some constant c > 0. We conclude with Theorem 2 for the strong consistency of ^ n under H2, by
setting un = n, that is, = 1. 3.3 Rate of convergence
It is possible to derive the rate of convergence of the change points estimates. It has been shown by Bai
(1994), for a single change-point and weak-dependent disturbance that the rate of convergence is n (i.e.
is linear with the sample size). This result has later been extended to multiple change points (and more
general linear regression models) by Bai and Perron (1996), under weak-dependence conditions for the
additive disturbance. It is shown in the sequel that the rate of convergence of the change fraction is n
under general assumptions on the disturbance process, which include, as particular examples, long-range
dependence processes.
11
In this section, ^tn , ^tn (An;rn ) is the estimate of t? obtained by minimizing the contrast function
Qn(t) over An;rn . The following theorem also holds under the alternative conditions mentioned in the
previous section.
Theorem 7. Assume that H1() holds for some < 2. Then, for all 1 j r, ^tn;j t?j = OP (1).
More precisely, denote ? = min1j r+1 jj? j? 1 j, and let 0 < < 1=2. Then there exists a constant
K < 1 such that for all (?1 ; ; ?r+1 ) and all > 0, it holds that, for large enough n,
P( 2 2 k^tn t? k1 n? ) K n 2 2 + 2(1 + (=)2 ) + n 1 1 2 2 : (34)
where = min1kr j?k+1 ?k j and = max1kr j?k+1 ?k j.
Remark: Perhaps surprisingly, the rate of convergence of the estimator of the break fraction is not
related to the rate of decay of the autocovariance function of the disturbance process ". Once again, the
need for a uniform upper bound (34) is justied by the need for a limit theory with steps j going to
zero with the sample size n.
Proof of Theorem 7: Dene
n
o
C;;n = t 2 An;r ; 2 2 kt t?k1 n? ;
(35)
The proof consists in determining an upper bound for P(^tn 2 C;;n). To that purpose, rst decompose
C;;n according as
[
C;;n = C;;n \ ft 2 An;r ; tk t?k ; 8k 2 Ig
I
where the union is over all subsets I of the index set f1; ; rg. We may compute an upper bound for
each individual set C;;n \ ft 2 An;r ; tk t?k ; 8k 2 Ig. Of course, this upper bound does not depend
0 =
on I , and we consider, for notational simplicity, only the case where I = f1; ; rg. Denote C;;n
?
C;;n \ ft 2 An;r ; tk tk ; 8k 2 f1; ; rgg. We show in the Appendix that, for any > 0,
0 ) K n 2 2 + 2 (1 + (=)2) + n 1 12= 2 :
P(^tn 2 C;;n
(36)
3.4 The limiting distribution
In this section, we derive the asymptotic distribution of ^tn when the sample size goes to innity. As shown
by Picard (1985) and later by Bai (1994), Bai and Perron (1996), this limiting distribution can be used to
construct condence interval for change fractions. The limiting distribution also carries information on
the way the estimator of the change fraction is linked with the other parameters in the model. Following
Bai (1994), it is assumed in the sequel that the jump j , 1 j r, depends on the sample size n and
diminishes as n increases. The limiting distribution for xed jump size j can be obtained in certain
cases, but it depends in a very intricate way on the distribution of the disturbance and is thus of little
practical use.
To stress the dependence of the jump in n, we use in this section the notation n;j instead of j .
We denote n = max1j r n;j and n = min1j r n;j . For some < 2, we consider the following
condition:
12
H4() For any 1 j r and for some 0 < < 1, it holds that
! +1;
n ! 0 and n1 n n!1
n 1 n;j n!1
! a j 1 < aj < 1
(37)
(38)
where , 2=(2 ).
The last requirement implies that the size of the jumps n;j , 1 j r goes to zero with the sample size
n at the same rate. For `short-memory' processes ( = 1), one may set = 0 (which is not allowed in
the assumption above). In fact, it is in that case to weaken slightly this assumption. Typical conditions
are, in such case n 1L(n)n 2 ! 1, where L(n) is a slowly varying function as n ! 1 (see Bai (1994)).
Since the main emphasis here is on processes with long-range dependence, we do not pursue that road.
In addition, the minimal size n of the interval between successive breakpoints (see Theorem 3) is set to
n = n ( 2). Under these assumptions, one may show by applying Theorems 3 and 7 that the sequence
fn ^tn g is tight in the sense that
(39)
lim lim P(n k^tn t? k1 ) = 0:
!1 n!0
Denote fB (s)gs0 the fractional Brownian motion (fBm) with self-similarity exponent =2 (for = 1,
B (s) is the standard Brownian motion). Recall that the fBm is (the unique up to a scale factor) continuous Gaussian process with stationary increments satisfying B (0) = 0, E(B (0)) = 0 and E(B (t)2 ) = t .
Its covariance kernel is
(t; s) = 1=2(s + t js tj ):
For the derivations that follow, we also need to introduce the two-sided fBm. This is, simirlarly, the
unique (up to a scale factor) continuous Gaussian process with stationary increments satisfying B~ (0) = 0,
E B~ (t) = 0, and E B~ (t) = jtj2. Its covariance kernel is given by
cov(B~ (t); B~ (s)) = 1=2(jtj + jsj jt sj )
The results below deeply rely upon an invariance principle, i.e. a functional form of the Central Limit
Theorem and a multi-dimensional Central Limit Theorem. Dene for n 2 N and m 2 N the following
sequences of polygonal interpolation functions fSn(m; s)gs2R, where Sn (m; 0) = 0 and
( Pm+[ns]
s>0
Sn (m; s) = Ptm=m+1 "t + ""m++["ns]+1 (ns(ns [ns])
[ns]) s < 0
t=m+1+[ns] t m+[ns]
(40)
We assume that
H5-a() (invariance principle) f"tgt2Zis a strict sense stationary process. In addition, there exists
a constant > 0 such that, for all m 2 N,
n =2Sn (m; s) n=!1
) B~ (s); s 2 [ 1; 1];
where B~ (k) is a two-sided fractional Brownian motion. Furthermore, for any sequence of positive
integers fmn gn2Nsuch that mn =n ! 1, fSn (1; s)gs2[ 1;+1] and fSn (mn ; s)gs2[ 1;+1] are asymptoticaly independent.
13
H5-b() (multi-dimensional CLT) For any positive integer r and any sequence of non-negative
integers fmn;ign2N, 1 i r, such that for all n 2 N, 1 < mn;1 < < mn;r < n, and
limn!1 mn;i =n = mi with 1 < m1 < < mr < 1, it holds that
0m1;n m2;n
1
n
X
X
X
n =2 @ "t ;
"t ; ;
"tA d! N (0; )
t=1
t=m1;n+1
t=mn;r +1
In the above expression, =) denotes the weak-convergence in the space of continuous function on [ 1; +1]
equipped with the uniform metric, and N (0; ) is the (r + 1)-dimensional multivariate Gaussian distribution with covariance matrix .
Assumption H5() with = 1 is veried forPa1wide-class of `short-memory' processes, e.g., for linear
is either a sequence of i.i.d.
j =0 j jfj j < 1 and f t gt2Z
2
random variables with zero-mean E( t ) = 0 and nite variance E( t P
) =n 2 , or f t g are martingale
2
2
2+
1
2
increments such that E( t ) = , supt0 E(j tj ) < 1 and n
t=1 E( t jFt 1) ! 1, where
Ft = ( s ; 1 s t). H5() also holds with = 1 under a wide range of mixing conditions, including
mixingales (McLeish (1975), Bai and Perron (1996)), strong-dependent processes (Doukhan, (1994)),
under appropriate conditions on the rate of decrease of the mixing coecients and conditions on the
moments (or on the tail of the distribution of "t ). In all of these situations, the matrix is diagonal
processes (Eq. (3)) under the assumption that
For our discussion it is more interesting to ask whether these assumptions hold for strongly dependent
processes. Invariance principles H5() have been derived by for interpolated sums of non-linear functions
of Gaussian variables that exhibit a long-range dependence in Taqqu (1975; 1977). In that case, however
matrix is no longer diagonal. Invariance principles have been also obtained by Taqqu (1977) and Ho
and Hsing (1997) for non-Gaussian linear processes (Eq. (3)).
We have the following result:
Theorem 8. Assume that H1(), H4() and H5() hold for some < 2. Then, for any 1 j r,
2
2
(41)
n2 (t^n;j t?j ) d! 2 argmin(jvj + 2B~(j ) (v))
v 2R
(
j
)
where B~ is a two{sided fractional Brownian motion. Furthermore, ^tn;j and t^n;k are asymptotically
independent if j 6= k.
Denote ^ n the vector of sample means: ^ n , (Y (1; ^t1); Y (t^1 + 1; ^t2); ; Y (t^r + 1; n)) and ^ ?n ,
(Y (1; t?1 ); Y (t?1 + 1; t?2); ; Y (t?r + 1; n)). Then, n1 =2(^n ^ ?n ) = oP (1). Assume in addition that
H5-b() holds. Then, n1 =2(^n ? ) is asymptoticaly normal.
Not surprisingly the limiting distribution of ^tn depends upon the memory parameter through the
normalizing constant = 2=(2 ). This result is fairly intuitive: for a given value of the memory
coecient , the spread of the estimated change-point instant ^tn;j increases as the magnitude of the
jump j decreases; on the other hand, for a given jump magnitude j , the spread of ^tn;j increases as increases.
Proof of Theorem 8: The proof is adapted from Bai (1994), Theorem 1. Denote s = (s1 sr )
[ M; +M] and dene
K~ n(s) , Kn ([t?1 + s1 n;1 ]; ; [t?r + sr n;r ])
14
2
and dene similarly V~n (s); W~ n (s) and J~n(s). The following result plays a key role in the derivations that
follow.
Lemma 3.2. Assume that H1() holds for 1 < < 2. Then,
nn 2K~ n (s)
r
X
sup
s2[ M;M ]r
2 ~ k=1
n Vn (s) = oP (1);
sup
s2[ M;M ]r n
sup
s2[ M;M ]r
nn 2W~ n (s)
where Sn;k (t?k ; sk ) is dened in (40).
2
!
sk a2k = o(1);
r
X
k=1
(42)
!
2
?
a2k =
n;k Sn;k (tk ; sk ) = oP (1)
(43)
(44)
P
2
?
Under assumption (H5-a)(), the process rk=1 a2k =
n;k Sn;k (tk ; sk ) converges (in the space of
P
continuous function on [ M; M]r equipped with the supremum norm) to rk=1 a2k B~(k) (sk ), where
(B~(k) ; 1 k r) are r independent copies of the fractional Brownian motion with self-similarity parameter
Then, Lemma 3.2 implies that the process J~n (s) (with polygonal interpolation) converges to
Pr =2.
(k)
2
2=(2 ) s. Then, argmins(jsj +2B~ (s)) = 2=(2 ) argminv (jvj +
k=1 ak (jsk j +2B~ (sk )). Let v , 2B~ (v)). The result follows from the continuous mapping with argmin functionals, see Kim and Pollard
(1990) (using the same arguments as in Bai (1994), Theorem 1).
Given the rate of convergence of the break dates, it is an easy matter to derive the asymptotic
distributions of the sample mean, along the same lines as in Bai (1994). .
4 Estimation of the number of change points
In the previous section, the number of change-points is assumed to be known. In many applications
however, the number of break fractions is not specied in advance, and inference about this parameter
also is important. Estimation of the number of break points has been addressed by Yao (1988), which
suggests the use of the Bayesian Information Criterion (also known as the Schwarz criterion). Yao (1988)
has shown that the estimate of the number of break points is consistent when the disturbance is a Gaussian
white noise. This work has later been expanded by Liu, Wu and Zidek (1997). In both cases, the basic
idea consists in adding a penalty term to the least-square criterion, in order to avoid over-segmentation.
The following result is a direct extension of Theorem 3:
Lemma 4.1. For any r 0 and for any r-partition t 2 An;r , let kt t?k1 = max1jr? min0kr+1 jtk
t?j j. If r r? and under the assumptions of Theorem 3, k^n ? k1 ! 0 when n ! 1. Moreover, the
uniform bound (17) still holds.
The proof is a straightforward adaptation of theorem 3 and is omitted. Lemma 4.1 means that,
even if the number of changes has been over-estimated, then, a sub-family (^k1 ; ^k2 ; ; ^kr? ) of ^n still
15
converges to the true set of change fractions ? . Note also that the uniform bounds Eqs. (24) and (25)
hold whatever the number r of estimated break fractions is. They even hold uniformly (w.r.t to the
number of break fractions), if this number is upper bounded.
We propose to estimate the conguration of change-points t? and the number of changes r? by
minimizing a penalized least-square procedure. For any R 2 N, for any 0 and any r Rr+1, we
consider the following estimates of (t? ; r? ):
(45)
(^tn ; r^n) = argmin argminfQn(t) + n rg;
0rR t2An;r
(^tn ; r^n ) = argmin argmin
fQn(t) + n rg;
(46)
0rR t2An;r
(47)
(^tn ; r^n ) = argmin argminfQn (t) + n rg;
0rR t2An;r
where the contrast functions Qn and Qn were dened in (16) and (28), and where fn g is decreasing
sequence of positive real numbers. We denote ^ n , ^ and ^ the associated estimators of the breakn
n
fractions. The choice of the penalty is discussed in the next theorem:
Theorem 9.
i) Assume that H1() is satised for some < 2. Then, for any sequence fng such that n ! 0
and nn ! 1, and such that H3( ) holds, (^ n ; r^n) converges in probability to ( ? ; r?) if r? R.
ii) Assume that H1() is satised for some < 2. Then, for any non increasing sequences fng and
(n ) such that n ! 0, n ! 0 and n2 nn ! +1, (^ n n ; r^nn ) converges in probability to
( ? ; r? ) if r? R.
iii) Assume that H1() is satised for some < 2. Assume also that n2 n ! +1. Then, for
any compact subset of Rr+1, (^n ; r^n ) converges in probability to ( ? ; r? ) if r? R provided
? = (?1 ; : : : ; ?r? +1 ) 2 .
iv) Assume that H2 is satised. Assume also that n n where
4
< ((1 1)
(48)
+ ) :
Then, for any > 0, (^n ; r^n) converges almost-surely to ( ? ; r?) if r? R provided ? .
Proof: We show i) rst. To establish the convergence in probability of (^ n ; r^n), it suces to show that
P(^rn 6= r? ) goes to 0. By denition of (^tn; r^n), we have
Qn (^tn) + n r^n Qn(t? ) + n r? :
(49)
Using (18) and (22), this latter relation implies
Kn (^tn ) + Vn (^tn) + Wn (^tn ) + n (^rn r? ) 0:
(50)
Then,
P(^rn = r) P(Kn (^tn ) + Vn (^tn) + Wn (^tn ) + n (r r? ) 0);
P(t2A
min (Kn (t) + Vn (t) + Wn (t)) + n (r r? ) 0):
(51)
n;r
16
Note that under H3() and (20), it holds that, for any > 0,
sup jVn (t)j n
nlim
!1 P t2A
n;r
!
= 0:
(52)
Assume rst that r < r? . Note that, for r < r? , we have k^tn t? k1 ? =r?, which implies that
?
Kn(^tn ) r? 2 :
(53)
Then, for any r < r? ,
? 2 (r? r) 0
P(^rn = r) P t2A
minn;r (Vn (t) + Wn (t)) + 2r
n
?
2
?
?
?
P tmax
j
V
(
t
)
j
=(2r
)
+
P
max
j
W
(
t
)
j
(54)
n
n
:
2A
t2A
n;r
r
and P(^rn = r) ! 0 as n ! 1, by application of (24), (25) and Lemma 2.2.
On the other hand, for any r? r R,
minn;r (Wn (t) + Kn(t)) 2n :
(55)
P(^rn = r) P t2A
minn;r Vn (t) + 2n 0 + P t2A
From (52), the rst term in the right hand term of (55) goes to 0 two terms go to 0 when n ! 1. The
second term requires additional attention. For any t 2 An;r , we have
+1 rX
r+1 rX
+1 n n
X
ki kj 2 ;
Kn(t) = n1
ij
k=1 i=1 j =1 nk
?
?
r+1 rX
+1 n
+1 rX
1X
ki S :
Wn(t) = 2n
ij kj
n
k=1 i=1 j =1 k
?
?
(56)
(57)
where ij , j?i ?j j, and nkj and Skj are dened in Eqs. (31) and (32), respectively (the dependence
of this quantity upon the r-partition t is implicit). Then, for any 1 k r and any 1 i; j r? , let
Ckj = ft 2 An;r ; nkj nn g ;
Ckj = ft 2 An;r ; nkj nn g :
From Lemma 2.2, there exists two constants A1 > 0 and A2 > 0 such that, for any c > 0, for any
1 k r and any 1 i; j r? , such that ij > 0,
j
S
j
n
kj
ki
?
?
( j )Skj + c(Kn(t) + n ) 0 P n max
ckj
P tmin
2Ckj n nk i
kj nn nkj
Ac21 (n n ) 2
(58)
! P min nnnki (?i ?j )Skj + c(Kn (t) + n ) 0 P n max
j
Skj j c n n
n
kj
n
t2Ckj
k
ij
Ac22 (n n ) 2 :
(59)
17
From (58) and (59), and using the fact that nn ! +1, we conclude that
n = 0:
(60)
(W
(
t
)
+
K
(
t
))
lim
P
min
n
n!1 t2An;r n
2
Finally, for any 0 r R, P(^rn = r) ! 0 when n ! 1 if r 6= r? . ii) and iii) can be shown along the
same lines.
P
We nally show the strong consistency of (^ n ; r^n) using the fact that, for any r 6= r? , n>0 P(^rn =
r) < 1 under H2, if (n ) satises (48). Indeed, nn > n1 with
+ ) + 4 :
1
> (32(1
+ )
?
We can apply Theorem 2 to P(^rn = r), with r > r , by setting un = nn . On the other hand, we can
apply Theorem 2 to P(^rn = r), with r < r? by setting un = n since n ! 0. 5 Numerical examples
In this section, we present a limited Monte-Carlo experiment. The disturbance " is a fractional Gaussian
noise, i.e. a covariance stationary process with zero-mean, and spectral density function given by
2 (sin ) 2d
f() = 2
d is the long-range dependence parameter. We use the Hoskings's method to simulate this time series
(1992). In the simulation, we set 2 = 1 and d = 0:3. There are two change-points at time 1? = 0:25 and
2? = 0:5. The function is dened as follows:
8 2 0 t < 0:25
<
(t) = : 0 0:25 t < 0:5
1 0:5 t 1
We simulate 50 realizations of the sequence Y1 ; : : :Yn, with dierent values of n. A typical realization
(n = 1000 samples) is displayed in Figure 1.
Since the number of change-points is assumed to be unknown, the penalized least-squares estimator
has been computed for each one of these realizations. if the statistical structure of the process would
exactely known, an upper bound of the regularization factor could be computed:
n = 4 log(n)=n1 2d:
Not surprisingly, the penalization is typically higher than in the i.i.d case (see Yao (1988)). The histograms
of the estimated change-points are displayed in Figure 2 while Table 1 gives the estimated numbers of
change-points.
The coecient of penalization n has been chosen by minimizing (45) in order to obtain approximatively the same number of over- and under- estimation of the change-points. It is clearly seen in
this example, that the estimated number of changes converges to the true value r? = 2 when n increases. We remark also that the distribution of the estimated change-points concentrates around the
true change-points 1? = 0:25 and 2? = 0:5 when n ! 1.
18
6 Acknowledgements
The authors are deeply indebted to the two anonymous reviewers for their very constructive suggestions
and for drawing our attention to relevant works, in particular the reference Bai and Perron (1996) which
was a fruitfull source of inspiration.
7 Proofs
Proof of Theorem 1: The proof of proposition (5) directly follows the proof proposed by Moricz,
Sering and Stout, (1982). Thus, for any m n, we have:
P(mmax
b jS j > ) P((bm jS1:m j > =2) + P(m+1
maxkn bk jSm+1:k j > =2);
kn k 1:k
n
b2
X
C(")
C(")m
m
1
4 2 + 4A() 2 (n m)
b2t :
t=m+1
(61)
Proof of Corollary 2.1: Using Theorem 1, we have, for any p 0, m > 0 and > 1:
2p+1
Xm 1 2
A()C(")
C(")
1
p
2
p
bt :
+
4
(2
m)
b
(2
m)
P 2p mmax
b
j
S
j
>
)
4
p
2m
2
2
k<2p+1 m k 1:k
t=2p m+1
For bk = k and > =2, we have:
2
X
(2p m) b22p m = m 2 2p( 2) 1 m 2 2
p=0
p=0
1
X
1
On the other hand, there exists a constant D < 1 such that:
1
X
p=0
(2p m) 1
We conclude using the fact that:
P(max
b jS j > ) km k 1:k
1
X
p=0
2p+1X
m 1
t=2 +1
pm
b2t Dm( 2) :
P(2p mmax
b jS j > ):
k<2p+1 m k 1:k
0 can be decomposed as C 0 = S C 0 (I ), where the union
Proof of equation (36): The set C;;n
;;n
I ;;n
is over all the subsets I of f1; ; rg, and
0 (I ) = ft 2 An;r ; 2 2 tk t? n? ; 8k 2 I ; 0 tk t? < 2 2 ; 8k 62 Ig
C;;n
k
k
0 , denote nkk = t? tk 1, nk;k+1 = tk t? , nk = tk tk 1 and n? = t? t? ; the
For any t 2 C;;n
k
k
k
k k 1
dependence of these quantities on t and t? is implicit. Note that nk = nk;k +nk;k+1 and n?k = nk;k +nk 1;k,
19
0 , one may show that
and nk;k=nk (1 )? . For all t 2 C;;n
r
X
Kn (t) = n1 nkk nnk;k+1 2k ;
k
k=1
r
1X
Skk Sk;k+1 2 nk+1;k+1 Sk+1;k+1
n
kk
Vn (t) = n nk;k+1 n n
n?k+1 nk+1;k+1
k
kk nk;k+1
k=1
r
X
Wn (t) = n2 k nk;k+1 Snkk + nnkk Sk;k+1 :
k
k
k=1
where, for 1 i j r + 1, Sij =
0 ,
t 2 C;;n
Ptj
t=ti 1 +1 "t
Sk;k+1 2
nk;k+1
(62)
!
(63)
(64)
and k = ?k+1 ?k . In addition, we have, for all
2 2
min Kn (t) (1 )? 2 ;
t2C0
!
;;n
(65)
jS
r
X
S2
S 2 +1
k+1;k+1j + jSk;kj
Vn (t) n1
(66)
+
2
j
S
j
nk;k+1 nk2 +1;k+1 + k;k
k;k+1 n
nk
nk;k
k+1;k+1
k+1;k+1
k=1
Using these expressions for Kn (t), Vn (t) and Wn(t), and the above bounds, we obtain sharper bounds
0 . First, by using Lemma 2.2 and its corollary, there exists nite constants C1; C2; C3
for Jn (t) on C;;n
such that, for all 1 k r and for all c > 0,
!
jSkk j c sup P
P t2C
max
0
n
;;n
P t2C
max
0
;;n
Z sn (1 ) !
Xt p
2
Skk
"i n(1
c
sup
P
sup
nkk
t2Z sn i=t s+1 0
t2
kk
S2
!
Pt
2
"j
j
sup ? i=t s s+1 i c C1 n c2 ;
!
!
k;k+1
PB
P t2C
max
@ max
0;;n nk;k+1 c sup
t2Z 1sn ?
Pt+s
i=t+1 "i
s
1
)c? C2 n c
2
1
1
cC
A C3 n c :
(67)
(68)
(69)
Let I be an arbitray subset of f1; ; rg. There exist nite constants C4; C5 (that does not depend upon
, , nor on the subset I ) such that, for all n 1
!
!
Pt+s " j
j
j
S
j
k;k
+1
i=t+1 i c C 2 2 ;
P max
max
c
sup
P
max
4 c2
0 (I) nk;k+1
k2I t2C;;n
s
t2Z s 2 2
!
t+s
X
!
22 :
"
j
c
C
j
P max
max
j
S
j
c
sup
P
max
max
i
5
k;k
+1
0 (I)
k62I t2C;;n
c2
t2Z k62I 0s 2 2 i=t+1
Using (64), (65) and (66), there exists c > 0 small enough so that it holds
20
(70)
(71)
P(^tn 2 C 0
;;n)
!
P t2Cmin
(Kn (t) + Vn (t) + Wn(t)) 0
0
;;n (I)
r
X
+
+
+
+
+
+
+
k2I
X
k62I
X
k2I
X
k62I
r
X
P c(1
P c(1
2
)n? 2 t2Cmax
0 (I) Sk;k+1=nk;k+1
P c(1
2 2
)? 2 P c(1
)? 2 t2Cmax
0 (I) jSk;k+1j=nk;k+1 (jSk+1;k+1j=nk+1;k+1 + jSk;k j=nk;k)
P c(1
2 2
)? 2 ;;n
2 =nk
t2Cmax
Sk;k
+1
0
;;n (I)
!
t2Cmax
0 (I) jSk;k+1j (jSk+1;k+1j=nk+1;k+1 + jSk;k j=nk;k)
;;n
!
k2I
P c t2Cmax
jSk;k+1j=nk;k+1
0
;;n (I)
k62I
P c(1
!
;;n
P c t2Cmax
jSk;k j=nk;k
0
;;n (I)
X
(73)
!
;;n
k=1
X
!
2
2
)? 2 t2Cmax
0 (I) Sk+1;k+1=nk+1;k+1
k=1
X
(72)
!
!
2 2
)? 2 1 t2Cmax
0 (I) jSk;k+1j
;;n
:
The proof is concluded by bounding each term in the previous sum using relations (67)-(71). Proof of Lemma 3.2: We prove these properties for the positive orthant: 0
because of symmetry. Let
si M, 1 i r
Kn(M) , ft 2 An;rn ; tk = [t?k + sk n;k1 ]; 0 sk M; k 2 f1; ; rgg:
Eq. (42) follows directly from (62). To prove, Eq. (43), rst note that (63) implies that
!
jSk;kj jSk+1;k+1j 1 1 2
Sk2+1;k+1
Sk;k
2
~
+ 2jSk;k+1j n + n
+ Sk;k+1 n + n
nVn (s) nk;k+1 n2 + n2
k;k
k+1;k+1
k
k+1
k;k
k+1;k+1
k=1
(74)
r
X
where, as before, nij = #(Tij ), with
Tij = f[t?i 1 + si 1 n;i ] + 1; [t?i + si n;i ]g \ f[tj 1 + sj 1 n;j ] + 1; [tj + sj n;j ]g
21
!
P
and Sij = t2Tij "t . Now, for any > 0, we have
P
nn 2 jV~n(s)j !
r
X
2
Sk;k
(75)
P t2Kmax
c2n = (A)
2
(
M
)
n
n
k;k
k=1
jSk;kj jSk+1;k+1j r X
2
cn = (B)
+ P t2Kmax(M ) jSk;k+1j n + n
+
k=1
r
X
k;k
n
k+1;k+1
2 = (C):
2
P t2Kmax(M ) Sk;k
+1 cn
n
k=1
where c > 0 is a suciently small constant. Using (67), we have (A) = O(n 2 2) = o(1). Similarly,
we have by applying (68)-(69)
r X
(B) P t2Kmax(M ) jSk;k+1j 1n n"= + P t2Kmax(M ) jnSk;kj cn n "= ;
n
n
k;k
k=1
2
2
"=
(1
"
)(
2)
= O(n
) + O(n
) = o(1):
Finally, using again (67)-(71), we have (C) = O(n 1 n ) = o(1), concluding the proof of (43). The proof
of (44) is along the same lines and is omitted. References
Bai, J., (1994), Least Square estimation of a shift in linear processes. J. Time Series Analysis, 15, 5,
453{472.
Bai, J. and Perron, P., (1996), Estimating and testing linear models with multiple structural changes. To
appear in Econometrica.
Basseville, M. and Nikiforov, N., (1993), The Detection of abrupt changes - Theory and applications.
Prentice-Hall: Information and System sciences series.
Battacharya, P., (1987), Maximum likelihood estimation of a change-point in the distribution of independent random variables: general multiparameter case. J. of Multivariate Anal., 32, 183{208.
Beran, J., (1992), Statistical methods for data with long-range dependence. Statistical Science, 7, 404{
427.
Birnaum, Z. and Marshall, A., (1961), Some multivariate Chebyshev inequalities with extensions to
continous parameter processes. Ann. Math. Stat., 32, 682{703.
Brodsky, B. and Darkhovsky, B., (1993), Nonparametric methods in change-point problems. Kluwer
Academic Publishers, the Netherlands.
Doukhan, P., (1994), Mixing, properties and examples. Lecture notes in statistics, 85,, Springer Verlag.
Doukhan, P. and Louhichi, S., (1997), Weak dependence and moment inequalities. Preprint University
Paris-Sud.
22
Hajek, J. and Renyi, A., (1955), Generalization of an inequality of Kolmogorov. Acta. Math. Acad. Sci.,
6, 281{283.
Ho, H. C. and Hsing, T. , (1997) Limit theorem for functional of moving averages, The Annals of
Probability, 25(4), 1636{1669.
Hawkins, D., (1977), Testing a sequence of observations for a shift in location. J. Am. Statist. Assoc.,
72, 180{186.
Hinkley, D., (1970), Inference about the change point in a sequence of random variables. Biometrika, 57,
1{17.
Kim, J. and Pollard, D., (1990), Cube root asymptotics. The Annals of Stat., 18, 191{219.
Liu, J., Wu, S., and Zidek, J., (1997), On segmented multivariate regression. Statistica Sinica, 7, 497{525.
McLeish, D., (1975), A maximal inequality and dependent strong laws. The Annals of Probability, 5,
829{839.
Moricz, F., Sering, R., and Stout, W., (1982), Moment and probability bounds with quasi-superadditive
structure for the maximum partial sum. The Annals of Prob., 10, 4, 1032{1040.
Picard, D., (1985), Testing and estimating change points in time series. J. Applied Prob., 17, 841{867.
Rio, E., (1995), The functional law of iterated logarithm for strongly mixing processes. The Annals of
Prob., 23, 3, 1188{1203.
Sen, A. and Srivastava, M., (1975), On tests for detecting change in the mean. The Annals of Stat., 3,
96{103.
Taqqu, M., (1975), Weak convergence to the fractional Brownian motion and to the Rosenblatt process.
Z. Wahrsch. verw. Geb., 31, 287{302.
Taqqu, M., (1977), Law of the iterated logarithm for sums of non-linear functions of Gaussian variables
that exhibit long range dependence. Z. Wahrsch. verw. Geb., 40, 203{238.
Yao, Y., (1988), Estimating the number of change-points via Schwarz criterion. Stat. & Probab. Lett., 6,
181{189.
23
6
5
4
3
2
1
0
−1
−2
−3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 1: A realization of the process Y with n = 1000 and with change-points at 1? = 0:25 and 2? = 0:5.
24
50
50
40
n=500
40
30
30
20
20
10
10
0
0
0.5
(a)
0
0
1
50
0.5
(b)
1
50
40
n=2000
40
30
30
20
20
10
10
0
0
n=1000
0.5
(c)
0
0
1
n=5000
0.5
(d)
1
Figure 2: The empirical distributions of the estimated change-points obtained with dierent values of n:
(a) n = 500, (b) n = 1000, (c) n = 2000, (d) n = 5000.
25
n n r^ = 1 r^ = 2 r^ = 3 r^ = 4
500 20 0.12 0.74 0.14 0.00
1000 40 0.06 0.86 0.06 0.02
2000 100 0.04 0.96 0.00 0.00
5000 200 0.02 0.98 0.00 0.00
Table 1: Estimation of the number r of changes, with dierent values of n. For example, 0.86 means
that, with n = 1000, we obtained 43 times an estimate r^ = 2 among the 50 simulations.
26