1 Proof of Corollary 1

Supplementary Materials to the Manuscript
“Nonparametric Maximum Likelihood Approach to
Multiple Change-Point Problems”
Changliang Zou, Guosheng Yin, Long Feng, and Zhaojun Wang
Nankai University and The University of Hong Kong
1
Proof of Corollary 1
Firstly, we prove a useful lemma, which is essentially similar to the proof of
Lemma 2 in the paper.
Lemma A.1. Suppose that Assumptions (A1)–(A3) hold. For any ² > 0, as
n → ∞, there exists a constant M² , if δ > M²
(
Pr
sup
τm−1 ≤k<l<τm−1 +δ
ξm (k, l) ≥ w
)
< ²,
where w = C² (log δ)2 and C² is given in the proof.
Proof. Without loss of generality, suppose that Fm is uniform on [0, 1] and
0 < X1 < · · · < Xn < 1. Then, we have
Z X(n)
ξm (k, l) = nkl
H(F̂kl (u), u){F̂n (u)(1 − F̂n (u))}−1 dF̂n (u),
X(1)
where nkl = l − k, and
½
H(x, y) =
µ ¶
µ
¶¾
x
1−x
x log
+ (1 − x) log
.
y
1−y
1
(A.1)
By setting a = 3h−1 (1 + α)δ −1 log δ ≡ Dα δ −1 log δ, 0 < α < 1/2, and noting
that h(1 + α) > 0, we write
ÃZ
Z 1−a Z
a
+
+
ξm (k, l) = nkl
X(1)
a
X(n)
!
H(F̂kl (u), u){F̂n (u)(1 − F̂n (u))}−1 dF̂n (u)
1−a
≡ ∆1 + ∆2 + ∆3 .
First, we provide an upper bound for Pr(supk,l ∆1 ≥ w/3), where ∆1 ≡
∆11 + ∆12 with
Z
∆11 = nkl
a
X(1)
Z
a
∆12 = nkl
X(1)
!
u
F̂kl (u)
dF̂n (u),
u
F̂n (u)(1 − F̂n (u))
Ã
!
1 − F̂kl (u)
1−u
1 − F̂kl (u)
log
dF̂n (u).
1−u
1−u
F̂n (u)(1 − F̂n (u))
F̂kl (u)
log
u
Ã
To show this, we choose λ²1 such that as n → ∞,
°
°1
°
°
³
´
nu
°
°
Pr ku/F̂n (u)k1X(1) > λ²1 ≤ Pr °
°
° (τm − τm−1 )F̂ττm (u) °
m−1
X

> λ²1 
(1)
≤ qm eλ²1 exp(−qm λ²1 ) < ²/18,
based on Assumption (A2) and the fact that
1
Pr(ku/Gn (u)k1X(1) > λ) ≤ Pr(kG−1
n (u)/uk1/n ≥ λ) ≤ eλ exp{−λ}
by Lemma 1-(iv). Similarly,
Pr(kF̂n−1 (u)/uk11/n > λ²1 ) < ²/18.
o
S n l
1
Also, we consider the event Am ≡ k,l kF̂k (u)/uk0 > λ²2 δ/nkl and thus
Ã
!
[ nkl
Pr(Am ) = Pr
kF̂kl (u)/uk10 > λ²2
δ
k,l
Ã
!
[
m−1 +δ
≤ Pr
kF̂ττm−1
(u)/uk10 > λ²2
³
= Pr
k,l
m−1 +δ
kF̂ττm−1
(u)/uk10
2
´
> λ²2 ≤ eλ−1
²2 < ²/18,
by choosing a proper λ²2 . In parallel, we can have
³
´
Pr k(1 − u)/(1 − F̂n (u))k10 > λ²1 < ²/18,
and P (Bm ) <
eλ−1
²2
< ²/18, where Bm ≡
S
n
o
l
1
k,l k(1 − F̂k (u))/(1 − u)k0 > λ²2 δ/nkl .
For the interaction of the events Ām , ku/F̂n (u)k10 ≤ λ²1 , and kF̂n−1 (u)/uk11/n ≤
λ²1 , we have
Z
a
∆11 = nkl
X(1)
Ã
F̂kl (u)
log
u
F̂kl (u)
u
¶
!
1
u
F̂n (u) (1 − F̂n (u))
dF̂n (u)
µ
δ
δ
≤ −nkl λ²2 log
λ²2 log λ²1 log(1 − F̂n−1 (a))
nkl
nkl
≤ −δλ²2 log (δλ²2 ) λ²1 log(1 − λ²1 a)
≤ δaλ²2 λ2²1 log δ,
Consequently,
¡
¢
Pr sup ∆11 ≥ w/3
k,l
≤ Pr(Am ) + Pr(kF̂n−1 (u)/uk11/n > λ²1 ) + Pr(ku/F̂n (u)k1X(1) > λ²1 )
+ dδ 2 e Pr(∆11 ≥ w/3)
1
1
≤ ² + dδ 2 e Pr(Dα λ²2 λ2²1 (log δ)2 ≥ w/6) = ²,
6
6
where the probability Pr(Dα λ²2 λ2²1 (log δ)2 ≥ w) is zero when C² > 6Dα λ²2 λ2²1 .
Similarly, we can show that Pr(supk,l ∆12 ≥ w/6) ≤ ²/6, and then
Pr(sup ∆1 ≥ w/3) ≤ Pr(sup ∆11 ≥ w/6) + Pr(sup ∆12 ≥ w/6) < ²/3.
k,l
k,l
k,l
By symmetry, we immediately have
³
´ 1
Pr sup ∆3 ≥ w/3 ≤ ²
3
k,l
3
for δ > M² as n → ∞. Thus, it remains to give a bound of Pr(supk,l ∆2 ≥
w/3). Following similar argument in the proof of Theorem 3.1 of Jager and
Wellner (2007), we can express H(F̂kl (u), u) as
H(F̂n (u), u) =
1 (F̂kl (u) − u)2
,
2 F̂kl∗ (u)(1 − F̂kl∗ (u))
for 0 < u < 1 where |F̂kl∗ (u) − u| ≤ |F̂kl (u) − u|. Then, we rewrite ∆2 as
Z
1−a
nkl (F̂kl (u) − u)2
u(1 − u)
dF̂n (u)
∗
∗
u(1 − u)
F̂ (u)(1 − F̂kl (u)) F̂n (u)(1 − F̂n (u))
a
°
°1−a ° kl
°1−a °
°1−a Z
° °
° °
°
1−a
1°
dF̂n (u)
° nkl (F̂kl (u) − u)2 ° ° u ° ° 1 − u °
≤ °
.
° ° ∗
° °
°
° ° F̂kl (u) ° ° 1 − F̂kl∗ (u) °
2°
u(1 − u)
F̂
(u)(1
−
F̂
(u))
a
n
n
a
a
a
1
∆2 =
2
Consider the event Cm ≡
S
k,l
o
n¯
¯
¯
¯kF̂ l (u)/u − 1k1−a
> α for some 0 < α < 1
a
k
and, by applying Lemma 1-(v), we have
2
Pr(Cm ) ≤ dδ e Pr
³
|kF̂kl (u)/u
−
1k1−a
a |
´
>α
²
≤ 2 exp{2 log(dδe) − δah(1 + α)} ≤ 2δ −1 < 2M²−1 < .
9
On the event C̄m and |F̂kl∗ (u)/u − 1| < |F̂kl (u)/u − 1| < α, we have
°
°
° u °1−a
1
°
°
<
.
° ∗
°
° F̂kl (u) °
1−α
a
Symmetrically, we also have
°
°
° 1 − u °1−a
1
°
°
<
°
°
° 1 − F̂kl∗ (u) °
1−α
a
on the event D̄m , where Dm ≡
S
o
n
1−a
l
k,l |k(1 − F̂k (u))/(1 − u) − 1ka | > α
occurs with the probability smaller than ²/9. On the other hand, by using
4
Lemma 1-(v) again, it is easy to see that, for sufficiently large M² , if δ > M² ,
as n → ∞,
Z 1−a
{F̂n (u)(1 − F̂n (u))}−1 dF̂n (u) ≤ −2 log a + Cα ≤ 3 log δ,
a
where the constant Cα depends on α.
Now, let us turn to the term knkl (F̂kl (u) − u)2 /{u(1 − u)}k1−a
a . Let % =
p
(w/ log δ)1/2 . By taking q(t) = t(1 − t) in the Inequality 11.2.1 of Shorack
and Wellner (1986, page 446), for sufficiently large M² , if δ > M² ,

°
°
¾
½
Z 1/2
° n (F̂ l (u) − u)± °1/2
1
1
°
°
kl
k
±
2
Pr ° p
exp − γ % (1 − t) dt
° ≥ % ≤ 6
°
t
8
u(1 − u) °a
a
½
¾
1 ± 2
≤ 12 exp − γ % log δ,
16
√
where γ − = 1, γ + = ψ(%/ δa), and ψ(x) = 2h(1 + x)/x2 . By using the
fact that ψ(x) ∼ 2(log x)/x as x → ∞ (Proposition 11.1.1 in Shorack and
1/2
Wellner, 1986), γ + ∼ log C² /C²
for sufficiently large C² . Consequently, we
have


°
°
° n (F̂ l (u) − u)2 °1/2 2(1 − α)2 w
° kl k
°

Pr sup °
° ≥
°
°
u(1 − u)
9 log δ
k,l
a

°
°
° n (F̂ l (u) − u)± °1/2
°
° kl
≤ dδ 2 e Pr ° p k
° ≥ %
°
u(1 − u) °a
µ
¶
1 + 2
≤ 12 exp 2 logdδe − γ % log δ
16
1
1/2
< 12 log δ(δ + 1)2− 16 log C² C²
1/2
1
2− 16
log C² C²
< 12 log M² M²
<
²
18
as long as M² and C² are sufficiently large. By symmetry, we can also show
5
that


°
°
° n (F̂ l (u) − u)2 °1−a 2(1 − α)2 w
° kl k
°
< ² .
Pr sup °
≥
°
°
u(1 − u)
9 log δ
18
k,l °
1/2
Finally, we can obtain that for sufficiently large M² , if δ > M² ,
¡
¢
Pr sup ∆2 ≥ w/3 ≤ Pr(Cm ) + Pr(Dm )
k,l


°
°
° n (F̂ l (u) − u)2 °1−a
2
1
²
° kl k
°
+ Pr sup °
log δ ≥ w < ,
°
2
°
u(1 − u)
(1 − α)
9
3
k,l °
a
which completes the proof of this lemma.
2
Proof of Corollary 1
It suffices to show that for any ² > 0, there exists δ > 0,
Pr (|τ̂s − τs | < δ, s = 1, · · · , K) > 1 − ², as n → ∞.
or Pr {Gn (K) ∈ CK (δ)} > 1 − ².
For every (τ10 , . . . , τK0 n ) ∈ Dr (K, δ),
max
0 )∈D (K,δ)
(τ10 ,...,τK
r
Rn (τ10 , . . . , τK0 )
≤ Rn (τ10 , . . . , τK0 , τ1 , . . . , τr−1 , τr − δ, τr + δ, τr+1 , . . . , τK )
ep ((log δ)2 ; K),
≤ Rn (τ1 , . . . , τK ) − δSn (Fr , Fr+1 ) + O
by Lemma A.1. Thus, we know that, for any ² > 0, as n → ∞,
¶
µ
0
0
Pr (Gn (K) ∈ Dr (K, δ)) ≤ Pr
max
Rn (τ1 , . . . , τK ) < Rn (τ1 , . . . , τK )
0 )∈D (K ,δ)
(τ10 ,...,τK
r
n
³
´
ep ((log δ)2 ; K) > δηmin
≤ Pr O
³
´
2
2
e
≤ Pr Op ((log δ) ; K) > C² (log δ) < ²/K
6
for sufficiently large δ > M² . Consequently,
(
)
[
Pr {Gn (K) ∈ CK (δ)} = 1 − Pr
{Gn (K) ∈ Dr (K, δ)}
r
≥1−
K
X
Pr {Gn (K) ∈ Dr (K, δ)} > 1 − ²,
r=1
which completes the proof.
2
2
Other Simulation Results
Figure A.1 shows 1000 observations simulated from Models (I) and (II) with
noise σ = 0.5 and error distributions of N (0, 1), t(3) and χ2(1) , respectively.
Table A.1 presents the average values of ξ(Ĝn kCt ) and ξ(Ct kĜn ) of PL, NMCD
and NMCD* for n = 500 and 1000 and σ = 0.1, 0.25, 0.5, when Kn is known
to be 11.
b n , the estimated numbers of changeFigure A.2 shows the histograms of K
points using NMCD and PL with Kn selected by the BIC when n = 500 and
bn
σ = 0.5 for the three considered error distributions. Under the NMCD, K
is always centered around the true value Kn = 11. In contrast, the estimated
numbers of change-points using PL in the two non-normal cases are much
larger than the true value, which results in much larger estimation errors in
terms of ξ(Ct kGˆn ) as shown in Table A.2. In addition, Table A.2 shows that
the performance of our NMCD procedure in terms of n−1 ξ(Ct kGˆn ) improves
substantially when n increases, while that of PL does not change much.
Figure A.3 exhibits the curves of ξna = n−1 ξ(Gˆn kCt ) and ξnb = n−1 ξ(Ct kGˆn )
using the NMCD with respect to the sample size n when σ = 0.5 for the
three error distributions. Clearly, the NMCD method tends to has smaller
errors when the sample size becomes larger, demonstrating its estimation
consistency.
7
Model (III) has four data segments from different distributions. The left
panel of Figure A.4 exhibits 1000 simulated data points and the right panel
shows the curves of ξna = n−1 ξ(Gˆn kCt ) and ξnb = n−1 ξ(Ct kGˆn ) of NMCD with
respect to the sample size n. Both errors decrease fast as the sample size
increases, which demonstrates that the NMCD is able to produce consistent
estimates for Kn and τk ’s under Model (III).
8
N(0, 1)
10
15
20
25
30
1.0
Percentage
5
10
15
20
25
30
0
10
15
N(0, 1)
t(3)
χ2(1)
20
25
30
30
20
25
30
0.8
0.6
Percentage
0.2
0.0
0.2
0.0
15
25
0.4
0.8
0.6
Percentage
0.4
0.8
0.6
0.4
10
20
1.0
PL
(c)
1.0
PL
(b)
0.2
5
5
PL
(a)
0.0
0
0.4
0.2
0.0
0
1.0
5
0.6
0.8
1.0
0.0
0.2
0.4
Percentage
0.6
0.8
1.0
0.8
0.6
0.4
0.0
0.2
Percentage
0
Percentage
χ2(1)
t(3)
0
5
10
15
NMCD
(e)
NMCD
(d)
20
25
30
0
5
10
15
NMCD
(f)
Figure A.1: Simulated 1000 observations from Model (I) (the top three plots) and
Model (II) (the bottom three plots) with noise σ = 0.5 and error distributions of
N (0, 1), t(3) and χ2(1) (from left to right), respectively.
9
Model (I)
Model (I)
−3
−4
−2
−2
−1
−2
−1
0
0
0
1
1
2
2
2
3
4
3
Model (I)
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
N(0,1)
t(3)
χ2
(1)
Model (II)
Model (II)
Model (II)
800
1000
800
1000
−20
0
−5
−10
0
0
5
10
20
5
10
30
0
0
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
t(3)
N(0,1)
600
χ2
(1)
Figure A.2: Histograms of the estimated numbers of change-points using NMCD
and PL with Kn selected by the BIC for n = 500 and σ = 0.5 under Model (I).
10
χ2
(1)
ξnb
ξna
ξnb
ξna
0.015
0.010
ξ n
0.03
0.005
0.000
0.00
0.000
0.01
0.005
0.02
ξ n
0.010
ξ n
0.04
0.015
0.05
ξnb
ξna
0.020
t(3)
0.06
0.020
N(0, 1)
200
600
1000
1400
200
n
600
1000
1400
200
n
600
1000
n
Figure A.3: The curves of ξna = n−1 ξ(Gˆn kCt ) and ξnb = n−1 ξ(Ct kGˆn ) using the
NMCD with respect to the sample size n under model (I) with σ = 0.5.
11
1400
8
0.15
Model (III)
0.00
−2
0
0.05
2
ξ n
4
0.10
6
ξnb
ξna
0
200
400
600
800
1000
500
1500
n
Figure A.4: The left panel: 1000 observations simulated from Model (III); the
right panel: ξna = n−1 ξ(Gˆn kCt ) and ξnb = n−1 ξ(Ct kGˆn ) curves using the NMCD
with respect to the sample size n under model (III).
12
2500
Table A.1: Comparison of the PL, NMCD, and NMCD* methods when the number of change-points Kn is specified (known) under Models (I) and (II), respectively. The standard deviations are given in parentheses.
ξ(Ĝn kCt )
Model
(I)
Error
N (0, 1) 500
t(3)
χ2(1)
(II)
n
χ2(1)
PL
NMCD
NMCD*
PL
NMCD
NMCD*
0.1 0.00(0.00) 0.00(0.00) 0.00(0.00)
0.00(0.00) 0.00(0.00) 0.00(0.00)
0.25 0.04(0.22) 0.04(0.21) 0.13(0.44)
0.04(0.22) 0.04(0.21) 0.13(0.44)
0.5 0.96(1.19) 0.96(1.14) 1.16(1.15)
0.96(1.19) 0.96(1.14) 1.16(1.15)
1000 0.1 0.00(0.00) 0.00(0.00) 0.00(0.00)
0.00(0.00) 0.00(0.00) 0.00(0.00)
0.25 0.03(0.18) 0.03(0.17) 0.07(0.30)
0.03(0.18) 0.03(0.17) 0.07(0.30)
0.5 0.91(1.15) 0.97(1.16) 1.06(1.21)
0.91(1.15) 0.97(1.16) 1.06(1.21)
0.1 0.10(0.91) 0.05(0.25) 0.08(0.30)
0.13(1.15) 0.05(0.26) 0.08(0.30)
0.25 2.13(5.19) 0.60(0.98) 0.64(1.01)
2.21(6.88) 0.60(0.98) 0.64(1.01)
0.5 13.6(12.0) 3.77(4.48) 3.86(4.33)
14.3(18.4) 3.95(7.51) 3.97(7.63)
1000 0.1 0.42(3.77) 0.05(0.27) 0.07(0.30)
0.39(4.64) 0.05(0.27) 0.07(0.30)
0.25 3.58(10.5) 0.65(0.91) 0.69(0.97)
3.92(15.9) 0.65(0.91) 0.69(0.97)
0.5 20.2(21.3) 2.58(2.50) 2.90(2.72)
21.9(34.5) 2.56(2.40) 2.90(2.72)
0.1 0.01(0.08) 0.01(0.12) 0.03(0.17)
0.01(0.08) 0.01(0.12) 0.03(0.17)
0.25 0.15(0.40) 0.15(0.37) 0.19(0.45)
0.15(0.40) 0.15(0.37) 0.19(0.45)
0.5 1.39(2.91) 0.70(0.80) 0.80(1.22)
1.13(1.57) 0.70(0.80) 0.81(1.41)
1000 0.1 0.00(0.06) 0.01(0.10) 0.02(0.14)
0.00(0.06) 0.01(0.10) 0.02(0.14)
0.25 0.00(0.06) 0.14(0.36) 0.20(0.46)
0.00(0.06) 0.14(0.36) 0.20(0.46)
0.5 1.05(2.15) 0.59(0.77) 0.58(0.71)
0.99(1.38) 0.59(0.77) 0.58(0.71)
0.5 1.59(1.72) 2.35(2.42) 3.34(4.96)
1.59(1.72) 2.35(2.42) 3.34(4.96)
1000 0.5 1.58(1.52) 2.68(2.59) 2.74(2.89)
1.58(1.52) 2.68(2.59) 2.74(2.89)
500
0.5 13.6(25.8) 4.75(6.87) 6.42(8.84)
7.52(10.2) 4.54(5.19) 6.05(6.42)
1000 0.5 16.4(40.2) 4.10(3.88) 5.27(7.20)
10.3(18.0) 4.10(3.88) 5.24(6.85)
500
0.5 6.36(11.3) 1.57(2.12) 1.65(2.90)
5.88(8.93) 1.57(2.12) 1.65(2.90)
1000 0.5 4.80(67.8) 1.17(1.45) 1.49(2.10)
4.80(7.82) 1.17(1.45) 1.49(2.10)
500
500
N (0, 1) 500
t(3)
σ
ξ(Ct kĜn )
13
Table A.2: Comparison of the PL and NMCD methods when the number of
change-points Kn is unknown (Kn is selected using the BIC) under Models (I) and
(II), respectively. The standard deviations are given in parentheses.
PL
Model
(I)
Error
N (0, 1) 500
t(3)
χ2(1)
(II)
n
χ2(1)
ξ(ĜkCt )
ξ(Ct kĜ)
b n − Kn |
|K
ξ(ĜkCt )
ξ(Ct kĜ)
b n − Kn |
|K
0.1 0.00(0.00) 1.89(9.19) 0.11(0.37)
0.00(0.00) 0.00(0.00) 0.00(0.00)
0.25 0.05(0.25) 1.63(8.12) 0.10(0.33)
0.05(0.24) 0.06(0.26) 0.00(0.00)
0.5 0.93(1.08) 2.16(6.57) 0.09(0.31)
0.96(1.34) 0.99(1.05) 0.00(0.04)
1000 0.1 0.00(0.00) 2.78(17.6) 0.07(0.30)
0.00(0.00) 0.00(0.00) 0.00(0.00)
0.25 0.04(0.21) 1.62(12.9) 0.05(0.22)
0.04(0.23) 0.05(0.23) 0.00(0.00)
0.5 0.94(1.14) 2.30(10.3) 0.05(0.25)
0.96(1.25) 1.01(1.25) 0.00(0.04)
0.1 0.02(0.13) 36.9(24.8) 6.11(3.51)
0.04(0.19) 0.41(4.35) 0.03(0.23)
0.25 0.48(0.78) 41.7(26.7) 6.40(3.58)
0.66(1.19) 5.02(14.8) 0.27(0.67)
0.5 2.91(2.92) 39.0(24.9) 6.05(3.47)
3.34(4.22) 8.64(15.2) 0.36(0.88)
1000 0.1 0.02(0.14) 96.2(49.5) 10.1(4.36)
0.04(0.20) 2.57(15.2) 0.08(0.38)
0.25 0.57(0.86) 99.5(50.4) 10.0(4.27)
0.58(0.96) 7.65(25.8) 0.24(0.63)
0.5 2.94(3.02) 95.2(48.8) 9.70(4.14)
2.54(2.78) 10.0(26.8) 0.36(0.75)
0.1 0.00(0.04) 49.8(24.4) 11.1(4.78)
0.01(0.09) 0.01(0.09) 0.00(0.00)
0.25 0.12(0.35) 50.1(24.6) 11.3(4.76)
0.14(0.37) 0.15(0.46) 0.00(0.06)
0.5 0.85(0.99) 49.5(23.6) 10.9(4.69)
0.73(0.95) 1.36(5.59) 0.05(0.28)
1000 0.1 0.00(0.00) 111(45.2) 14.5(3.97)
0.01(0.09) 0.01(0.09) 0.00(0.00)
0.25 0.13(0.36) 106(46.1) 13.8(4.11)
0.16(0.39) 0.23(1.02) 0.01(0.14)
0.5 0.85(1.05) 111(46.2) 14.2(4.06)
0.53(0.69) 0.89(4.28) 0.02(0.20)
0.5 1.66(1.61) 2.22(5.56) 0.04(0.22)
2.28(2.31) 4.45(8.54) 0.13(0.37)
1000 0.5 1.69(1.50) 1.71(1.52) 0.01(0.11)
2.19(2.11) 3.93(10.6) 0.06(0.27)
500
0.5 5.77(6.57) 24.1(20.0) 1.58(1.56)
5.18(6.18) 14.1(16.5) 0.75(1.01)
1000 0.5 5.59(6.26) 62.4(41.3) 2.72(2.21)
4.50(4.44) 17.0(28.4) 0.47(0.87)
500
0.5 5.03(6.19) 43.1(16.0) 4.71(2.66)
1.67(2.39) 7.27(12.6) 0.43(0.80)
1000 0.5 5.00(6.29) 91.1(31.1) 6.22(3.23)
1.26(1.50) 9.45(22.7) 0.28(0.70)
500
500
N (0, 1) 500
t(3)
σ
NMCD
14