Carlstein, E.; (1986).Nonparametric Change-Point Estimation."

NONPARAMETRIC CHANGE-POINT ESTIMATION
(Revised)
by
1
E. Carlstein
University of North Carolina
Chapel Hill, NC 27514
Mimeo Series #1595
February 1986
.e
DEPARTMENT OF STATISTICS
Chapel Hill, North Carolina
NONPARAMETRIC CHANGE-POINT ESTIMATION
..
(Revised)
by E. Carlstein
University of North Carolina at Chapel Hill
SUMMARY
Consider a sequence of independent random variables {X.:
I < i < n} having
1.
cdf F for i < en
change-point e
~~
•
E
and cdf G otherwise.
(0.1) is proposed.
A strongly consistent esti!.Jlator of the
The estimator requires no knowledge of the
functional forms or parametric families of F and G.
Furthermore. F and G need
not differ in their means (or other measure of location).
is that F and G differ on a set of positive probability.
The only requirement
The proof of consistency
provides rates of convergence and bounds on the error probability for the estimator.
The estimator is applied to two well-known data sets; in both cases it yields
results in close agreement with previous parametric analyses.
A simulation study
is conducted. showing that the estimator performs well even when F and G share
the same mean. variance. and skewness.
Key words:
Change-point; Estimation; Nonparametric.
AMS 1980 Subject Classification:
Running Head:
Primary 62G05; Secondary 60F15.
Nonparametric Changepoint Estimator
1.
n
n
Let Xl' •.. , X
n
be independent random variables with:
n
n
Xl' "', X[8n]
•
INTRODUCTION
n
identically distributed with cdf F;
n
X[8n]+1' •.. , Xn
identically distributed with cdf G;
where [y] denotes the greatest integer not exceeding y.
is the change-point to be estimated.
The parameter 8
E
(0,1)
The body of literature addressing this
problem is extensive, but most of the work is based upon at least one of the
following assumptions:
(i)
F and G are known to belong to parametric families (e.g. normal, binomial),
or are otherwise known in functional form;
(ii) F and G differ, in particular, in their levels (e.g. mean or median) .
Hinkley (1970) and Hinkley
•
&Hinkley
(1970) use maximum likelihood to
estimate 8 in the situation where F and G are from the same parametric family.
Hinkley (1972) generalizes this method to the case where F and G may be arbitrary
known distributions, or alternatively where a sensible discriminant function (for
discriminating between F and G) is known.
Smith's (1975) Bayesian approach and
Cobb's (1978) conditional solution also require assumptions of type (i).
These
authors generally suggest that any unknown parameters in F and G can be estimated
from the sample, but nevertheless F and G must be specified as functions of those
•
parameters.
At the other extreme, Darkhovshk (1976) presents a nonparametric estimator
•
based on the Mann-Whitney statistic.
Although his estimator makes no explicit use
of the functional forms of F and G, his asymptotic results require
J~cx,G (x) dF (x) F ! .
This excludes cases where F and G are both symmetric and have a common median.
Bhattacharyya
&Johnson
(1968) give a nonparametric test for the presence of a
-2-
change-point, but again under the type (ii) assumption that the variables after
the change are 'stochastically larger than those before.
(See Shaban (1980) for
an annotated bibliography of change-point literature.)
In contrast to assumptions of types (i) and (ii). the estimator studied
·
here does not require any knowledge of F and G; and virtually any salient
difference between F and G will ensure detection of the change-point (asymptotically).
(I)
•
Specifically. we assume:
The set A
f
=
dF(x) > 0
{x
E
or
A
R: IF(x) - G(x) I>
f
o}
satisfies either'
dG (x) > 0 .
A
Note that F and G may be discrete. continuous. or mixed.
The theoretical results
for Darkhovshk's (1976) nonparametric estimator assumed F and G to be continuous.
Unlike Cobb (1978). we do not require the supports of F andG to be identical; in
fact the supports may be entirely unknown.
The intuition behind the proposed estimator is as follows.
(but not necessarily correct) change-point t
E
For a hypothetical
•
(0.1). consider the pre-t empirical
cdf thn(x). which is constructed as if X~ • ..•• X[tn] were identically distributed,
arid the post-t empirical cdf h~(x). which is constructed as if X[tn]+l' ...• X~
were identically distributed.
th(x)
= I{t.::. 8}
The former estimates the unknown mixture distribution:
F(x) + l{t > 8} (8 F(x) + (t-8) G(x) )/t.
and the latter similarly estimates:
ht(x)
= I{t.::.8}
((8-t) F(x) + (1-8) G(x))/(1-t) + I{t > 8} G(x)
The total difference between these two distributions is measured by:
n
J (t) =
n
=
I
j=l
(I{t.::.8} (1-8)/(1-t) + I{t>8} 8/t) I n (8).
•
•
.
-3-
Note that J (t) attains its maximum (over t
E
n
(0,1)) at t = 8.
Thus a reasonable
estimator for 8 is the value of t that maximizes:
n
H (t)
n
•
=I. 1
n n I/n.
- ht(X.)
(X.)
It .hnn
J
J=
J
In Section 2 the estimator is formally defined and its asymptotic properties
•
are presented.
The results include strong consistency, with rates of convergence
and bounds on the error probability.
4.
Proof of these results i£ deferred to Section
Section 3 investigates the finite-sample behavior of the estimator:
Firstthe
estimator is calculated on Cobb's (1978) Nile data and on the Lindisfarne Scribes
data (see Smith, 1980).
In both examples this nonparametric analysis produces
results which are nearly identical to the results of the earlier parametric analyses.
Then the estimator is tested (via simulation) in a situation where no other changepoint estimator can be used:
F and G are both unknown and are of different
parametric families, but they are both symmetric and share the same mean and
•
variance.
Here again the performance of the estimator is quite reasonable •
2~
THE ESTIMATOR
In order to formalize the ideas of Section 1, the following additional
definitions and assumptions are introduced.
For t
E
(0,1) the empirical cdf's are:
n
[tn]
n
t h (x) = i~l I{Xi~x}/[tn],
h~(x)
•
n
I
=
i=[tn]+l
n
I{X. < xl/en - [tn]) .
~-
Since the accuracy of the approximation of J (t) by H (t) depends in turn
n
n
n
upon the accuracy of th (0) and h~( 0) in approximating the 0) and h (0), we must
t
restrict our attention to values of t that will yield reasonably large sample
sizes ([tn] and n - [tn]) for our empirical cdf's.
Moreover, for t such that [tn]
-4-
(say) is small, the corresponding thn(o) consists of just a few big steps regardless of how smooth F and G are.
So, even if n is large and hn ( oj is devoid
t
of large jumps,
the difference Hn (t) will tend to be exaggerated.
.
That is,
the change-point estimator will be drawn (erroneously) towards such extremal
values of t.
This again suggests that we restrict our attention to values of t
bounded away from 0 and 1.
Specifically, we only consider t E[a , I-a J, where
n
{a : n> I} is such that na -+ 00 and a + 0 as n -+ 00 .
n
n
n
n
•
The change-point estimator is
then defined as:
for which H (8 )
n
n
=
sup
a <t<l,..a
n--
In practice, a
n
Hn (t) . .
n
prohibits estimation of a change that is too close to either end
of the data string.
The constraint n a -+00 requires the minimum acceptable sample
n
size to grow withn, thus ensuring increasing precision (in estimating J n (t) by
H(t)) over the whole range of acceptable t-values.
n
a
n
+ 0,
any 8 E(0,1) will eventually be fair game.
On the other hand, since
..
(Darkhovshk (1976) assumes
8 E[a, I-a], where a E(O,!) is known.)
Notice that Hn(t) depends upon t only through [tnJ, so for fixed n there
are at most n-l distinct values of H (0) to be compared (as t ranges through (0,1)).
n
Therefore an estimator 8
n
Also note that
en
attaining the supremum does exist and is easily calculated.
is invariant under strict monotone transformations of the data,
since H (t) depends on the data only via terms of the form I{X~ < X~}
n
1-
J
The fundamental theoretical result for 8 n is
THEOREM:
Let {a: n> l} be s.t. 1> a > 0, a fa -+ 0, and na a -+00 as n-+ oo
n
-n
nn
nn
(1) holds.
where c > 0
1
Then, for any
and
C
z>0
E:
> 0,
are constants.
•
Assume
-5-
Choosing a
n
==
1 we obtain the bound
Strong consistency of
.
Lemma.
en
follows from the Theorem by applying the Borel-Cantelli
Formally we have
00
2
2
Let {a: n>l} be s.t. 1>a >0, a/a +0, and In exp{-Tna a } <oo"\h>O.
n
- n
n n
n=l
n n
Assume (I) holds. Then
COROLLARY:
Ien - eI/ an + 0
almost surely as n + 00
•
If we choose a = n -F,; with 0 < F,; < l, then the conditions of the Corollary are satisfied
n
by a = n -¢ ~ithO < ¢ < min{t,;,
n
-
! (1-F,;)}
3.
The Nile Data.
•
APPLICATIONS
Cobb (1978) reports the annual volume of discharge from the Nile
River for each year from 1871 to 1970.
•
His analyses assume that the observations
are independent normal variables with common variance for the whole series.
results he obtains clearly indicate
1898 as the most likely change-point; he cites
independent meteorological evidence that this change is real.
data X.
along with the criterion function Hn (t) at t = i/n.
1 .
of a , our nonparametric estimate is
n
The
en = .28
(i.e. 1898).
Figure 1 shows the
For any reasonable choice
The function H (t)
n
does exhibit the sort of erratic behavior near t= 0 and 1 as was discussed in
Section 2; note that Cobb (1978) also makes the qualification that the change-point
be restricted away from the ends of the data.
The Lindisfarne Scribes Data.
•
divides into 13 sections.
The Lindisfarne text (as studied
by Smith (1980))
It is assumed that only one scribe was involved in the
writing of anyone section, and that sections written by anyone scribe are consecutive.
It is also assumed that a scribe may be characterized by his propensity
to use one of two possible grammatical variants:
either the "s" or "0" ending in
FIGURE 1:
The Nile Data
i = Year (1871 to 1970).
Annua 1 vo 1ume
X ,=
l.
l.
' h arge (10 10 m3) .
f d l.SC
. ....... .
.
., .. . '1 . -. ". ... ... • .' . .-.. ., ., .....
•
.
'
"". .. '.. .. .. ..-. " ..
. ...
I
"
. .., '
X.
0
'
'0
_~~t-_+ __2i-0_-+-+-__If.O+"":"_+-_-f60_ _+_-:;IO~_+_~!OOi
..
.
. ..
.. ..,
...... ..
.
, '
" "
-.... ..'
'
....-.' .
.. ,
--
,
•10
,15
'.
........ .' ...
•
"
.10
H (i/n)
n
.. a·····.
.05
TABLE 1:
i
r·
.'
..
,
...
.
_.-..
The Lindisfarne Scribes Data
= Section of the text.
X.l. = Proportion of "s" endings.
•
1
2
3
4
5
6
7
8
9
X.
.571
.722
.705
,800
.538
.756
.813
.807
.854
Hn (i/n)..
.42
.37
.41
.37
.47
.49
.42
.41
.31
l.
l.
TABLE 2:
12
13
,864 .• 850
,8.10
.800
.22
.27
---
10
11
.19
Simulation Study
a
n
E{a }
n
E{ Ian - an
..
.4
100
.4
200
.423*
.402 t
,101*
•08S t
•
*Estimated empirically by 250 realizations of en; the standard deviation of E{a } is
n
approximately .009 •
tEstimated empirically by 200 realizations of e ; the standard deviation of E{e } is
n
n
approximately ,009 •
Note:
For calculating a
n
we used
ex,
n
= n -l;; with
l;;
= ,3 •
-6-
the present indicative 3 rd person singular.
Let mi denote the total number of
relevant words in the i th section, and let Y. denote the number of times that
~
Smith (1980) assumes that the Y.1 are
the "s'" ending was used in those words.
•
,
independent binomial variables with common parameter p between change-points,
and he uses independent beta prior distributions on the pIS.
His analysis (which
entertains the possibility of multiple change-points) arrives at a model with
changes of scribe after section 4 and again after section 5.
We take the view that the r.v.sX.
~
=
Y./m. (for all i on one side of [en]) share
~
1
approximately the same distribution. (The distributions cannot be precisely identical
since m. -and thus the supports- vary with L The approximation should not be of much
1
consequence since in all cases m. is fairly large (i.e. at least 20).)
1
shows the data and the Hn(t) function, which is maximized at
e
,
"
parametric analysis has the following interpretation:
en = 6/13.
Table 1
Our non-
It is clear by inspection
that the earlier X. IS are smaller in magnitude but larger in their variability
1
than the later X.1 IS.
our
en
In estimating the precise change-point for this transition,
chooses to group sections 5 and 6 with the pre-change portion of the
series.
Given the constraint of our analysis that there be exactly one change-
point, this choice is quite reasonable:
magnitude and large variability.
X5 and X6 are of relatively small
Smith's (1980) approach refines the analysis
by explaining (with more change-points and more assumptions) the disorder in the
,
•
series at
section~4
Simulation Study.
through 6.
When the functional forms of F and G are unknown to the stat-
istician, but both distributions are symmetric with the same mean, then no other
change-point estimator is appropriate.
Such is the case in this example:
F
is the distribution with density f(x)
G
is the N(0,1) distribution.
= .697128
2
x I{ Ixl <: 1.29l};
Actually, F and G also share the same variance in this situation, making it even
-7-
more difficult for the user to choose. an estimator that discriminates between
them.
Nonetheless, en performs well for moderately large n, as illustrated
by the simulation results in Table 2.
4.
.'
PROOF OF THEOREM
Let Yn , ... , Ynn be iid with cdf Q, and define:
1
LEMMA 1:
= {(r,s,~,m):
B
n
[rn]
n
I{ Y. < x} / ( [ ~n ] - [mn]) ,
Qn (x;r~s,~,m)
=
dn(x;r,s,~,m)
= IQn(x;r,s,~,m)
=
lin
I
i=[sn]+l
sup
( r , s , ~, m)
E
0 < m < s < r < ~ < I} ,
and
an <
r-s
-
1 -
- (r-s)Q(x)/(~-m)
sup d (x; r
xER n
Bn
I,
, s , ~ , m)
Then
where
PROOF:
d
n
and
K >0
1
(x;r,s,~,m)
are constants.
K >0
2
< d (x;r,s,r,s) ([rn] - [sn])/([Q,n] - [mn])
-
+
n
Q(x) I ([rn] - [sn])/([~n] - [mn]) - (r-s)/(~-m)
I
< d (x;r,s,r,s) + 4/na .
-
n
n
D (r, s) > ~ a d + P{4/ na a > ~ d ,
So, P{li /a > d < p{ sup
n
n
n n
n n
- (r, s) E C
n
and 0< S < r < l} and D (r,s) = sup d (x;r,s,r,s).
where C = {(r,s): a < r-s
n
.
n-
xl:' R
n .
n
,
Now D (r,s) depends on rand sonly through [rn] and [sn], each of which takes on
n
at most n distinct values as (r,s) ranges through C.
n
P{li /a > E:} < n
n
n
2
-
sup
(r,s) E C
P{D (r, s)
n
Hence, for n -> Nc: ,
>! a n d
n
K exp{-!([rn] - [sn])c:
sup
(r,s)EC
where K> 0 is constant.
n
2
a~}
•
-8-
The
&
last inequality is an application of Dvoretzky, Kiefer,
Wolfowitz (1956); see their Lemma 2 and the discussion after their Theorem 3.
Since Ern] - [sn] > 1 a n 'VI (r, s) EC , Lemma 1 is established.
n
n
0
In order to apply Lemma 1, we must restrict the range of t away from
so that there is a sufficient amount of data between [tn] and [8n].
•
e,
This
constraint will be eliminated later in the argument.
Let A = [a ,8-01. ] U [8+01. ,I-a ].
n
n
n
n
n
222
p{ sup IJ (t) - H(t)l/a > d < K3 n exp{-K 4 E na a} Vn > N' ,
n n
- E
tEA
n
n
n
-
LEMMA 2:
n
where K > 0
3
PROOF:
and
For tEA
n
K > 0 are constants.
4
we have:
n
\ ( 1t h n (X n) - th(X n) I + Iht(X
n n) - ht(Xj))/n;
n I
Hn(t) - In(t) '::j~l
the same bound
j
j
j
applies to J (t) - H (t).
n
n
Now Ithn(X~) - th(X~) 1< I{t <e}
J
J -
,
I{t>8}
(IQn (X~;8,O,t,O)
J
IQnJ
(X~;t,O,t,O)
- eF(x~)/tl
J
+
- F(X~)
J
IQn (X~;t,8,t,O)
J
I
+
- (t-e) G(X~)/tl),
J
where the Y~'s in the definition of Q (0) (see Lemma 1 above) have been replaced·
n
~
with X~'s that are identically distributed with either distribution F or distribution
~
G.
Each of the three terms on the r.h.s. of the last inequality can be bounded
above by a term of the form
applies to
,
..
(X.)
Ihnn
t J
P{sup
tEA
by Lemma 1.
n
~
n
, which does not depend on j or t.
A similar argument
J,
- h (X.n) I ,so that
t
IH (t) - J(t) I/a > d< 2
n
n
n
0
Still restricting attention to the set An' define en as:
en E An for which Hn (en) = tEA
sup
H (t),
n
and
en = en if
n
Similarly, define . t n =I{8 -< D(e-an ) + l{e > !H8+au)'
so that:
-n
e EA.
n
n
-9-
tEA
n
n
and
J (t ) = sup
n n
tEA
J (t).
n
n
LEMMA 3:
K >0
S
p{IJ (8) -J (8)\/a >d < K n
n n
n
n
- s
and
K >0
where
are constants.
6
IJ n (en ) - J n (8) I -< IJ n (8n )
PROOF:
2
2
exp{-K E na a }vn>N"
6
n n
- E'
2
- H (8 )
n n
I + IHn (e n ) - J n(tn ) I + IJ n (t n )
Note that either H (8) >J (t) >J (8) or
- n n - n n - n n.
- J (8)
n
I
J (t ) >H(8) >H (t ); in either
n n-n n-n n
case the se'cond term on the r.h. s. of the above inequality is bounded by
sup
tEA
IJn(t) JHn(t) \.
Of course the same bound applies to the first
ter~.
n
The third term is J (8) (H8 <
n
-
HI (l-8+an )
+ H8 >
HI (8+an ))a.
n
Since
I (8) ~ sup p ex) + sup G(x) = 2, and since the second factor in the above
n
XER
XER
product is also bounded by 2, we obtain
P{IJ (8) -J (8)l/a >e::} < P{2
n n
n
n
-
I
Denote Z~ = P (xj) - G(x~)
J
J
II = 8ll p + (1-8)llG' and c =
all.
I.
IIp =
II a n >! e::} +
P{\en-81/an>d
~
JIp (x) - G(x) IdF (x),
n
I a n >!
e::} •
JI
K n
7
2
-n [8f] n
Zl= L Z./[8n]
j =1
J
2
exp{-K 8 E nan
I
lJ =
F (x) - G(x) dG (x) ,
G
R
and
a~} Vn~:N'~',
Note that
n
n
2
j = [ 8n] + 1
Z =
I
where K7 >O
n
Z./(n-[8n]).
J
and
are constants.
For fY uCO,ll' 18-tl>0=> IJ n C81-Jn Ct ll
,
= (I{8-t ~ O}(8-t)/(l-t) + I{t-8 > O}(t-8)/t) I (8) >
n
Thus
P{ 4a
n
By assumption (1) we have ll> O.
2
PROOF:
- H ( t)
n
R
-;::;i1
-n
In(S) =Zl[Sn]/n+Z (n-[8n])/n, where
K >0
8
IJ n( ·t)
0
Now apply Lemma 2.
LEMMA 4:
sup
tEA
<5
J (8) •
n
P{\8 -81/a >d < p{IJ (8) -J (6 )l/a >E J (8J)
n
n
n
n n
n
n
< p{ IJ (8) - J (8 ) I/a > Ed + p{J (8) < c} •
-
n
n
n
n
n
Since Lemma 3 applies to the first probability on the r.h.s. of this last
inequality, it suffices to consider
-10-
The two probabilities in this upper bound are both handled in the same way; we
only deal explicitly with the first one.
p{~1 [en]/n-el > c/4}
,i
+
It is dominated by
p{el"z~ -llF' > c/4} .
Of these two probabilities, the first is zero for n sufficiently large (since
Since {Z~: 1 < j < [en]} are Ed and bounded, we can use
zn _< 1 always holds).
1
J
--
equation (2.3) of Hoeffding (1963) to dominate the second probability by
2
2 exp{ -2 [en] (c/4e) 2} .
Since a n a·n + 0, this bound can be absorbed into the
0
earlier bound from Lemma 3.
Finally, for the unrestricted estimator we have
p{le-el/a
n
n >d -<
Since 0 t A ~
n
n
Ie
n
n sufficiently large.
-01
P{len -el/a. n >d
+
P{le n -el >E: a n ne n riA}.
n
< a , and a /a +0, the last probability is zero for
n
n
n
Applying Lemma 4 concludes the proof.
5.
ACKNOWLEDGMENT
The referees' comments led to numerous improvements in this paper.
,
REFERENCES
BHATTACHARYYA, G.K.
&JOHNSON,
an Unknown Time Point.
COBB, G.W. (1978).
R.A. (1968).
Non-parametric Tests for Shift at
Ann. Math. Stat., 39, 1731-43.
The Problem of the Nile:,
Point Problem.
Conditional Solution to a Change'"
Biometrika, 65, 243-51.
DARKHOVSHK, B.S. (1976).
A Non-Parametric Method for the a posteriori Detection
of the "Disorder" Time of a Sequence of Independent Random Variables.
Theory of Prob. and Applic., 21, 178-83.
DVORETZKY, A., KIEFER, J.,
of the Sample
Estimator.
Distrib~tion
Asymptotic Minimax Character
Function and of the Classical Multinomial
Inference about the Change-Point in a Sequence of Random
Biometrika, 57, 1-16.
HINKLEY, D.V. (1972).
HINKLEY, D.V.
J. (1956).
Ann. Math. Stat., 27, 642-69.
HINKLEY, D.V. (1970).
Variables.
&WOLFOWITZ,
Time-Ordered Classification.
&HINKLEY,
E.A. (1970).
Sequence of Binomial Variables.
HOEFFDING, W. (1963).
Biometrika, 59, 509-23.
Inference About the Change-Point in a
Biometrika, 57, 477-88.
Probability Inequalities for Sums of Bounded Random Variables.
J.A.S.A., 58, 13-30.
SHABAN, S.A. (1980).
Bibliography.
SMITH, A.F.M. (1975).
Change-Point Problem and Two-Phase Regression:
Intern. Stat. Rev., 48, 83-93.
A Bayesian Approach to Inference about a Change-Point in
a sequence of Random Variables.
SMITH, A.F.M. (1980).
An Annotated
Biometrika, 62, 407-16.
Change-Point problems:
Trabajos.Estad~stica,
31, 83-98.
approaches and applications.
,