NONPARAMETRIC CHANGE-POINT ESTIMATION (Revised) by 1 E. Carlstein University of North Carolina Chapel Hill, NC 27514 Mimeo Series #1595 February 1986 .e DEPARTMENT OF STATISTICS Chapel Hill, North Carolina NONPARAMETRIC CHANGE-POINT ESTIMATION .. (Revised) by E. Carlstein University of North Carolina at Chapel Hill SUMMARY Consider a sequence of independent random variables {X.: I < i < n} having 1. cdf F for i < en change-point e ~~ • E and cdf G otherwise. (0.1) is proposed. A strongly consistent esti!.Jlator of the The estimator requires no knowledge of the functional forms or parametric families of F and G. Furthermore. F and G need not differ in their means (or other measure of location). is that F and G differ on a set of positive probability. The only requirement The proof of consistency provides rates of convergence and bounds on the error probability for the estimator. The estimator is applied to two well-known data sets; in both cases it yields results in close agreement with previous parametric analyses. A simulation study is conducted. showing that the estimator performs well even when F and G share the same mean. variance. and skewness. Key words: Change-point; Estimation; Nonparametric. AMS 1980 Subject Classification: Running Head: Primary 62G05; Secondary 60F15. Nonparametric Changepoint Estimator 1. n n Let Xl' •.. , X n be independent random variables with: n n Xl' "', X[8n] • INTRODUCTION n identically distributed with cdf F; n X[8n]+1' •.. , Xn identically distributed with cdf G; where [y] denotes the greatest integer not exceeding y. is the change-point to be estimated. The parameter 8 E (0,1) The body of literature addressing this problem is extensive, but most of the work is based upon at least one of the following assumptions: (i) F and G are known to belong to parametric families (e.g. normal, binomial), or are otherwise known in functional form; (ii) F and G differ, in particular, in their levels (e.g. mean or median) . Hinkley (1970) and Hinkley • &Hinkley (1970) use maximum likelihood to estimate 8 in the situation where F and G are from the same parametric family. Hinkley (1972) generalizes this method to the case where F and G may be arbitrary known distributions, or alternatively where a sensible discriminant function (for discriminating between F and G) is known. Smith's (1975) Bayesian approach and Cobb's (1978) conditional solution also require assumptions of type (i). These authors generally suggest that any unknown parameters in F and G can be estimated from the sample, but nevertheless F and G must be specified as functions of those • parameters. At the other extreme, Darkhovshk (1976) presents a nonparametric estimator • based on the Mann-Whitney statistic. Although his estimator makes no explicit use of the functional forms of F and G, his asymptotic results require J~cx,G (x) dF (x) F ! . This excludes cases where F and G are both symmetric and have a common median. Bhattacharyya &Johnson (1968) give a nonparametric test for the presence of a -2- change-point, but again under the type (ii) assumption that the variables after the change are 'stochastically larger than those before. (See Shaban (1980) for an annotated bibliography of change-point literature.) In contrast to assumptions of types (i) and (ii). the estimator studied · here does not require any knowledge of F and G; and virtually any salient difference between F and G will ensure detection of the change-point (asymptotically). (I) • Specifically. we assume: The set A f = dF(x) > 0 {x E or A R: IF(x) - G(x) I> f o} satisfies either' dG (x) > 0 . A Note that F and G may be discrete. continuous. or mixed. The theoretical results for Darkhovshk's (1976) nonparametric estimator assumed F and G to be continuous. Unlike Cobb (1978). we do not require the supports of F andG to be identical; in fact the supports may be entirely unknown. The intuition behind the proposed estimator is as follows. (but not necessarily correct) change-point t E For a hypothetical • (0.1). consider the pre-t empirical cdf thn(x). which is constructed as if X~ • ..•• X[tn] were identically distributed, arid the post-t empirical cdf h~(x). which is constructed as if X[tn]+l' ...• X~ were identically distributed. th(x) = I{t.::. 8} The former estimates the unknown mixture distribution: F(x) + l{t > 8} (8 F(x) + (t-8) G(x) )/t. and the latter similarly estimates: ht(x) = I{t.::.8} ((8-t) F(x) + (1-8) G(x))/(1-t) + I{t > 8} G(x) The total difference between these two distributions is measured by: n J (t) = n = I j=l (I{t.::.8} (1-8)/(1-t) + I{t>8} 8/t) I n (8). • • . -3- Note that J (t) attains its maximum (over t E n (0,1)) at t = 8. Thus a reasonable estimator for 8 is the value of t that maximizes: n H (t) n • =I. 1 n n I/n. - ht(X.) (X.) It .hnn J J= J In Section 2 the estimator is formally defined and its asymptotic properties • are presented. The results include strong consistency, with rates of convergence and bounds on the error probability. 4. Proof of these results i£ deferred to Section Section 3 investigates the finite-sample behavior of the estimator: Firstthe estimator is calculated on Cobb's (1978) Nile data and on the Lindisfarne Scribes data (see Smith, 1980). In both examples this nonparametric analysis produces results which are nearly identical to the results of the earlier parametric analyses. Then the estimator is tested (via simulation) in a situation where no other changepoint estimator can be used: F and G are both unknown and are of different parametric families, but they are both symmetric and share the same mean and • variance. Here again the performance of the estimator is quite reasonable • 2~ THE ESTIMATOR In order to formalize the ideas of Section 1, the following additional definitions and assumptions are introduced. For t E (0,1) the empirical cdf's are: n [tn] n t h (x) = i~l I{Xi~x}/[tn], h~(x) • n I = i=[tn]+l n I{X. < xl/en - [tn]) . ~- Since the accuracy of the approximation of J (t) by H (t) depends in turn n n n upon the accuracy of th (0) and h~( 0) in approximating the 0) and h (0), we must t restrict our attention to values of t that will yield reasonably large sample sizes ([tn] and n - [tn]) for our empirical cdf's. Moreover, for t such that [tn] -4- (say) is small, the corresponding thn(o) consists of just a few big steps regardless of how smooth F and G are. So, even if n is large and hn ( oj is devoid t of large jumps, the difference Hn (t) will tend to be exaggerated. . That is, the change-point estimator will be drawn (erroneously) towards such extremal values of t. This again suggests that we restrict our attention to values of t bounded away from 0 and 1. Specifically, we only consider t E[a , I-a J, where n {a : n> I} is such that na -+ 00 and a + 0 as n -+ 00 . n n n n • The change-point estimator is then defined as: for which H (8 ) n n = sup a <t<l,..a n-- In practice, a n Hn (t) . . n prohibits estimation of a change that is too close to either end of the data string. The constraint n a -+00 requires the minimum acceptable sample n size to grow withn, thus ensuring increasing precision (in estimating J n (t) by H(t)) over the whole range of acceptable t-values. n a n + 0, any 8 E(0,1) will eventually be fair game. On the other hand, since .. (Darkhovshk (1976) assumes 8 E[a, I-a], where a E(O,!) is known.) Notice that Hn(t) depends upon t only through [tnJ, so for fixed n there are at most n-l distinct values of H (0) to be compared (as t ranges through (0,1)). n Therefore an estimator 8 n Also note that en attaining the supremum does exist and is easily calculated. is invariant under strict monotone transformations of the data, since H (t) depends on the data only via terms of the form I{X~ < X~} n 1- J The fundamental theoretical result for 8 n is THEOREM: Let {a: n> l} be s.t. 1> a > 0, a fa -+ 0, and na a -+00 as n-+ oo n -n nn nn (1) holds. where c > 0 1 Then, for any and C z>0 E: > 0, are constants. • Assume -5- Choosing a n == 1 we obtain the bound Strong consistency of . Lemma. en follows from the Theorem by applying the Borel-Cantelli Formally we have 00 2 2 Let {a: n>l} be s.t. 1>a >0, a/a +0, and In exp{-Tna a } <oo"\h>O. n - n n n n=l n n Assume (I) holds. Then COROLLARY: Ien - eI/ an + 0 almost surely as n + 00 • If we choose a = n -F,; with 0 < F,; < l, then the conditions of the Corollary are satisfied n by a = n -¢ ~ithO < ¢ < min{t,;, n - ! (1-F,;)} 3. The Nile Data. • APPLICATIONS Cobb (1978) reports the annual volume of discharge from the Nile River for each year from 1871 to 1970. • His analyses assume that the observations are independent normal variables with common variance for the whole series. results he obtains clearly indicate 1898 as the most likely change-point; he cites independent meteorological evidence that this change is real. data X. along with the criterion function Hn (t) at t = i/n. 1 . of a , our nonparametric estimate is n The en = .28 (i.e. 1898). Figure 1 shows the For any reasonable choice The function H (t) n does exhibit the sort of erratic behavior near t= 0 and 1 as was discussed in Section 2; note that Cobb (1978) also makes the qualification that the change-point be restricted away from the ends of the data. The Lindisfarne Scribes Data. • divides into 13 sections. The Lindisfarne text (as studied by Smith (1980)) It is assumed that only one scribe was involved in the writing of anyone section, and that sections written by anyone scribe are consecutive. It is also assumed that a scribe may be characterized by his propensity to use one of two possible grammatical variants: either the "s" or "0" ending in FIGURE 1: The Nile Data i = Year (1871 to 1970). Annua 1 vo 1ume X ,= l. l. ' h arge (10 10 m3) . f d l.SC . ....... . . ., .. . '1 . -. ". ... ... • .' . .-.. ., ., ..... • . ' "". .. '.. .. .. ..-. " .. . ... I " . .., ' X. 0 ' '0 _~~t-_+ __2i-0_-+-+-__If.O+"":"_+-_-f60_ _+_-:;IO~_+_~!OOi .. . . .. .. .., ...... .. . , ' " " -.... ..' ' ....-.' . .. , -- , •10 ,15 '. ........ .' ... • " .10 H (i/n) n .. a·····. .05 TABLE 1: i r· .' .. , ... . _.-.. The Lindisfarne Scribes Data = Section of the text. X.l. = Proportion of "s" endings. • 1 2 3 4 5 6 7 8 9 X. .571 .722 .705 ,800 .538 .756 .813 .807 .854 Hn (i/n).. .42 .37 .41 .37 .47 .49 .42 .41 .31 l. l. TABLE 2: 12 13 ,864 .• 850 ,8.10 .800 .22 .27 --- 10 11 .19 Simulation Study a n E{a } n E{ Ian - an .. .4 100 .4 200 .423* .402 t ,101* •08S t • *Estimated empirically by 250 realizations of en; the standard deviation of E{a } is n approximately .009 • tEstimated empirically by 200 realizations of e ; the standard deviation of E{e } is n n approximately ,009 • Note: For calculating a n we used ex, n = n -l;; with l;; = ,3 • -6- the present indicative 3 rd person singular. Let mi denote the total number of relevant words in the i th section, and let Y. denote the number of times that ~ Smith (1980) assumes that the Y.1 are the "s'" ending was used in those words. • , independent binomial variables with common parameter p between change-points, and he uses independent beta prior distributions on the pIS. His analysis (which entertains the possibility of multiple change-points) arrives at a model with changes of scribe after section 4 and again after section 5. We take the view that the r.v.sX. ~ = Y./m. (for all i on one side of [en]) share ~ 1 approximately the same distribution. (The distributions cannot be precisely identical since m. -and thus the supports- vary with L The approximation should not be of much 1 consequence since in all cases m. is fairly large (i.e. at least 20).) 1 shows the data and the Hn(t) function, which is maximized at e , " parametric analysis has the following interpretation: en = 6/13. Table 1 Our non- It is clear by inspection that the earlier X. IS are smaller in magnitude but larger in their variability 1 than the later X.1 IS. our en In estimating the precise change-point for this transition, chooses to group sections 5 and 6 with the pre-change portion of the series. Given the constraint of our analysis that there be exactly one change- point, this choice is quite reasonable: magnitude and large variability. X5 and X6 are of relatively small Smith's (1980) approach refines the analysis by explaining (with more change-points and more assumptions) the disorder in the , • series at section~4 Simulation Study. through 6. When the functional forms of F and G are unknown to the stat- istician, but both distributions are symmetric with the same mean, then no other change-point estimator is appropriate. Such is the case in this example: F is the distribution with density f(x) G is the N(0,1) distribution. = .697128 2 x I{ Ixl <: 1.29l}; Actually, F and G also share the same variance in this situation, making it even -7- more difficult for the user to choose. an estimator that discriminates between them. Nonetheless, en performs well for moderately large n, as illustrated by the simulation results in Table 2. 4. .' PROOF OF THEOREM Let Yn , ... , Ynn be iid with cdf Q, and define: 1 LEMMA 1: = {(r,s,~,m): B n [rn] n I{ Y. < x} / ( [ ~n ] - [mn]) , Qn (x;r~s,~,m) = dn(x;r,s,~,m) = IQn(x;r,s,~,m) = lin I i=[sn]+l sup ( r , s , ~, m) E 0 < m < s < r < ~ < I} , and an < r-s - 1 - - (r-s)Q(x)/(~-m) sup d (x; r xER n Bn I, , s , ~ , m) Then where PROOF: d n and K >0 1 (x;r,s,~,m) are constants. K >0 2 < d (x;r,s,r,s) ([rn] - [sn])/([Q,n] - [mn]) - + n Q(x) I ([rn] - [sn])/([~n] - [mn]) - (r-s)/(~-m) I < d (x;r,s,r,s) + 4/na . - n n D (r, s) > ~ a d + P{4/ na a > ~ d , So, P{li /a > d < p{ sup n n n n n n - (r, s) E C n and 0< S < r < l} and D (r,s) = sup d (x;r,s,r,s). where C = {(r,s): a < r-s n . n- xl:' R n . n , Now D (r,s) depends on rand sonly through [rn] and [sn], each of which takes on n at most n distinct values as (r,s) ranges through C. n P{li /a > E:} < n n n 2 - sup (r,s) E C P{D (r, s) n Hence, for n -> Nc: , >! a n d n K exp{-!([rn] - [sn])c: sup (r,s)EC where K> 0 is constant. n 2 a~} • -8- The & last inequality is an application of Dvoretzky, Kiefer, Wolfowitz (1956); see their Lemma 2 and the discussion after their Theorem 3. Since Ern] - [sn] > 1 a n 'VI (r, s) EC , Lemma 1 is established. n n 0 In order to apply Lemma 1, we must restrict the range of t away from so that there is a sufficient amount of data between [tn] and [8n]. • e, This constraint will be eliminated later in the argument. Let A = [a ,8-01. ] U [8+01. ,I-a ]. n n n n n 222 p{ sup IJ (t) - H(t)l/a > d < K3 n exp{-K 4 E na a} Vn > N' , n n - E tEA n n n - LEMMA 2: n where K > 0 3 PROOF: and For tEA n K > 0 are constants. 4 we have: n \ ( 1t h n (X n) - th(X n) I + Iht(X n n) - ht(Xj))/n; n I Hn(t) - In(t) '::j~l the same bound j j j applies to J (t) - H (t). n n Now Ithn(X~) - th(X~) 1< I{t <e} J J - , I{t>8} (IQn (X~;8,O,t,O) J IQnJ (X~;t,O,t,O) - eF(x~)/tl J + - F(X~) J IQn (X~;t,8,t,O) J I + - (t-e) G(X~)/tl), J where the Y~'s in the definition of Q (0) (see Lemma 1 above) have been replaced· n ~ with X~'s that are identically distributed with either distribution F or distribution ~ G. Each of the three terms on the r.h.s. of the last inequality can be bounded above by a term of the form applies to , .. (X.) Ihnn t J P{sup tEA by Lemma 1. n ~ n , which does not depend on j or t. A similar argument J, - h (X.n) I ,so that t IH (t) - J(t) I/a > d< 2 n n n 0 Still restricting attention to the set An' define en as: en E An for which Hn (en) = tEA sup H (t), n and en = en if n Similarly, define . t n =I{8 -< D(e-an ) + l{e > !H8+au)' so that: -n e EA. n n -9- tEA n n and J (t ) = sup n n tEA J (t). n n LEMMA 3: K >0 S p{IJ (8) -J (8)\/a >d < K n n n n n - s and K >0 where are constants. 6 IJ n (en ) - J n (8) I -< IJ n (8n ) PROOF: 2 2 exp{-K E na a }vn>N" 6 n n - E' 2 - H (8 ) n n I + IHn (e n ) - J n(tn ) I + IJ n (t n ) Note that either H (8) >J (t) >J (8) or - n n - n n - n n. - J (8) n I J (t ) >H(8) >H (t ); in either n n-n n-n n case the se'cond term on the r.h. s. of the above inequality is bounded by sup tEA IJn(t) JHn(t) \. Of course the same bound applies to the first ter~. n The third term is J (8) (H8 < n - HI (l-8+an ) + H8 > HI (8+an ))a. n Since I (8) ~ sup p ex) + sup G(x) = 2, and since the second factor in the above n XER XER product is also bounded by 2, we obtain P{IJ (8) -J (8)l/a >e::} < P{2 n n n n - I Denote Z~ = P (xj) - G(x~) J J II = 8ll p + (1-8)llG' and c = all. I. IIp = II a n >! e::} + P{\en-81/an>d ~ JIp (x) - G(x) IdF (x), n I a n >! e::} • JI K n 7 2 -n [8f] n Zl= L Z./[8n] j =1 J 2 exp{-K 8 E nan I lJ = F (x) - G(x) dG (x) , G R and a~} Vn~:N'~', Note that n n 2 j = [ 8n] + 1 Z = I where K7 >O n Z./(n-[8n]). J and are constants. For fY uCO,ll' 18-tl>0=> IJ n C81-Jn Ct ll , = (I{8-t ~ O}(8-t)/(l-t) + I{t-8 > O}(t-8)/t) I (8) > n Thus P{ 4a n By assumption (1) we have ll> O. 2 PROOF: - H ( t) n R -;::;i1 -n In(S) =Zl[Sn]/n+Z (n-[8n])/n, where K >0 8 IJ n( ·t) 0 Now apply Lemma 2. LEMMA 4: sup tEA <5 J (8) • n P{\8 -81/a >d < p{IJ (8) -J (6 )l/a >E J (8J) n n n n n n n < p{ IJ (8) - J (8 ) I/a > Ed + p{J (8) < c} • - n n n n n Since Lemma 3 applies to the first probability on the r.h.s. of this last inequality, it suffices to consider -10- The two probabilities in this upper bound are both handled in the same way; we only deal explicitly with the first one. p{~1 [en]/n-el > c/4} ,i + It is dominated by p{el"z~ -llF' > c/4} . Of these two probabilities, the first is zero for n sufficiently large (since Since {Z~: 1 < j < [en]} are Ed and bounded, we can use zn _< 1 always holds). 1 J -- equation (2.3) of Hoeffding (1963) to dominate the second probability by 2 2 exp{ -2 [en] (c/4e) 2} . Since a n a·n + 0, this bound can be absorbed into the 0 earlier bound from Lemma 3. Finally, for the unrestricted estimator we have p{le-el/a n n >d -< Since 0 t A ~ n n Ie n n sufficiently large. -01 P{len -el/a. n >d + P{le n -el >E: a n ne n riA}. n < a , and a /a +0, the last probability is zero for n n n Applying Lemma 4 concludes the proof. 5. ACKNOWLEDGMENT The referees' comments led to numerous improvements in this paper. , REFERENCES BHATTACHARYYA, G.K. &JOHNSON, an Unknown Time Point. COBB, G.W. (1978). R.A. (1968). Non-parametric Tests for Shift at Ann. Math. Stat., 39, 1731-43. The Problem of the Nile:, Point Problem. Conditional Solution to a Change'" Biometrika, 65, 243-51. DARKHOVSHK, B.S. (1976). A Non-Parametric Method for the a posteriori Detection of the "Disorder" Time of a Sequence of Independent Random Variables. Theory of Prob. and Applic., 21, 178-83. DVORETZKY, A., KIEFER, J., of the Sample Estimator. Distrib~tion Asymptotic Minimax Character Function and of the Classical Multinomial Inference about the Change-Point in a Sequence of Random Biometrika, 57, 1-16. HINKLEY, D.V. (1972). HINKLEY, D.V. J. (1956). Ann. Math. Stat., 27, 642-69. HINKLEY, D.V. (1970). Variables. &WOLFOWITZ, Time-Ordered Classification. &HINKLEY, E.A. (1970). Sequence of Binomial Variables. HOEFFDING, W. (1963). Biometrika, 59, 509-23. Inference About the Change-Point in a Biometrika, 57, 477-88. Probability Inequalities for Sums of Bounded Random Variables. J.A.S.A., 58, 13-30. SHABAN, S.A. (1980). Bibliography. SMITH, A.F.M. (1975). Change-Point Problem and Two-Phase Regression: Intern. Stat. Rev., 48, 83-93. A Bayesian Approach to Inference about a Change-Point in a sequence of Random Variables. SMITH, A.F.M. (1980). An Annotated Biometrika, 62, 407-16. Change-Point problems: Trabajos.Estad~stica, 31, 83-98. approaches and applications. ,
© Copyright 2025 Paperzz