Nonparametric Monitoring of Data Streams for Changes in Location and Scale Gordon J. R OSS, Dimitris K. TASOULIS, and Niall M. A DAMS Department of Mathematics Imperial College London London, SW7 2AZ, U.K. ([email protected]; [email protected]; [email protected]) The analysis of data streams requires methods which can cope with a very high volume of data points. Under the requirement that algorithms must have constant computational complexity and a fixed amount of memory, we develop a framework for detecting changes in data streams when the distributional form of the stream variables is unknown. We consider the general problem of detecting a change in the location and/or scale parameter of a stream of random variables, and adapt several nonparametric hypothesis tests to create a streaming change detection algorithm. This algorithm uses a test statistic with a null distribution independent of the data. This allows a desired rate of false alarms to be maintained for any stream even when its distribution is unknown. Our method is based on hypothesis tests which involve ranking data points, and we propose a method for calculating these ranks online in a manner which respects the constraints of data stream analysis. KEY WORDS: Change detection; Nonparametric tests; Streaming data. 1. INTRODUCTION In recent years, problems relating to the analysis of data streams have become widespread. A data stream is a collection of time-ordered observations x1 , x2 , . . . generated from the random variables X1 , X2 , . . . . We assume that the observations are univariate, real-valued, and independent. Unlike classical time series, data streams are not assumed to have a fixed size and new observations are regularly received over time. In many applications the rate at which these observations are received can be very high; for example, the Large Hadron Collider at CERN generates millions of observations every second (Stapnes 2007). Therefore it is common to impose restrictions on algorithms designed for use with data streams (Domingos and Hulten 2003). First, the amount of computation time required to process each observation should be constant rather than increasing over time. Second, only a fixed number of observations should be stored in memory at any one time. Ideally methods should be single-pass, with observations processed once, and then discarded. Many traditional approaches to statistics assume the existence of a fixed size collection of observations, where the number of data points in the collection is usually assumed to be small enough that computational issues are not a primary concern. This assumption is violated in the data stream setting, since new observations must be sequentially processed in a computationally efficient manner. This article is concerned with the task of detecting whether a data stream contains a change point, and extends traditional methods for sequential change detection to the streaming context. If no change point exists, the observations are assumed to be identically distributed. If a change point exists at time τ , then the observations are distributed as: 0 F if i < τ , Xi ∼ (1) F 1 if i ≥ τ . In other words, the variables are iid with some distribution F0 before the change point at t = τ , and iid with a different distribution F1 after. The location of the change point is unknown, and the problem is to detect it as soon as possible. When τ = ∞, no change point occurs in the sequence. Note that this change point methodology can also be applied to streams which are not iid between change points, by first modeling the stream distribution in a way which yields iid one-step-ahead forecast residuals, and then performing change detection on these; see the book by Gustafsson (2000) for more details. Much early work on this problem emerged from the quality control community where the goal was to monitor for a change in the number of defective items arising from a production line. Another common application is segmenting data streams into regions of similar behavior, a problem which has applications in many domains such as finance (Fu et al. 2001) and robotics (Ramoni, Sebastiani, and Cohen 2002). A further example is model adequacy determination, where the performance of a statistical model fitted to a data stream is monitored for evidence of degradation. The performance of change detection algorithms is usually measured using two criteria (Basseville and Nikiforov 1993): the expected time between false positive detections (denoted ARL0 , for Average Run Length), and the mean delay until a change is detected (denoted ARL1 ). If τ̂ is an estimator of the true change point which occurs at time τ , then we can define these formally as: ARL0 = E(τ̂ |F = F 0 ), ARL1 = E(τ̂ − τ |F = F 1 ). Note that when τ̂ < τ , a false positive is said to have occurred. Generally, change detection algorithms are designed by 379 © 2011 American Statistical Association and the American Society for Quality TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 DOI 10.1198/TECH.2011.10069 380 GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS deciding on an acceptable ARL0 value, and then attempting to minimize the mean detection delay. This is analogous to classical hypothesis testing where tests are designed to minimize the Type II error, subject to a bound on the Type I error. Most traditional approaches to sequential change detection have assumed that the distributional form of the stream is known before and after the change with only the parameters being unknown. However, these assumptions rarely hold in streaming applications. Typically, there is no prior knowledge of the true stream distribution, or assumptions made about the stream distribution may be incorrect. Several authors have investigated the performance of parametric change detection algorithms (Chan, Hapuarachchi, and Macpherson 1988; Jensen et al. 2006) when the distribution of the stream is incorrectly specified, and found that even small misspecifications can have very large effects on the false alarm rate. There is hence a need for nonparametric (distribution-free) change detection methods which are able to maintain a specified level of performance, such as the false alarm rate, regardless of the true distribution of the stream. In recent years, several distribution-free charts have been proposed which can monitor a location parameter, such as the mean or median (Chakraborti and van de Wiel 2008; Hawkins and Deng 2010); however, the problem of detecting more general changes, such as those involving a scale or shape parameter, has been less widely studied. In this work, we propose a novel technique for monitoring for a change in the location and/or scale parameter of an arbitrary continuously distributed univariate data stream. Unlike many existing approaches, our algorithm satisfies the requirements for data stream processing, with the computation time and memory required to process each data point being constant rather than growing over time. Our method does not assume that anything is known about the distribution of the stream before monitoring begins; it is thus an example of a “self-starting” technique. This is in contrast to several methods in the quality control literature which aim to detect whether the stream deviates from a known baseline distribution. The main advantage of the self-starting approach is that it can be deployed out-of-the-box without the need to estimate parameters of the stream distribution from a reference sample prior to monitoring. Our approach is based on a generalization of the change point model (CPM), introduced for the Gaussian distribution by both Hawkins, Qiu, and Kang (2003) and Hawkins and Zamba (2005) in order to adapt traditional generalized likelihood tests to the streaming problem. This framework was recently extended for the purpose of detecting nonparametric changes to the location parameter by Hawkins and Deng (2010). However, their work does not satisfy the O(1) computational and memory complexity requirements for the processing of data streams. We develop their work in three ways. First, we extend the CPM framework so that changes to the scale parameter can be detected, a problem which has not received sufficient attention in the literature. Second, we propose a method for monitoring changes to both the location and scale simultaneously. Finally, we introduce the idea of stream discretization, which allows the test statistics used in both their method and ours to be computed in a fast manner, thereby TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 facilitating deployment of these techniques on high-frequency streams. The remainder of the article proceeds as follows: Section 2 briefly reviews the existing literature on nonparametric change detection. Section 3.1 introduces the change point model formulation. Our nonparametric approach is discussed in Section 3.2, where we describe the hypothesis tests which we use. The key issue of setting the threshold parameters for our algorithm is dealt with in Section 3.3. The generalization to the streaming problem is given in Section 3.4 where we provide a method for calculating ranks in a way which requires only constant computation and memory costs. An extensive experimental comparison is then carried out in Section 4, along with an example of the method deployed on a real financial data stream. 2. RELATED WORK Most traditional work on change detection assumes that the distribution of the stream before the change point is known. Classic methods for this problem include the CUSUM algorithm, exponentially weighted moving average charts, generalized likelihood tests, and the Bayesian Shiryaev–Roberts approach. An overview of these techniques can be found in the book by Basseville and Nikiforov (1993). Few authors have treated the problem of detecting arbitrary changes in unknown distributions, in a single-pass manner. Most of the existing literature on nonparametric change detection deals with a fixed size sample rather than a data stream where new observations are received over time. Examples of such approaches include the works of Bhattacharya and Frierson (1981), Carlstein (1988), Jones and Woodall (1998). Due to the difficulty of detecting arbitrary distributional changes, most existing work on nonparametric change detection deals only with monitoring for changes in a location parameter, such as the mean or median. A popular approach to this problem is to use only the ranks of observations, and adapt classical rank-based hypothesis tests such as Mann– Whitney (Hackl and Ledolter 1991; Gordon and Pollak 1994; Chakraborti and van de Wiel 2008). However, the computation of ranks requires the storage of all previous data to allow new points to be ranked. This makes them difficult to apply to the streaming problem. Directly relevant to our work is that of Pettitt (1979), which used a statistic based on a maximization over several Mann– Whitney statistics to detect a change in location parameter. However, this approach was developed only for a fixed size dataset, and is hence not single-pass. The analysis of the test statistic also relies on an asymptotic argument which makes it unsuitable for deployment in streams where changes occur relatively frequently and small sample behavior is important. A sequential extension of the work of Pettitt (1979) was recently proposed by both Zhou et al. (2009) and Hawkins and Deng (2010), who used Monte Carlo simulation instead of an asymptotic argument in order to calculate exact quantiles of the test statistic. We propose to extend this work so that it can detect location and/or scale changes rather than scale alone, and also adapt it to the streaming context. NONPARAMETRIC MONITORING OF DATA STREAMS 3. FRAMEWORK 3.1 The Change Point Model The sequential change detection problem has the following form: a (potentially infinite) sequence of (assumed) independent random variables X1 , X2 , . . . is observed. We write xi for the particular realization of Xi observed at time t = i. The distribution of Xi is given by Equation (1), conditional on the change point τ which we seek to detect. Suppose t points from the sequence have been observed and we wish to test whether a change point has occurred at some point in the past. For any fixed k < t the hypothesis that a change point occurs at the kth observation can be written as F0 if i < k, H0 : ∀i Xi ∼ F0 , H 1 : Xi ∼ F1 if i ≥ k. A two-sample hypothesis test can then be used to test for a change point at k. Let Nk,t be an appropriately chosen test statistic. For example, if the change is assumed to take the form of a shift in location parameter, and the stream is assumed to be Gaussian, then Nk,t will be the statistic associated with the usual t-test (Hawkins, Qiu, and Kang 2003). Now, if Nk,t > hk,t for some appropriately chosen threshold hk,t , then a change is detected at location k. Of course, we do not know which value of k to use for testing since no information is available concerning the location of the change point. Instead, we must decide between the more general hypotheses: F0 if i < k, H0 : ∀i Xi ∼ F0 , H1 : ∃k < t : Xi ∼ F1 if i ≥ k. One approach presented by both Pettitt (1979) and Hawkins and Zamba (2005) is to evaluate Nk,t at all values of 1 < k < t, and then use the maximum value. This is somewhat analogous to a generalized likelihood ratio test: if we define Nmax,t = max Nk,t , k then the hypothesis that no change has occurred before the tth observation is rejected if Nmax,t > ht for some threshold ht . The estimate τ̂ of the change point is then the value of k which maximizes Nmax,t . The above formulation provides a general method for nonstreaming change point detection on a fixed size dataset. It can also be used in the case where points are sequentially arriving over time, by repeatedly recomputing the test statistic as each new observation is received. In this case, when the (t + 1)th point, xt+1 , is observed, we compute Nmax,t+1 , and flag that a change has occurred if Nmax,t+1 > ht+1 . This requires a sequence of time-varying thresholds as will be discussed in Section 3.3. However, this formulation is unsuitable for the streaming problem since the computational cost of evaluating Nmax,t increases at least linearly with t. As more points are received from the stream, the number of possible change point locations k increases which results in an increasing number of hypothesis tests being performed at each time instance. Similarly, the memory required to store all of the previously-seen observations grows linearly over time. 381 When the observations are assumed to be Gaussian, the test statistic Nk,t can be summarized by a finite set of recursively updatable sufficient statistics which allows older stream data to be discarded with no loss in performance, as described in the article by Hawkins and Zamba (2005). We will discuss a similar approach for our nonparametric statistics in Section 3.4. 3.2 The Nonparametric Models When the task is to detect a change in a stream where no information is available regarding the pre- or post-change distribution, the obvious approach is to replace Nk,t with a nonparametric two-sample test statistic such as the Kolgormorov– Smirnoff (KS), which can detect arbitrary changes in distribution. The algorithm would proceed as above, with this statistic evaluated at every time point, and the maximum value being compared to a threshold ht . However, “omnibus” tests such as the KS often have low power (Gibbons 1985). In many situations we can find a more powerful test by restricting attention to the case when the prechange distribution F0 undergoes a change in either location or scale. These shifts have the following form: • Location Shift: F 1 (x) = F 0 (x + δ). • Scale Shift: F 1 (x) = F 0 (δx). Although this is slightly more restricted than the general problem, in practice any change in F0 is likely to cause a shift in the first or second moment, and so can be detected. Most popular nonparametric tests for changes in location and scale use only the ranks of the observations, where the rank of the ith observation at time t is defined as r(xi ) = t I(xi ≥ xj ), i=j where I is the indicator function. Detecting location changes using a change point model with a Mann–Whitney (MW) test statistic was discussed by Hawkins, Qiu, and Kang (2003), so we restrict our discussion to changes in the scale parameter, and joint monitoring of the location and scale parameters. In Section 3.4 we also present a method for computing these ranks which requires constant computational time, and a finite memory. This method makes rank-based tests suitable for the streaming problem. 3.2.1 Scale Shifts. There are several nonparametric rankbased tests for comparing the scale parameters of two samples (Gibbons 1985). We favor the Mood test (Mood 1954) since it has a simple formulation and has been shown to perform favorably in power analysis (Duran 1976). This test is based on the following observation: if there are n points spread over two samples S and T, then assuming no tied ranks, the expected rank of each point under the null hypothesis that both samples are identically distributed is (n + 1)/2. The Mood test uses a test statistic which measures the extent to which the rank of each point deviates from its expected value: (r(xi ) − (n + 1)/2)2 , M = xi ∈S TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 382 GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS where r(xi ) denotes the rank of xi in the pooled sample. Like the MW statistic, the distribution of the Mood statistic is independent of the underlying random variables. The mean and variance of the Mood statistic are μM = nS (n2 − 1)/12, 2 2 σM = nS nT (n + 1)(n − 4)/180. This is then standardized to give is that, while the null distribution of Nk,t is usually known for each individual value of k, the distribution of Nmax,t is much more complicated due to the high correlation between the Nk,t statistics. The other major difficulty is that we really require the conditional probability P(Nmax,t > ht |Nmax,t−1 < ht−1 , . . . , Nmax,1 < h1 ), M = |(M − μM )/σM |, where we take the absolute value so that both increases and decreases in the scale parameter can be detected. A nonparametric CPM for scale shifts can be defined by replacing the Nmax,t statistic described in Section 3.1 with this M statistic. When a new observation is received from the stream, M is evaluated at every possible point xi , with the stream being split into two samples around each point. Nmax,t is then the maximum value of these M statistics, and is compared to a threshold ht , as before, with a change being flagged if the threshold is exceeded. 3.2.2 Location and Scale Shifts. In practice, it may not be known whether a change point will involve a shift in location or in scale. In this case it will be required to monitor for either one of these shifts. One approach is to simply use two change point models, one designed to detect shifts in location using the Mann–Whitney (MW) statistic, and one designed to detect shifts in the scale using the Mood statistic. However, using multiple tests makes it difficult to control the false alarm rate; it is not clear how to set the false alarm rates of the individual tests in order to achieve a desired overall false alarm rate, which is a key issue in designing change detection algorithms. An alternative is to use a single detector which is designed to monitor for changes in both the location and scale simultaneously. To facilitate this, we use a Lepage-type (LP) hypothesis test (Lepage 1971) which gives a nonparametric test for both location and scale parameters. Our LP-type test statistic L is defined as a sum of squared MW U and Mood M statistics: L = U2 + M2. Although there are many other ways in which the MW and Mood statistics can be combined, the Lepage method of summing their squared values seems to be the most common in the nonparametric hypothesis testing literature, hence it is the formulation we use. This L statistic can then be integrated into the change point model as before. 3.3 Determination of Thresholds All the change point models discussed above require a sequence of thresholds {ht } to which the test statistic Nmax,t is compared. Hawkins and Zamba (2005) recommended choosing these thresholds so that the false alarm probability (FAP) is equal at every time point, that is, P(Nmax,t > ht ) = α for all t. The average time between false positives (commonly referred to as the Average Run Length or ARL0 ) is then ARL0 = 1/α. However, finding such a sequence of ht values is a difficult problem to solve analytically since the distribution of Nmax,t is very complex and has no obvious analytical form. The problem TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 which is much more difficult to compute than the marginal (unconditional) probability. Further, using an asymptotic distribution would skew the ARL0 of the change detector, since it would not be accurate during the early monitoring period. Since analytic calculation of the thresholds seems unfeasible, we use Monte Carlo simulation to determine the thresholds instead. Once the thresholds have been computed, they can be stored in a look-up table or summarized in a closed form expression, so that there is no need to spend any computational resources on threshold calculation during stream monitoring. For each desired value of α, one million streams containing 5000 points were generated, without any change points. Since our change point models are distribution-free, the threshold values do not depend on the distribution of the stream. We used N(0, 1) random variables for convenience when generating the thresholds, but any continuous distribution could be used instead. Then, the change point models were evaluated over the simulated streams, and Nmax,t was computed at each time instance. This allows the distribution of Nmax,t to be approximated, and the required values of ht can then be determined. Table 1 gives a formula for the values of ht for various choices of ARL0 in powers of 1/t, which was constructed using polynomial regression. For example, to obtain an ARL0 of 500 for the Mood model, the value of ht at time t is found by substituting into the ARL0 = 500 row of the table, which gives the value: ht = 3.37 × 100 − 1.59 × 100 t−1 − 3.64 × 103 t−3 + 5.16 × 106 t−5 − 3.04 × 109 t−7 + 6.55 × 1011 t−9 . Because the Nmax,t test statistics are discrete, it may not be possible to find a value for ht which gives the exact FAP required, especially for low values of t. However, this will only be an issue when the number of observations is very low. We therefore recommend that monitoring of the stream only begins after the first 20 observations have been received, which allows sufficient possibilities for rank assignments to make most FAPs achievable. We can now summarize our algorithm as follows. First, decide whether it is desired to monitor the location or scale of the distribution (or both), and choose the test statistic Nk,t to be either the Mann–Whitney, Mood, or Lepage statistic as appropriate. Also choose the desired FAP α. When a new point xi arrives from the stream calculate Nmax,t , then use Table 1 to find the values of ht corresponding to α. If Nmax,t > ht , then flag that a change has occurred; otherwise do nothing and process the next observation. NONPARAMETRIC MONITORING OF DATA STREAMS 383 Table 1. Polynomial approximation of ht as a function of γ = 1/t Coefficient polynomial approximation of ht CPM type Constant t−1 370 Mood LP 3.27 × 100 1.53 × 101 −1.69 × 100 −4.42 × 101 2.03 × 102 −3.26 × 103 −2.85 × 106 −1.32 × 107 500 Mood LP 3.37 × 100 1.62 × 101 −1.59 × 100 −5.03 × 101 −3.64 × 103 −1.32 × 104 5.16 × 106 2.21 × 106 1000 Mood LP 3.60 × 100 1.82 × 101 −3.55 × 100 −6.43 × 101 5.79 × 102 −7.72 × 104 −2.33 × 106 1.50 × 108 8.69 × 108 −1.01 × 1011 10,000 Mood LP 4.25 × 100 2.45 × 101 −8.28 × 100 −1.50 × 102 5.90 × 102 −1.20 × 103 −6.77 × 106 −1.85 × 107 4.98 × 109 6.66 × 109 −1.01 × 1012 −4.74 × 1011 20,000 Mood LP 4.44 × 100 2.64 × 101 −1.21 × 101 −1.87 × 102 8.05 × 103 7.35 × 104 −1.53 × 107 −1.70 × 108 8.46 × 109 1.04 × 1011 −1.46 × 1012 −2.02 × 1013 ARL0 3.4 Streaming Adaption The above formulations of the nonparametric CPMs rely on the calculation of ranks. This is a problem when it comes to data streams, as computing these ranks requires all previous data to be stored in memory. The result is that the amount of memory used, and computation time required, grow without bound over time. When working with data streams, where this is usually infeasible, we recommend discretizing older stream observations so that only a constant amount of memory and computation is used for rank computation. First, form a rough estimate of an interval within which most points from the stream will lie. This will not usually be known a priori, but can be estimated from the initial stream observations, as [a, b] where a is the minimum value of the first 20 observations, and b is the maximum value. This interval can then be split up into m + 1 segments {s1 , s2 , . . . , sm } where s1 = a, s2 = a + 1/(m − 1), sj = a + j(b − a)/(m − 1), ..., sm = b. For each of these m segments a count, ci , is maintained of the number of stream observations which fall into this segment. Observation xi falls into the jth segment if sj−1 ≤ xi < sj . The CPM now works as follows: define the window Ww,t of fixed length w to be the set of the most recent w points observed from the stream, that is, Ww,t = {xt−w+1 , . . . , xt }. The points in this window are stored in memory, with older points discarded. Whenever a point is old enough to fall outside the moving window, it is no longer stored in memory, but instead assigned to one of the m segments. Therefore, the memory only needs to be large enough to contain the w points currently in the window, and the m count variables for the discretization. Since neither of these grows over time, a constant amount of memory is used no matter how many observations arrive from the stream. The Nmax,t statistic is now calculated by maximizing Nk,t over only the observations in the window, rather than over the whole stream. Note that because older observations are summarized by the histogram rather than discarded entirely, the choice of the window size w is not critical since it only determines the points at which a change may be detected; points too old to fall t−3 t−5 t−7 t−9 2.70 × 109 1.47 × 1010 −6.06 × 1011 −3.39 × 1012 −3.04 × 109 7.62 × 109 6.55 × 1011 −2.55 × 1012 3.09 × 1010 2.11 × 1013 into the window are not discarded, but are instead summarized in the discretization. Rather than ranking each point in the window against all previous data as before, each point’s rank is now defined as the sum of its rank against the other points in the window, and its rank against all previous points in the stream, as approximated by the discretization. Each of the cj points in the jth segment sj is assigned the value of the segment mid-point vj = (sj + sj+1 )/2. The rank of each point in the window xt is then r(xt ) = rw (xt ) + m+1 ci I(xt > vj ) − 1, i=1 where rw is the rank of xi among all the points currently contained in the window. Note that as well as computing Nk,t for each point contained in the window, we can also compute it for the point immediately to the left of the window by comparing the ranks of each point contained in the window to the ranks summarized by the discretization. Using windowing has an effect on the values of the threshold ht used by the CPM. The values given in Table 1 were computed without windowing, where the number of hypothesis tests performed is constantly increasing in time. When windowing is used, the number performed becomes constant, equal to the size of the window. Therefore in order to maintain the desired ARL0 when a window of size w is used, the threshold parameter should be set as: ht if t < w, ht = hw otherwise. There are two potential drawbacks which can arise from stream discretization. First, there is now a small loss of accuracy in rank computation, and the standard deviation of the MW/Mood statistics is slightly increased since tied ranks now arise. Although this will not generally be large, it may be desirable to adjust the standard deviation of the test statistic to take this into account; details of this procedure can be found in the article by Mielke (1967). Because lower values of m result in more tied ranks and hence a slight loss of accuracy, it is desirable to have m as large as possible. Therefore, m should be set as high as computational and memory resources allow. In Section 4.5 we empirically investigate how the choice of m TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 384 GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS affects performance, and find that there is little degradation in performance, unless m is extremely low. The other potential drawback relates to post-signal diagnosis. As discussed in Section 3.1, as well as flagging that a change has been detected, our method also gives an estimate of how far back in the stream the change occurred. This estimate is found by considering previous values of the Nmax,t statistic. However, if discretization is used, it will not be possible to compute these values for points which lie outside the window, which means that the post-signal estimate of the change location can be at most only w observations in the past. Therefore in situations where post-signal diagnosis is considered important, the window size should be chosen to be larger than the expected time taken to detect changes. 4. EXPERIMENTS We now analyze the performance of our nonparametric change point models. Because the Gaussian is the most common distribution used in the change detection literature, we first investigate how the nonparametric models compare to the parametric Gaussian change point models using the Student-t and F test described by Hawkins, Qiu, and Kang (2003), Hawkins and Zamba (2005), as well as to the self-starting CUSUM method proposed by Hawkins (1987), for detecting changes in the mean and standard deviation of a Gaussian stream. This is important, since one of the main uses of nonparametric control charts is when real-world data are suspected to be Gaussian, but an insufficient number of observations are available to validate this assumption. In this case, we would like the speed at which the nonparametric charts detect changes to be similar to the Gaussian parametric models when the data are indeed Gaussian, while being more stable when the Gaussian assumption is violated. After assessing performance in a Gaussian setting, we next investigate how the nonparametric charts perform with nonGaussian data. We use the Student-t and lognormal distributions as examples of distributions which are more heavy-tailed and skewed than the Gaussian, respectively. A relevant performance measure is required in order to compare different change detection methods. As discussed in Section 1, the most common measure in the change detection literature is the mean detection delay. To make the comparison fair, we must first tune the parameters of the algorithms so that they have an equivalent rate of false alarms (ARL0 ). For our comparison we chose ARL0 = 500, although we also performed experiments using other values and found similar results, which we have omitted for brevity. For CPMs, this simply means ensuring that they all use the same value of α = 1/500. For the selfstarting CUSUM, we found that using a threshold of h = 9.8 when k = 0.25, the value used by Hawkins (1987), gives this ARL0 . Note that in this section we will use the terms CPM-MW, CPM-Mood, CPM-LP to refer to the nonparametric change point models for the Mann–Whitney, Mood, and Lepage tests, respectively, with the CPM-MW method being the one from the work of Hawkins and Deng (2010). CPM-t and CPM-F refer to the Student-t and F models from the work of Hawkins, Qiu, and Kang (2003), Hawkins and Zamba (2005), and SS-CUSUM is the self-starting CUSUM from the work of Hawkins (1987). TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 4.1 Gaussian Setting We first evaluate the performance of the nonparametric methods when detecting shifts in the parameters of a Gaussian distribution. In this setting, we would expect the best performance to be achieved by SS-CUSUM, CPM-t, and CPM-F since these models use the optimal tests for detecting shifts in the mean and standard deviation, respectively. Therefore these models are the baseline against which we compare our method. We consider shifts in both the mean and standard deviation. The observations are initially distributed as N(0, 1) before the change point, and either as N(δ, 1) or N(0, δ) after. We expect changes that occur early in the stream to be more difficult to detect than changes which occur after many observations have been received, since more observations allows a more accurate estimation of the distribution parameters and ranks. These early changes will also arise in practice when the stream experiences multiple change points, with only short intervals between each one. Therefore, we consider streams with change points occurring at times τ = {50, 300} to investigate the impact on performance when the change occurs early. We found that changes occurring later than τ = 300 are not significantly easier to detect, so the performance of the detectors when changes occur when τ = 300 can be assumed to also be the performance when the changes occur later. For every change point location and shift magnitude δ, 10,000 streams are generated each containing a single change point, and the mean delay to detect the change is computed. Table 2(a) and (b) shows the mean detection delay using the various change detectors. There are four interesting findings: First, for smaller sized scale changes, the Mood CPM actually outperforms the parametric F model, a similar result to that reported by Hawkins and Deng (2010) when comparing the MW CPM to CPM-t. The explanation, as they noted, is that the parametric change point models tend to give relatively high values of Nk,t at the extremes, when k is close to 0 or t. They hence have higher values of the control limits ht than the nonparametric CPMs, which allows the nonparametric models to react slightly faster to smaller changes. Second, for larger sized changes, the F model outperforms both the Mood and Lepage models as expected. However, the difference in performance is not excessive, suggesting that the nonparametric models are suitable for use when the stream is Gaussian. Third, the performance of the LP model is slightly inferior to the MW model when the change magnitude is low. This is to be expected since the LP model is also monitoring for changes in scale, and hence should be less powerful than a model which is only monitoring for location shifts. Interestingly, when the change magnitude is large, the LP model actually outperforms the MW. This is because when a large change occurs to the mean of the stream, there is also an effect on the estimated standard deviation. Therefore, the change can also be detected by a CPM intended to detect a change in the scale parameter, as well as the one intended to detect a change in location. Since the LP model consists of both a Mood and a MW model, the change can be detected by either of these, which can result in faster detection than using the MW model alone. NONPARAMETRIC MONITORING OF DATA STREAMS 385 Table 2. Average time taken to detect shifts of magnitude δ in the mean and standard deviation of a Gaussian N(0, 1) stream, for various change times τ . Standard deviations are given in brackets (a) Gaussian mean δ SS-CUSUM CPM-t CPM-MW CPM-LP τ = 50 0.5 1.0 1.5 2.0 78.6 (100.1) 16.2 (12.5) 8.1 (3.3) 5.9 (2.1) 151.8 (267.7) 16.4 (13.3) 6.7 (4.1) 4.1 (2.2) 134.2 (250.8) 14.9 (10.4) 7.5 (3.5) 5.4 (1.8) 224.4 (338.9) 21.1 (19.2) 8.0 (4.0) 4.8 (2.2) τ = 300 0.5 1.0 1.5 2.0 38.8 (31.3) 12.6 (5.6) 7.0 (2.7) 5.3 (1.5) 39.1 (31.4) 10.8 (6.3) 5.6 (2.9) 3.6 (1.7) 35.5 (24.6) 11.3 (5.5) 6.5 (2.5) 5.0 (1.4) 50.5 (39.0) 12.6 (7.5) 6.1 (3.5) 4.1 (1.6) τ (b) Gaussian standard deviation τ δ SS-CUSUM CPM-F CPM-Mood CPM-LP τ = 50 1.5 2.0 3.0 0.5 0.3 72.7 (94.0) 19.3 (26.5) 8.2 (5.4) 19.3 (26.5) 8.2 (5.4) 152.4 (282.0) 16.2 (17.5) 5.5 (3.8) 20.3 (15.2) 9.6 (3.7) 95.4 (190.1) 18.0 (21.9) 7.8 (5.3) 38.9 (76.6) 15.1 (5.9) 143.0 (252.0) 26.8 (45.5) 9.4 (6.9) 63.4 (122.6) 21.0 (7.7) τ = 300 1.5 2.0 3.0 0.5 0.3 29.2 (22.9) 11.9 (7.1) 6.3 (3.2) 19.3 (26.5) 8.2 (5.4) 30.7 (23.7) 10.6 (7.2) 4.6 (3.0) 16.4 (7.5) 9.2 (3.1) 28.0 (24.3) 11.5 (8.3) 6.1 (3.6) 22.5 (27.9) 13.1 (3.2) 35.5 (31.8) 14.0 (10.2) 7.5 (4.5) 31.9 (10.7) 18.7 (3.6) Fourth and finally, unlike location shifts, it can be seen that the Lepage CPM does not give better performance than the Mood CPM regardless of the scale shift magnitude. This will be explored further in Section 4.4. 4.2 Gradual Drift A further point to consider is how the CPM methodology performs when either the location or scale of the stream undergoes gradual drift, where the change is incremental over time rather than abrupt. Although this is not the sort of situation which the CPMs are designed to cope with, it may be encountered in practice so it is desirable to be able to perform reasonably in these situations. To represent drift, we simulated 10,000 streams with change points occurring at times τ ∈ {10, 50, 300}. However, the change points now mark the start of a period of gradual drift, rather than abrupt change. We specified that the period of drift last for 100 observations, over which the stream parameter increases linearly. For example, in the streams containing change point at τ = 300 where the observations are N(0, 1) prior to the change, and the location shift magnitude is δ, the observations are distributed as N(δ(t − 300)/100, 1) for t ∈ [300, 400], and as N(δ, 1) for t > 400. Unlike previous experiments, we have included an extremely short change time of τ = 10 to represent the extreme case where the stream is undergoing drift when monitoring begins. This situation presents a problem for any self-starting change detector that does not make assumptions about the in-control distribution, since the lack of a reference sample with which to compare the stream means that changes occurring this early are not easy to detect. The results of our simulations are shown in Table 3(a) and (b), and show that drift is significantly harder to detect than abrupt change, which is not surprising. In particular, when the drift is small and occurs after very few observations, the time to detect it was even longer than the drift period. This raises an important cautionary point for any self-starting change detector: when monitoring a stream, care must be taken to ensure that it is actually stationary at the beginning of monitoring, because small drift will take some time to detect. Comparing the methods, it seems that the self-starting CUSUM approach performs slightly better than the Lepage CPM when it comes to detecting gradual drift, with similar results for the other CPMs omitted for brevity. However, we stress again that the SS-CUSUM is a parametric method designed to work for Gaussian streams, while the Lepage CPM is more general. 4.3 Non-Gaussian Setting The primary advantage of the nonparametric approaches are of course that they can be used to monitor parameter shifts in arbitrary unknown distributions. We now investigate performance when detecting changes in the location and scale of nonGaussian distributions. We study the Student-t distribution with 2.5 degrees of freedom, and the lognormal distribution with parameters (1, 1/2) since these are examples of heavy-tailed and skewed distributions, respectively. Although it would not be sensible to choose to use parametric methods which assume the stream is Gaussian in cases where this assumption is known to be violated, in practice this may occur by accident, when assumptions formed about the data are TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 386 GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS Table 3. Average time taken to detect gradual drift in the mean and standard deviation of a Gaussian N(0, 1) stream, for various change times τ and shift magnitudes δ. Standard deviations are given in brackets SS-CUSUM τ = 10 τ = 50 0.50 1.00 1.50 2.00 122.1 (61.1) 81.3 (29.5) 61.0 (16.8) 52.1 (13.4) 106.2 (55.7) 65.1 (23.5) 49.5 (16.1) 40.2 (12.4) 1.50 2.00 2.50 3.00 116.0 (58.4) 83.1 (39.2) 67.5 (27.0) 56.1 (21.1) δ CPM-LP τ = 300 τ = 50 τ = 300 (a) Drifting Gaussian mean 88.7 (39.1) 134.2 (69.9) 56.5 (18.5) 92.7 (40.1) 43.8 (13.1) 69.7 (24.2) 36.8 (10.8) 58.5 (18.7) 121.4 (68.2) 76.3 (28.9) 57.6 (20.0) 47.5 (15.9) 102.9 (46.8) 63.3 (22.3) 49.3 (15.8) 40.6 (12.9) (b) Drifting Gaussian standard deviation 93.1 (51.6) 73.4 (32.6) 113.6 (68.9) 59.9 (24.7) 49.7 (19.0) 93.6 (57.4) 39.4 (14.7) 80.0 (45.9) 48.2 (19.0) 39.9 (15.1) 34.1 (11.1) 69.1 (36.3) 103.4 (62.6) 73.6 (41.6) 55.3 (27.6) 47.8 (23.2) 85.5 (43.1) 55.8 (25.1) 43.5 (18.9) 36.9 (14.9) mistaken. We therefore investigate how the violation of Gaussianity affects the realized ARL0 of the T and F CPMs compared to the specified value. Since the ht thresholds associated with the nonparametric change point models are distributionfree, the values given in Table 1 will give the required ARL0 regardless of the underlying distribution. This is not the case with either the T or F models, since their thresholds were computed under the assumption of Gaussianity (Hawkins and Zamba 2005). Table 4 shows the actual ARL0 values obtained by these models on the Student-t(2.5) and lognormal(1, 1/2) distributions, and it can be seen that they deviate significantly from the target values. Since controlling the rate of false alarms is highly important in change detection problems, these large deviations from the desired ARL0 values imply that the parametric Gaussian tests are unsuitable in situations where deviations from Gaussianity may be present. Table 5(a) and (b) shows the average detection delays for detecting location and scale shifts in the Student-t and lognormal distributions, respectively. In all cases, the observations have been standardized so that they have mean 0 and standard deviation 1 before the change point. The change then consists of either the mean or standard deviation changing to δ. These results broadly follow the same patterns as those for the Gaussian distribution, with the LP CPM being slightly slower to detect most changes than the dedicated MW and Mood CPMs, with the advantage of the LP model of course being that it is able to detect both location and scale changes. Table 4. The empirical ARL0 observed for the T and F change point models on the t(4) and lognormal(1, 1/2) distributions Distribution Model Target ARL Student-t(2.5) lognormal(1, 1/2) Student-t 50 200 500 32 83 113 29 78 94 F 50 200 500 39 97 134 33 91 127 TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 τ = 10 4.4 Diagnostics One issue which arises when using the combined locationscale Lepage CPM is that of post-signal diagnostics: given that the detector has signaled that a change has occurred, can we determine whether the change represents a shift in location, or a shift in scale? Suppose that a change is detected at time t. The CPM gives an estimate τ̂ ≤ t of the change point corresponding to the value of k which maximized Nmax,t . The stream can then be partitioned into the subsets {X1 , . . . , Xτ̂ −1 }, {Xτ̂ , . . . , XT }. A twosample Mann–Whitney and Mood test can then be performed on these two subsets, and the p-values compared. If the MW test gives the lower p-value, the change is most likely to constitute a location shift, while if the Mood test gives the lower p-value, then it is likely to be a scale shift. To test this idea we simulated 10,000 streams from the three distributions considered above, containing various sized changes in either the location or scale. For each stream, the Lepage CPM was used to detect change, and the above procedure was deployed to predict the type of change that occurred. For each type of change, we then found the probability of it being correctly identified. Table 6 summarizes the results. It can be seen that this diagnostic method gives a good prediction of which type of change caused the CPM to flag. When the stream undergoes a location shift, the MW test will almost always have a lower p-value than the Mood test, causing the change type to be correctly identified. There is more risk of incorrect identification when the change magnitude is small. However, when the stream undergoes a scale shift, the proportion of correctly identified changes is slightly lower. This is because, although the MW test is intended to be used to detect whether two samples have equal location parameters, it is also slightly sensitive to differences in scales. Our findings are consistent with the results reported for joint monitoring of mean and standard deviation in a parametric context. For example, Hawkins and Zamba (2005) also found that when using combined parametric mean and standard deviation change detectors to monitor a Gaussian distribution, the detector which was meant to be monitoring the mean was sometimes faster to detect a change in the standard deviation than the corresponding detector. They also claimed this is a feature of all methods designed for detecting location shifts. NONPARAMETRIC MONITORING OF DATA STREAMS 387 Table 5. Average time taken to detect shifts in the location and scale of Student-t(2.5) and lognormal(1, 1/2) streams. In both cases the pre-change distributions are scaled to have mean 0 and standard deviation 1. The post-change distributions then have either mean δ or standard deviation δ. The standard deviation of the detection time is given in brackets (a) Nonparametric location Student-t(2.5) lognormal(1, 1/2) δ CPM-MW CPM-LP CPM-MW CPM-LP τ = 50 0.5 1.0 1.5 2.0 22.7 (41.6) 7.0 (3.1) 5.1 (1.6) 4.5 (1.1) 39.2 (93.3) 7.3 (12.3) 6.4 (3.6) 3.4 (1.8) 86.1 (46.3) 11.4 (7.1) 6.5 (2.3) 5.4 (1.3) 84.2 (52.4) 16.5 (9.3) 8.2 (4.1) 5.2 (2.2) τ = 300 0.5 1.0 1.5 2.0 14.3 (8.3) 6.2 (2.4) 4.6 (1.3) 4.1 (0.9) 18.0 (28.2) 5.5 (5.5) 5.1 (2.4) 3.1 (1.1) 26.1 (15.2) 9.2 (3.4) 5.9 (1.5) 4.8 (0.9) 36.1 (19.5) 12.5 (5.6) 6.3 (2.4) 4.2 (1.1) τ (b) Nonparametric scale Student-t(2.5) lognormal(1, 1/2) δ CPM-Mood CPM-LP CPM-Mood CPM-LP τ = 50 1.5 2.0 3.0 0.5 0.3 205.7 (314.4) 50.3 (117.9) 12.1 (10.8) 79.5 (157.0) 20.7 (13.4) 251.6 (343.0) 84.4 (184.6) 16.0 (21.3) 140.6 (245.5) 29.5 (28.4) 63.2 (136.3) 14.1 (16.0) 7.0 (4.5) 29.2 (39.1) 15.6 (6.3) 87.6 (190.1) 17.0 (20.2) 8.0 (5.3) 40.4 (60.4) 19.4 (5.9) τ = 300 1.5 2.0 3.0 0.5 0.3 49.5 (50.9) 18.3 (14.1) 8.6 (5.3) 31.8 (16.4) 16.8 (6.0) 63.4 (71.7) 22.4 (17.5) 10.3 (6.6) 45.9 (20.5) 24.0 (7.2) 20.1 (17.3) 9.5 (7.9) 5.6 (3.2) 21.1 (8.3) 14.5 (4.4) 24.4 (21.0) 11.3 (7.9) 6.7 (3.8) 27.5 (7.7) 18.7 (3.5) τ In summary, we give the following recommendations: if the Lepage CPM signals that a change has occurred and the Mood test is found to have a lower p-value, then we can conclude with a high degree of certainty that the signal was caused by a shift in scale parameter, since Table 6 shows that the Mood test rarely signals when a location shift has occurred. However, if the Mann–Whitney test has a lower p-value, then the signal was more likely to have been caused by a location shift, although there is slightly more uncertainty here since the Mann– Whitney test is also sensitive to scale shifts. Although this is a simple method, it gives a reasonably accurate diagnosis of what was responsible for the flagged change. More sophisticated assessment is an avenue for future research. 4.5 The Impact of Stream Discretization In the previous experiments, all data from the stream were stored in memory to allow ranks to be calculated. The stream discretization technique was introduced in Section 3.4 as an alternative method of computing ranks when this is infeasible. We now investigate the impact that discretization has on performance. We use the Lepage CPM in the following, although we found identical results for the individual MW and Mood models. We first investigate a stream of N(0, 1) variables which change to N(1, 1) after 500 observations. The stream is discretized into m ∈ {100, 200, 300} segments, and the window size is w ∈ {50, 100, 200}. For each value of w, we define the relative performance to be the ratio of the mean detection delay Table 6. Probability of correctly diagnosing a change of size δ in either the location or scale parameter by comparing the p-values of the Mann–Whitney and Mood tests Location shifts δ 0.25 0.50 1.00 1.50 2.00 3.00 Scale shifts N(0, 1) t(4) LN(1, 1/2) δ N(0, 1) t(4) LN(1, 1/2) 0.86 0.92 0.94 0.94 0.96 0.97 0.88 0.91 0.92 0.92 0.93 0.96 0.83 0.96 0.96 0.97 0.97 0.97 1.25 1.50 1.75 2.00 3.00 4.00 0.75 0.81 0.87 0.87 0.91 0.94 0.68 0.78 0.82 0.86 0.92 0.92 0.74 0.83 0.88 0.89 0.94 0.94 TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 388 GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS Table 7. Relative performance when discretizing the stream into m segments with window size w, compared to the non-discretized model w m 100 200 300 50 100 200 500 0.80 0.81 0.81 0.92 0.93 0.93 0.98 0.98 0.98 0.99 0.99 0.99 using a window of size w, and the mean detection delay when no windowing is used. Table 7 shows the relative performance for each choice of m and w compared to the non-discretized version which stores all observations in memory. It can be seen that unless w is low, discretization has very little impact on performance. Because the histogramming provides a summary of points which are too old to fall in the window, the window size generally only affects how far back in time changes can be detected. Therefore as long as the change is sufficiently large to be detected within 500 points of occurring, there should be no performance decrease from windowing. However, if it is suspected that the change will be small enough that it will take a very long time to detect, then a larger window will need to be chosen. The results for the Student-t(2.5) and lognormal(1, 1/2) distributions are similar, but omitted for space reasons. Again, unless the window size is set very low (≤50), there is little impact on performance. 4.6 Real Data Application We now give an example of our algorithm being used to detect changes in a high-volume financial data stream. Although the analysis of financial data is often quite sophisticated, we provide this example to demonstrate the capabilities of our algorithm. We obtained a historical sequence of the exchange rate between the Swiss Franc (CHF) and the British Pound (GBP). The maximum value of the exchange rate was recorded during consecutive 5-minute intervals running from October 21st 2002 to May 15th 2007. In total, 333,758 observations xt were made, and we treat them as being a data stream where observations are received and processed sequentially. Note that although this is an example of sequential monitoring, it differs from those typically investigated in the quality control literature, which usually assume that we have some control over the process being monitored, and can (e.g.) stop the process to further investigate and diagnose any detected shifts. This is not true in the case of financial data; rather, signaling for a change may eventually prompt some trading action which might indirectly affect the stream. We have chosen this example to illustrate how our change detection techniques can be deployed in broader situations than those typical in quality control, where large data streams are more likely to be encountered. A plot of the financial stream is shown in Figure 1(a), and appears to follow a random walk. To remove this nonstationarity we instead consider the first differences of the log-returns, defined as xt = log(xt ) − log(xt−1 ). These are plotted in Figure 1(b), and appear stationary with mean 0. However, these TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4 first differences have very heavy tails with a kurtosis of 4.52, suggesting that they are non-Gaussian. A nonparametric change detector hence seems an appropriate tool to use for analysis. We used the Lepage CPM to monitor for changes in both location and scale in the differenced observations. Due to the length of the data, the ARL0 was set to 200,000 in order to avoid a large number of false alarms being generated. Because this stream is likely to contain multiple change points, we reinitialized the CPM from scratch whenever a change was detected by discarding all the points stored in the window, and reinitializing it starting with the observation immediately following the flagged change. Our algorithm processed the stream sequentially and detected a total of four change points. Using the diagnostic method outlined in Section 4.4, we concluded that all four correspond to scale shifts. We have superimposed these change points on Figure 1(a) and (b). It is not obvious from these plots that the discovered change points correspond to real scale shifts, so to investigate further, we computed the exponentially weighted average of the stream variance, defined as EWMAt = λEWMAt−1 + (1 − λ)(xt )2 . This allows a local estimate of the variance to be formed. We have plotted this EWMA in Figure 1(c), with λ = 0.999. It can be seen that the stream variance is undergoing gradual drift, and that our discovered change points correspond to an abrupt increase in the variance around the 30,000th and 265,000th observations, and a shift in the direction of the drift at the 115,000th observation. The final change point at the 150,000th observation does not correspond to any obvious feature of the variance drift process, and may perhaps be considered a false positive. 5. CONCLUSIONS We introduced a nonparametric change point model for detecting joint changes in the location and scale parameters of a data stream. These models provide a way to use classical rankbased hypothesis tests in a streaming context. Through discretization, we gave a method for computing the ranks in a computationally efficient manner. Experiments with synthetic data showed that our detectors gave good performance compared with the optimal Student-t and F tests for detecting changes in a Gaussian distribution, while also being suitable when the Gaussianity assumption is violated. Finally, we showed that discretizing the stream does not significantly affect performance, so this approach can be used to implement our models efficiently. We conclude that the change point models are suitable to be deployed in a streaming context where computational efficiency is important, and the amount of available memory is low. ACKNOWLEDGMENTS This research was undertaken as part of the ALADDIN (Autonomous Learning Agents for Decentralised Data and Information Systems) project and is jointly funded by a BAE Systems and EPSRC (Engineering and Physical Research Council) strategic partnership, under EPSRC grant EP/C548051/1. [Received May 2010. Revised June 2011.] NONPARAMETRIC MONITORING OF DATA STREAMS (a) Original sequence 389 (b) First differences (c) EWMA estimate of variance Figure 1. The foreign exchange data, their first differences, and the EWMA of the squared first differences, all with the detected scale change points superimposed. REFERENCES Basseville, M., and Nikiforov, I. V. (1993), Detection of Abrupt Change Theory and Application, Englewood Cliffs, NJ: Prentice Hall. [379,380] Bhattacharya, P., and Frierson, D. (1981), “A Nonparametric Control Chart for Detecting Small Disorders,” The Annals of Statistics, 9, 544–554. [380] Carlstein, E. (1988), “Nonparametric Change-Point Estimation,” The Annals of Statistics, 16, 188–197. [380] Chakraborti, S., and van de Wiel, M. A. (2008), “A Nonparametric Control Chart Based on the Mann–Whitney Statistic,” in Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, Beachwood, OH: Institute of Mathematical Statistics, pp. 156–172. [380] Chan, L., Hapuarachchi, K., and Macpherson, B. (1988), “Robustness of XBar and R Charts,” IEEE Transactions on Reliability, 37, 117–123. [380] Domingos, P., and Hulten, G. (2003), “A General Framework for Mining Massive Data Stream,” Journal of Computational and Graphical Statistics, 12, 945–949. [379] Duran, B. (1976), “Survey of Nonparametric Tests for Scale,” Communications in Statistics—Theory and Methods, 5, 1287–1312. [381] Fu, T., Chung, F., Ng, V., and Luk, R. (2001), “Evolutionary Segmentation of Financial Time Series Into Subsequences,” in Proceedings of the 2001 Congress on Evolutionary Computation, Seoul, Korea: IEEE Press, pp. 426–430. [379] Gibbons, J. D. (1985), Nonparametric Statistical Inference, Boca Raton, FL: McGraw-Hill. [381] Gordon, L., and Pollak, M. (1994), “An Efficient Sequential Nonparametric Scheme for Detecting a Change of Distribution,” The Annals of Statistics, 22, 763–804. [380] Gustafsson, F. (2000), Adaptive Filtering and Change Detection, Chichester, West Sussex, England: Wiley. [379] Hackl, P., and Ledolter, J. (1991), “A Control Chart Based on Ranks,” Journal of Quality Technology, 23, 117–124. [380] Hawkins, D., and Zamba, K. (2005), “A Change-Point Model for a Shift in Variance,” Journal of Quality Technology, 37, 21–31. [380-382,384,386] Hawkins, D., Qiu, P., and Kang, C. (2003), “The Changepoint Model for Statistical Process Control,” Journal of Quality Technology, 35, 355–366. [380,381,384] Hawkins, D. M. (1987), “Self-Starting CUSUM Charts for Location and Scale,” Statistician, 36, 299–315. [384] Hawkins, D. M., and Deng, Q. (2010), “A Nonparametric Change-Point Control Chart,” Journal of Quality Technology, 42, 165–173. [380,384] Jensen, W. A., Jones-Farmer, L. A., Champ, C. W., and Woodall, W. H. (2006), “Effects of Parameter Estimation on Control Chart Properties: A Literature Review,” Journal of Quality Technology, 38, 349–364. [380] Jones, L., and Woodall, W. (1998), “The Performance of Bootstrap Control Charts,” Journal of Quality Technology, 30, 362–375. [380] Lepage, Y. (1971), “Combination of Wilcoxians and Ansari–Bradley Statistics,” Biometrika, 58, 213–217. [382] Mielke, P. W. J. (1967), “Note on Some Squared Rank Tests With Existing Ties,” Technometrics, 9, 312–314. [383] Mood, A. (1954), “On the Asymptotic Efficiency of Certain Nonparametric Two-Sample Tests,” Annals of Mathematical Statistics, 25, 514–533. [381] Pettitt, A. N. (1979), “A Non-Parametric Approach to the Change-Point Problem,” Journal of the Royal Statistical Society, Ser. C, 28, 126–135. [380,381] Ramoni, M., Sebastiani, P., and Cohen, P. (2002), “Bayesian Clustering by Dynamics,” Machine Learning, 47, 91–121. [379] Stapnes, S. (2007), “Detector Challenges at the LHC,” Nature, 440, 290–296. [379] Zhou, C., Zou, C., Zhang, Y., and Wang, Z. (2009), “Nonparametric Control Chart Based on Change-Point Model,” Statistical Papers, 50, 13–28. [380] TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
© Copyright 2026 Paperzz