Nonparametric Monitoring of Data Streams for

Nonparametric Monitoring of Data Streams for
Changes in Location and Scale
Gordon J. R OSS, Dimitris K. TASOULIS, and Niall M. A DAMS
Department of Mathematics
Imperial College London
London, SW7 2AZ, U.K.
([email protected]; [email protected];
[email protected])
The analysis of data streams requires methods which can cope with a very high volume of data points.
Under the requirement that algorithms must have constant computational complexity and a fixed amount
of memory, we develop a framework for detecting changes in data streams when the distributional form
of the stream variables is unknown. We consider the general problem of detecting a change in the location
and/or scale parameter of a stream of random variables, and adapt several nonparametric hypothesis tests
to create a streaming change detection algorithm. This algorithm uses a test statistic with a null distribution independent of the data. This allows a desired rate of false alarms to be maintained for any stream
even when its distribution is unknown. Our method is based on hypothesis tests which involve ranking
data points, and we propose a method for calculating these ranks online in a manner which respects the
constraints of data stream analysis.
KEY WORDS: Change detection; Nonparametric tests; Streaming data.
1. INTRODUCTION
In recent years, problems relating to the analysis of data
streams have become widespread. A data stream is a collection of time-ordered observations x1 , x2 , . . . generated from the
random variables X1 , X2 , . . . . We assume that the observations
are univariate, real-valued, and independent.
Unlike classical time series, data streams are not assumed to
have a fixed size and new observations are regularly received
over time. In many applications the rate at which these observations are received can be very high; for example, the Large
Hadron Collider at CERN generates millions of observations
every second (Stapnes 2007). Therefore it is common to impose
restrictions on algorithms designed for use with data streams
(Domingos and Hulten 2003). First, the amount of computation
time required to process each observation should be constant
rather than increasing over time. Second, only a fixed number
of observations should be stored in memory at any one time.
Ideally methods should be single-pass, with observations processed once, and then discarded.
Many traditional approaches to statistics assume the existence of a fixed size collection of observations, where the number of data points in the collection is usually assumed to be
small enough that computational issues are not a primary concern. This assumption is violated in the data stream setting,
since new observations must be sequentially processed in a
computationally efficient manner.
This article is concerned with the task of detecting whether
a data stream contains a change point, and extends traditional
methods for sequential change detection to the streaming context. If no change point exists, the observations are assumed to
be identically distributed. If a change point exists at time τ , then
the observations are distributed as:
0
F if i < τ ,
Xi ∼
(1)
F 1 if i ≥ τ .
In other words, the variables are iid with some distribution
F0 before the change point at t = τ , and iid with a different
distribution F1 after. The location of the change point is unknown, and the problem is to detect it as soon as possible. When
τ = ∞, no change point occurs in the sequence. Note that this
change point methodology can also be applied to streams which
are not iid between change points, by first modeling the stream
distribution in a way which yields iid one-step-ahead forecast
residuals, and then performing change detection on these; see
the book by Gustafsson (2000) for more details.
Much early work on this problem emerged from the quality
control community where the goal was to monitor for a change
in the number of defective items arising from a production line.
Another common application is segmenting data streams into
regions of similar behavior, a problem which has applications
in many domains such as finance (Fu et al. 2001) and robotics
(Ramoni, Sebastiani, and Cohen 2002). A further example is
model adequacy determination, where the performance of a statistical model fitted to a data stream is monitored for evidence
of degradation.
The performance of change detection algorithms is usually
measured using two criteria (Basseville and Nikiforov 1993):
the expected time between false positive detections (denoted
ARL0 , for Average Run Length), and the mean delay until a
change is detected (denoted ARL1 ). If τ̂ is an estimator of the
true change point which occurs at time τ , then we can define
these formally as:
ARL0 = E(τ̂ |F = F 0 ),
ARL1 = E(τ̂ − τ |F = F 1 ).
Note that when τ̂ < τ , a false positive is said to have occurred. Generally, change detection algorithms are designed by
379
© 2011 American Statistical Association and
the American Society for Quality
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
DOI 10.1198/TECH.2011.10069
380
GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS
deciding on an acceptable ARL0 value, and then attempting to
minimize the mean detection delay. This is analogous to classical hypothesis testing where tests are designed to minimize the
Type II error, subject to a bound on the Type I error.
Most traditional approaches to sequential change detection
have assumed that the distributional form of the stream is
known before and after the change with only the parameters
being unknown. However, these assumptions rarely hold in
streaming applications. Typically, there is no prior knowledge
of the true stream distribution, or assumptions made about the
stream distribution may be incorrect. Several authors have investigated the performance of parametric change detection algorithms (Chan, Hapuarachchi, and Macpherson 1988; Jensen
et al. 2006) when the distribution of the stream is incorrectly
specified, and found that even small misspecifications can have
very large effects on the false alarm rate.
There is hence a need for nonparametric (distribution-free)
change detection methods which are able to maintain a specified level of performance, such as the false alarm rate, regardless of the true distribution of the stream. In recent years, several
distribution-free charts have been proposed which can monitor
a location parameter, such as the mean or median (Chakraborti
and van de Wiel 2008; Hawkins and Deng 2010); however, the
problem of detecting more general changes, such as those involving a scale or shape parameter, has been less widely studied.
In this work, we propose a novel technique for monitoring
for a change in the location and/or scale parameter of an arbitrary continuously distributed univariate data stream. Unlike
many existing approaches, our algorithm satisfies the requirements for data stream processing, with the computation time
and memory required to process each data point being constant
rather than growing over time.
Our method does not assume that anything is known about
the distribution of the stream before monitoring begins; it is thus
an example of a “self-starting” technique. This is in contrast
to several methods in the quality control literature which aim
to detect whether the stream deviates from a known baseline
distribution. The main advantage of the self-starting approach
is that it can be deployed out-of-the-box without the need to
estimate parameters of the stream distribution from a reference
sample prior to monitoring.
Our approach is based on a generalization of the change
point model (CPM), introduced for the Gaussian distribution
by both Hawkins, Qiu, and Kang (2003) and Hawkins and
Zamba (2005) in order to adapt traditional generalized likelihood tests to the streaming problem. This framework was
recently extended for the purpose of detecting nonparametric changes to the location parameter by Hawkins and Deng
(2010). However, their work does not satisfy the O(1) computational and memory complexity requirements for the processing of data streams. We develop their work in three ways.
First, we extend the CPM framework so that changes to the
scale parameter can be detected, a problem which has not received sufficient attention in the literature. Second, we propose
a method for monitoring changes to both the location and scale
simultaneously. Finally, we introduce the idea of stream discretization, which allows the test statistics used in both their
method and ours to be computed in a fast manner, thereby
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
facilitating deployment of these techniques on high-frequency
streams.
The remainder of the article proceeds as follows: Section 2
briefly reviews the existing literature on nonparametric change
detection. Section 3.1 introduces the change point model formulation. Our nonparametric approach is discussed in Section 3.2, where we describe the hypothesis tests which we use.
The key issue of setting the threshold parameters for our algorithm is dealt with in Section 3.3. The generalization to the
streaming problem is given in Section 3.4 where we provide
a method for calculating ranks in a way which requires only
constant computation and memory costs. An extensive experimental comparison is then carried out in Section 4, along with
an example of the method deployed on a real financial data
stream.
2. RELATED WORK
Most traditional work on change detection assumes that the
distribution of the stream before the change point is known.
Classic methods for this problem include the CUSUM algorithm, exponentially weighted moving average charts, generalized likelihood tests, and the Bayesian Shiryaev–Roberts approach. An overview of these techniques can be found in the
book by Basseville and Nikiforov (1993).
Few authors have treated the problem of detecting arbitrary
changes in unknown distributions, in a single-pass manner.
Most of the existing literature on nonparametric change detection deals with a fixed size sample rather than a data stream
where new observations are received over time. Examples of
such approaches include the works of Bhattacharya and Frierson (1981), Carlstein (1988), Jones and Woodall (1998).
Due to the difficulty of detecting arbitrary distributional
changes, most existing work on nonparametric change detection deals only with monitoring for changes in a location parameter, such as the mean or median. A popular approach
to this problem is to use only the ranks of observations, and
adapt classical rank-based hypothesis tests such as Mann–
Whitney (Hackl and Ledolter 1991; Gordon and Pollak 1994;
Chakraborti and van de Wiel 2008). However, the computation
of ranks requires the storage of all previous data to allow new
points to be ranked. This makes them difficult to apply to the
streaming problem.
Directly relevant to our work is that of Pettitt (1979), which
used a statistic based on a maximization over several Mann–
Whitney statistics to detect a change in location parameter.
However, this approach was developed only for a fixed size
dataset, and is hence not single-pass. The analysis of the test
statistic also relies on an asymptotic argument which makes it
unsuitable for deployment in streams where changes occur relatively frequently and small sample behavior is important.
A sequential extension of the work of Pettitt (1979) was recently proposed by both Zhou et al. (2009) and Hawkins and
Deng (2010), who used Monte Carlo simulation instead of an
asymptotic argument in order to calculate exact quantiles of the
test statistic. We propose to extend this work so that it can detect location and/or scale changes rather than scale alone, and
also adapt it to the streaming context.
NONPARAMETRIC MONITORING OF DATA STREAMS
3. FRAMEWORK
3.1 The Change Point Model
The sequential change detection problem has the following
form: a (potentially infinite) sequence of (assumed) independent random variables X1 , X2 , . . . is observed. We write xi for
the particular realization of Xi observed at time t = i. The distribution of Xi is given by Equation (1), conditional on the change
point τ which we seek to detect.
Suppose t points from the sequence have been observed and
we wish to test whether a change point has occurred at some
point in the past. For any fixed k < t the hypothesis that a
change point occurs at the kth observation can be written as
F0 if i < k,
H0 : ∀i Xi ∼ F0 ,
H 1 : Xi ∼
F1 if i ≥ k.
A two-sample hypothesis test can then be used to test for
a change point at k. Let Nk,t be an appropriately chosen test
statistic. For example, if the change is assumed to take the form
of a shift in location parameter, and the stream is assumed to
be Gaussian, then Nk,t will be the statistic associated with the
usual t-test (Hawkins, Qiu, and Kang 2003). Now, if Nk,t > hk,t
for some appropriately chosen threshold hk,t , then a change is
detected at location k.
Of course, we do not know which value of k to use for testing since no information is available concerning the location of
the change point. Instead, we must decide between the more
general hypotheses:
F0 if i < k,
H0 : ∀i Xi ∼ F0 ,
H1 : ∃k < t : Xi ∼
F1 if i ≥ k.
One approach presented by both Pettitt (1979) and Hawkins
and Zamba (2005) is to evaluate Nk,t at all values of 1 < k < t,
and then use the maximum value. This is somewhat analogous
to a generalized likelihood ratio test: if we define
Nmax,t = max Nk,t ,
k
then the hypothesis that no change has occurred before the tth
observation is rejected if Nmax,t > ht for some threshold ht . The
estimate τ̂ of the change point is then the value of k which maximizes Nmax,t .
The above formulation provides a general method for nonstreaming change point detection on a fixed size dataset. It can
also be used in the case where points are sequentially arriving
over time, by repeatedly recomputing the test statistic as each
new observation is received. In this case, when the (t + 1)th
point, xt+1 , is observed, we compute Nmax,t+1 , and flag that
a change has occurred if Nmax,t+1 > ht+1 . This requires a sequence of time-varying thresholds as will be discussed in Section 3.3.
However, this formulation is unsuitable for the streaming
problem since the computational cost of evaluating Nmax,t increases at least linearly with t. As more points are received
from the stream, the number of possible change point locations
k increases which results in an increasing number of hypothesis tests being performed at each time instance. Similarly, the
memory required to store all of the previously-seen observations grows linearly over time.
381
When the observations are assumed to be Gaussian, the test
statistic Nk,t can be summarized by a finite set of recursively
updatable sufficient statistics which allows older stream data
to be discarded with no loss in performance, as described in the
article by Hawkins and Zamba (2005). We will discuss a similar
approach for our nonparametric statistics in Section 3.4.
3.2 The Nonparametric Models
When the task is to detect a change in a stream where no
information is available regarding the pre- or post-change distribution, the obvious approach is to replace Nk,t with a nonparametric two-sample test statistic such as the Kolgormorov–
Smirnoff (KS), which can detect arbitrary changes in distribution. The algorithm would proceed as above, with this statistic
evaluated at every time point, and the maximum value being
compared to a threshold ht .
However, “omnibus” tests such as the KS often have low
power (Gibbons 1985). In many situations we can find a more
powerful test by restricting attention to the case when the prechange distribution F0 undergoes a change in either location or
scale. These shifts have the following form:
• Location Shift: F 1 (x) = F 0 (x + δ).
• Scale Shift: F 1 (x) = F 0 (δx).
Although this is slightly more restricted than the general
problem, in practice any change in F0 is likely to cause a shift
in the first or second moment, and so can be detected.
Most popular nonparametric tests for changes in location and
scale use only the ranks of the observations, where the rank of
the ith observation at time t is defined as
r(xi ) =
t
I(xi ≥ xj ),
i=j
where I is the indicator function. Detecting location changes
using a change point model with a Mann–Whitney (MW) test
statistic was discussed by Hawkins, Qiu, and Kang (2003),
so we restrict our discussion to changes in the scale parameter, and joint monitoring of the location and scale parameters.
In Section 3.4 we also present a method for computing these
ranks which requires constant computational time, and a finite
memory. This method makes rank-based tests suitable for the
streaming problem.
3.2.1 Scale Shifts. There are several nonparametric rankbased tests for comparing the scale parameters of two samples
(Gibbons 1985). We favor the Mood test (Mood 1954) since it
has a simple formulation and has been shown to perform favorably in power analysis (Duran 1976). This test is based on
the following observation: if there are n points spread over two
samples S and T, then assuming no tied ranks, the expected
rank of each point under the null hypothesis that both samples
are identically distributed is (n + 1)/2. The Mood test uses a
test statistic which measures the extent to which the rank of
each point deviates from its expected value:
(r(xi ) − (n + 1)/2)2 ,
M =
xi ∈S
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
382
GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS
where r(xi ) denotes the rank of xi in the pooled sample. Like
the MW statistic, the distribution of the Mood statistic is independent of the underlying random variables. The mean and
variance of the Mood statistic are
μM = nS (n2 − 1)/12,
2
2
σM
= nS nT (n + 1)(n − 4)/180.
This is then standardized to give
is that, while the null distribution of Nk,t is usually known for
each individual value of k, the distribution of Nmax,t is much
more complicated due to the high correlation between the Nk,t
statistics. The other major difficulty is that we really require the
conditional probability
P(Nmax,t > ht |Nmax,t−1 < ht−1 , . . . , Nmax,1 < h1 ),
M = |(M − μM )/σM |,
where we take the absolute value so that both increases and decreases in the scale parameter can be detected. A nonparametric CPM for scale shifts can be defined by replacing the Nmax,t
statistic described in Section 3.1 with this M statistic. When
a new observation is received from the stream, M is evaluated
at every possible point xi , with the stream being split into two
samples around each point. Nmax,t is then the maximum value
of these M statistics, and is compared to a threshold ht , as before, with a change being flagged if the threshold is exceeded.
3.2.2 Location and Scale Shifts. In practice, it may not be
known whether a change point will involve a shift in location
or in scale. In this case it will be required to monitor for either
one of these shifts. One approach is to simply use two change
point models, one designed to detect shifts in location using
the Mann–Whitney (MW) statistic, and one designed to detect
shifts in the scale using the Mood statistic. However, using multiple tests makes it difficult to control the false alarm rate; it is
not clear how to set the false alarm rates of the individual tests
in order to achieve a desired overall false alarm rate, which is a
key issue in designing change detection algorithms.
An alternative is to use a single detector which is designed to
monitor for changes in both the location and scale simultaneously. To facilitate this, we use a Lepage-type (LP) hypothesis
test (Lepage 1971) which gives a nonparametric test for both
location and scale parameters. Our LP-type test statistic L is
defined as a sum of squared MW U and Mood M statistics:
L = U2 + M2.
Although there are many other ways in which the MW and
Mood statistics can be combined, the Lepage method of summing their squared values seems to be the most common in the
nonparametric hypothesis testing literature, hence it is the formulation we use.
This L statistic can then be integrated into the change point
model as before.
3.3 Determination of Thresholds
All the change point models discussed above require a sequence of thresholds {ht } to which the test statistic Nmax,t is
compared. Hawkins and Zamba (2005) recommended choosing these thresholds so that the false alarm probability (FAP) is
equal at every time point, that is, P(Nmax,t > ht ) = α for all t.
The average time between false positives (commonly referred
to as the Average Run Length or ARL0 ) is then
ARL0 = 1/α.
However, finding such a sequence of ht values is a difficult
problem to solve analytically since the distribution of Nmax,t is
very complex and has no obvious analytical form. The problem
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
which is much more difficult to compute than the marginal
(unconditional) probability. Further, using an asymptotic distribution would skew the ARL0 of the change detector, since it
would not be accurate during the early monitoring period.
Since analytic calculation of the thresholds seems unfeasible, we use Monte Carlo simulation to determine the thresholds
instead. Once the thresholds have been computed, they can be
stored in a look-up table or summarized in a closed form expression, so that there is no need to spend any computational
resources on threshold calculation during stream monitoring.
For each desired value of α, one million streams containing 5000 points were generated, without any change points.
Since our change point models are distribution-free, the threshold values do not depend on the distribution of the stream. We
used N(0, 1) random variables for convenience when generating the thresholds, but any continuous distribution could be
used instead. Then, the change point models were evaluated
over the simulated streams, and Nmax,t was computed at each
time instance. This allows the distribution of Nmax,t to be approximated, and the required values of ht can then be determined.
Table 1 gives a formula for the values of ht for various
choices of ARL0 in powers of 1/t, which was constructed using polynomial regression. For example, to obtain an ARL0 of
500 for the Mood model, the value of ht at time t is found by
substituting into the ARL0 = 500 row of the table, which gives
the value:
ht = 3.37 × 100 − 1.59 × 100 t−1 − 3.64 × 103 t−3
+ 5.16 × 106 t−5 − 3.04 × 109 t−7 + 6.55 × 1011 t−9 .
Because the Nmax,t test statistics are discrete, it may not be
possible to find a value for ht which gives the exact FAP required, especially for low values of t. However, this will only
be an issue when the number of observations is very low. We
therefore recommend that monitoring of the stream only begins
after the first 20 observations have been received, which allows
sufficient possibilities for rank assignments to make most FAPs
achievable.
We can now summarize our algorithm as follows. First, decide whether it is desired to monitor the location or scale of
the distribution (or both), and choose the test statistic Nk,t to
be either the Mann–Whitney, Mood, or Lepage statistic as appropriate. Also choose the desired FAP α. When a new point
xi arrives from the stream calculate Nmax,t , then use Table 1 to
find the values of ht corresponding to α. If Nmax,t > ht , then
flag that a change has occurred; otherwise do nothing and process the next observation.
NONPARAMETRIC MONITORING OF DATA STREAMS
383
Table 1. Polynomial approximation of ht as a function of γ = 1/t
Coefficient polynomial approximation of ht
CPM type
Constant
t−1
370
Mood
LP
3.27 × 100
1.53 × 101
−1.69 × 100
−4.42 × 101
2.03 × 102
−3.26 × 103
−2.85 × 106
−1.32 × 107
500
Mood
LP
3.37 × 100
1.62 × 101
−1.59 × 100
−5.03 × 101
−3.64 × 103
−1.32 × 104
5.16 × 106
2.21 × 106
1000
Mood
LP
3.60 × 100
1.82 × 101
−3.55 × 100
−6.43 × 101
5.79 × 102
−7.72 × 104
−2.33 × 106
1.50 × 108
8.69 × 108
−1.01 × 1011
10,000
Mood
LP
4.25 × 100
2.45 × 101
−8.28 × 100
−1.50 × 102
5.90 × 102
−1.20 × 103
−6.77 × 106
−1.85 × 107
4.98 × 109
6.66 × 109
−1.01 × 1012
−4.74 × 1011
20,000
Mood
LP
4.44 × 100
2.64 × 101
−1.21 × 101
−1.87 × 102
8.05 × 103
7.35 × 104
−1.53 × 107
−1.70 × 108
8.46 × 109
1.04 × 1011
−1.46 × 1012
−2.02 × 1013
ARL0
3.4 Streaming Adaption
The above formulations of the nonparametric CPMs rely on
the calculation of ranks. This is a problem when it comes to data
streams, as computing these ranks requires all previous data to
be stored in memory. The result is that the amount of memory
used, and computation time required, grow without bound over
time. When working with data streams, where this is usually infeasible, we recommend discretizing older stream observations
so that only a constant amount of memory and computation is
used for rank computation.
First, form a rough estimate of an interval within which most
points from the stream will lie. This will not usually be known
a priori, but can be estimated from the initial stream observations, as [a, b] where a is the minimum value of the first 20
observations, and b is the maximum value.
This interval can then be split up into m + 1 segments
{s1 , s2 , . . . , sm } where
s1 = a,
s2 = a + 1/(m − 1),
sj = a + j(b − a)/(m − 1),
...,
sm = b.
For each of these m segments a count, ci , is maintained of the
number of stream observations which fall into this segment. Observation xi falls into the jth segment if sj−1 ≤ xi < sj .
The CPM now works as follows: define the window Ww,t of
fixed length w to be the set of the most recent w points observed
from the stream, that is, Ww,t = {xt−w+1 , . . . , xt }. The points in
this window are stored in memory, with older points discarded.
Whenever a point is old enough to fall outside the moving window, it is no longer stored in memory, but instead assigned to
one of the m segments. Therefore, the memory only needs to be
large enough to contain the w points currently in the window,
and the m count variables for the discretization. Since neither
of these grows over time, a constant amount of memory is used
no matter how many observations arrive from the stream.
The Nmax,t statistic is now calculated by maximizing Nk,t
over only the observations in the window, rather than over the
whole stream. Note that because older observations are summarized by the histogram rather than discarded entirely, the choice
of the window size w is not critical since it only determines the
points at which a change may be detected; points too old to fall
t−3
t−5
t−7
t−9
2.70 × 109
1.47 × 1010
−6.06 × 1011
−3.39 × 1012
−3.04 × 109
7.62 × 109
6.55 × 1011
−2.55 × 1012
3.09 × 1010
2.11 × 1013
into the window are not discarded, but are instead summarized
in the discretization.
Rather than ranking each point in the window against all previous data as before, each point’s rank is now defined as the sum
of its rank against the other points in the window, and its rank
against all previous points in the stream, as approximated by
the discretization. Each of the cj points in the jth segment sj is
assigned the value of the segment mid-point vj = (sj + sj+1 )/2.
The rank of each point in the window xt is then
r(xt ) = rw (xt ) +
m+1
ci I(xt > vj ) − 1,
i=1
where rw is the rank of xi among all the points currently contained in the window. Note that as well as computing Nk,t for
each point contained in the window, we can also compute it for
the point immediately to the left of the window by comparing
the ranks of each point contained in the window to the ranks
summarized by the discretization.
Using windowing has an effect on the values of the threshold
ht used by the CPM. The values given in Table 1 were computed without windowing, where the number of hypothesis tests
performed is constantly increasing in time. When windowing
is used, the number performed becomes constant, equal to the
size of the window. Therefore in order to maintain the desired
ARL0 when a window of size w is used, the threshold parameter
should be set as:
ht if t < w,
ht =
hw otherwise.
There are two potential drawbacks which can arise from
stream discretization. First, there is now a small loss of accuracy in rank computation, and the standard deviation of the
MW/Mood statistics is slightly increased since tied ranks now
arise. Although this will not generally be large, it may be desirable to adjust the standard deviation of the test statistic to
take this into account; details of this procedure can be found
in the article by Mielke (1967). Because lower values of m result in more tied ranks and hence a slight loss of accuracy, it
is desirable to have m as large as possible. Therefore, m should
be set as high as computational and memory resources allow.
In Section 4.5 we empirically investigate how the choice of m
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
384
GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS
affects performance, and find that there is little degradation in
performance, unless m is extremely low.
The other potential drawback relates to post-signal diagnosis.
As discussed in Section 3.1, as well as flagging that a change
has been detected, our method also gives an estimate of how far
back in the stream the change occurred. This estimate is found
by considering previous values of the Nmax,t statistic. However,
if discretization is used, it will not be possible to compute these
values for points which lie outside the window, which means
that the post-signal estimate of the change location can be at
most only w observations in the past. Therefore in situations
where post-signal diagnosis is considered important, the window size should be chosen to be larger than the expected time
taken to detect changes.
4.
EXPERIMENTS
We now analyze the performance of our nonparametric
change point models. Because the Gaussian is the most common distribution used in the change detection literature, we first
investigate how the nonparametric models compare to the parametric Gaussian change point models using the Student-t and F
test described by Hawkins, Qiu, and Kang (2003), Hawkins and
Zamba (2005), as well as to the self-starting CUSUM method
proposed by Hawkins (1987), for detecting changes in the mean
and standard deviation of a Gaussian stream. This is important,
since one of the main uses of nonparametric control charts is
when real-world data are suspected to be Gaussian, but an insufficient number of observations are available to validate this
assumption. In this case, we would like the speed at which
the nonparametric charts detect changes to be similar to the
Gaussian parametric models when the data are indeed Gaussian, while being more stable when the Gaussian assumption is
violated.
After assessing performance in a Gaussian setting, we next
investigate how the nonparametric charts perform with nonGaussian data. We use the Student-t and lognormal distributions as examples of distributions which are more heavy-tailed
and skewed than the Gaussian, respectively.
A relevant performance measure is required in order to compare different change detection methods. As discussed in Section 1, the most common measure in the change detection literature is the mean detection delay. To make the comparison fair,
we must first tune the parameters of the algorithms so that they
have an equivalent rate of false alarms (ARL0 ). For our comparison we chose ARL0 = 500, although we also performed experiments using other values and found similar results, which we
have omitted for brevity. For CPMs, this simply means ensuring that they all use the same value of α = 1/500. For the selfstarting CUSUM, we found that using a threshold of h = 9.8
when k = 0.25, the value used by Hawkins (1987), gives this
ARL0 .
Note that in this section we will use the terms CPM-MW,
CPM-Mood, CPM-LP to refer to the nonparametric change
point models for the Mann–Whitney, Mood, and Lepage tests,
respectively, with the CPM-MW method being the one from the
work of Hawkins and Deng (2010). CPM-t and CPM-F refer to
the Student-t and F models from the work of Hawkins, Qiu, and
Kang (2003), Hawkins and Zamba (2005), and SS-CUSUM is
the self-starting CUSUM from the work of Hawkins (1987).
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
4.1 Gaussian Setting
We first evaluate the performance of the nonparametric methods when detecting shifts in the parameters of a Gaussian distribution. In this setting, we would expect the best performance
to be achieved by SS-CUSUM, CPM-t, and CPM-F since these
models use the optimal tests for detecting shifts in the mean and
standard deviation, respectively. Therefore these models are the
baseline against which we compare our method.
We consider shifts in both the mean and standard deviation.
The observations are initially distributed as N(0, 1) before the
change point, and either as N(δ, 1) or N(0, δ) after. We expect
changes that occur early in the stream to be more difficult to
detect than changes which occur after many observations have
been received, since more observations allows a more accurate
estimation of the distribution parameters and ranks. These early
changes will also arise in practice when the stream experiences
multiple change points, with only short intervals between each
one.
Therefore, we consider streams with change points occurring
at times τ = {50, 300} to investigate the impact on performance
when the change occurs early. We found that changes occurring later than τ = 300 are not significantly easier to detect,
so the performance of the detectors when changes occur when
τ = 300 can be assumed to also be the performance when the
changes occur later.
For every change point location and shift magnitude δ,
10,000 streams are generated each containing a single change
point, and the mean delay to detect the change is computed.
Table 2(a) and (b) shows the mean detection delay using the
various change detectors. There are four interesting findings:
First, for smaller sized scale changes, the Mood CPM actually outperforms the parametric F model, a similar result to
that reported by Hawkins and Deng (2010) when comparing
the MW CPM to CPM-t. The explanation, as they noted, is that
the parametric change point models tend to give relatively high
values of Nk,t at the extremes, when k is close to 0 or t. They
hence have higher values of the control limits ht than the nonparametric CPMs, which allows the nonparametric models to
react slightly faster to smaller changes.
Second, for larger sized changes, the F model outperforms
both the Mood and Lepage models as expected. However, the
difference in performance is not excessive, suggesting that the
nonparametric models are suitable for use when the stream is
Gaussian.
Third, the performance of the LP model is slightly inferior to
the MW model when the change magnitude is low. This is to
be expected since the LP model is also monitoring for changes
in scale, and hence should be less powerful than a model which
is only monitoring for location shifts. Interestingly, when the
change magnitude is large, the LP model actually outperforms
the MW. This is because when a large change occurs to the
mean of the stream, there is also an effect on the estimated standard deviation. Therefore, the change can also be detected by a
CPM intended to detect a change in the scale parameter, as well
as the one intended to detect a change in location. Since the LP
model consists of both a Mood and a MW model, the change
can be detected by either of these, which can result in faster
detection than using the MW model alone.
NONPARAMETRIC MONITORING OF DATA STREAMS
385
Table 2. Average time taken to detect shifts of magnitude δ in the mean and standard deviation of
a Gaussian N(0, 1) stream, for various change times τ . Standard deviations are given in brackets
(a) Gaussian mean
δ
SS-CUSUM
CPM-t
CPM-MW
CPM-LP
τ = 50
0.5
1.0
1.5
2.0
78.6 (100.1)
16.2 (12.5)
8.1 (3.3)
5.9 (2.1)
151.8 (267.7)
16.4 (13.3)
6.7 (4.1)
4.1 (2.2)
134.2 (250.8)
14.9 (10.4)
7.5 (3.5)
5.4 (1.8)
224.4 (338.9)
21.1 (19.2)
8.0 (4.0)
4.8 (2.2)
τ = 300
0.5
1.0
1.5
2.0
38.8 (31.3)
12.6 (5.6)
7.0 (2.7)
5.3 (1.5)
39.1 (31.4)
10.8 (6.3)
5.6 (2.9)
3.6 (1.7)
35.5 (24.6)
11.3 (5.5)
6.5 (2.5)
5.0 (1.4)
50.5 (39.0)
12.6 (7.5)
6.1 (3.5)
4.1 (1.6)
τ
(b) Gaussian standard deviation
τ
δ
SS-CUSUM
CPM-F
CPM-Mood
CPM-LP
τ = 50
1.5
2.0
3.0
0.5
0.3
72.7 (94.0)
19.3 (26.5)
8.2 (5.4)
19.3 (26.5)
8.2 (5.4)
152.4 (282.0)
16.2 (17.5)
5.5 (3.8)
20.3 (15.2)
9.6 (3.7)
95.4 (190.1)
18.0 (21.9)
7.8 (5.3)
38.9 (76.6)
15.1 (5.9)
143.0 (252.0)
26.8 (45.5)
9.4 (6.9)
63.4 (122.6)
21.0 (7.7)
τ = 300
1.5
2.0
3.0
0.5
0.3
29.2 (22.9)
11.9 (7.1)
6.3 (3.2)
19.3 (26.5)
8.2 (5.4)
30.7 (23.7)
10.6 (7.2)
4.6 (3.0)
16.4 (7.5)
9.2 (3.1)
28.0 (24.3)
11.5 (8.3)
6.1 (3.6)
22.5 (27.9)
13.1 (3.2)
35.5 (31.8)
14.0 (10.2)
7.5 (4.5)
31.9 (10.7)
18.7 (3.6)
Fourth and finally, unlike location shifts, it can be seen that
the Lepage CPM does not give better performance than the
Mood CPM regardless of the scale shift magnitude. This will
be explored further in Section 4.4.
4.2 Gradual Drift
A further point to consider is how the CPM methodology performs when either the location or scale of the stream undergoes
gradual drift, where the change is incremental over time rather
than abrupt. Although this is not the sort of situation which the
CPMs are designed to cope with, it may be encountered in practice so it is desirable to be able to perform reasonably in these
situations.
To represent drift, we simulated 10,000 streams with change
points occurring at times τ ∈ {10, 50, 300}. However, the
change points now mark the start of a period of gradual drift,
rather than abrupt change. We specified that the period of drift
last for 100 observations, over which the stream parameter increases linearly. For example, in the streams containing change
point at τ = 300 where the observations are N(0, 1) prior to the
change, and the location shift magnitude is δ, the observations
are distributed as N(δ(t − 300)/100, 1) for t ∈ [300, 400], and
as N(δ, 1) for t > 400. Unlike previous experiments, we have
included an extremely short change time of τ = 10 to represent
the extreme case where the stream is undergoing drift when
monitoring begins. This situation presents a problem for any
self-starting change detector that does not make assumptions
about the in-control distribution, since the lack of a reference
sample with which to compare the stream means that changes
occurring this early are not easy to detect.
The results of our simulations are shown in Table 3(a)
and (b), and show that drift is significantly harder to detect than
abrupt change, which is not surprising. In particular, when the
drift is small and occurs after very few observations, the time
to detect it was even longer than the drift period. This raises
an important cautionary point for any self-starting change detector: when monitoring a stream, care must be taken to ensure
that it is actually stationary at the beginning of monitoring, because small drift will take some time to detect.
Comparing the methods, it seems that the self-starting
CUSUM approach performs slightly better than the Lepage
CPM when it comes to detecting gradual drift, with similar
results for the other CPMs omitted for brevity. However, we
stress again that the SS-CUSUM is a parametric method designed to work for Gaussian streams, while the Lepage CPM is
more general.
4.3 Non-Gaussian Setting
The primary advantage of the nonparametric approaches are
of course that they can be used to monitor parameter shifts
in arbitrary unknown distributions. We now investigate performance when detecting changes in the location and scale of nonGaussian distributions. We study the Student-t distribution with
2.5 degrees of freedom, and the lognormal distribution with parameters (1, 1/2) since these are examples of heavy-tailed and
skewed distributions, respectively.
Although it would not be sensible to choose to use parametric
methods which assume the stream is Gaussian in cases where
this assumption is known to be violated, in practice this may
occur by accident, when assumptions formed about the data are
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
386
GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS
Table 3. Average time taken to detect gradual drift in the mean and standard deviation of a Gaussian N(0, 1)
stream, for various change times τ and shift magnitudes δ. Standard deviations are given in brackets
SS-CUSUM
τ = 10
τ = 50
0.50
1.00
1.50
2.00
122.1 (61.1)
81.3 (29.5)
61.0 (16.8)
52.1 (13.4)
106.2 (55.7)
65.1 (23.5)
49.5 (16.1)
40.2 (12.4)
1.50
2.00
2.50
3.00
116.0 (58.4)
83.1 (39.2)
67.5 (27.0)
56.1 (21.1)
δ
CPM-LP
τ = 300
τ = 50
τ = 300
(a) Drifting Gaussian mean
88.7 (39.1)
134.2 (69.9)
56.5 (18.5)
92.7 (40.1)
43.8 (13.1)
69.7 (24.2)
36.8 (10.8)
58.5 (18.7)
121.4 (68.2)
76.3 (28.9)
57.6 (20.0)
47.5 (15.9)
102.9 (46.8)
63.3 (22.3)
49.3 (15.8)
40.6 (12.9)
(b) Drifting Gaussian standard deviation
93.1 (51.6)
73.4 (32.6)
113.6 (68.9)
59.9 (24.7)
49.7 (19.0)
93.6 (57.4)
39.4 (14.7)
80.0 (45.9)
48.2 (19.0)
39.9 (15.1)
34.1 (11.1)
69.1 (36.3)
103.4 (62.6)
73.6 (41.6)
55.3 (27.6)
47.8 (23.2)
85.5 (43.1)
55.8 (25.1)
43.5 (18.9)
36.9 (14.9)
mistaken. We therefore investigate how the violation of Gaussianity affects the realized ARL0 of the T and F CPMs compared to the specified value. Since the ht thresholds associated
with the nonparametric change point models are distributionfree, the values given in Table 1 will give the required ARL0
regardless of the underlying distribution. This is not the case
with either the T or F models, since their thresholds were
computed under the assumption of Gaussianity (Hawkins and
Zamba 2005). Table 4 shows the actual ARL0 values obtained
by these models on the Student-t(2.5) and lognormal(1, 1/2)
distributions, and it can be seen that they deviate significantly
from the target values. Since controlling the rate of false alarms
is highly important in change detection problems, these large
deviations from the desired ARL0 values imply that the parametric Gaussian tests are unsuitable in situations where deviations from Gaussianity may be present.
Table 5(a) and (b) shows the average detection delays for detecting location and scale shifts in the Student-t and lognormal
distributions, respectively. In all cases, the observations have
been standardized so that they have mean 0 and standard deviation 1 before the change point. The change then consists of
either the mean or standard deviation changing to δ. These results broadly follow the same patterns as those for the Gaussian
distribution, with the LP CPM being slightly slower to detect
most changes than the dedicated MW and Mood CPMs, with
the advantage of the LP model of course being that it is able to
detect both location and scale changes.
Table 4. The empirical ARL0 observed for the T and F change point
models on the t(4) and lognormal(1, 1/2) distributions
Distribution
Model
Target ARL
Student-t(2.5)
lognormal(1, 1/2)
Student-t
50
200
500
32
83
113
29
78
94
F
50
200
500
39
97
134
33
91
127
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
τ = 10
4.4 Diagnostics
One issue which arises when using the combined locationscale Lepage CPM is that of post-signal diagnostics: given that
the detector has signaled that a change has occurred, can we
determine whether the change represents a shift in location, or
a shift in scale?
Suppose that a change is detected at time t. The CPM gives
an estimate τ̂ ≤ t of the change point corresponding to the value
of k which maximized Nmax,t . The stream can then be partitioned into the subsets {X1 , . . . , Xτ̂ −1 }, {Xτ̂ , . . . , XT }. A twosample Mann–Whitney and Mood test can then be performed
on these two subsets, and the p-values compared. If the MW
test gives the lower p-value, the change is most likely to constitute a location shift, while if the Mood test gives the lower
p-value, then it is likely to be a scale shift.
To test this idea we simulated 10,000 streams from the
three distributions considered above, containing various sized
changes in either the location or scale. For each stream, the Lepage CPM was used to detect change, and the above procedure
was deployed to predict the type of change that occurred. For
each type of change, we then found the probability of it being correctly identified. Table 6 summarizes the results. It can
be seen that this diagnostic method gives a good prediction of
which type of change caused the CPM to flag.
When the stream undergoes a location shift, the MW test will
almost always have a lower p-value than the Mood test, causing
the change type to be correctly identified. There is more risk
of incorrect identification when the change magnitude is small.
However, when the stream undergoes a scale shift, the proportion of correctly identified changes is slightly lower. This is because, although the MW test is intended to be used to detect
whether two samples have equal location parameters, it is also
slightly sensitive to differences in scales. Our findings are consistent with the results reported for joint monitoring of mean
and standard deviation in a parametric context. For example,
Hawkins and Zamba (2005) also found that when using combined parametric mean and standard deviation change detectors to monitor a Gaussian distribution, the detector which was
meant to be monitoring the mean was sometimes faster to detect a change in the standard deviation than the corresponding
detector. They also claimed this is a feature of all methods designed for detecting location shifts.
NONPARAMETRIC MONITORING OF DATA STREAMS
387
Table 5. Average time taken to detect shifts in the location and scale of Student-t(2.5) and
lognormal(1, 1/2) streams. In both cases the pre-change distributions are scaled to have
mean 0 and standard deviation 1. The post-change distributions then have either mean δ
or standard deviation δ. The standard deviation of the detection time is given in brackets
(a) Nonparametric location
Student-t(2.5)
lognormal(1, 1/2)
δ
CPM-MW
CPM-LP
CPM-MW
CPM-LP
τ = 50
0.5
1.0
1.5
2.0
22.7 (41.6)
7.0 (3.1)
5.1 (1.6)
4.5 (1.1)
39.2 (93.3)
7.3 (12.3)
6.4 (3.6)
3.4 (1.8)
86.1 (46.3)
11.4 (7.1)
6.5 (2.3)
5.4 (1.3)
84.2 (52.4)
16.5 (9.3)
8.2 (4.1)
5.2 (2.2)
τ = 300
0.5
1.0
1.5
2.0
14.3 (8.3)
6.2 (2.4)
4.6 (1.3)
4.1 (0.9)
18.0 (28.2)
5.5 (5.5)
5.1 (2.4)
3.1 (1.1)
26.1 (15.2)
9.2 (3.4)
5.9 (1.5)
4.8 (0.9)
36.1 (19.5)
12.5 (5.6)
6.3 (2.4)
4.2 (1.1)
τ
(b) Nonparametric scale
Student-t(2.5)
lognormal(1, 1/2)
δ
CPM-Mood
CPM-LP
CPM-Mood
CPM-LP
τ = 50
1.5
2.0
3.0
0.5
0.3
205.7 (314.4)
50.3 (117.9)
12.1 (10.8)
79.5 (157.0)
20.7 (13.4)
251.6 (343.0)
84.4 (184.6)
16.0 (21.3)
140.6 (245.5)
29.5 (28.4)
63.2 (136.3)
14.1 (16.0)
7.0 (4.5)
29.2 (39.1)
15.6 (6.3)
87.6 (190.1)
17.0 (20.2)
8.0 (5.3)
40.4 (60.4)
19.4 (5.9)
τ = 300
1.5
2.0
3.0
0.5
0.3
49.5 (50.9)
18.3 (14.1)
8.6 (5.3)
31.8 (16.4)
16.8 (6.0)
63.4 (71.7)
22.4 (17.5)
10.3 (6.6)
45.9 (20.5)
24.0 (7.2)
20.1 (17.3)
9.5 (7.9)
5.6 (3.2)
21.1 (8.3)
14.5 (4.4)
24.4 (21.0)
11.3 (7.9)
6.7 (3.8)
27.5 (7.7)
18.7 (3.5)
τ
In summary, we give the following recommendations: if the
Lepage CPM signals that a change has occurred and the Mood
test is found to have a lower p-value, then we can conclude
with a high degree of certainty that the signal was caused by
a shift in scale parameter, since Table 6 shows that the Mood
test rarely signals when a location shift has occurred. However,
if the Mann–Whitney test has a lower p-value, then the signal
was more likely to have been caused by a location shift, although there is slightly more uncertainty here since the Mann–
Whitney test is also sensitive to scale shifts. Although this is a
simple method, it gives a reasonably accurate diagnosis of what
was responsible for the flagged change. More sophisticated assessment is an avenue for future research.
4.5 The Impact of Stream Discretization
In the previous experiments, all data from the stream were
stored in memory to allow ranks to be calculated. The stream
discretization technique was introduced in Section 3.4 as an alternative method of computing ranks when this is infeasible.
We now investigate the impact that discretization has on performance. We use the Lepage CPM in the following, although
we found identical results for the individual MW and Mood
models. We first investigate a stream of N(0, 1) variables which
change to N(1, 1) after 500 observations. The stream is discretized into m ∈ {100, 200, 300} segments, and the window
size is w ∈ {50, 100, 200}. For each value of w, we define the
relative performance to be the ratio of the mean detection delay
Table 6. Probability of correctly diagnosing a change of size δ in either the location or
scale parameter by comparing the p-values of the Mann–Whitney and Mood tests
Location shifts
δ
0.25
0.50
1.00
1.50
2.00
3.00
Scale shifts
N(0, 1)
t(4)
LN(1, 1/2)
δ
N(0, 1)
t(4)
LN(1, 1/2)
0.86
0.92
0.94
0.94
0.96
0.97
0.88
0.91
0.92
0.92
0.93
0.96
0.83
0.96
0.96
0.97
0.97
0.97
1.25
1.50
1.75
2.00
3.00
4.00
0.75
0.81
0.87
0.87
0.91
0.94
0.68
0.78
0.82
0.86
0.92
0.92
0.74
0.83
0.88
0.89
0.94
0.94
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
388
GORDON J. ROSS, DIMITRIS K. TASOULIS, AND NIALL M. ADAMS
Table 7. Relative performance when discretizing the stream into m
segments with window size w, compared to the non-discretized model
w
m
100
200
300
50
100
200
500
0.80
0.81
0.81
0.92
0.93
0.93
0.98
0.98
0.98
0.99
0.99
0.99
using a window of size w, and the mean detection delay when
no windowing is used. Table 7 shows the relative performance
for each choice of m and w compared to the non-discretized
version which stores all observations in memory.
It can be seen that unless w is low, discretization has very
little impact on performance. Because the histogramming provides a summary of points which are too old to fall in the window, the window size generally only affects how far back in
time changes can be detected. Therefore as long as the change
is sufficiently large to be detected within 500 points of occurring, there should be no performance decrease from windowing.
However, if it is suspected that the change will be small enough
that it will take a very long time to detect, then a larger window
will need to be chosen.
The results for the Student-t(2.5) and lognormal(1, 1/2) distributions are similar, but omitted for space reasons. Again, unless the window size is set very low (≤50), there is little impact
on performance.
4.6 Real Data Application
We now give an example of our algorithm being used to detect changes in a high-volume financial data stream. Although
the analysis of financial data is often quite sophisticated, we
provide this example to demonstrate the capabilities of our algorithm.
We obtained a historical sequence of the exchange rate between the Swiss Franc (CHF) and the British Pound (GBP).
The maximum value of the exchange rate was recorded during
consecutive 5-minute intervals running from October 21st 2002
to May 15th 2007. In total, 333,758 observations xt were made,
and we treat them as being a data stream where observations are
received and processed sequentially. Note that although this is
an example of sequential monitoring, it differs from those typically investigated in the quality control literature, which usually
assume that we have some control over the process being monitored, and can (e.g.) stop the process to further investigate and
diagnose any detected shifts. This is not true in the case of financial data; rather, signaling for a change may eventually prompt
some trading action which might indirectly affect the stream.
We have chosen this example to illustrate how our change detection techniques can be deployed in broader situations than
those typical in quality control, where large data streams are
more likely to be encountered.
A plot of the financial stream is shown in Figure 1(a), and
appears to follow a random walk. To remove this nonstationarity we instead consider the first differences of the log-returns,
defined as xt = log(xt ) − log(xt−1 ). These are plotted in Figure 1(b), and appear stationary with mean 0. However, these
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
first differences have very heavy tails with a kurtosis of 4.52,
suggesting that they are non-Gaussian. A nonparametric change
detector hence seems an appropriate tool to use for analysis. We
used the Lepage CPM to monitor for changes in both location
and scale in the differenced observations. Due to the length of
the data, the ARL0 was set to 200,000 in order to avoid a large
number of false alarms being generated. Because this stream is
likely to contain multiple change points, we reinitialized the
CPM from scratch whenever a change was detected by discarding all the points stored in the window, and reinitializing it
starting with the observation immediately following the flagged
change.
Our algorithm processed the stream sequentially and detected a total of four change points. Using the diagnostic
method outlined in Section 4.4, we concluded that all four correspond to scale shifts. We have superimposed these change
points on Figure 1(a) and (b). It is not obvious from these
plots that the discovered change points correspond to real scale
shifts, so to investigate further, we computed the exponentially
weighted average of the stream variance, defined as EWMAt =
λEWMAt−1 + (1 − λ)(xt )2 . This allows a local estimate of
the variance to be formed. We have plotted this EWMA in Figure 1(c), with λ = 0.999. It can be seen that the stream variance is undergoing gradual drift, and that our discovered change
points correspond to an abrupt increase in the variance around
the 30,000th and 265,000th observations, and a shift in the
direction of the drift at the 115,000th observation. The final
change point at the 150,000th observation does not correspond
to any obvious feature of the variance drift process, and may
perhaps be considered a false positive.
5. CONCLUSIONS
We introduced a nonparametric change point model for detecting joint changes in the location and scale parameters of a
data stream. These models provide a way to use classical rankbased hypothesis tests in a streaming context. Through discretization, we gave a method for computing the ranks in a computationally efficient manner. Experiments with synthetic data
showed that our detectors gave good performance compared
with the optimal Student-t and F tests for detecting changes
in a Gaussian distribution, while also being suitable when the
Gaussianity assumption is violated. Finally, we showed that discretizing the stream does not significantly affect performance,
so this approach can be used to implement our models efficiently. We conclude that the change point models are suitable
to be deployed in a streaming context where computational efficiency is important, and the amount of available memory is
low.
ACKNOWLEDGMENTS
This research was undertaken as part of the ALADDIN (Autonomous Learning Agents for Decentralised Data and Information Systems) project and is jointly funded by a BAE Systems and EPSRC (Engineering and Physical Research Council)
strategic partnership, under EPSRC grant EP/C548051/1.
[Received May 2010. Revised June 2011.]
NONPARAMETRIC MONITORING OF DATA STREAMS
(a) Original sequence
389
(b) First differences
(c) EWMA estimate of variance
Figure 1. The foreign exchange data, their first differences, and the EWMA of the squared first differences, all with the detected scale change
points superimposed.
REFERENCES
Basseville, M., and Nikiforov, I. V. (1993), Detection of Abrupt Change Theory
and Application, Englewood Cliffs, NJ: Prentice Hall. [379,380]
Bhattacharya, P., and Frierson, D. (1981), “A Nonparametric Control Chart for
Detecting Small Disorders,” The Annals of Statistics, 9, 544–554. [380]
Carlstein, E. (1988), “Nonparametric Change-Point Estimation,” The Annals of
Statistics, 16, 188–197. [380]
Chakraborti, S., and van de Wiel, M. A. (2008), “A Nonparametric Control
Chart Based on the Mann–Whitney Statistic,” in Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen,
Beachwood, OH: Institute of Mathematical Statistics, pp. 156–172. [380]
Chan, L., Hapuarachchi, K., and Macpherson, B. (1988), “Robustness of XBar
and R Charts,” IEEE Transactions on Reliability, 37, 117–123. [380]
Domingos, P., and Hulten, G. (2003), “A General Framework for Mining Massive Data Stream,” Journal of Computational and Graphical Statistics, 12,
945–949. [379]
Duran, B. (1976), “Survey of Nonparametric Tests for Scale,” Communications
in Statistics—Theory and Methods, 5, 1287–1312. [381]
Fu, T., Chung, F., Ng, V., and Luk, R. (2001), “Evolutionary Segmentation of Financial Time Series Into Subsequences,” in Proceedings of the
2001 Congress on Evolutionary Computation, Seoul, Korea: IEEE Press,
pp. 426–430. [379]
Gibbons, J. D. (1985), Nonparametric Statistical Inference, Boca Raton, FL:
McGraw-Hill. [381]
Gordon, L., and Pollak, M. (1994), “An Efficient Sequential Nonparametric
Scheme for Detecting a Change of Distribution,” The Annals of Statistics,
22, 763–804. [380]
Gustafsson, F. (2000), Adaptive Filtering and Change Detection, Chichester,
West Sussex, England: Wiley. [379]
Hackl, P., and Ledolter, J. (1991), “A Control Chart Based on Ranks,” Journal
of Quality Technology, 23, 117–124. [380]
Hawkins, D., and Zamba, K. (2005), “A Change-Point Model for a Shift in
Variance,” Journal of Quality Technology, 37, 21–31. [380-382,384,386]
Hawkins, D., Qiu, P., and Kang, C. (2003), “The Changepoint Model for
Statistical Process Control,” Journal of Quality Technology, 35, 355–366.
[380,381,384]
Hawkins, D. M. (1987), “Self-Starting CUSUM Charts for Location and Scale,”
Statistician, 36, 299–315. [384]
Hawkins, D. M., and Deng, Q. (2010), “A Nonparametric Change-Point Control
Chart,” Journal of Quality Technology, 42, 165–173. [380,384]
Jensen, W. A., Jones-Farmer, L. A., Champ, C. W., and Woodall, W. H. (2006),
“Effects of Parameter Estimation on Control Chart Properties: A Literature
Review,” Journal of Quality Technology, 38, 349–364. [380]
Jones, L., and Woodall, W. (1998), “The Performance of Bootstrap Control
Charts,” Journal of Quality Technology, 30, 362–375. [380]
Lepage, Y. (1971), “Combination of Wilcoxians and Ansari–Bradley Statistics,”
Biometrika, 58, 213–217. [382]
Mielke, P. W. J. (1967), “Note on Some Squared Rank Tests With Existing
Ties,” Technometrics, 9, 312–314. [383]
Mood, A. (1954), “On the Asymptotic Efficiency of Certain Nonparametric
Two-Sample Tests,” Annals of Mathematical Statistics, 25, 514–533. [381]
Pettitt, A. N. (1979), “A Non-Parametric Approach to the Change-Point
Problem,” Journal of the Royal Statistical Society, Ser. C, 28, 126–135.
[380,381]
Ramoni, M., Sebastiani, P., and Cohen, P. (2002), “Bayesian Clustering by Dynamics,” Machine Learning, 47, 91–121. [379]
Stapnes, S. (2007), “Detector Challenges at the LHC,” Nature, 440, 290–296.
[379]
Zhou, C., Zou, C., Zhang, Y., and Wang, Z. (2009), “Nonparametric Control
Chart Based on Change-Point Model,” Statistical Papers, 50, 13–28. [380]
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4