Testing Exchangeability Online (Vovk et.al., ICML 2003)

A Martingale Framework for Concept Change
Detection in Time-Varying Data Stream
Ho Shen-Shyang
[email protected]
Department of Computer Science
George Mason University
Preview:
Problem: In a data streaming setting, data points are
observed one by one. The concepts to be learned from
the data stream may change infinitely often.
● How do we detect the changes efficiently?
● Other Topics: Concept Drift, Anamoly detection, ... ...
● Testing Exchangeability Online (Vovk et.al., ICML 2003)
●
Outline:
Background: Strangeness, Martingale,
Exchangeability,
●Martingale Framework - Two Tests
●Theoretical Justifications
●Additional Theoretical Results
●Experimental Results
●
Strangeness Measure
(Saunders et. al., IJCAI 1999)
α : scoring how a data point is different from the rest.
Support Vector Machine: Value of Lagrange Multipler
or Distance from the hyperplane
(we use SVM/Lagrange Multiplier
– incremental SVM (Cauwenberghs and Poggio, NIPS 2000))
●
K-nearest-neighbor rule: A/B where
A – Sum of the distance of a point from
the k nearest points with the same label
B – Sum of the distance of a point from
the k nearest points with different label
●
Testing Exchangeability:
Definitions
Let { Zi : 1 ≤ i < ∞ } be a sequence of r.v.
A finite sequence of r.v. Z1,..., Zn is exchangeable
if the joint distribution p(Z1,..., Zn) is invariant
under any permutation of the indices of the r.v.
A martingale is a sequence of r.v. { Mi : 0 ≤ i < ∞ }
such that Mn is a measurable function of Z1,..., Zn for
all n = 0, 1, ... (M0 is a constant, say 1) and the
conditional expectation of Mn+1 given M1,..., Mn is equal
to Mn, i.e. E(Mn+1 | M1,..., Mn ) = Mn
Testing Exchangeability
(Vovk et. al., ICML 2003)
pn = V(Z U {zn}, θn)
=
where ε in [0,1] (say 0.92) and M0= 1
Performing Kolmogorov-Smirnov Test
on the p-value distribution as
data is observed one by one.
Skewed p-value distribution: small p-values inflate the martingale values
Martingale Framework:
Test for Change Detection
Consider the simple null hypothesis
H0: “no concept change in the data stream”
against the alternative hypothesis
H1: “concept change occurs in the data stream”
Martingale Framework:
Test for Change Detection
Martingale Test 1 (MT1)
0 < Mn(ε)< λ
where λ is a positive number. One rejects
the null hypothesis when Mn(ε) ≥ λ.
Martingale Test 2 (MT2)
0 < | Mn(ε) - Mn-1(ε) |< t
where t is a positive number. One rejects
the null hypothesis when | Mn(ε) - Mn-1(ε) | ≥ t.
Justification for Martingale Test 1:
Doob's Maximal Inequality
Assuming that { Mi : 0 ≤ i < ∞ } is a nonnegative martingale,
the Doob's Maximal Inequality states that for any λ > 0 and
0 ≤ n < ∞,
Hence, if E(Mn) = E(M0) = 1, then
Justification for Martingale Test 2
Hoeffding-Azuma Inequality
Let c1, ..., cm be positive constants and let Y1, ..., Ym be
a martingale difference sequence with |Yk| ≤ ck for each k.
Then for any t ≥ 0,
At each n, the martingale difference is maximum and bounded
when pn is 1/n for the deterministic martingale (θn=1 for all n)
Justification for Martingale Test 2:
When m = 1, the Hoeffding-Azuma Inequality becomes
Assuming that Mn-1(ε) = M0(ε) = 1,
Comparison:
Some Theoretical Results for Martingale
Test 1 (Ho & Wechsler, UAI 2005)
Martingale Test based on the Doob's Inequality is
an approximaton of the sequential probability ratio test.
●
●
Where α is the desirable size (type I error) and
β is the probability of the type II error
●
The mean delay time from the true change point is:
where
Experiments
Precision = Number of Correct Detections
Number of Detections
Recall = Number of Correct Detections
Number of True Changes
Precision: Probability that a detection is actually correct
Recall: Probability that the system recognizes a true change
Delay time (for a detected change): the number of time units
from a true change point to the detected change point, if any
Experimental Results: Synthetic Data
Stream with noise (10-D Rotating
Hyperplane) – Precision and Recall
Experimental Results: Synthetic Data
Stream – Mean and Median Delay Time
Experimental Results:
Numerical (WaveNorm & TwoNorm) and
Categorical data streams (Nursery)
Experimental Results: Multi-class data
streams (Modified USPS data-set)
Dataset: 10 classes, 256 dimensions, 7291 data points
Data stream: 3 classes.
Experimental Results: Multi-class data
streams (Modified USPS data-set)
Conclusions:
Our martingale approach is an efficient, one-pass
incremental algorithm that
●Does not require a sliding window on the data stream
●Does not require monitoring the performance of a
base classifier as data is streaming
●Works well for high dimensional, multiclass data stream
●Theoretically justified.
●
Conclusions/Future (Current) Work:
Previous works: Kifer et. al. (VLDB 2004),
Fan et. al.(SDM 2004), Wald (1947), Page (1957) ......
● Extension to Unlabeled and One-class data streams
● Application: Keyframe Extraction, Anomaly Detection,
Adaptive Classifier (Ho and Wechsler, IJCAI 2005)
● Comparison using different classifiers (i.e. Different
strangeness measure, also weak classifiers)
● Comparison with other change detection algorithms.
● http://cs.gmu.edu/~sho/research/change_detection.html
Acknowledgement: Vladimir Vovk, Harry Wechsler.
●