A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang [email protected] Department of Computer Science George Mason University Preview: Problem: In a data streaming setting, data points are observed one by one. The concepts to be learned from the data stream may change infinitely often. ● How do we detect the changes efficiently? ● Other Topics: Concept Drift, Anamoly detection, ... ... ● Testing Exchangeability Online (Vovk et.al., ICML 2003) ● Outline: Background: Strangeness, Martingale, Exchangeability, ●Martingale Framework - Two Tests ●Theoretical Justifications ●Additional Theoretical Results ●Experimental Results ● Strangeness Measure (Saunders et. al., IJCAI 1999) α : scoring how a data point is different from the rest. Support Vector Machine: Value of Lagrange Multipler or Distance from the hyperplane (we use SVM/Lagrange Multiplier – incremental SVM (Cauwenberghs and Poggio, NIPS 2000)) ● K-nearest-neighbor rule: A/B where A – Sum of the distance of a point from the k nearest points with the same label B – Sum of the distance of a point from the k nearest points with different label ● Testing Exchangeability: Definitions Let { Zi : 1 ≤ i < ∞ } be a sequence of r.v. A finite sequence of r.v. Z1,..., Zn is exchangeable if the joint distribution p(Z1,..., Zn) is invariant under any permutation of the indices of the r.v. A martingale is a sequence of r.v. { Mi : 0 ≤ i < ∞ } such that Mn is a measurable function of Z1,..., Zn for all n = 0, 1, ... (M0 is a constant, say 1) and the conditional expectation of Mn+1 given M1,..., Mn is equal to Mn, i.e. E(Mn+1 | M1,..., Mn ) = Mn Testing Exchangeability (Vovk et. al., ICML 2003) pn = V(Z U {zn}, θn) = where ε in [0,1] (say 0.92) and M0= 1 Performing Kolmogorov-Smirnov Test on the p-value distribution as data is observed one by one. Skewed p-value distribution: small p-values inflate the martingale values Martingale Framework: Test for Change Detection Consider the simple null hypothesis H0: “no concept change in the data stream” against the alternative hypothesis H1: “concept change occurs in the data stream” Martingale Framework: Test for Change Detection Martingale Test 1 (MT1) 0 < Mn(ε)< λ where λ is a positive number. One rejects the null hypothesis when Mn(ε) ≥ λ. Martingale Test 2 (MT2) 0 < | Mn(ε) - Mn-1(ε) |< t where t is a positive number. One rejects the null hypothesis when | Mn(ε) - Mn-1(ε) | ≥ t. Justification for Martingale Test 1: Doob's Maximal Inequality Assuming that { Mi : 0 ≤ i < ∞ } is a nonnegative martingale, the Doob's Maximal Inequality states that for any λ > 0 and 0 ≤ n < ∞, Hence, if E(Mn) = E(M0) = 1, then Justification for Martingale Test 2 Hoeffding-Azuma Inequality Let c1, ..., cm be positive constants and let Y1, ..., Ym be a martingale difference sequence with |Yk| ≤ ck for each k. Then for any t ≥ 0, At each n, the martingale difference is maximum and bounded when pn is 1/n for the deterministic martingale (θn=1 for all n) Justification for Martingale Test 2: When m = 1, the Hoeffding-Azuma Inequality becomes Assuming that Mn-1(ε) = M0(ε) = 1, Comparison: Some Theoretical Results for Martingale Test 1 (Ho & Wechsler, UAI 2005) Martingale Test based on the Doob's Inequality is an approximaton of the sequential probability ratio test. ● ● Where α is the desirable size (type I error) and β is the probability of the type II error ● The mean delay time from the true change point is: where Experiments Precision = Number of Correct Detections Number of Detections Recall = Number of Correct Detections Number of True Changes Precision: Probability that a detection is actually correct Recall: Probability that the system recognizes a true change Delay time (for a detected change): the number of time units from a true change point to the detected change point, if any Experimental Results: Synthetic Data Stream with noise (10-D Rotating Hyperplane) – Precision and Recall Experimental Results: Synthetic Data Stream – Mean and Median Delay Time Experimental Results: Numerical (WaveNorm & TwoNorm) and Categorical data streams (Nursery) Experimental Results: Multi-class data streams (Modified USPS data-set) Dataset: 10 classes, 256 dimensions, 7291 data points Data stream: 3 classes. Experimental Results: Multi-class data streams (Modified USPS data-set) Conclusions: Our martingale approach is an efficient, one-pass incremental algorithm that ●Does not require a sliding window on the data stream ●Does not require monitoring the performance of a base classifier as data is streaming ●Works well for high dimensional, multiclass data stream ●Theoretically justified. ● Conclusions/Future (Current) Work: Previous works: Kifer et. al. (VLDB 2004), Fan et. al.(SDM 2004), Wald (1947), Page (1957) ...... ● Extension to Unlabeled and One-class data streams ● Application: Keyframe Extraction, Anomaly Detection, Adaptive Classifier (Ho and Wechsler, IJCAI 2005) ● Comparison using different classifiers (i.e. Different strangeness measure, also weak classifiers) ● Comparison with other change detection algorithms. ● http://cs.gmu.edu/~sho/research/change_detection.html Acknowledgement: Vladimir Vovk, Harry Wechsler. ●
© Copyright 2025 Paperzz