GLS-SOD: A Generalized Local Statistical Approach for Spatial Outlier Detection Feng Chen, Chang-Tien Lu, Arnold P. Boedihardjo Virginia Tech, Computer Science Department July, 28, 2010 KDD ’2010, Washington, DC, USA 1 Outline Motivations G Generalized li d Locall Statistical S i i l (GLS) (G S) Model d l GLS Robust Estimation & Inferences Simulations Summary 2 Applications 3 What’s Special for Spatial Data? By the first law of Geography, “Everything is related to everything else, else but nearby things are more related than distant things” [Tobler 79] Spatial autocorrelation a toco elation Correlation of a variable with itself through space 4 Global based Spatial Outlier Detection 5 Local based Spatial Outlier Detection Local (laplacian) smoothing swamping masking 6 Why GLS-SOD? Global based Local based Pros: high accuracy with statistical justifications justifications. Cons: very slow; complicate estimation process; nonconvex optimization p Pros: veryy fast;; simplicity p y Cons: heuristic-driven, lack of statistical justifications Quest o s Questions Local (laplacian) smoothing vs. spatial dependence? Statistical connections between local and global methods? When will existing local based methods perform poorly and how to handle these situations? 7 Gaussian Random Field Large scale trend White noise variation Small Scale Variation 8 Generalized Local Statistical Model Generalized local statistical model (GLS) ≈ (See theorem 3) C Convolution l ti effects ff t 9 Generalized least Squares 10 Forward / Backward Search GLS Backward Search Model estimation byy g generalized least squares q Remove the most probable outlier and update all local differences R Repeat t until til the th p values l off allll existing i ti objects bj t are greater t than a threshold (e.g., 0.025) GLS Forward Search Estimation by a robust subset S of local differences Add dd test objects o one e by o one e to tthe e ttraining a g set S Check the change of the smallest p value in S A large drop in the smallest p value indicates an outlier swamping masking 11 GLS Z-Test Statistics 12 Connections with Existing Methods If F∑FT = σ2I, then GLS-SOD is equivalent to Universal Kriging SOD. Local vs. global estimator for a spatial Gaussian random field. field The key is which estimator is more robust. When F∑FT = σ2I, I FXB = μII and FF= FF σ02I, I then Local based SOD is equivalent to GLSSOD and Universal Kriging SOD. Local based SOD is a special case of GLS-SOD 13 Experiments Standard Simulation Model for SOD 864 different diff t simulation i l ti settings. tti Six Si repetitions titi for each setting; consider average error Existing statistical SOD methods only consider 10 to 15 simulation settings in their experiments. 14 Simulation Results 15 Summary Design of a generalized local statistical framework Robust estimation and outlier detection methods b d on the based th proposed d GLS framework f k In-depth study on the connection between diff different t SOD methods th d Comprehensive simulations to validate the effectiveness ff and d efficiency ff off GLS 16 Thank you ! [email protected] h f@ t d 17 Property of “F∑FT” (Theorem 2) 18 Property of “σ02FFT” Conclusion: When selecting g a relatively large g neighborhood g size to do local smoothing, we can approximate “FFT” as an identity matrix. 19 Connections with Existing Methods 20 Simulations Simulation Model and Settings 864 different diff t simulation i l ti settings. tti Six Si repetitions titi for each setting; consider average error Exponential Model C t i ti Contaminations Spherical Model 21 Simulation Results 22 23
© Copyright 2026 Paperzz