IBM Research Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang Tong and Ching-Yung Lin SIAM-DM 2011, Mesa AZ, USA, April 28-30, 2011 © 2011 IBM Corporation IBM Research Large Graphs are Everywhere! - Q: How to find patterns? Terrorist Network Food Web [2007] Internet Map [Koren 2009] e.g., community, anomaly,[Krebs etc.2002] Social Network [Newman 2005] 2 Protein Network [Salthe 2004] Web Graph © 2011 IBM Corporation IBM Research Matrix Tool for Finding Graph Patterns A Typical Procedure: Graph 3 Adj. Matrix A Low-rank matrices Residual matrix A=FxG+R © 2011 IBM Corporation IBM Research Matrix Tool for Finding Graph Patterns A Typical Procedure: Graph Adj. Matrix A Low-rank matrices Residual matrix A=FxG+R community anomalies An Illustrative Example 4 © 2011 IBM Corporation IBM Research Improve Interpretation by Non-negativity A Typical Procedure: Interpretation by Non-negativity community Graph Adjacency Matrix A An Example Non-negative Matrix Factorization A=FxG+R anomalies F >= 0; G >= 0 (for community detection) Non-negative Residual Matrix Factorization R(i,j) >= 0; for A(i,j) > 0 (for anomaly detection) This Paper 5 © 2011 IBM Corporation IBM Research Anomaly Detection on Graphs Social Networks – `Popularity contest’ Computer Networks – Spammer, Port Scanner, Vulnerable Machines, etc Financial Transaction Networks – Fraud transaction (e.g., money-laundry ring), scammer Criminal Networks – New criminal trend Tele-communication Networks – Tele-marketer Key Observation: Abnormal Behavior Actual Activities 6 © 2011 IBM Corporation IBM Research Optimization Formulation General Case Weighted Frobenius Form Common in Any Matrix Factorization 8 Weight © 2011 IBM Corporation IBM Research Optimization Formulation General Case Weighted Frobenius Form Common in Any Matrix Factorization Unique in This Paper 9 Weight Non-negative residual © 2011 IBM Corporation IBM Research Optimization Formulation 0/1 weight 0/1 Weight Matrix (Major Focus of the Paper) Common in Any Matrix Factorization Unique in This Paper 10 Non-negative residual © 2011 IBM Corporation IBM Research Optimization Formulation with 0/1 Weight Matrix NrMF with 0/1 Weight Matrix Q: How to find ‘optimal’ F and G? – D1: Quality C1: non-convexity of opt. objective – D2: Scalability C2: large size of the graph 11 © 2011 IBM Corporation IBM Research Optimization Method: Batch Mode Basic Idea 1: Alternating Not convex wrt F and G, jointly But convex if fixing either F or G Basic Idea 2: Separation argminG argminG i, s.t.. For each j s.t.. Standard Quadratic Programming Prob. Overall Complexity: Polynomial Can we do better? 12 © 2011 IBM Corporation IBM Research Optimization Method: Incremental Mode Basic Idea 1: Recursive Basic Idea 2: Alternating Basic Idea 3: Separation Adjacency Matrix A Initialize: R=A Rank-1 Approximation Do r times QP for a single variable w/ boundary constrains Update Residual Matrix R Can be solved in constant time Output Final Residual Matrix Overall Complexity: Linear wrt # of edges 13 © 2011 IBM Corporation IBM Research Experimental Evaluation Effectiveness Accuracy Wall-clock Time Anomaly Type 14 Efficiency # of edges © 2011 IBM Corporation IBM Research Batch Method vs. Incremental Method Log Wall-clock time (sec.) Batch Method Incremental Method 16 Data Set © 2011 IBM Corporation IBM Research Conclusion Problem Formulation: Non-negative Residual Matrix Factorization – a new matrix factorization for interpretable graph anomaly detection Optimization Methods – Batch: straight-forward, polynomial time complexity – Incremental: linear time complexity Future Work – Other interpretable properties (sparseness) for anomaly detection – Matrix Factorization w/ Total Non-negativity 17 © 2011 IBM Corporation IBM Research Thank you! [email protected] (We are hiring at IBM Research!) 18 © 2011 IBM Corporation IBM Research Visual Comparison 19 © 2011 IBM Corporation IBM Research low q up q low up © 2011 IBM Corporation
© Copyright 2026 Paperzz