Graph and stream mining

CMU SCS
Big (graph) data analytics
Christos Faloutsos
CMU
CMU SCS
Outline
•
•
•
•
Problem definition / Motivation
Anomaly detection
Time series analysis
Conclusions
CMU visit '14
C. Faloutsos
2
CMU SCS
Motivation
• Data mining: ~ find patterns (rules, outliers)
• How do real graphs look like? Anomalies?
• Time series / Monitoring
Measles @ PA, NY, …
CMU visit '14
C. Faloutsos
3
CMU SCS
Graphs - why should we care?
CMU visit '14
C. Faloutsos
4
CMU SCS
Graphs - why should we care?
Food Web
[Martinez ’91]
~1B users
$10-$100B revenue
Internet Map
[lumeta.com]
CMU visit '14
C. Faloutsos
5
CMU SCS
Outline
• Problem definition / Motivation
• Anomaly/fraud detection
– Financial fraud
– Ebay fraud
• Time Series Analysis
• Conclusions
CMU visit '14
C. Faloutsos
6
CMU SCS
Network Effect Tools: SNARE
• Some accounts are sort-of-suspicious – how to combine weak
signals?
Before
CMU visit '14
C. Faloutsos
7
CMU SCS
Network Effect Tools: SNARE
• A: Belief Propagation.
Before
CMU visit '14
C. Faloutsos
8
CMU SCS
Network Effect Tools: SNARE
• A: Belief Propagation.
Before
After
Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier,
Christos
Faloutsos: SNARE: a link
analytic system for graph labeling
CMU visit '14
C. Faloutsos
and risk detection. KDD 2009: 1265-1274
9
CMU SCS
Network Effect Tools: SNARE
• Produces improvement over simply using flags
– Up to 6.5 lift
– Improvement especially for low false positive rate
Results for accounts data (ROC Curve)
Ideal
True
positive
rate
CMU visit '14
SNARE
C. Faloutsos
False positive rate
Baseline
(flags only)
10
CMU SCS
Network Effect Tools: SNARE
• Accurate- Produces large improvement over
simply using flags
• Flexible- Can be applied to other domains
• Scalable- One iteration BP runs in linear time
(# edges)
• Robust- Works on large range of parameters
CMU visit '14
C. Faloutsos
11
CMU SCS
Outline
• Problem definition / Motivation
• Anomaly/fraud detection
– Financial fraud
– Ebay fraud
• Time series analysis
• Conclusions
CMU visit '14
C. Faloutsos
12
CMU SCS
E-bay Fraud detection
Detects
‘non-delivery’ fraud:
seller takes $$
and disappears
Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos
Faloutsos.
NetProbe:
A Fast and C.Scalable
System for Fraud Detection in
3 - 13 visit
CMU
'14
Faloutsos
Online Auction Networks WWW 07.
CMU SCS
E-bay Fraud detection - NetProbe
3 - 14 visit '14
CMU
C. Faloutsos
CMU SCS
‘Tycho’ – epidemics analysis
Yasuko Matsubara
50 states x
46 diseases
CMU visit '14
C. Faloutsos
22
CMU SCS
‘Tycho’ – epidemics analysis
Prof. Yasuko Matsubara
CMU visit '14
C. Faloutsos
23
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU visit '14
Prof. Yasuko Matsubara
C. Faloutsos
24
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU visit '14
Prof. Yasuko Matsubara
C. Faloutsos
25
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU visit '14
Prof. Yasuko Matsubara
C. Faloutsos
26
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU visit '14
Prof. Yasuko Matsubara
C. Faloutsos
27
CMU SCS
‘Tycho’ – epidemics analysis
Flu?
Measles?
August?
No periodicity?
CMU visit '14
Prof. Yasuko Matsubara
C. Faloutsos
28
CMU SCS
‘Tycho’ – epidemics analysis
Prof. Yasuko Matsubara
https://www.tycho.pitt.edu/resources.php
from U. Pitt (epidemiology dept.)
Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos
Faloutsos,
FUNNEL: Automatic
Mining of Spatially Coevolving
CMU visit '14
C. Faloutsos
29
Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.
CMU SCS
Open research questions
• Patterns/anomalies for time-evolving
graphs (Call graph, 3M people x 6mo)
• Spot fraudsters in soc-net (eg., Twitter
‘$10 -> 1000 followers’)
CMU visit '14
C. Faloutsos
30
CMU SCS
Contact info
• www.cs.cmu.edu/~christos
• GHC 8019
• Ph#: x8.1457
CMU visit '14
C. Faloutsos
31