Discovering Leaders from Community Actions

Discovering Leaders from
Community Actions
Amit Goyal
Francesco Bonchi
Laks V.S. Lakshmanan
ICDM 2008
Outline


Introduction
Framework Definition





Influence propagation on the social network
Various notions of leaders
Algorithms
Experiments
Conclusion
2
Introduction
Word of Mouth and Viral Marketing


We are more influenced
by our friends than
strangers
68% of consumers
consult friends and
family before
purchasing home
electronics
4
Viral Marketing



Also known as Target
Advertising
Initiate chain reaction
by Word of mouth effect
Low investments,
maximum gain
5
Our Contributions


Formally define the notion of leaders and its
various flavors
Efficient algorithms for extracting these
leaders
6
Framework Definition
Input Data (1)


A social network, i.e., an undirected graph
G=(V,E) where nodes are users and edges
represent social ties.
Users declare their friends. e.g. Facebook,
Yahoo! Messenger etc
8
Input Data (2)

An actions log sorted in
chronological order, i.e., a
relation
Actions(User, Action, Time)

Example: “Jack” “joined
Yoga community” at “time
5”
9
Action Propagation
Jack
Joined Yoga
Community at time 5
3 time units
Jill
Joined Yoga
Community at time 8
Jack and Jill are
friends
Jack and Mary are
friends
Action is “Joining the
Yoga community”
Mary
Joined Yoga
Community at time 1000
Action Propagated from Jack to Jill
Action propagated from Jack to Mary
10
Propagation Graph
Jack
Joined Yoga
Community at time 5
Jill
Joined Yoga
Community at time 8
Ben
Joined Yoga
Community at time 15
Joey
Mary
Joined Yoga
Community at time 1000
Joined Yoga
Community at time 12
Can we say Mary got influenced by Jack?? NO
11
User Influence Graph
Jack



When an action
propagates from
user u to user v,
we may think of v
being influenced
by u
Influence should
decay in time
Size of influence
graph << Size of
PG
Joined Yoga
Community at time 5
Jill
Joined Yoga
Community at time 8
Ben
Joined Yoga
Community at time 15
Joey
Mary
Joined Yoga
Community at time 1000
Jack
Joined Yoga
Community at time 5
Joined Yoga
Community at time 12
Propagation
Graph
Jill
Joined Yoga
Community at time 8
Ben
Joined Yoga
Community at time 15
Joey
Joined Yoga
Community at time 12
User Influence
Graph for Jack
12
Leaders – first definition

Who should be a leader?
For an action, should influence sufficiently large number of users ( >=ψ )
For an action, should influence these users in a reasonable amount of
time ( <=π )
Should act as a leader in sufficiently large number of actions ( >=σ )



Jack
Joined Yoga
Community at time 5
3
Jill
Joined Yoga
Community at time 8
7
4
995
Mary
Joined Yoga
Community at time 1000
7
Ben
3
Joey
Joined Yoga
Community at time 12
Joined Yoga
Community at time 15
If ψ= 2, π = 15,
σ=1
then, both
Jack and Jill
are leaders
13
Tribe Leader



A leader may influence
different users for
different actions
What if a leader lead a
fixed set of users for
different actions? YES
We call these leaders
as Tribe Leaders
jack
A2
A1, A2 and A3 are 3 different
actions
14
Additional Constraint: Genuineness




It may happen that one
user acts as a leader
but in concrete he is
always a follower of the
other leaders
We want to avoid this
kind of fake leaders.
gen(Jill) = 1/3
If gen(v) >= r ,then
define v to be a
genuine leader.
Jack
Tom
A1
A2
Jill
A2
A1, A2 and A3 are 3 different
actions
15
Algorithms
Algorithms: Overview

Assumptions:




Social graph is huge – millions of nodes
Actions log is huge – millions of tuples
For an action, size of user Influence Graph <<
size of Propagation Graph for all users
Our algorithms are able to extract the patterns
(leaders and tribe leaders) in no more than one
scan of the action log table.
17
Algorithms: Overview


Scan the action log table by means of a window of size π backward
in time, i.e., starting from the most recent timestamp (bottom of the
table if we assume tuples to be ordered by time).
Efficiently compute the influence matrix, i.e., a matrix Users x
Actions


IMπ(u, a) represents number of users, influenced by u w.r.t. action a
within time π
Compute leaders from IM
Jack
IM10(Jack, “joining yoga
community”) = 3
Joined Yoga
Community at time 5
Jill
Joined Yoga
Community at time 8
Ben
Joined Yoga
Community at time 15
Joey
Joined Yoga
Community at time 12
18
Computing Influence Matrix (1)

We use a bit vector to track which users are influenced by a given user. Updated
incrementally

Locking mechanism using another bit vector



0 => free bit; 1 => occupied bit
(V,2)
(W,1)
Node to bit index mapping stored in a queue
Head
Bits must be dynamically allocated.
Node
InfVec
R
01010111
S
01000110
T
00010110
W
00000110
V
00000100
(T,4)
(S,6)
(R,0)
Queue
R
Time
window on
propagation
graph
01010111
S
T
W
V
Lock bit Vector
19
Computing Influence Matrix (2)




Slide up the current window – delete node V
Delete the entry from queue
Update the lock bit vector
Update influence vectors
(V,2)
(W,1)
(T,4)
(S,6)
(R,0)
Queue
Head
Node
InfVec
R
01010111
01010011
S
01000110
01000010
T
00010110
00010010
W
00000110
00000010
V
00000100
R
Time
window on
propagation
graph
01010011
01010111
S
T
W
V
Lock bit Vector
20
Computing Influence Matrix (3)





New node P added
Issue a lock, add entry to the queue
Compute its Influence Vector by propagation (W,1)
Number of followers of P = 4
IM(P,a) = 4
Head
(T,4)
(S,6)
(R,0)
(P,2)
Queue
P
Node
InfVec
P
Node
01010111
InfVec
R
01010011
S
01000010
T
00010010
W
00000010
Time
window on
propagation
graph
01010011
01010111
R
S
T
W
Lock bit Vector
21
Mining Tribe Leaders

Influence Matrix not enough

We use influence cube: Users x Actions x Users

ICπ(u,a,v) = 1, when user v is influenced by user u for
action a within time π

We do not explicitly compute the whole
cube due to sparsity.

Problem same as discovering existence of
frequent itemsets of size larger than a given
threshold
22
Experiments
Leaders Vs. Tribe leaders
π – threshold on time
σ – threshold on
number of actions
ψ – threshold on
number of influenced
users
24
Number of leaders found
π – threshold on time
σ – threshold on
number of actions
ψ – threshold on
number of influenced
users
25
Number of leaders found
π – threshold on time
σ – threshold on
number of actions
ψ – threshold on
number of influenced
users
26
Run-time
π – threshold on time
σ – threshold on
number of actions
ψ – threshold on
number of influenced
users
27
Genuineness: an almost binary concept!
28
Conclusions


Proposed framework based on frequent pattern mining for
discovering leaders in social networks
Formally define the problem of extracting leaders from social graph
and actions log.



Various notions of leader, tribe leader
Their genuine variants
Efficient algorithms for extracting leaders of various flavors

Just one pass over the actions log table
29