Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence
through a Social Network
David Kempe, Jon Kleinberg, Éva Tardos
KDD 2003
Outline

Motivations

Models of influence

Influence maximization problem

Experiments

Conclusion
Social Network and Spread of Influence

Social network plays a fundamental
role as a medium for the spread of
INFLUENCE among its members

Opinions, ideas, information,
innovation…

Direct Marketing takes the “word-of-mouth”
effects to significantly increase profits
What we need (cont.)



Form models of influence in social networks.
Obtain data about particular network (to estimate
inter-personal influence).
Devise algorithm to maximize spread of influence.
Models of Influence




First mathematical models
Large body of subsequent work:
Two basic classes of diffusion models: threshold and
cascade
General operational view:




A social network is represented as a directed graph, with each
person (customer) as a node
Nodes start either active or inactive
An active node may trigger activation of neighboring nodes
Monotonicity assumption: active nodes never deactivate
Linear Threshold Model

A node v has random threshold θv ~ U[0,1]

A node v is influenced by each neighbor w according to a
weight bvw such that

bv ,w  1
w neighbor of v

A node v becomes active when at least
(weighted) θv fraction of its neighbors are active

w active neighbor of v
bv ,w   v
Inactive Node
Y
0.6
Active Node
0.3
0.2
X
Threshold
0.2
Active neighbors
0.1
0.4
U
0.5
w
0.3
Stop!
0.2
0.5
v
Independent Cascade Model


When node v becomes active, it has a single
chance of activating each currently inactive
neighbor w.
The activation attempt succeeds with probability
pvw .
Y
0.6
Inactive Node
0.3
0.2
X
0.4
0.5
w
0.2
U
0.1
0.3
0.2
Newly active
node
Successful
attempt
Unsuccessful
attempt
0.5
v
Stop!
Active Node
Influence Maximization Problem

Influence of node set S: f(S)


expected number of active nodes at the end, if set S is
the initial active set
Problem:

Given a parameter k (budget), find a k-node set S to
maximize f(S)
f(S): properties (cont.)



Non-negative
Monotone: f ( S  v ) 
Submodular:
 Let N be a finite set

A set function
f
f (S )
T
V
is submodular iff
S  T  N , v  N \ T ,
f ( S  v )  f ( S )  f (T  v )  f (T )
(diminishing returns)
S
g(S)
g(T)
g(v)
Submodularity for Independent Cascade
0.6

Coins for edges are
flipped during
activation attempts.
0.3
0.2
0.2
0.1
0.4
0.5
0.3
0.5
Submodularity for Independent Cascade
0.6




Coins for edges are
flipped during
activation attempts.
Can pre-flip all coins
and reveal results
immediately.
0.3
0.2
0.2
0.1
0.4
0.5
Active nodes in the end are reachable via
green paths from initially targeted nodes.
Study reachability in green graphs
0.3
0.5
Submodularity, Fixed Graph




Fix “green graph” G. g(S)
are nodes reachable from
S in G.
Submodularity: g(T +v) g(T)  g(S +v) - g(S)
when S  T.
S
T
V
g(S)
g(T)
g(v)
g(S +v) - g(S): nodes reachable from S + v, but not from
S.
From the picture: g(T +v) - g(T)  g(S +v) - g(S) when S
 T (indeed!).
Submodularity of the Function
Fact: A non-negative linear
combination of submodular
functions is submodular
f ( S )   Prob(G is green graph)  gG ( S )
G



gG(S): nodes reachable from S in G.
Each gG(S): is submodular (previous slide).
Probabilities are non-negative.
Submodularity for Linear Threshold



Use similar “green graph” idea.
Once a graph is fixed, “reachability” argument is
identical.
Each node picks at most one incoming edge, with
probabilities proportional to edge weights.
(cont.)

For a submodular function f, if f only takes nonnegative value, and is monotone, finding a k-element
set S for which f(S) is maximized is an NP-hard
optimization problem.

It is NP-hard to determine the optimum for influence
maximization for both independent cascade model
and linear threshold model.
(cont.)

We can use Greedy Algorithm!(Hill Climbing)


Start with an empty set S
For k iterations:
Add node v to S that maximizes f(S +v) - f(S).

How good it is?


Theorem: The greedy algorithm is a (1 – 1/e)
approximation.
The resulting set S activates at least (1- 1/e) > 63% of
the number of nodes that any size-k set S could activate.
Results: linear threshold model
Independent Cascade Model
P = 1%
P = 10%
Conclusions

We consider this problem in several of the most
widely studied models in social network analysis.

We show that a natural greedy strategy obtains a
solution that is provably within 63% of optimal
for several classes of models
Hill Climbing

基本的Hill Climbing 演算法







1. 從搜尋空間中亂數取一點a作為出發點
2. 考慮a點周圍可用的狀態點
3. 取a點周圍最好品質(錯位少)的一點b,並移往b點
4. 重複2~4,直到找不到更好的點
5. 最後的狀態點就是用Hill Climbing找到的最佳解
6. 若有兩點以上是最好解,則亂數擇一
Hill Climbing並不能保證得到最佳化 solution,
但卻可以有近似 solution
Example
1
2
8
7
3
4
6
goal state
5