An Optimal Allocation Approach to Influence Maximization Problem

An Optimal Allocation Approach to Influence
Maximization Problem on Modular Social Network
Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu
ACM SAC 2010
outline
Social network
`
`
`
Definition and properties
Definition
and properties
Social contagion
Models of Influence
`
`
`
Linear threshold model
Independent cascade model
Influence maximization problem
Influence maximization problem
`
`
`
`
Greedy algorithm and the CELF optimized version
Degree discount algorithm
OASNET algorithm
Experiments
`
2
Social Network Definitions
Social Network Definitions
A social network is a social structure made of individuals (or organizations) called "nodes,"
(or organizations) called nodes, which are tied which are tied
(connected) by one or more specific types of interdependency, such as friendship, kinship, financial exchange, dislike,
h
d l k sexual relationships, or relationships of l l
h
l
h
f
beliefs, knowledge or prestige.(Wiki)
Alternatively: people + interactions
Alternatively: people + interactions.
`
`
3
Social Network Examples
Social Network Examples
The University Karate Club Network[8]
4
Social Network Example
Social Network Example
Modularity and structure
5
Social Network Example
Social Network Example
6
Social Network Example
Social Network Example
Malaysian Blogosphere Division[9]
7
Social Network Properties
Social Network Properties
Power law degree distribution. P(k)=ck‐r
Positive assortativity Similar degree nodes connecting to
Positive assortativity. Similar degree nodes connecting to each other.
High clustering coefficient. Your friend tend to know each g
g
other.
Short diameter. (six degree of separation)
Community structure.
`
`
`
`
`
8
Social Contagion
Social Contagion
Yawning is contagious.
`
9
Social contagion
Social contagion
`
“I’ll have what she is having.”
10
Models of contagion/diffusion
Models of contagion/diffusion
`
`
Initially some nodes are active:
y
Active nodes spread their influence on the other nodes, and so on …
11
Models of Influence
Models of Influence
`
`
Two basic classes of diffusion models: threshold[5] and
cascade [1]
General operational view:
`
`
`
`
12
A social network is represented as a directed graph, with each person (customer) as a node
person (customer) as a node
Nodes start either active or inactive
An active node may trigger activation of neighboring nodes Monotonicity assumption: active nodes never deactivate
Linear Threshold Model
`
A node v has random threshold θv ~ U[0,1]
`
A node v
A
d is influenced by each neighbor w
i i fl
db
h i hb
according to a di
weight bvw such that ∑
bv ,w ≤ 1
w neighbor of v
`
A d becomes active when at least
A node v
b
ti
h
tl t
(weighted) θv fraction of its neighbors are active
∑
bv ,w ≥ θ v
w active neighbor of v
13
Example
Inactive Node
0.6
0.3
Active Node
0.2
X
Threshold
0.2
Active neighbors
0.1
0.4
U
0.5
w
14
0.3
0.5
Stop!
0.2
v
Independent Cascade Model
Independent Cascade Model
`
`
When node v becomes active, it has a single chance of activating each currently inactive neighbor w.
activating each currently inactive neighbor w.
The activation attempt succeeds with probability pvw .
15
Example
0.6
Inactive Node
0.3
0.2
X
0.4
0.5
w
0.2
U
0.1
0.3
0.2
0.5
v
Stop!
16
Active Node
Newly active node
Successful Successful
attempt
Unsuccessful
attempt
Problem Setting
Problem Setting
`
Given
`
`
`
Goal
`
`
a limited budget B for initial advertising (e.g. give away free g
g( g g
y
samples of product)
estimates for influence between individuals
trigger a large cascade of influence (e.g. further adoptions of a product)
Q ti
Question
`
17
Which set of individuals should B target at?
Influence Maximization Problem
Influence Maximization Problem
`
Influence of node set S: f(S)
`
`
expected number of active nodes at the end, if set S is the number of active nodes at the end if set S is the
initial active set
Problem:
`
`
18
Given a parameter k (budget), find a k‐node set S to maximize f(S)
Constrained optimization problem with f(S) as the objective
Constrained optimization problem with f(S) as the objective function
First Solution
First Solution
•
Greedy Algorithm[1]
–
–
Start with an empty set S
Start
with an empty set S
For k iterations:
Add node v to S that maximizes f(S +v) ‐ f(S).
•
How good (bad) it is?[1]
–
–
19
Theorem: The greedy algorithm is a (1 – 1/e) approximation.
Th
The resulting set S activates at least (1‐
lti
t S ti t
t l t (1 1/e) > 63% of the 1/ ) 63% f th
number of nodes that any size‐k set S could activate.
Problems with Greedy Algorithm
Problems with Greedy Algorithm
`
`
The greedy algorithm in[1] is slow. Its time complexity is e g eedy a go
[ ] sso
s
e co p e y s
O(knms), where n is the number of nodes, m is the number of edges and s is the times of simulation. It does not assume any topological structure of the social network. There might be some space for optimization given a certain topological property
given a certain topological property.
20
CELF optimized Greedy Algorithm
CELF optimized Greedy Algorithm
`
The function f(S) is sub‐modular for both the independent cascade model and the linear threshold model.[1]
cascade model and the linear threshold model.[1]
f(TU{v})‐f(T)<=f(SU{v})‐f(S) if S is a subset of T.
21
CELF optimized Greedy Algorithm
CELF optimized Greedy Algorithm
`
Recall the greedy algorithm.
`
For k iterations:
For k iterations:
Add node v to S that maximizes f(S +v) ‐ f(S).
`
CELF optimized greedy algorithm [2]
`
`
22
Keep an ordered list of marginal benefits bi from previous iteration
Re evaluate bi only if bi in previous iteration is larger than the Re‐evaluate b
only if bi in previous iteration is larger than the
maximum marginal benefits for current iteration
CELF optimized Greedy Algorithm
CELF optimized Greedy Algorithm
In the second iteration, the nodes b, e, c do not
need to be evaluated because their gain in first
iteration is smaller than the gain of d in the current
iteration.[10]
23
CELF optimized Greedy Algorithm
CELF optimized Greedy Algorithm
`
Pros
`
`
It produces the same result as the greedy algorithm with much It
produces the same result as the greedy algorithm with much
less time. (Experiments show that it is 700 times faster than the original greedy algorithm)
Cons
`
24
It does not improve the running time complexity. For a large network it is still slow.
network it is still slow.
Degree Discount Heuristics
Degree Discount Heuristics
`
`
The degree heuristics is a good baseline compared to other heuristics(random, distance centrality).
other heuristics(random, distance centrality).
If a node u has been selected as a seed, when considering selecting a new seed v based on its degree, we should not count the edge {v, u} towards its degree.[2]
25
Degree Discount Heuristics
Degree Discount Heuristics
Node 6 is selected in the second iteration because it
g
degree
g
than node 5, whose degree
g
is
has a higher
discounted by 1.
26
Degree Discount heuristics
Degree Discount heuristics
`
`
`
There is a more sophisticated discount mechanisms when the diffusion probability is small[2].
the diffusion probability is small[2].
In the IC model with propagation probability p, suppose that dv=O(1/p) and tv=o(1/p) for a vertex v. The expected number of additional vertices in Star(v) influenced by selecting v into the seed set is:
1 (d 2t (d t )t
1+(dv‐2tv‐(dv‐tv)tvp+o(tv))*p
(t ))*
27
Result on the above heuristics[2]
Result on the above heuristics[2]
28
Result on the above heuristics[2]
Result on the above heuristics[2]
29
Result on the above heuristics[2]
Result on the above heuristics[2]
30
Our motivation
Our motivation
`
`
`
Make use of the modularity of social networks. Generally a social network exhibits some degree of modularity.
a social network exhibits some degree of modularity.
Try to make an algorithm that scales better than greedy algorithm in terms of the time complexity.
Idea:
`
31
Treat the seeds as a resource. Try to allocate the resources to th
the communities of a network in an intelligent way.
iti
f
t
ki
i t lli t
Growth function
Growth function
`
`
The growth function F(k, G, M, Γ) maps the number of initial seeds k, the network G, the diffusion model M and
initial seeds k, the network G, the diffusion model M and a base strategy of selecting initial active nodes(like high degree) to the expected number of active nodes when the diffusion process terminates.
h d ff
Relation with f(S) in [1]. Γ(k, G)=S.
32
Growth Function
Growth Function
Condensed Matter physics network with diffusion probability 0.05
33
Condensed Matter physics network with linear threshold model
Problem Reformulation
Problem Reformulation `
`
`
`
We made a simplified assumptions. Communities are disconnected.
Maximize D=∑Fi(ki, Gi, Mi, Γi) s.t. ∑ki=k.
Solutions:
OPT(k,n)=MAX{Fn(i, Gn, Mn, Γn)+OPT(k‐I,n‐1)} 0≤ i≤k.
34
Adaptation to a real world network
Adaptation to a real world network
•
•
•
•
The assumption that all the communities are disconnected is too restrictive.
disconnected is too restrictive.
Communities are usually weakly connected. It violates the requirement of the reformulated problem. q
p
Second assumption: The number of cross community edges is relatively small and they are far from the center of the community. It won’t have a large effect.
35
OASNET algorithm
OASNET algorithm
`
`
`
`
`
`
The steps of the algorithm.
1 Use an external community finding algorithm to find
1. Use an external community finding algorithm to find the communities of the network. Extract these communities from the network.
2. Simulate the diffusion process on each community to find its corresponding growth function.
3. Apply the optimization equation on the n growth functions to find an optimal allocation of n initial target nodes into n communities
nodes into n communities.
4. Return the seed as the union of Γ(ki,Gi).
Time complexity is O(kms)
Time complexity is O(kms). 36
Experiment Settings
Experiment Settings
`
`
The greedy algorithm does not scale well to the cases of our experiments (too slow for networks over 30,000
our experiments (too slow for networks over 30,000 nodes). We use four heuristics to compare. Degree heuristics(M5), equal allocation(M2), random allocation(M4), proportional allocation(M1).
37
Experiment Settings
Experiment Settings
`
`
`
`
`
Datasets.
1 Condense matter physics collaboration network[3];
1.Condense matter physics collaboration network[3];
2.PGP giant component network[4]; 3 Lederberg citation network;
3.Lederberg citation network;
4.Zewail Citation Network.
38
Experiment Results
Experiment Results
Condensed Matter physics network with diffusion probability 0.05
39
Experiment Results
Experiment Results
PGP Network with diffusion probability 0.1
40
Experiment Results
Experiment Results
PGP Network with diffusion probability 0.2
41
Experiment Results
Experiment Results
Lederberg Citation Network with diffusion probability 0.1
42
Experiment Results
Experiment Results
Lederberg Citation Network with diffusion probability 0.2
43
Experiment Results
Experiment Results
Zewail Citation Network with diffusion probability 0.1
44
Experiment Results
Experiment Results
Zewail Citation Network with diffusion probability 0.2
45
Experiment Results
Experiment Results
Condensed matter physics Network with threshold model
46
Experiment Results
Experiment Results
PGP Network with threshold model
47
Experiment Results
Experiment Results
Lederberg Citation Network with threshold model
48
Experiment Results
Experiment Results
Zewail Citation Network with threshold model
49
Conclusions
`
`
The proposed algorithm outperforms comparing heuristics in most cases.
heuristics in most cases.
The difference between the proposed method and comparison heuristics are larger when the diffusion probability is higher for the independent cascade model.
50
References
`
`
`
`
[1] D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD,
the spread of influence through a social network. In KDD, pages 137–146, 2003.
[2]Wei Chen, Yajun Wang, Siyu Yang. Efficient Influence Maximization In Social network. KDD 2009
[3] M. E. J. Newman. The structure of scientific collaboration networks. PNAS, 98(2):404–409, 2001.
ll b ti
t
k PNAS 98(2) 404 409 2001
[4] M. B. n´a, R. Pastor‐Satorras, A. D´ıaz‐Guilera, and A. Arenas Models of social networks based on social
Arenas. Models of social networks based on social distance attachment. Physical Review E, 70(5):056122, 2004.
51
References
`
`
`
`
[5] M Granovetter. Threshold Models of Collective Behavior. American Journal of. Sociology 83 (1978) 1420–1443.
gy (
)
[6] Bearman PS, Moody J, Stovel K. Chains of affection: The structure of adolescent romantic and sexual networks. the American Journal of Sociology, Vol. 100, No. 1.
[7] A. Clauset, C. Moore, and M. E. J. Newman. Structural inference of hierarchies in networks 2006
inference of hierarchies in networks, 2006.
[8] Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452‐473.
52
References
`
`
[9]http://datamining.typepad.com/data_mining/2010/01
/malaysian blogosphere division.html.
/malaysian‐blogosphere‐division.html.
[10] Jure Leskovec, Christos Faloutsos. Tools for large graph mining WWW 2008 tutorial Part 2: Diffusion and cascading behavior. 53
Thanks
54

Download Report

An Optimal Allocation Approach to Influence Maximization Problem

Paperzz.com

Your Paperzz