Proposed Algorithm

Maximizing Influence over a
Target User through
Friend Recommendation
Master’s Thesis Defense
December 5th, 2014
Sundong Kim
Department of Industrial & Systems Engineering
Korea Advanced Institute of Science and Technology
Professor Kyoung-Kuk Kim (Chair)
Professor James Morrison
Professor Jae-Gil Lee
1
Outline


Introduction
Our Model




Proposed Algorithm




Information Propagation Model
Influence & Reluctance
K-node Suggestion Problem
Candidate Reduction
Approximation & Incremental Update
Experiments
Conclusions
2
Online Social Network with Target
Target’s web feed
A user wants to
expose him/herself
to the target user.
3
New Concept of Friend Recommendation
Target’s web feed
Suggesting relevant
friends to promote
information flow
How can we
suggest helpful
friends to the user?
4
Without Target vs. With Target
<Considering target>
<Without considering target>
Source
Source
Target

Recommend users which have high
node-to-node similarity (Contentbased, Topology-based)

Target
Recommend users in order to maximize
influence over the target node
5
Our Contribution

Design the problem of friend recommendation
with a target user




Basic rule of information propagation
Influence & Reluctance
K-node suggestion problem
Develop an algorithm to solve this problem

IKA(Incremental Katz Approximation) algorithm


Candidate reduction
Approximation & Incremental update
6
Outline


Introduction
Our Model




Proposed Algorithm




Information Propagation Model
Influence & Reluctance
K-node Suggestion Problem
Candidate Reduction
Approximation & Incremental Update
Experiments
Conclusions
7
Influence

Definition : Ratio of source
node(ns )’s article on target node
(nt )’s web feed on underlying
graph G
𝑟𝑠𝑡
𝐼𝑠𝑡 𝐺 = 𝐼𝑠𝑡 =
𝑠 𝑟𝑠𝑡

Issue : How can we estimate 𝑟𝑠𝑡 ?
Need information propagation model.
𝑟𝑠𝑡 = 3,
𝑠 𝑟𝑠𝑡
=8
8
Information Propagation

In online social network


Action : Each individual shows interest on
someone’s article (like, share, retweet)
Effect : Information can transmit beyond
neighbor by cascading effect
Share
Source
Share
Target
9
Four Principles
Share
Target
Source
Source
Target
(a) Direct neighbors can receive
a post without any action.
(b) An article is reached over its
neighbors by a sharing action.
Share
Source
Source
Share
Target
Target
(c) Users can receive the same
message multiple times.
(d) Every user can upload and
share articles.
10
Back to Influence

Assumptions




Deterministic sharing probability 𝑝𝑠
Single article by each user
Independent behavior
Number of 𝑛𝑠 ’ article on 𝑛𝑡 ’s wall

𝑤 ∶ 𝑊𝑎𝑙𝑘 𝑤𝑖𝑡ℎ 𝑡𝑤𝑜 𝑒𝑛𝑑𝑝𝑜𝑖𝑛𝑡 𝑛𝑠 𝑎𝑛𝑑 𝑛𝑡
𝑆 ∶ 𝑆𝑒𝑡 𝑜𝑓 𝑤𝑎𝑙𝑘𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑛𝑠 𝑎𝑛𝑑 𝑛𝑡
𝑙𝑒𝑛𝑔𝑡ℎ𝑤 ∶ 𝐿𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑤𝑎𝑙𝑘 𝑤

Example


Independent
action
1
Source
2
3
4
Target
Sequence of graph vertices
and edges
Number of edges in w
𝑟𝑠𝑡 = 𝑝𝑠2 + 𝟑𝑝𝑠4 + ⋯
Total number of walk of
length 5 (1-2-3-2-3-4)
11
Influence Representation


Katz[1] centrality

Centrality measure which considers the total number
of walks between two nodes

CKatz t =

C𝑃Katz 𝑠, t =
∞
𝑘=1
𝑛
𝑘
𝛼
𝑗=1
∞
𝑘
𝛼
𝑖=1
𝐴𝑘
𝐴𝑘
𝛼 ∶ Attenuation Factor
𝑠𝑡
𝐴 ∶ Adjacency Matrix
𝑠𝑡
Relation between Katz centrality and Influence

𝐼𝑠𝑡 =
𝑟𝑠𝑡
𝑠 𝑟𝑠𝑡
=
𝑟𝑠𝑡
𝑖𝑁
𝑑𝑖𝑠𝑡 = 𝑖
𝑝𝑠𝑖−1
Total number of walks from t
which length is equal to i
[1] Katz, L. (1953) A New Status Index Derived from Sociometric Analysis. Psychometrika 18. 1. : 39–43.
=
𝐶𝑃𝐾𝑎𝑡𝑧 (𝑠,𝑡)
𝐶𝐾𝑎𝑡𝑧 (𝑡)
Sharing probability =
Attenuation factor
12
Reluctance

Awkwardness between two nodes

Definition : Negative exponential to Adamic-Adar[2] similarity
𝜌𝑖𝑗 = 𝑒 −𝑠𝑖𝑚(𝑖,𝑗)

Adamic-Adar similarity : sim i, j =
1
𝑛∈Γ 𝑖 ⋂Γ 𝑗 log |Γ 𝑛 |
Γ 𝑖 = 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑜𝑓 𝑖

Purpose : Constraints when two nodes make connection

Example : 𝜌𝑖𝑗 = 1, where two vertices has no mutual friend
[2] Adamic, L., Adar, E. (2003) Friends and Neighbors on the Web. Social Networks 25. : 211–230.
13
𝐾-node Suggestion Problem




Goal : Maximize 𝐼𝑠𝑡 𝐺 ′ − 𝐼𝑠𝑡 (𝐺)
by making connection to 𝑘-nodes
Input : G = (𝑉, 𝐸) , Source node 𝑛𝑠 , Target node 𝑛𝑡
Output : Ordered set of 𝑘 suggested nodes for 𝑛𝑠 :
S = 𝑛𝑖1 , 𝑛𝑖2 , … , 𝑛𝑖𝑘
Constraint : ρ𝑠𝑖 < 1 for every suggestion
Source
Source
Target
Good Suggestion
Source
Target
Target
Bad Suggestion
14
Major Difficulties

Computation cost for the global optimal solution

𝑛
𝑘
different combination for 𝑘-node suggestion
Exponential problem with a unknown 𝑘
⇒ Greedy Algorithm


Large matrix inversion
Needed for computing influence(Katz centrality)
⇒ Approximation by Monte-Carlo simulation


Computation overlap
Occurred when computing influence on 𝐺 ′
⇒ Incremental update of influence

15
Outline


Introduction
Our Model




Proposed Algorithm




Information Propagation
Influence & Reluctance
K-node Suggestion Problem
Candidate Reduction
Approximation & Incremental Update
Experiments
Conclusions
16
Baseline Greedy Algorithm

Procedure
For all n𝑖 in
Candidate set
Start
Calculate Ist on
original graph G
Calculate ΔIst , ρ𝑠𝑖 on
G′ = G + e(ns , ni )
∀
Update
candidate set
Δ Ist < 0
Finish algorithm
Find the best node n𝑏
Set G + e ns , nb as G
17
Proposed Algorithm

Procedure
1. Candidate Reduction
For all n𝑖 in
Candidate set
2. Influence Approximation
3. Incremental Update
Start
Approximate Ist
on original graph G
Update Ist , ρ𝑠𝑖 on
G′ = G + e(ns , ni )
∀
1. Candidate Reduction
Update
candidate set
Δ Ist < 0
Finish algorithm
Find the best node n𝑏
Set G + e ns , nb as G
18
Candidate Reduction

Reduction 1 : Restrict the candidate set to the twohop neighbors of 𝑛𝑠
2
 Effect : Size decreases 𝑂 𝑛 → 𝑂 𝑑
 Reason : Constraint 𝝆𝒔𝒊 < 𝟏 implies recommended
nodes should have at least one mutual friend

Reduction 2 : Gradually remove non-beneficial node
𝑛𝑖 , the connection of which 𝜟𝑰𝒔𝒕 < 𝟎
 Reason : Each step we suggest only one node,
there is low chance that 𝑛𝑖 would be chosen later
19
Working Example (Greedy Algorithm)

Initial candidate set (Two-hop neighbor)
Candidate node
Non-beneficial node
Source user
Target user
20
Working Example (Greedy Algorithm)

First recommendation result
Candidate node
Non-beneficial node
Δ𝐼𝑠𝑡 < 0
1𝑠𝑡 suggestion
which satisfies
𝑎𝑟𝑔𝑚𝑎𝑥 Δ𝐼𝑠𝑡
and Δ𝐼𝑠𝑡 > 0
21
Working Example (Greedy Algorithm)

Update candidate set : Remove non-beneficial
nodes from the candidate set
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
22
Working Example (Greedy Algorithm)

Update candidate set : Effect of having a new
connection
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
23
Working Example (Greedy Algorithm)

Second recommendation result
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
24
Working Example (Greedy Algorithm)

Update the candidate set
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
25
Working Example (Greedy Algorithm)

Third recommendation result
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
3𝑟𝑑 suggestion
26
Working Example (Greedy Algorithm)

Update the candidate set
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
3𝑟𝑑 suggestion
27
Working Example (Greedy Algorithm)

No more beneficial nodes on the candidate set
Candidate node
Non-beneficial node
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
3𝑟𝑑 suggestion
28
Working Example (Greedy Algorithm)

Result of k-node recommendation
Candidate node
Non-beneficial node
1
1𝑠𝑡 suggestion
2𝑛𝑑 suggestion
2
3
3𝑟𝑑 suggestion
29
Proposed Algorithm

Procedure
1. Candidate Reduction
For all n𝑖 in
Candidate set
2. Influence Approximation
3. Incremental Update
Start
Approximate Ist
on original graph G
Update Ist , ρ𝑠𝑖 on
G′ = G + e(ns , ni )
∀
1. Candidate Reduction
Update
candidate set
Δ Ist < 0
Finish algorithm
Find the best node n𝑏
Set G + e ns , nb as G
30
Monte-Carlo Simulation

Purpose : To get the numerical result of 𝐼𝑠𝑡 and Δ𝐼𝑠𝑡

Influence approximation (𝐼𝑠𝑡 )

Simulation of Katz centrality (
𝑠 𝑟𝑠𝑡 )
and personalized Katz
centrality (𝑟𝑠𝑡 ) using our information propagation model


Save interim result for Δ𝐼𝑠𝑡
Incremental update of influence (Δ𝐼𝑠𝑡 )

Update 𝐼𝑠𝑡 𝐺 ′ based on I𝑠𝑡 𝐺 and 𝑒 𝑛𝑠 , 𝑛𝑖1

By Initializing new diffusion starting from 𝑛𝑠 𝑎𝑛𝑑 𝑛𝑖1
31
Example (Influence Approximation)


Monte-Carlo simulation for Personalized Katz centrality
Number of initial article : 𝑅1
(Number of simulation)
𝑈𝑝𝑙𝑜𝑎𝑑 𝑅1 articles
𝐼𝑠𝑡 =
𝐶𝑃𝐾𝑎𝑡𝑧 (𝑠, 𝑡)
𝐶𝐾𝑎𝑡𝑧 (𝑡)
Source user
Target user
32
Example (Influence Approximation)

Articles propagate through network according to our
model (Direct neighbors receive the article)
<First propagation step>
33
Example (Influence Approximation)

Information propagation from a node stops if the sharing
condition (e.g. ps = 0.2) hasn’t met
No sharing from this node
<Second propagation step>
𝑥1 ~ 𝑈 0,1 = 0.18 < 𝑝𝑠
No sharing
Share
34
Example (Influence Approximation)
Finish the simulation (Fixed step n) and count the total
number of articles which passed 𝑛𝑡

<Example>
Number of articles 𝑛𝑠 uploaded : 𝑅1
Number of articles 𝑛𝑡 received : 0.17𝑅1
→ rst = 𝐶𝑃𝐾𝑎𝑡𝑧 𝑠, 𝑡 ≈ 0.17
𝑅1
Number of article each node upload : 𝑅2
Number of article 𝑛𝑡 received : 1.25𝑅2
→
𝑠 rst
= 𝐶𝐾𝑎𝑡𝑧 (𝑡) ≈ 1.25
𝐼𝑠𝑡 =
𝐶𝑃𝐾𝑎𝑡𝑧 (𝑠, 𝑡)
0.17
≈
𝐶𝐾𝑎𝑡𝑧 (𝑡)
1.25
0.17𝑅1
35
Proposed Algorithm

Procedure
1. Candidate Reduction
For all n𝑖 in
Candidate set
2. Influence Approximation
3. Incremental Update
Start
Approximate Ist
on original graph G
Update Ist , ρ𝑠𝑖 on
G′ = G + e(ns , ni )
∀
1. Candidate Reduction
Update
candidate set
Δ Ist < 0
Finish algorithm
Find the best node n𝑏
Set G + e ns , nb as G
36
Example (Incremental Update)

Update part : Simulate the effect of new diffusion
occurred by a new edge
New edge
New diffusion
37
Example (Incremental Update)


Continue diffusion if sharing condition has met.
Finish at the same time with the original simulation
and update the influence value
Original diffusion
New diffusion
𝑥1 ~ 𝑈 0,1 = 0.35 > 𝑝𝑠
Sharing
𝐶𝑃𝐾𝑎𝑡𝑧 (𝑠, 𝑡) =
+
38
Outline


Introduction
Our Model




Proposed Algorithm




Information Propagation
Influence & Reluctance
K-node Suggestion Problem
Candidate Reduction
Approximation & Incremental Update
Experiments
Conclusions
39
Network Datasets
Size
Density
Topology
40
Time Comparison

According to node size
(Scale-Free, V = |E|)

According to graph density
41
Error Analysis

Relative error
according to the
number of initial seeds
(x-axis : 𝑅1 , y-axis : 𝑅2 )

Variance of influence
(x-axis : 𝑅1 , y-axis : 𝑅2 )
𝑅1 : Number of initial article to approximate CPKatz (s, t)
𝑅2 : Number of initial article to approximate CKatz (t)
42
Interpretation

Case 1: 𝑛𝑠 , 𝑛𝑡 = 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑛𝑠 = 𝑙𝑒𝑎𝑓, 𝑛𝑡 = 𝑐𝑒𝑛𝑡𝑒𝑟
43
Interpretation

Case 2: 𝑛𝑠 , 𝑛𝑡 = 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑛𝑠 = 𝑙𝑒𝑎𝑓, 𝑛𝑡 = 𝑙𝑒𝑎𝑓
44
Outline


Introduction
Our Model




Algorithm




Information Propagation
Influence & Reluctance
K-node Suggestion Problem
Candidate Reduction
Approximation & Incremental Update
Experiments
Conclusions
45
Summary

Proposed the problem of friend recommendations having
a target user

Defined the new measure called Influence, and found
out the relation with Katz centrality

Proposed Incremental Katz Approximation (IKA)
algorithm to recommend friends effectively

Conducted various experiments and proved that IKA
effectively suggests the close persons of the target
46
Thank you
47
Future Work

Solve scalability issue (for dense graph)

Extend to multiple targets

Apply more realistic settings


Generalize sharing probability
Asymmetric behavior in social network
48
Appendix : Algorithm Complexity on Random graph

Initial candidate set size:
O n → 𝑂 𝑑2

Number of candidate to check in whole process:
O(

𝑛 ∶ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 𝑖𝑛 𝐺
𝑑 ∶ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐺
𝑘 ∶ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛
𝑝𝑠 ∶ 𝑆ℎ𝑎𝑟𝑖𝑛𝑔 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
𝑘
𝑖=1 𝑑(𝑑
+ 𝑖)) = O kd2 + dk 2 = O(kd k + d )
Overall algorithm complexity on random graph:
𝑂 𝑘𝑛4 → 𝑂 𝑘𝑑 𝑘 + 𝑑 𝑛3

Computation of Katz centrality - O n3 is a big burden →
Necessary to do approximation & incremental update
49
Appendix : Algorithm Complexity on Random graph

Approximation


Initializing a seed : O(n)
Approximating Katz centrality (Influence):
Θ(𝑛 +
Initialize

𝑛 ∶ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 𝑖𝑛 𝐺
𝑑 ∶ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐺
𝑘 ∶ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛
𝑝𝑠 ∶ 𝑆ℎ𝑎𝑟𝑖𝑛𝑔 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
∞
𝑘=0 𝑛𝑑
𝑑𝑝𝑠 𝑘 ) = Θ 𝑛 +
Number of
initial neighbor
Incremental update


Decreasing
factor < 1
𝑛𝑑
1−𝑑𝑝𝑠
= Ο(𝑛𝑑)
Need to check
2 nodes over
total n nodes
Updating a Katz centrality for single candidate : O(d)
Total number of
Find K best node: O(kd2 𝑘 + 𝑑 )
candidate set : O(kd(k + d))

Total complexity : Ο(𝑛𝑑 + 𝑘 2 𝑑2 + 𝑘𝑑3 )
50