Community Analysis via Social Influence Analysis

SOCIAL NETWORK
ANALYSIS VIA FACTOR
GRAPH MODEL
Zi Yang
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
BACKGROUND

Social network

Example: Digg.com
A popular social news website for people to discover
and share content
 Various types of behaviors of the users



submit, digg, comment and reply a comment
Edges

if one diggs or comments a story of another
BACKGROUND

Community discovery
 Modularity

property
ki k j 

exp [ yi  y j ]  i , j 

2
m
i, j


Affinity propagation
 Clustering
via factor graph model
 Update rules:
r (i, k )  s (i, k )  max {a (a, k ')  s (i, k ')}
k ' s .t .k ' k
a (i, k )  min{0, r (k , k ) 

i ' s .t .i '{i , k }
a (k , k ) 

i ' s .t .i '{ k }
max{0, r (i ', k )}
max{0, r (i ', k )}}
Pair-wise
constrain
BACKGROUND

Affinity propagation
, if ck  k but i : ci  k
S (c)   s(i, ci )    k (c1:N ) where  k (c1:N )  
0, otherwise
i 1
k 1

N
N
Local factor Regional constrain
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
CHALLENGES


How to capture the local properties for social
network analysis?
Community discovery as a graph clustering,
and how to consider the edge information
directly?


Homophily
What constraint can be applied to describe the
formation/evolution of community?
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
REPRESENTATIVE USER FINDING

Problem definition
a social network G  (V , E ) and (optional) a
confidence  i for each user vi , the objective is to
find a pair-wise representativeness on each edge
in the network, and estimate the representative
degree of each user vi in the network, which is
denoted by a set of variables { yi } satisfying
.yi {1,, N} . In other words, yi represents the
user that vi mostly trusts (or relies on).
 given
REPRESENTATIVE USER FINDING

Modeling
v2
 Input
v4
v1
v3
 Variables
y2
y4
y1
y3
Represent the
representative
v2
v4
v1
v3
REPRESENTATIVE USER FINDING

Modeling
 Node
y2
feature function
Normalization
factor
y3
Observation:
similarity between
the node and
variable

  wi , yi

gi (y i )  g i ( yi )    w j ,i
 jNB (i )

0
y4
y1
g1(y1)
v2
v4
v1
if yi  O (i )
if yi  i
otherwise
g4(y4)
g3(y3)
g2(y2)
v3
Neighbor
Representative
Self-representative
REPRESENTATIVE USER FINDING

Modeling
f2,1(y2,y1)
 Edge
y1
feature function
f2,3(y2,y3)
f3,2(y3,y2)
g4(y4)
g3(y3)
g2(y2)
v2
v4
v1
if yi  y j
 
fi , j (y i , y j )  fi , j ( yi , y j )  
1   if yi  y j
y4
f3,2(y3,y2) y3
g1(y1)
Undirected edge:
bidirected influence
f2,4(y2,y4)
y2
v3
If vertexes of the
edge have the same
representative
If vertexes of the
edge have different
representative
REPRESENTATIVE USER FINDING

Modeling
 Regional
a
h1(y1,y2)
h2(y2,y3,y4)
h3(y3,y1)
feature function
feature function defined
on the set of neighboring
nodes of vi and itself.
f2,1(y2,y1)
h4(y4,y2)
f2,4(y2,y4)
y2
f2,3(y2,y3)
y1
f3,2(y3,y2)
y4
f3,2(y3,y2) y3
g1(y1)
g4(y4)
g3(y3)
g2(y2)
v2
v4
v1
v3
0 if yk  k and i  I (k ), yi  k
hk (y I (k ){k} )  hk ( yI ( k ){k } )  
otherwise
1
To avoid “leader without followers”
REPRESENTATIVE USER FINDING

Modeling
 Objective
function
max log P (y 1:N )
y1:N
N
1 N
P (y 1:N )   gi (y i )  f i , j (y i , y j ) hk (y I (k ){k } )
Z i 1
ei , j E
k 1
N

1 N
   gi ( yi )  fi , j ( yi , y j ) hk ( yI ( k ){k } ) 

Z  i 1
ei , j E
k 1


Solving
 Max-sum
algorithm
REPRESENTATIVE USER FINDING

Model learning
aii  max min rkj , 0
kI ( j )


aij  min  min rjj , 0  max min rkj , 0 , max rjj , 0
kI ( j ) ‚ {i}


rij  gij   cikj 
max  gij   aij    cikj  

j O ( i ) {i}‚ { j }
kI ( i ) O ( i )
kI ( i ) O ( i )


pijk  gik  aik 

lI ( i ) O ( i ) ‚

cikl  max  gij   aij  

j O ( i )
{ j}
lI ( i ) O ( i ) ‚




cijk  max log
 p jik , 0 
 1 


cilj 
{ j}

REPRESENTATIVE USER FINDING

A bit explanation

pijk : how likely user vi persuadesv j to take vk as his
representative
 cijk : how likely user vi compliances the suggestion
from v j that he considers vk as his representative

The direction of such process
 Along
the directed edges
v1
v2
v3
v1
v2
v3
v1
v2
v3
REPRESENTATIVE USER FINDING

Algorithm
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
COMMUNITY DISCOVERY

Problem definition
 given
a social network G and an expected number
of communities C , correspondingly a virtual
node uc  U . is introduced for each community,
and the objective is to find a community yi for
each person vi satisfying yi {1,, C} , which
represents the community that vi belongs to, such
that maximize the preservation of structure (or
maximize the modularity Q of the community).
COMMUNITY DISCOVERY

Feature definition – What’s different?
 Node
gi ( yi )  exp
 Edge
feature function
 [ y
jI ( i ) O ( i )
j
 yi ]  1
i, j
| X yj |
u1
feature function
f i , j ( yi , y j )  exp qi , j
ki k j 

 exp[ yi  y j ]   i , j 

2m 

u2
g3(y3)
g2(y2)
g1(y1)
f2,1(y2,y1) y2
g4(y4)
f2,4(y2,y4)
f2,3(y2,y3)
y4
y1
f3,2(y3,y2) y
3
f1,3(y1,y3)
v2
v4
v1
v3
COMMUNITY DISCOVERY

Algorithm
Result output and
Variable updates
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
Experiments

Dataset: Digg.com
a
popular social news website for people to
discover and share content
 9,583 users, 56,440 contacts
 various types of behaviors of the users
 submit,
 Edges
 if
digg, comment and reply a comment
(In total: 308,362)
one diggs or comments a story of another
 Weight of the edge: the total number of diggs and
comments
Experiments

Dataset: Digg.com
 9,583
users, 56,440 contacts
 308,362 edges
 weight
of the edge: the total number of diggs and
comments

Settings:
 Parameter
  0.6
Experiments

Result: 3 most self-representative users
on 3 different topics for Digg user network
Experiments

Result: 3 most representative users of 5
communities on 3 different subset
Experiments
Result: Representative network on a sub
graph in Digg-2 Network
irfanmp
0.0000
0.0024
0.0000
SirPopper
0.0020
0.0024
wonderwal
0.0006
0.0009
0.0006
0.0007
0.0010
0.0000
0.0006
upick
maxthreepwood
0.0007
0.0000
zohaibusman
mpind176
numberneal
0.0020
0.0000
rocr69
optimusprime01
0.0006
0.0006
0.0000
0.0007
0.0015
0.0000
0.0000
0.0003
0.0000
0.0000

0.0010
0.0000
mklopez
louiebaur
GordonFree
pyrates
Omek
0.0000 0.0005
0.0007
0.0000
0.0003
0.0000
pavelmah
ritubpant
1nfiniteLoop
mikek814
OUTLINE




Background
Challenge
Unsupervised case 1
 Representative user finding
Unsupervised case 2
 Community


discovery
Experiments
Supervised case
 Modeling
information diffusion in social network
Modeling information diffusion in
social network



Supervised model
Bridging the actual value (label) with the
variable.
More variables to come?
 Learning
the weights
Thanks