slides

Understanding (Mis)information Spreading
for Improving Corporate Network
Trustworthiness
Candidate:
Mara Sorella
Advisor:
Prof. Roberto Baldoni
A.A. 2012/2013
Supervisor: Dott.ssa Silvia Bonomi
Outline
Misinformation in SN
Spreading models
Problems
Moving to Corporate networks
A model for Corporate Social Networks
Problem formulation
Evaluation: Case Studies
Enron
DIAG
Future works
Introduction
Social networks: medium for the spread of information
Opinions, ideas, information, innovation
Direct Marketing exploits word-of-mouth effects to significantly increase profits
Spreading of information in SN
Two basic classes of graph based diffusion models: Threshold and Cascade
Directed G = (V,E)
General operational view:
!
users = nodes
Edges (u,v) can be weighted to represent influence of node u on v.
!
Nodes start either active or inactive
An active node may trigger activation of neighboring nodes Monotonicity assumption: active nodes never deactivate
Linear Threshold
A node
has random threshold
A node is influenced by each neighbor
according to a weight
such that:
!
Activation condition
Linear Threshold
A node
has random threshold
A node is influenced by each neighbor
according to a weight
such that:
u
0.4
0.2
!
Activation condition
w
!w = 0.3
0.5
v
!v = 0.6
Linear Threshold
A node
has random threshold
A node is influenced by each neighbor
according to a weight
such that:
u
0.2
0.4
!
Activation condition
w
!w = 0.3
0.5
v
!v = 0.6
Linear Threshold
A node
has random threshold
A node is influenced by each neighbor
according to a weight
such that:
u
0.4
0.2
!
Activation condition
w
!w = 0.3
0.5
v
!v = 0.6
Independent Cascade
When a node becomes active, it has a single chance of activating
each currently inactive neighbor !
The activation attempt succeeds with probability
U
0.2
W
0.5
v
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Example (ICM)
Legenda
0.6
inactive node
0.2
0.3
0.4
0.2
0.1
x
0.5
w
Stop!
active node
u
successful attempt
0.3
0.5
newly activated node
0.2
v
unsuccessful attempt
Problems in SN
Influence of node set S: f(S)
expected number of active nodes at the end, if set S is the initial active set
Influence maximization
Given set S of nodes is selected for initial activation
Problem:
Given a parameter k (budget), find a k-node set S to maximize f(S)
Misinformation containment
Information (set L) and misinformation (set A) are competing.
!
Problem:
Given a parameter k (budget), find a k-node set L to maximize f(L)
[24] Kempe et al. Maximizing the spread of influence through a SN [KDD ’03] [11] Budak et al. Limiting the spread of misinformation in social networks. [WWW ’11]
[9] Bharathi et al. Competitive influence maximization in SN [WINE ’07]
[31] Nguyen et el. Containment of misinformation spread in OSN. [WebSci ‘12]
From SN to Corporate Networking
Key point: a hierarchical interpretation exists over the set of entities
forming the system.
Organizational chart
represents the
hierarchical organization
of a company
Alongside, Social Networks are commonly used in Corporate Networks
!
Social relationships within the corporation
Technological means Corporate Social Network Tools
Tools for improving the efficiency of a company
Internal SN:
true internal social
networks for expertise
localization
i.e. IBM SmallBlue
Internal messaging
systems
i.e. emails, internal chat service
Detecting influential nodes
Social connections can create potential vulnerabilities as employees that are at
the lower levels in the organization chart may become influential
thanks to social connections.
!
Unexpected influence could be dangerous if the employee behaves maliciously
reducing thus the trustworthiness of the overall organization (potential
insiders)
Therefore, a joint analysis must be performed of:
hierarchical relationships imposed by the organizational structure
social relationships observed by the presence of a
social network among them
Main purpose:
identifying the global scope for the influence of every node of the network
Downline of this, appropriate countermeasures to prevent potential attacks can be taken
Towards a CN Model:
Hierarchical
Network
Social
Network
+
Corporate Social
Network
Network Graph (topology)
Influence mapping function
Information Diffusion Model
Hierarchical Network
u
Legenda
v
Employee
Hierarchical
relationships
Ed
Er
“direct edges” (going down):
“reverse edges” (going up):
Social Network
Legenda
u
v
Employee
Hierarchical
relationships
Es
Social Influence Mapping Function
no specific constraint over values/relationships, can
be derived from the specific social network considered
also a superimposition of more social means
Corporate Social Network Model
Legenda
Ed
Er
Es
Merging rules for the Influence Function
u
v
f-Influential nodes identification
Influence function of a node
expected number of nodes that will be influenced by v at the end of
the spreading process
Problem (f-influential Nodes Identification)
this is done in order to find the f-influential weak nodes
Experiments
In order to discover the f-influential nodes we study the spread of information with 10000 Monte Carlo
simulations from any single node in three different settings and the corresponding graphs H, S and HS.
v
P is the probability associated to edges (u, v) representing
the “u is member of v’s staff” relationship.
P
u
The same experiment is repeated by considering two
different values of P: P = 0 and P = 0.5.
!
P = 0 supervisors don’t listen at people in their staff.
P = 0.5 models the situation in which a supervisor can either decide to accept or not an
information coming from a person from his/her staff.
The value of f in the experiments is set to 0.5
Study Case: Enron Corp.
H graph
height 8
recovered by official documents released to the public
organizational chart tree-shaped graph (labeled
via BFS, 60% leaves).
Enron
S graph
company’s social network represented by
email exchanges
Influence over edges: associated to the number
of emails sent by u to v, (threshold values)
Results ~ P=0
Number of Reached Nodes
Number of nodes reached by each one of the 151 Enron employees considering the influence
given by in the 3 different graphs.
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
0
0
Corporate Social Graph (HS)
Organizational Chart (H)
Social Network Graph (S)
f=0.5
10
20
30
40
50
60
70
80
90
100
110
120
130
Nodes Ordered by Rank
“weak” nodes
employees are ordered by rank of appearance (BFS)
in the organizational chart (0, CEO -150, bottom-level employee)
140
150
Results ~ P=0.5
Number of Reached Nodes
Number of nodes reached by each one of the 151 Enron employees considering the influence
given by in the 3 different graphs.
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
0
0
Corporate Social Graph (HS)
Organizational Chart (H)
Social Network Graph (S)
f=0.5
10
20
30
40
50
60
70
80
90
100
110
120
Nodes Ordered by Rank
“weak” nodes
employees are ordered by rank of appearance (BFS)
in the organizational chart (0, CEO -150, bottom-level employee)
130
140
150
Study Case: DIAG
Dataset: H Graph
Derived tree shaped graph (DAG) from
publicly available documents
Depth 4
Director
Full Professor
Associate
Researcher
Expert Engineer
PhD
Technical/Administrative staff
Other
DIAG: S Graph
Email exchanges within the department members
2 Months of traffic obtained by the Network Administrator that was provided with
an obfuscation tool
Nov 10 10:42:31 mail postfix/qmgr[6885]: BB3741F69B: from=<[email protected]>,
size=1131, nrcpt=1 (queue active)
Nov 10 10:42:32 mail postfix/local[7078]: BB3741F69B: to=<[email protected]>,
relay=local, delay=2, status=sent (delivered to command: /usr/bin/procmail)
regular expression matching of the local
Postfix Mail Transfer Agent logfiles
only emails among department
members were considered
no information on email
subject/contents
obfuscated HDIAG graph
Anonymization Flow
Anonymized HDIAG Graph
Labeled Organizational Graph
11
a
b
d
c
In
e
Ea
Random salt
node hashing +
shuffling
list of matches Va
Postfix Logs
Out
dcc7e59
d10ca76
d6ce203
5e62ab4
e16ada9
a : dcc7e59
b : d10ca76
...
postfix/qmgr[6635]:
6CC062712E:
from=<[email protected].
it>, size=46785,
nrcpt=1
postfix/smtpd[24570]:
6CC062712E:
to=<[email protected]
>, relay=local,
delay=0, status=sent
21
31
In
Anonymized HSDIAG Graph
Substitution
Anonymized SDIAG Graph
anonymized
logs
dcc7e59
41
Anonymized
Postfix Log
Parsing
Out
d10ca76
d6ce203
5e62ab4
e16ada9
Results (Role Clustering)
Results (Role Clustering)
Organizational Chart (H)
Social Network Graph (S)
Corporate Social Graph (HS)
300
# Reached Nodes
250
200
150
100
50
0
Head
Full Prof.
Associate
Researcher
Exp. Eng.
PhD
Staff
Other
Results (Role Clustering)
Social Network Graph (S)
Organizational Chart (H)
Corporate Social Graph (HS)
300
# Reached Nodes
250
200
150
100
50
0
Head
Full Prof.
Associate
Researcher
Exp. Eng.
PhD
Staff
Other
Average Role Spreading
Average Role Spreading
Average"Role"Spreading"in"H"
%"Reached"Nodes"
100.0%(
80.0%(
60.0%(
40.0%(
20.0%(
0.0%(
Director(
Full(Prof.(
Associate(
Researcher(
Exp(Eng.(
PhD(
Staff(
Other(
P=(0(
100.0%(
3.0%(
0.9%(
0.8%(
0.4%(
0.4%(
0.5%(
0.4%(
P=0.5(
100.0%(
66.0%(
51.0%(
49.7%(
44.9%(
44.6%(
47.8%(
46.0%(
Average Role Spreading
Average"Role"Spreading"in"S"
100.0%(
%"Nodes"Reached"
80.0%(
60.0%(
40.0%(
20.0%(
0.0%(
Director(
Full(Prof.(
Associate(
Researcher(
Exp(Eng.(
PhD(
Staff(
Other(
33.6%(
16.0%(
10.0%(
17.0%(
8.6%(
6.3%(
16.0%(
3.6%(
Average Role Spreading
Average"Role"Spreading"in"HS"
%"Nodes"Reached"
100.0%(
80.0%(
60.0%(
40.0%(
20.0%(
0.0%(
Director(
Full(Prof.(
Associate(
Researcher(
Exp(Eng.(
PhD(
Staff(
Other(
P=(0(
100.0%(
59.0%(
27.0%(
48.9%(
24.0%(
17.7%(
46.0%(
9.0%(
P=(0.5(
100.0%(
82.7%(
70.0%(
77.0%(
66.2%(
59.0%(
76.0%(
56.0%(
Another perspective
Community detection performed on the social graph
C6
Specific purpose of
email exchanges
C3
C2
C13
C12
S is assumed to
contain the
underlying research/
workgroup structure
C4
C14
C5
C7
C1
C11
C10
C0
15 clusters identified
C8
C9
Cluster Composition
Other"
PhD"
1"
1"
1"
5"
1"
Staff"
Exp."Eng."
2"
1"
1"
1"
1"
Researcher"
Associate"
1"
5"
2"
3"
1"
3"
1"
1"
2"
3"
4"
6"
8"
1"
1"
6"
1"
3"
5"
1"
1"
3"
6"
3"
3"
8"
5"
1"
1"
7"
10%
12"
4"
37"
5"
5"
2"
3"
6"
1"
2"
2"
3"
2"
11"
1%
4"
2"
10"
1"
1"
1"
1"
3%
5"
3"
2"
0"
Director"
1"
3"
5"
2"
Full"Prof."
1"
3"
1"
3"
4"
5"
6"
7"
3%
7%
8%
4%
9%
2"
8"
1%
1"
2"
3"
9"
10"
11"
12"
13"
14"
11%
7%
20%
3%
4%
9%
1"
1"
Results (Clusters) P=0
Results (Clusters) P=0
Organizational Chart (H)
Social Network Graph (S)
Corporate Social Graph (HS)
300
# Reached Nodes
250
200
150
100
50
0
C0
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
Comparative Considerations
Enron
DIAG
different aims/purposes
profit making company
academic department
hierarchical structure
flat
hierarchyzed
deep
social activity
low
business intended
high (4x)
higher level of collaboration
teaching activities
overall spreading
low - position related
few unexpected peaks
high - position independent
many unexpected peaks
Future Works
Other problems related to enforcing trustworthiness with
human-in-the loop
explicit constraints on subparts of the organization
that have conflicts of interest among them
- i.e. banking/financial institutions and supervisory agencies
developing of online and offline workforce
reorganization algorithms
- minimize exchanges between blocks that must be kept isolated
- expertise constraints
placing a new employee
- analyze existing social ties
misinformation cascade post-mortem analysis
- after the occurrence of an information leak