Xu11topic7 - socialnetworks-2011

The Structure of Information Pathways
in a Social
Communication Network
Presented By:
Tingting Xu
Under the guidance of:
Augustin Chainterau
Paper Objective
 Study the temporal dynamics of communication using on-line data
 Give temporal notion of ‘distance’ and ‘vector – clocks’ to
formulate a temporal measure which will provide structural
insights
 Define the network backbone to be the sub-graph consisting of
edges on which information has the potential to flow the quickest
Why Construct New Model
 Discrete communication distributed non-uniformly over time
 Direct and indirect flow of information
 Discussion about recent research - has studied communication of
an event-driven nature
 The properties of systemic communication arguably determine
much about the rate at which people in the network remain up-todate on information about each other
The Present Work
 Systemic communication and information pathways
 Propose a framework for analyzing systemic communication based
on inferring structural measures from the potential for information
to flow between different nodes
 Out-of-date information
 Indirect paths – triangle-inequality violation
The Present Work
 Data used here have complete histories of communication
events over long periods of time
 Main datasets - complete set of anonymized e-mail logs among
all faculty and staff at a large university over two years
 Enron e-mail corpus
 The complete set of user-talk communications among admins
and high-volume editors on Wikipedia
 Vector clocks introduced by Lamport and refined by Mattern
 Network backbone
Vector Clocks and Latency
 Communication skeleton G
 The latest view that v has of u at time t is denoted by 𝜙𝑣,𝑡 𝑢
 Define 𝜙𝑣,𝑡 𝑣 = 𝑡 for all v and t
 𝜙𝑣,𝑡 = (𝜙𝑣,𝑡 𝑢 ∶ 𝑢 ∈ 𝑽), refer 𝜙𝑣,𝑡 as the vector clock of v at
time t
 Information latency is denoted by t - 𝜙𝑣,𝑡 (𝑢)
 An algorithm to compute the vector clocks for all nodes at all
time in [0, T]
Latencies in Social Network Data
 Consider only messages with at most c (ranging from 1 and 5)
recipients
 Focus on q-fraction of active e-mal users (Here q = 0.20)
 For a time difference τ , we define the ball of radius τ around
node v at time t, denoted Bτ (v, t), to be the set of all nodes
whose latency with respect to v at time t is ≤ τ days.
 For fixed t, the distribution of ball-sizes over nodes can be
studied using a function ft(τ ), defined as the median value of
|Bτ (v, t)| over all v
Open Worlds vs. Closed Worlds
 Boundary specification problem – value of q-fraction [0, 1]
Quantifying the Strength of Weak Ties
 The range of an edge 𝑒 = (𝑣, 𝑤), defined to be the unweighted
shortest-path distance in the social network between 𝑣 and 𝑤 if
𝑒 were deleted
 Edges of range greater than two are generally weak ties
 Vector-clock analysis can provide evidence for the phenomenon
that weak ties are the sources of important information to their
endpoints
 Define advance in 𝑤’s clock to be the sum of coordinatewise
differences between 𝜙𝑤 before the update from 𝑣 and 𝜙𝑤 after
the update from 𝑣
Backbone Structures
 Instantaneous Backbones
 Define the backbone Ht at time t to be the graph on 𝑉 whose edge set is
the collection of edges from G that are essential at time t.
 An edge (𝑣, 𝑤) is essential if 𝑤’s most up-to-date view of 𝑣 is the result
of direct communication from 𝑣
 Here the backbones Ht at fixed times t as instantaneous backbones, by
contrast with the aggregate backbone which is based on an aggregate
construction that takes all times into account.
Backbone Structures
 An aggregate Backbone
 For each edge (𝑣, 𝑤) in the communication skeleton G such that 𝑣 has
sent ρv, w > 0 messages to 𝑤 over the full time interval [0, T], define the
delay δv, w of the edge (𝑣, 𝑤) to be T/ ρv, w
 The weighted graph Gδ obtained from the communication skeleton G by
assigning a weight of δv, w to each edge (𝑣, 𝑤)
 An edge 𝑒 = (𝑣, 𝑤) in Gδ is essential if it forms the minimum-delay path
between its two endpoints
 Define the aggregate backbone H* to be the sub-graph of Gδ consisting
only of essential edges
Backbone Structures
 How to construct the aggregate Backbone H*
 Compute a weighted shortest-paths tree rooted at each node of Gδ ,
using the delays as weights
 The union of the edges in all these trees will be H*, by the following
proposition
 PROPOSITION An edge 𝑒 = (𝑣, 𝑤) belongs to H* if and only if it lies on
the minimum-delay path between some pair of nodes 𝑥 and 𝑦
 PROOF
Backbone Structures
 How to construct the aggregate Backbone H*
 Compute a weighted shortest-paths tree rooted at each node of Gδ ,
using the delays as weights
 The union of the edges in all these trees will be H*, by the following
proposition
 PROPOSITION An edge 𝑒 = (𝑣, 𝑤) belongs to H* if and only if it lies on
the minimum-delay path between some pair of nodes 𝑥 and 𝑦
Backbone Structures
 Density and node degrees of the backbone
 The backbone Ht and the aggregate backbone H* are surprisingly sparse
related to a fairly dense communication skeleton G
 This in other words, from the point of view of potential information flow,
a significant majority of all edges in the social network are bypassed by
faster indirected paths
Backbone Structures
 Density and node degrees of the backbone
 Considering the backbone also sheds further light on the role of highdegree nodes in the social network
 High-degree nodes in the full communication skeleton G indeed have
many incident edges in the aggregate backbone
 However, the fraction of a node’s edges that are declared essential strictly
decreases with degree.
Backbone Structures
 Structure of the backbone
 The backbone is trying to balance two competing objectives
 Representing long range edges (recall definition of ‘range’)
 Representing edges have high embeddedness and transmit information at
short ranges over quick time scales
 Define embeddedness of an edge to be the fraction of its endpoints’
neighbors that are common to both
For an edge 𝑒 = 𝑣, 𝑤 , let 𝑁𝑣 and 𝑁𝑤 denote the sets of neighbors of
the endpoints 𝑣 and 𝑤 respectively. Define the embeddedness of 𝑒 to be
|𝑁𝑣 ∩ 𝑁𝑤 | / | 𝑁𝑣 ∪ 𝑁𝑤 |
 The backbone balances between two qualitatively different kinds of
information flow
Varying Speed of Communication
 Study what happens to information latencies (i.e. t - ϕv,t (u)) when
each node varies the relative rates of its communication
 Given a directed graph G, with a total rate 𝜌𝑣 for each node 𝑣
 Given a target set 𝑆 of nodes in G
 Each node 𝑣 chooses a rate 𝜌𝑣,𝑤 at which to communicate to each of its
neighbors 𝑤, subject to the constraint that
𝑤
𝜌𝑣,𝑤 = 𝜌𝑣
 Define delays 𝛿𝑣,𝑤 = 𝑇/𝜌𝑣,𝑤 , where T is value of the time interval
 Question here is that: for a given bound 𝛿, can we choose rates for each
node so that the median shortest-path delay between pairs in 𝑆 in the
aggregate backbone is at most 𝛿
Varying Speed of Communication
 THEOREM The delay minimization problem defined above is NP –
complete
 Sketch of the proof of this theorem is in the paper
 Consider simple local rules by which individuals in a network might
vary rates of communication so as to influence the potential for
information flow
Load-leveling vs. Load-concentrating
 For accelerating potential information flow
 Talk even more actively to one’s most frequent contacts

Load-concentrating with 𝛾 > 1
 or balance things out by increasing communication with the less
frequent contacts?

Load-leveling with 𝛾 < 1
 Rescaling exponent 𝛾, changing the communication rate 𝜌𝑣,𝑤 to
𝜌 𝛾 𝑣,𝑤 and then normalizing all rates from 𝑣 to keep its total
outgoing message volume the same
Load-leveling vs. Load-concentrating
 Extend the notion of delay to node-dependent delays which will
have also a fixed delay of 𝜀 at each node
 Total delay on a path becomes the sum of edges and node delays
 As 𝜀 increases, there is a larger penalty for more-hop paths
 The value of 𝛾 at which network latency is optimized decreases
with 𝜀, corssing 𝛾* = 1 at 𝜀 ≈ 4 days
 The backbone becomes denser and the importance of quick
indirect paths diminishes
Conclusions (I)
 Make integral use of information about how nodes communicate over
time
 Develop structural measures based on the potential for information to
flow
 The sparse sub-graph of edges most essential to keeping people up-todate – the backbone of the network – provides important structural
insights that relate to embeddedness, the role of high-degree(i.e.
hubs), and the strength of weak ties
 Studied the effects on information flow as nodes vary the rate at which
they communicate with others in the network using different strategies
Conclusions (II)
 Discussions in other two datasets
 The situations in sparsity of the aggregate and instantaneous
backbones and the variation in node degrees are similar
 Difference - the ‘core’ of active communicators is much smaller in both
the Enron corpus and in Wikipedia, this makes the range of an edge in
the unweighted communication skeleton harder to interpret and to
correlate with other measures
 Further investigation


the principles that govern the dynamics of different types of information
how these principles interact with the directed, weighted nature of social
communication networks
Thank You 