Network principles and advanced practice

Network principles and
advanced practice
Anastasios Noulas
Computer Laboratory, University of Cambridge
University of Namur
Spring 2014
Today’s plan
Statistical significance in networks: designing a null model.
Introduction to principal network metrics
Introduction to fundamental processes of real work networks
In the second hour we will move to the
lab for hands on practice on the
theoretical notions.
Null Model of a Network
randomly pick a pair of edges
rewire edges end repeat the process
until new network is generated.
What is a null model? Wikipedia:
In mathematics, in the
study of statistical properties of graphs, the null model is a graph which matches
one specific graph in some of its structural features, but which is otherwise taken
to be an instance of a random graph. The null model is used as a term of
comparison, to verify whether the graph in question displays some feature, such
as community structure, or not.
Not all null models are the same. The design of each depends on what network
properties one is willing to preserve in the original network (degree, clustering
etc.) and, also, what is the null hypothesis being made. The example here
preserves the original degree distribution in the network, but properties such as
clustering, edge geographic length, diameter and shortest paths may change.
Ideally, one would generate a null model enough times to safely draw any
conclusions about the properties observed in the original network.
Clustering Coefficient
Clustering coefficient C: the fraction
of connections that are realized
between the neighbours of a node
The clustering coefficient for a
network measures the number of
triangles observed for a node vs
the number of potential triangles
on average across all nodes.
Average Shortest Paths
Many algorithms to
calculate paths (see
Dijkstra for an example)
One of the most common
operations in graphs/
networks
Given a pair of nodes we
measure the shortest
paths across all paths
between that pair in the
work. Take the average
across all pairs of nodes
and what you have is the
average shortest path.
Measuring paths for all
pairs of nodes is
computationally
expensive
Watts & Strogatz built a model which was able to
capture these characteristics.
Start with regular lattice
– Increase a probability p of “rewiring” a node to another node.
– When p very high the lattice would become a random graph.
It’s a small World
It all started with
Milgram’s 60s famous
experiment...
Nodes in social (and
other real) networks have
been shown to be
separated only by a few
hops (6-degrees of
separations).
Lattice net
Real World
Random
nets
What about place
networks?
From the following article:
Collective dynamics of 'small-world' networks
Duncan J. Watts and Steven H. Strogatz
Nature 393, 440-442(4 June 1998)
doi:10.1038/30918
Network Diameter
The diameter of the network is measured
in the following way: for all pairs of nodes,
take the shortest path and then given
the set of all shortest paths get the
maximum value in the set.
The computational cost of this is usually
high. In large networks one may need to
sample ...
Densification & Shrinking
Diameters (1)
The number of edges
has been shown to scale
super-linearly to the
number of nodes in a
network.
Graph Evolution:
Densification and
Shrinking Diameters @
http://arxiv.org/pdf/physics/0603229.pdf
Densification & Shrinking
Diameters (2)
As networks densify
their effective diameters
shrink.
Issue: What about edges
that stop being active? In
dynamic networks this
may effect the validity of
such metric.
Assortativity
Do you believe place networks
are assortative or not? Why?
M.E.J. Newman. Assortative mixing in networks.
Phys. Rev. Lett. 89, 208701 (2002).
Assortativity refers to the property of nodes in the network to connect with similar
others. Similarity can refer to many different things, but its most common use it refer to
node degree. For instance, most on-line social networks appear to be assortative with
nodes with higher degree connecting to other nodes of similar degree strength. The
values of the metric range from +1 (assortative) to -1 (disassortative) and 0 being neutral.
Edge Reciprocity
Patterns of link reciprocity in directed networks:
http://arxiv.org/abs/cond-mat/0404521
r = # reciprocal edges / #edges
what is the problem with the above?
Read the paper...
In a directed network it would be interesting to understand whether the directionality of
the links formed presents a pattern. A traditional definition of reciprocity, r, measured
simply the fraction of reciprocal links over the total number of links in the network. r=0
means that the network is purely uni-directional, whereas a r=1 implies that every single
edge is directional. As this definition suffered from the problem of not being able to
compare the results against random networks with the same number of links and nodes
(see http://en.wikipedia.org/wiki/Reciprocity_(network_science)), a new definition (which
we use here) has been proposed.
Node Centralities
Betweenness
• Intuition: how many pairs of individuals would have
to go through you in order to reach one another in
the minimum number of hops?
• who has higher betweenness, X or Y?
Closeness
Closeness is based on the length of the average
shortest path between a vertex and all vertices in the graph
Eigenvector
• Degree Centrality depends on having many connections: but what if
these connections are pretty isolated?
• A central node should be one
connected to powerful nodes
The theory of weak ties in networks
Sociologist Mark Granovetter interviewed people about how they discovered their
jobs...
Most people did so through personal contacts, often the personal contacts
described as acquaintances and not close friends.
Basic intuition on this is: close friends are part of triad
closures and would know what you know and would
know others who would know what you know
http://www.soc.ucsb.edu/faculty/friedkin/Syllabi/Soc148/Granovetter%201983.pdf
Bridges in networks
• Edge between A and B is a bridge if,
when deleted, it would make A and B lie
in 2 different components
A local bridge
• An edge is a local bridge if its endpoints
have no friends in common
– If deleting the edge would increase
– the distance of the endpoints to a value more than 2.
Strong Triadic Closure Property
(STPC)
weak tie
• Links between nodes have different “value”: strong and weak ties
– E.g: Friendship vs acquaintances
• Strong Triadic Closure Property (Granovetter): If a node A
has two strong links (to B and C) then a link (strong or weak)
must exist between B and C.
Overlap based link removal
• We have just seen that weak ties matter and if they are removed,
they lead to a breakdown in the network.
• If strong ties are removed they lead to a smooth degrading of the network
• Would expect to see the same in place networks? Why don’t we try it!
Practical
Computing a null model
First things first! Import the following
import networkx as nx
import random
import pylab as plt
import sys
Re-load the network we used in the previous lecture.
If you have your own place network, that is also good!
### LOAD NETWORK FROM FILE ###
cam_net =
nx.read_edgelist('cambridge_net.txt',create_using=nx.DiGraph(),
nodetype = int)
### READ META DATA ###
node_data = {}
for l in open('cambridge_net_titles.txt'):
lineSplits = l.split(';')
node_id = int(lineSplits[0])
place_title = lineSplits[1]
latit = float(lineSplits[2])
longit = float(lineSplits[3])
node_data[node_id] = (place_title,latit,longit)
Computing a null model (1)
Define a method for null model’s computation
def get_randomized_network(original_network):
#rewire net by picking two randomly chosen edges and re-wire them
#--- we assume network is directed!
print 'Getting randomized version of the network’
nx_rand_graph = original_network.to_undirected()
candidate_edges = nx_rand_graph.edges()
num_rewirings = len(candidate_edges) / 2
Computing a null model (2)
for i in range(0, num_rewirings):
#the following mechanism preserves the directionality of the edges
#first edge selection and removal
random_edge_index = random.randint(0, len(candidate_edges)-1)
edge1 = candidate_edges[random_edge_index]
nodeAedge1 = edge1[0]
nodeBedge1 = edge1[1]
del candidate_edges[random_edge_index]
nx_rand_graph.remove_edge(nodeAedge1, nodeBedge1)
#second edge selection and removal
random_edge_index = random.randint(0, len(candidate_edges)-1)
edge2 = candidate_edges[random_edge_index]
nodeAedge2 = edge2[0]
nodeBedge2 = edge2[1]
del candidate_edges[random_edge_index]
nx_rand_graph.remove_edge(nodeAedge2, nodeBedge2)
Computing a null model (1)
1. Rewire edges 2. Exit the loop and 3. Return the randomized version of the network
#then re-wire (randomly chosen) removed edges
nx_rand_graph.add_edge(nodeAedge1, nodeBedge2)
nx_rand_graph.add_edge(nodeAedge2, nodeBedge1)
print 'Done randomizing network. Moving on...'
return nx_rand_graph
Exercise:
1. Measure the average clustering coefficient of the
graph and compare with null model.
2 .Measure the assortativity.
Computing the reciprocity of directed network
def get_reciprocity(original_network):
print 'Calculating link reciprocity in the network.'
#first extract strongly connected component
SGcc = nx.strongly_connected_components(original_network)
try:
SG0=SGcc[0]
except:
print sys.exc_info()
SG0=None
strongly_connected_component = original_network.subgraph(SG0)
nodes = strongly_connected_component.nodes()
num_links = len(strongly_connected_component.edges())
potential_num_links = len(nodes) * (len(nodes) - 1.0)
average_a = num_links / (1.0*potential_num_links)
nominator = 0.0
denominator = 0.0
for i in range(0, len(nodes)):
for j in range(i, len(nodes)):
a_ij = 0.0
a_ji = 0.0
if strongly_connected_component.has_edge(nodes[i], nodes[j]):
a_ij = 1.0
if strongly_connected_component.has_edge(nodes[j], nodes[i]):
a_ji = 0.0
nominator += (a_ij - average_a)*(a_ji - average_a)
denominator += (a_ij - average_a)**2
Exercise:
Test Weak Tie hypothesis in
place networks