Basics of graphs - University of Pittsburgh

School of Information Sciences
University of Pittsburgh
Network Science: A Short Introduction
i3 Workshop
Konstantinos Pelechrinis
Summer 2014
Figures are taken from:
M.E.J. Newman, “Networks: An Introduction”
The representation of networks

The network consists of entities connected with each
other

The structure of these connections are represented
through graphs

A graph is represented by two sets
 A vertex set V of the entities participating in the network. In the
rest of the slides typically, n will be the number of vertices
 Also
called node or actor set
 An edge set E of the connections between vertices. In the rest
of the slides typically, m will be the number of edges
2
 Also
called link or tie set
we denote an edge between vertices i and j by (i,j) then the complete network can be spec
ving the value of n and a list of all the edges. For example, the network in Fig. 6.1a has n
Example
es and edges (1,2), (1,5), (2,3), (2,4), (3,4), (3,5), and (3,6). Such a specification is calle
list. Edge lists are sometimes used to store the structure of networks on computers, bu
ematical developments like those in this chapter they are rather cumbersome.
Edges can have direction, but in this introduction
we will only consider undirected edges/networks.
e 6.1: Two small networks. (a) A simple graph, i.e., one having no multiedges or self-ed
3
network
with both multiedges and self-edges.
Edge attributes

Examples
 Weight (e.g., frequency of contacts, bandwidth of the link in a
telecommunication network etc.)
 Ranking (e.g., primary connection, secondary connection etc.)
 Type (e.g., friend edge, family edge, co-worker edge etc.)
 …
4
Edge list and the adjacency matrix

If we label the nodes with IDs 1, 2, … n we can denote each
edge as a pair (i,j)
 This is an edge list specification
 Good
for storing and processing networks in computers, but not for
mathematical development

The adjacency matrix A of a simple graph is a matrix with
elements Aij such that:
ìï 1,
Aij = í
ïî 0,
5
if there is an edge between vertices i and j
otherwise
Example
Edge list
(1,2)
(1,5)
(2,3)
(2,4)
(3,4)
(3,5)
(3,6)
(6.2)
6
Adjacency matrix
Adjacency list

Easier to work if the network is
 Large
 Sparse
1: 2,5
2: 1,3,4
3: 2,4,5,6
4: 2,3
5: 1,3
6: 3
7
Degree

The degree ki of a vertex i in a graph is the number of
edges connected to it
 For undirected graphs we have:
n
ki = å Aij
j=1
 And the number of edges of a graph is given by:
1 n
1 n n
m = å ki = åå Aij
2 i=1
2 i=1 j=1

Mean degree c of a vertex in an undirected graph is:
n
8
1
2m
c = å ki =
n i=1
n
Example
Degree of node 2 = 3
9
Density

The maximum number of possible edges in a simple graph
is:
æ n ö 1
ç
÷ = n(n -1)
2
è
ø 2

Density ρ of a graph is the fraction of these edges that are
actually present:
r=
10
m
æ n ö
ç
÷
2
è
ø
=
2m
c
=
n(n -1) n -1
Degree sequence and degree distribution

Degree sequence is an (ordered) list of the degree of every
node
 In our earlier network we have: [4, 3, 2, 2, 2, 1]

Degree distribution is a frequency count of the occurrence
of each degree
 It is essentially a histogram
11
Paths


A sequence of vertices such that every consecutive pair of
vertices in the sequence is connected by an edge in the
network
Length of a path is the number of edges traversed along
the path
 When a path traverses the same edge e two times, e is counted
twice

A geodesic path (shortest path) is a path between two
vertices such that no shorter path exists
 The length of this path is called geodesic (or shortest) distance
 If two nodes are not connected with any path their geodesic
distance is infinite
12
Connected components
Note, however, that the vertex labels must be chosen correctly to pro

A network
there is no
appearance of blocks in the adjacency matrix depends on the vertice
represented by adjacent rows and columns and choices of labels t
produce non-block-diagonal matrices, even though the choice of la
of the network itself. Thus, depending on the labeling, it may
for which structure
there
pairs
of avertices
thatcomp
obvious fromexists
the adjacency
matrix that
network has separate
exist them
computer is
algorithms,
suchdisconnected
as the “breadth-first search” algorithm
path between
called
that can take a network with arbitrary vertex labels and quickly determi
 If there exists a path between any possible pair of vertices in a
network the latter is called connected

Component is a maximal subset of vertices of a network
such that there exists at least one path from every vertex
of the subgroup to any other
 Each node within a component can be reached from every other
node in the component by following the edges
13
Giant component

14
We begin our discussion of the structure of real-world networks with a look at component sizes. In
an undirected network, we typically find that there is a large component that fills most of the
network—usually more than half and not infrequently over 90%—while the rest of the network is
divided into a large number of small components disconnected from the rest. This situation is
sketched in Fig. 8.1. (The large component is often referred to as the “giant component,” although
this is a slightly sloppy usage. As discussed in Section 12.5, the words “giant component” have a
specific meaning in network theory and are not precisely synonymous with “largest component.”
In this book we will be careful to distinguish between “largest” and “giant.”)
A typical example of this kind of behavior is the network of film actors discussed in Section 3.5.
In this network the vertices represent actors in movies and there is an edge between two actors if
they have ever appeared in the same movie. In a version of the network from May 2000 [253], it
was found that 440 971 out of 449 913 actors were connected together in the largest component, or
about 98%. Thus just 2% of actors were not part of the largest component.
If the largest component includes a significant fraction of
the network, it is called giant component
Figure 8.1: Components in an undirected network. In most undirected networks there is a
single large component occupying a majority, or at least a significant fraction, of the network,
Transitivity

If A is connected to B and B is connected to C, what is the
probability that B is connected to C ?

My friends’ friends are likely to be my friends too
C
?
A
B
15
Local clustering coefficient

The clustering coefficient can be defined for a single
vertex i as:
Clustering
Coefficient
Example
1
(number
of pairs of neighbors of i that
are connected)
CC =
i
(number of pairs of neighbors of i)
1/(2 x 1/2) = 1
1/(2*1/2)=1
16
2/(3 x 2/2) = 2/3
2/(3*2/2)=2/3
3/(4 x 3/2) = 1/2
3/(4*3/2)=1/2
2/(3 x 2/2) = 2/3
2/(3*2/2)=2/3
1/(2 x 1/2) = 1
1/(2*1/2)=1
Clustering coefficient

Watts and Strogatz have suggested computing the
clustering coefficient of a network as the average over all
the local clustering coefficients of the vertices:
1 n
CCWS = åCCi
n i=1
17