Math C5.4, Networks, University of Oxford Solution Sheet 1

Math C5.4, Networks, University of Oxford
Solution Sheet 1
Heather A Harrington
1. Reading.
(a) Browse through Chapters 1–5 of Newman’s book. Write down a sentence or two about the most interesting
thing you read in these chapters. Indicate an example of a social network, a biological network, and
a physical network from a source other than lectures or Newman’s book. Cite these sources using the
citation format in the back of Newman’s book.
Solution: Chapters 1–5 of Newman are descriptive chapters that cover various examples. It is worth
discussing for a couple minutes what ideas you found interesting. We’ll also discuss some of the other
networks you found.
(b) Read Chapter 6, Section 7.1, and Sections 8.3–8.4 of Newman’s book.
Solution: There is nothing to write here.
2. Software, Data, and Visualization.
(a) Download
and
install
Matlab.
software-personal-machines/matlab.)
(See
http://www.maths.ox.ac.uk/members/it/
Solution: There is nothing to write here.
(b) Practice with Matlab. For example, you could go through parts of a tutorial available at http://www.
tutorialspoint.com/matlab/. There are also numerous other resources for learning Matlab available
online. In your homework submission, indicate what practice and tutorials you have done. Cite all references
explicitly.
Solution: There is nothing to write here.
(c) Download the Brain Connectivity Toolbox from https://sites.google.com/site/bctnet/Home. As you
can see, the BCT includes scripts for many types of calculations. This is one of many such sources available
online.
Solution: There is nothing to write here.
(d) Download data for at least two small unweighted, undirected networks of somewhat different sizes (e.g.,
one of them should have roughly 5–10 times as many nodes as the other). Cite your sources for these data
sets. As you can see, numerous data sets are available online.
Solution: Three data sets are the Caltech, MIT, and Johns Hopkins networks from the Facebook100
data set, which is described in [3]. (The Caltech data set debuted previously as part of the Facebook5
data set in [2].) The Facebook100 data set has been used as benchmark networks in numerous papers.
The data consists of the full set of Facebook “friendship” connections from each of 100 US universities
during a single-time snapshot in fall 2005 (when one had to have a .edu account to be on Facebook). The
smallest network (Caltech) has 762 nodes in the largest connected component (LCC), and the largest has
more than 40000 nodes in the LCC. This data set was obtained in 2005 from Adam d’Angelo of Facebook.
Let’s use three networks from the FB100: Caltech (with 762 nodes in the LCC), its archrival MIT (which
has 6402 nodes in the LCC), and Johns Hopkins (which has 5157 nodes in the LCC).
Note: You might not consider the example networks that I used to be “small”, and that too depends on
perspective. From the perspective of data analysis, these networks are very small indeed. However, they
are large enough in order to observe and discuss features in realistic situations. Also note that the FB100
data is basically as clean a situation as one is ever going to get for real data (especially when it comes to
cross-sectional data as a “real-life ensemble” of social networks), and that is why they have become popular
choices to use as benchmarks for testing algorithms and other things.
2
(e) Using some software, draw visualizations of these networks. One possibility is the following: http://
netwiki.amath.unc.edu/VisComms/VisComms. There are numerous other software packages.
Solution: In Fig. 2e, I show a visualization of the LCC in the Caltech network and the LCC in the Johns
Hopkins network from the FB100 data set. The former plot, which is an alternative version of a plot in [2],
was created using the Matlab software indicated above. The latter plot comes from [1] and was created
using software released with that paper.
(f) Plot the degree distribution for each of the two networks that you downloaded. What are you able to
conclude from these degree distributions?
Solution: In Fig. 2f, I plot the degree distributions of the Caltech and MIT networks. There isn’t much
to learn from them, though if you can expect (because these are social networks) that they have heavy
tails (though with finite-size cutoffs). From the plots, it looks like the fluctuations in the (smaller) Caltech
data set are larger than in the MIT data set.
(g) Download and install a version of LATEX. You may wish to typeset some of your homework solutions using
LATEX. Later in C5.4, you will be required to use LATEX for some things.
Solution: There is nothing to write here.
Note: I am requiring some use of Matlab for the above problem (and it is freely available to you through
Oxford’s site license). However, you may prefer other languages, software, etc., and you are welcome to use
those instead for your work in C5.4.
3. Random graphs. In later lectures, we will discuss ensembles of random graphs. The simplest and most famous
type of random graph is an Erdős–Rényi (ER) graph (which was first studied by Solomonoff and Rapoport).
In particular, consider the random-graph ensemble G(N, p), which is defined as follows: Suppose that there are
N nodes. Between each pair of distinct nodes, a single edge exists with uniform and independent probability
p. There are no self-edges. A single graph G ∈ G(N, p) is generated using this process, and it is interesting to
study the properties of collections (“ensembles”) of graphs that are generated in this way. The probability in
which each simple graph G with m edges appears in a graph G ∈ G(N, p) is
P (G) = pm (1 − p)n C2 −m .
(1)
(a) Write down the total probability of drawing a graph G with exactly m edges from the ensemble G(N, p),
and use it to find the mean value of edges hmi.
Solution: This problem introduces the idea of an ensemble of random graphs. Importantly, when I discuss
calculating a property of a random graph — aside from what is required for each graph in the ensemble
by definition — I am actually referring to calculating a mean property of graphs that are drawn from that
ensemble.
The probability of drawing an N -node graph G with exactly m edges is
 
N
(2)
P (m) =  2  pm (1 − p)N C2 −m ,
m
a
which is the binomial distribution. (In this document, I am using a Cb and
interchangeably depending
b
on what I think is easier to read.) Consequently,
C2
N
X
N
hmi =
mP (m) =
p.
(3)
2
m=0
(b) Calculate the expected mean degree of an ER graph.
Solution: The mean degree of a graph G with m edges is k = 2m/N , so the expected mean degree of an
ER graph is
C2
N
X
2m
2 N
hki =
P (m) =
p = (N − 1)p ≡ c .
(4)
N
N 2
m=0
3
(c) Show, under an appropriate assumption (which you should state), that the degree distribution for an ER
graph (in expectation over the ensemble) satisfies
pk ∼ e−c
ck
,
k!
N → ∞,
(5)
where c = (N − 1)p.
Solution: Pick a node in G. It is adjacent with independent probability p to each of the N −1 other nodes.
Thus, the probability to be adjacent to a particular k other nodes (and to not be adjacent to any of the
others) is pk (1 − p)N −1−k . There are N −1 Ck ways to choose these k special nodes, so the probability of our
node being adjacent to exactly k other nodes is N −1 Ck pk (1 − p)N −1−k , which is the binomial distribution.
Hence, we have a binomial degree distribution for our ER graph.
Now we want to see what we expect to happen as N becomes large. We’re going to assume that the mean
degree is approximately constant as N → ∞ [4]. In this case, p = c/(N − 1) → 0 as N → ∞. We can then
write
c
ln (1 − p)N −1−k = (N − 1 − k) ln 1 −
N −1
c
≈ −c ,
(6)
≈ −(N − 1 − k)
N −1
where we have used a Taylor expansion. The approximations become exact as N → ∞. Exponentiating
both sides yields (1 − p)N −1−k = e−c , where again equality holds in the N → ∞ limit. We also have for
large N that
(N − 1)k
(N − 1)!
N −1
≈
,
(7)
=
k
(N − 1 − k)!k!
k!
from which we obtain
pk =
(N − 1)k k −c
(N − 1)k
p e =
k!
k!
c
N −1
k
e−c = e−c
ck
.
k!
(8)
This is the Poisson distribution. Thus, we expect ER graphs to have a Poisson degree distribution in
the N → ∞ limit (though note the condition on the scaling of degree with N ... does that hold for real
networks?).
(d) By doing calculations in Matlab, or in some other program, compare the expectations that you calculated
above to sample means from a set of ER networks. Indicate explicitly what sizes and parameter values you
consider. Also indicate explicitly the number of elements in your ensemble. What happens as you take a
sample mean over a larger number of ER graphs?
Solution: For a fixed network size, you should find that the sample mean more closely approximates the
expected value when you consider more samples. For quantities that are derived as N → ∞, you will find
that the sample mean more closely approximates the derived expression for larger networks. However, as
you consider larger networks, you will need to consider samples with more networks to have a reasonable
sample from the ensemble of possible networks.
Note 1: Writing “in expectation over the ensemble” gets repetitive rather quickly. In the applied literature,
when one writes something like “the degree distribution of an ER graph”, it is normally understood to be a
calculation that is in expectation over the ensemble. To be more precise, one can use notation like G for a single
realization and G for an ensemble. Expectations over graph ensembles are often compared to “sample means”,
in which one calculates a quantity of interest for each of some number of draws from an ensemble of graphs and
then averages over those results. In Newman’s book, he often writes about “averaging over” things, and you
should compare that language to what I have stated in this paragraph.
Note 2: The term “ensemble” comes from statistical mechanics, and the meaning here is exactly the same as
in that subject. Using more mathematical language, a random-graph ensemble is a probability distribution on
graphs.
4
[1] L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think Locally, Act Locally: Detection of
Small, Medium-Sized, and Large Communities in Large Networks. Phys. Rev. E, 91:012821, 2015.
[2] A. L. Traud, E. D. Kelsic, P. J. Mucha, and M. A. Porter. Comparing community structure to characteristics in online
collegiate social networks. SIAM Rev., 53:526–543, 2011.
[3] A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of Facebook networks. Physica A, 391:4165–4180, 2012.
[4] Question for students: Does this hold for real networks?
5
(a)
(b)
FIG. 1: Visualizations of the (a) Caltech and (b) Johns Hopkins networks from the Facebook100 data set. [The visualization
in panel (a) is an alternative version of a figure in [2], and the figure in panel (b) comes from [1].]
6
FIG. 2: Degree distribution of the Caltech and MIT networks from the Facebook100 data set.