Math C5.4, Networks, University of Oxford Solution Sheet 1 Heather A Harrington 1. Reading. (a) Browse through Chapters 1–5 of Newman’s book. Write down a sentence or two about the most interesting thing you read in these chapters. Indicate an example of a social network, a biological network, and a physical network from a source other than lectures or Newman’s book. Cite these sources using the citation format in the back of Newman’s book. Solution: Chapters 1–5 of Newman are descriptive chapters that cover various examples. It is worth discussing for a couple minutes what ideas you found interesting. We’ll also discuss some of the other networks you found. (b) Read Chapter 6, Section 7.1, and Sections 8.3–8.4 of Newman’s book. Solution: There is nothing to write here. 2. Software, Data, and Visualization. (a) Download and install Matlab. software-personal-machines/matlab.) (See http://www.maths.ox.ac.uk/members/it/ Solution: There is nothing to write here. (b) Practice with Matlab. For example, you could go through parts of a tutorial available at http://www. tutorialspoint.com/matlab/. There are also numerous other resources for learning Matlab available online. In your homework submission, indicate what practice and tutorials you have done. Cite all references explicitly. Solution: There is nothing to write here. (c) Download the Brain Connectivity Toolbox from https://sites.google.com/site/bctnet/Home. As you can see, the BCT includes scripts for many types of calculations. This is one of many such sources available online. Solution: There is nothing to write here. (d) Download data for at least two small unweighted, undirected networks of somewhat different sizes (e.g., one of them should have roughly 5–10 times as many nodes as the other). Cite your sources for these data sets. As you can see, numerous data sets are available online. Solution: Three data sets are the Caltech, MIT, and Johns Hopkins networks from the Facebook100 data set, which is described in [3]. (The Caltech data set debuted previously as part of the Facebook5 data set in [2].) The Facebook100 data set has been used as benchmark networks in numerous papers. The data consists of the full set of Facebook “friendship” connections from each of 100 US universities during a single-time snapshot in fall 2005 (when one had to have a .edu account to be on Facebook). The smallest network (Caltech) has 762 nodes in the largest connected component (LCC), and the largest has more than 40000 nodes in the LCC. This data set was obtained in 2005 from Adam d’Angelo of Facebook. Let’s use three networks from the FB100: Caltech (with 762 nodes in the LCC), its archrival MIT (which has 6402 nodes in the LCC), and Johns Hopkins (which has 5157 nodes in the LCC). Note: You might not consider the example networks that I used to be “small”, and that too depends on perspective. From the perspective of data analysis, these networks are very small indeed. However, they are large enough in order to observe and discuss features in realistic situations. Also note that the FB100 data is basically as clean a situation as one is ever going to get for real data (especially when it comes to cross-sectional data as a “real-life ensemble” of social networks), and that is why they have become popular choices to use as benchmarks for testing algorithms and other things. 2 (e) Using some software, draw visualizations of these networks. One possibility is the following: http:// netwiki.amath.unc.edu/VisComms/VisComms. There are numerous other software packages. Solution: In Fig. 2e, I show a visualization of the LCC in the Caltech network and the LCC in the Johns Hopkins network from the FB100 data set. The former plot, which is an alternative version of a plot in [2], was created using the Matlab software indicated above. The latter plot comes from [1] and was created using software released with that paper. (f) Plot the degree distribution for each of the two networks that you downloaded. What are you able to conclude from these degree distributions? Solution: In Fig. 2f, I plot the degree distributions of the Caltech and MIT networks. There isn’t much to learn from them, though if you can expect (because these are social networks) that they have heavy tails (though with finite-size cutoffs). From the plots, it looks like the fluctuations in the (smaller) Caltech data set are larger than in the MIT data set. (g) Download and install a version of LATEX. You may wish to typeset some of your homework solutions using LATEX. Later in C5.4, you will be required to use LATEX for some things. Solution: There is nothing to write here. Note: I am requiring some use of Matlab for the above problem (and it is freely available to you through Oxford’s site license). However, you may prefer other languages, software, etc., and you are welcome to use those instead for your work in C5.4. 3. Random graphs. In later lectures, we will discuss ensembles of random graphs. The simplest and most famous type of random graph is an Erdős–Rényi (ER) graph (which was first studied by Solomonoff and Rapoport). In particular, consider the random-graph ensemble G(N, p), which is defined as follows: Suppose that there are N nodes. Between each pair of distinct nodes, a single edge exists with uniform and independent probability p. There are no self-edges. A single graph G ∈ G(N, p) is generated using this process, and it is interesting to study the properties of collections (“ensembles”) of graphs that are generated in this way. The probability in which each simple graph G with m edges appears in a graph G ∈ G(N, p) is P (G) = pm (1 − p)n C2 −m . (1) (a) Write down the total probability of drawing a graph G with exactly m edges from the ensemble G(N, p), and use it to find the mean value of edges hmi. Solution: This problem introduces the idea of an ensemble of random graphs. Importantly, when I discuss calculating a property of a random graph — aside from what is required for each graph in the ensemble by definition — I am actually referring to calculating a mean property of graphs that are drawn from that ensemble. The probability of drawing an N -node graph G with exactly m edges is N (2) P (m) = 2 pm (1 − p)N C2 −m , m a which is the binomial distribution. (In this document, I am using a Cb and interchangeably depending b on what I think is easier to read.) Consequently, C2 N X N hmi = mP (m) = p. (3) 2 m=0 (b) Calculate the expected mean degree of an ER graph. Solution: The mean degree of a graph G with m edges is k = 2m/N , so the expected mean degree of an ER graph is C2 N X 2m 2 N hki = P (m) = p = (N − 1)p ≡ c . (4) N N 2 m=0 3 (c) Show, under an appropriate assumption (which you should state), that the degree distribution for an ER graph (in expectation over the ensemble) satisfies pk ∼ e−c ck , k! N → ∞, (5) where c = (N − 1)p. Solution: Pick a node in G. It is adjacent with independent probability p to each of the N −1 other nodes. Thus, the probability to be adjacent to a particular k other nodes (and to not be adjacent to any of the others) is pk (1 − p)N −1−k . There are N −1 Ck ways to choose these k special nodes, so the probability of our node being adjacent to exactly k other nodes is N −1 Ck pk (1 − p)N −1−k , which is the binomial distribution. Hence, we have a binomial degree distribution for our ER graph. Now we want to see what we expect to happen as N becomes large. We’re going to assume that the mean degree is approximately constant as N → ∞ [4]. In this case, p = c/(N − 1) → 0 as N → ∞. We can then write c ln (1 − p)N −1−k = (N − 1 − k) ln 1 − N −1 c ≈ −c , (6) ≈ −(N − 1 − k) N −1 where we have used a Taylor expansion. The approximations become exact as N → ∞. Exponentiating both sides yields (1 − p)N −1−k = e−c , where again equality holds in the N → ∞ limit. We also have for large N that (N − 1)k (N − 1)! N −1 ≈ , (7) = k (N − 1 − k)!k! k! from which we obtain pk = (N − 1)k k −c (N − 1)k p e = k! k! c N −1 k e−c = e−c ck . k! (8) This is the Poisson distribution. Thus, we expect ER graphs to have a Poisson degree distribution in the N → ∞ limit (though note the condition on the scaling of degree with N ... does that hold for real networks?). (d) By doing calculations in Matlab, or in some other program, compare the expectations that you calculated above to sample means from a set of ER networks. Indicate explicitly what sizes and parameter values you consider. Also indicate explicitly the number of elements in your ensemble. What happens as you take a sample mean over a larger number of ER graphs? Solution: For a fixed network size, you should find that the sample mean more closely approximates the expected value when you consider more samples. For quantities that are derived as N → ∞, you will find that the sample mean more closely approximates the derived expression for larger networks. However, as you consider larger networks, you will need to consider samples with more networks to have a reasonable sample from the ensemble of possible networks. Note 1: Writing “in expectation over the ensemble” gets repetitive rather quickly. In the applied literature, when one writes something like “the degree distribution of an ER graph”, it is normally understood to be a calculation that is in expectation over the ensemble. To be more precise, one can use notation like G for a single realization and G for an ensemble. Expectations over graph ensembles are often compared to “sample means”, in which one calculates a quantity of interest for each of some number of draws from an ensemble of graphs and then averages over those results. In Newman’s book, he often writes about “averaging over” things, and you should compare that language to what I have stated in this paragraph. Note 2: The term “ensemble” comes from statistical mechanics, and the meaning here is exactly the same as in that subject. Using more mathematical language, a random-graph ensemble is a probability distribution on graphs. 4 [1] L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think Locally, Act Locally: Detection of Small, Medium-Sized, and Large Communities in Large Networks. Phys. Rev. E, 91:012821, 2015. [2] A. L. Traud, E. D. Kelsic, P. J. Mucha, and M. A. Porter. Comparing community structure to characteristics in online collegiate social networks. SIAM Rev., 53:526–543, 2011. [3] A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of Facebook networks. Physica A, 391:4165–4180, 2012. [4] Question for students: Does this hold for real networks? 5 (a) (b) FIG. 1: Visualizations of the (a) Caltech and (b) Johns Hopkins networks from the Facebook100 data set. [The visualization in panel (a) is an alternative version of a figure in [2], and the figure in panel (b) comes from [1].] 6 FIG. 2: Degree distribution of the Caltech and MIT networks from the Facebook100 data set.
© Copyright 2026 Paperzz