School of Information Sciences University of Pittsburgh Network Science: A Short Introduction i3 Workshop Konstantinos Pelechrinis Summer 2014 Figures are taken from: M.E.J. Newman, “Networks: An Introduction” The representation of networks The network consists of entities connected with each other The structure of these connections are represented through graphs A graph is represented by two sets A vertex set V of the entities participating in the network. In the rest of the slides typically, n will be the number of vertices Also called node or actor set An edge set E of the connections between vertices. In the rest of the slides typically, m will be the number of edges 2 Also called link or tie set we denote an edge between vertices i and j by (i,j) then the complete network can be spec ving the value of n and a list of all the edges. For example, the network in Fig. 6.1a has n Example es and edges (1,2), (1,5), (2,3), (2,4), (3,4), (3,5), and (3,6). Such a specification is calle list. Edge lists are sometimes used to store the structure of networks on computers, bu ematical developments like those in this chapter they are rather cumbersome. Edges can have direction, but in this introduction we will only consider undirected edges/networks. e 6.1: Two small networks. (a) A simple graph, i.e., one having no multiedges or self-ed 3 network with both multiedges and self-edges. Edge attributes Examples Weight (e.g., frequency of contacts, bandwidth of the link in a telecommunication network etc.) Ranking (e.g., primary connection, secondary connection etc.) Type (e.g., friend edge, family edge, co-worker edge etc.) … 4 Edge list and the adjacency matrix If we label the nodes with IDs 1, 2, … n we can denote each edge as a pair (i,j) This is an edge list specification Good for storing and processing networks in computers, but not for mathematical development The adjacency matrix A of a simple graph is a matrix with elements Aij such that: ìï 1, Aij = í ïî 0, 5 if there is an edge between vertices i and j otherwise Example Edge list (1,2) (1,5) (2,3) (2,4) (3,4) (3,5) (3,6) (6.2) 6 Adjacency matrix Adjacency list Easier to work if the network is Large Sparse 1: 2,5 2: 1,3,4 3: 2,4,5,6 4: 2,3 5: 1,3 6: 3 7 Degree The degree ki of a vertex i in a graph is the number of edges connected to it For undirected graphs we have: n ki = å Aij j=1 And the number of edges of a graph is given by: 1 n 1 n n m = å ki = åå Aij 2 i=1 2 i=1 j=1 Mean degree c of a vertex in an undirected graph is: n 8 1 2m c = å ki = n i=1 n Example Degree of node 2 = 3 9 Density The maximum number of possible edges in a simple graph is: æ n ö 1 ç ÷ = n(n -1) 2 è ø 2 Density ρ of a graph is the fraction of these edges that are actually present: r= 10 m æ n ö ç ÷ 2 è ø = 2m c = n(n -1) n -1 Degree sequence and degree distribution Degree sequence is an (ordered) list of the degree of every node In our earlier network we have: [4, 3, 2, 2, 2, 1] Degree distribution is a frequency count of the occurrence of each degree It is essentially a histogram 11 Paths A sequence of vertices such that every consecutive pair of vertices in the sequence is connected by an edge in the network Length of a path is the number of edges traversed along the path When a path traverses the same edge e two times, e is counted twice A geodesic path (shortest path) is a path between two vertices such that no shorter path exists The length of this path is called geodesic (or shortest) distance If two nodes are not connected with any path their geodesic distance is infinite 12 Connected components Note, however, that the vertex labels must be chosen correctly to pro A network there is no appearance of blocks in the adjacency matrix depends on the vertice represented by adjacent rows and columns and choices of labels t produce non-block-diagonal matrices, even though the choice of la of the network itself. Thus, depending on the labeling, it may for which structure there pairs of avertices thatcomp obvious fromexists the adjacency matrix that network has separate exist them computer is algorithms, suchdisconnected as the “breadth-first search” algorithm path between called that can take a network with arbitrary vertex labels and quickly determi If there exists a path between any possible pair of vertices in a network the latter is called connected Component is a maximal subset of vertices of a network such that there exists at least one path from every vertex of the subgroup to any other Each node within a component can be reached from every other node in the component by following the edges 13 Giant component 14 We begin our discussion of the structure of real-world networks with a look at component sizes. In an undirected network, we typically find that there is a large component that fills most of the network—usually more than half and not infrequently over 90%—while the rest of the network is divided into a large number of small components disconnected from the rest. This situation is sketched in Fig. 8.1. (The large component is often referred to as the “giant component,” although this is a slightly sloppy usage. As discussed in Section 12.5, the words “giant component” have a specific meaning in network theory and are not precisely synonymous with “largest component.” In this book we will be careful to distinguish between “largest” and “giant.”) A typical example of this kind of behavior is the network of film actors discussed in Section 3.5. In this network the vertices represent actors in movies and there is an edge between two actors if they have ever appeared in the same movie. In a version of the network from May 2000 [253], it was found that 440 971 out of 449 913 actors were connected together in the largest component, or about 98%. Thus just 2% of actors were not part of the largest component. If the largest component includes a significant fraction of the network, it is called giant component Figure 8.1: Components in an undirected network. In most undirected networks there is a single large component occupying a majority, or at least a significant fraction, of the network, Transitivity If A is connected to B and B is connected to C, what is the probability that B is connected to C ? My friends’ friends are likely to be my friends too C ? A B 15 Local clustering coefficient The clustering coefficient can be defined for a single vertex i as: Clustering Coefficient Example 1 (number of pairs of neighbors of i that are connected) CC = i (number of pairs of neighbors of i) 1/(2 x 1/2) = 1 1/(2*1/2)=1 16 2/(3 x 2/2) = 2/3 2/(3*2/2)=2/3 3/(4 x 3/2) = 1/2 3/(4*3/2)=1/2 2/(3 x 2/2) = 2/3 2/(3*2/2)=2/3 1/(2 x 1/2) = 1 1/(2*1/2)=1 Clustering coefficient Watts and Strogatz have suggested computing the clustering coefficient of a network as the average over all the local clustering coefficients of the vertices: 1 n CCWS = åCCi n i=1 17
© Copyright 2026 Paperzz