Diffusion Through Multilayer Networks

IT 15 017
Examensarbete 30 hp
April 2015
Diffusion Through Multilayer
Networks
Panagiotis Kallioras
Institutionen för informationsteknologi
Department of Information Technology
Abstract
Diffusion Through Multilayer Networks
Panagiotis Kallioras
Teknisk- naturvetenskaplig fakultet
UTH-enheten
Besöksadress:
Ångströmlaboratoriet
Lägerhyddsvägen 1
Hus 4, Plan 0
Postadress:
Box 536
751 21 Uppsala
Telefon:
018 – 471 30 03
Telefax:
018 – 471 30 00
Hemsida:
http://www.teknat.uu.se/student
Population of earth is increasing rapidly, generating more and more situations which
allow us to study the mechanisms behind humans. Previous studies have found
evidence of important network effects. When network effects are present, the value
of a product or service is dependent on the number of others using them; however
there is a lack of empirical research concerning them. The goal of this thesis is to
examine and analyze such networks and try through a simulation model of a diffusion
process to identify the determinant that can predict the result. We investigate these
networks in terms of the probability of the virus to be spread to the neighbors of the
receiver needed to reach a given percentage of the nodes in the network. Moreover,
we are going to answer to questions like how these parameters change when we
change the structure of the networks and their relationships. Finally, the provided
results after a number of tests performed on our data will demonstrate how they
effect in networks. The study of previous literature will help us to obtain a more
depth understanding of multilayer networks. Verification of the model on real data is
an objective of the thesis but it is not guaranteed, given the difficulty in retrieving real
data.
Handledare: Matteo Magnani
Ämnesgranskare: Christian Rohner
Examinator: Jarmo Rantakokko
IT 15 017
Tryckt av: Reprocentralen ITC
Table of Contents Introduction ................................................................................................................... 1 Contribution of this Paper .......................................................................................... 2 Related Work .............................................................................................................. 3 Background .................................................................................................................... 6 Introduction on modeling & complex systems .......................................................... 6 Introduction on networks .......................................................................................... 8 Complex Networks ................................................................................................. 9 Single Networks .................................................................................................... 10 Multiplex-­‐Multilayer Networks ............................................................................ 12 Diffusion in Networks ........................................................................................... 12 Networks and Epidemiology ................................................................................. 13 Introduction to graph models .................................................................................. 13 ƌĚƅƐʹZĠŶLJŝ model ............................................................................................... 14 ĂƌĂďĄƐŝ-­‐Albert model .......................................................................................... 15 Epidemiological Modeling ........................................................................................ 17 Approaches ʹ Analysis ................................................................................................. 19 Single Networks ........................................................................................................ 20 ƌĚƅƐʹZĠŶLJŝdžƉĞƌŝŵĞŶƚ ...................................................................................... 21 ĂƌĂďĄƐŝ-­‐Albert Experiment ................................................................................. 25 Multiple Networks .................................................................................................... 29 Two-­‐Layers Experiment ........................................................................................ 30 Three-­‐Layers Experiment ...................................................................................... 37 Four-­‐Layers Experiment ........................................................................................ 40 Combination of random networks in a multiple one (two-­‐layer) ........................ 43 Discussion..................................................................................................................... 51 About time................................................................................................................ 51 About beta ................................................................................................................ 52 About number of nodes ........................................................................................... 52 About infection probability ...................................................................................... 53 About ER & BA Network Topologies ........................................................................ 54 Conclusion .................................................................................................................... 57 Acknowledgements ...................................................................................................... 59 Appendix ...................................................................................................................... 60 About R programming language .............................................................................. 60 ďŽƵƚ͞ŝŐƌĂƉŚ͟ƉĂĐŬĂŐĞ ........................................................................................... 61 Stochastic Process & Markov Chains for SIR ............................................................ 62 References ................................................................................................................... 63 Introduction In recent year Internet has grown and expanded exponentially at many levels of users and service level. The widespread use of distributed databases, distributed computing and telecommunications applications is directly applicable, and is a fundamental element in communications, defense, banks, exchange services, health, education and other important areas. This has made it imperative need to protect computer and network systems from threats which can make them vulnerable to malicious users and actions. But to protect something, we should first understand and analyze what is threatened. The availability of reliable models for the spread of threats to computer networks can be useful in many ways such as to predict future threats or develop new methods of containment. This search for new and better models is a research sector in the academic community and not only. Epidemiological models are inspired by the science of biology, which describes the spread of infectious diseases in human populations. These models are now widely used for modeling the spread to several dealers' computers. We analyze each model assumptions made in terms of their advantages and disadvantages. Epidemiological models presented and analyzed are inspired by their biological counterparts, which have been created in areas such as the field of epidemiology in medicine, which deals with infectious diseases. These models used to model the spread of several threats to networks, such as viruses and worms. We have to mention that computer viruses and worms are the only forms of artificial life that have measurable impact-­‐
influence on society. We are analyzing a main strategy which we are using to spread a virus to complex systems, like the multiple ones. In addition, we are going to present some basic types of random networks which have been created and characterize these multiple networks. This knowledge on the topology networks is an essential element directly related to the dissemination some threats concerned with this work. In the below chapters, we are considering thoroughly complex systems, complex networks and epidemiological models. Specifically, in the first chapter we are presenting some basic concepts, which are necessary to be understood in advance concerning their content and the conclusions of this work. Initially, we are providing a retrospection of the fields of information technology and biology by dealing with what are the threats and the how a virus is transmitted. Moreover, we are describing in depth the complex networks, graph models and what they can actually do. Then, an introduction of the SIR epidemiological model is described and we continue by presenting the consequences this biological model has in networks. In the second chapter, we propose a model that contributes to the examined model of the 1 simultaneous diffusion of viruses on a network. Then, we are going to extend the aforementioned model to multiple ones, which contain more layers of random networks and we are presenting our results. In the third chapter, there is a discussion, which shows our assumptions from the acquired results. Finally, in the fourth chapter we are making our conclusive remarks in conjunction to some suggested propositions that we have for the future researchers of this field. Contribution of this Paper After focusing a bit more extensively on the challenges that this field faces, this paper will attempt to provide answers to some of the questions mentioned above. We want to discover how an epidemiological model, such as SIR, can be applied in random multiple networks and how the diffusion of a virus can happen in them and what alterations we have when we alter parameters. The focal point of this paper is the methods and applications examining the dynamic of diffusion processes in multiplex networks. The purpose is to present a basic epidemiological model, the SIR model and have to simulate it in random multiple networks. The main goal of this project that we want to achieve is to examine and analyze multilayer networks and try through the simulation model of diffusion process, which we proposed to identify the determinant that can predict the result. Examples on what we might investigate on these networks are the probability of spreading in a given number of the nodes in the network, or how a spreading works in different layers in random networks (for example, having a combination of two different random networks or more than two layers). We also have to answer questions like how do these probabilities and layers change when we change the structure of the networks and their relationships. During this project, we are also conducting a comparison of different kind of networks through single and multilayer networks. Regarding this part of the study, we are going to employ these methods and algorithms to develop a simulator based on diffusion model and are going to study how different network configurations behave. For our experiments, we are using models from the graph theory and a main epidemiological model, SIR. Our tool for the project is R (See Appendix A) in which we implement our code, which will assist us in our experiments. Finishing, our major challenges in this thesis are to try and analyze this epidemiological model according to the efficiency of these networks (topology, connections etc.), apply it on multiple ones and focus on the speed that the threat is spreading on those networks. The below articles mentioned on the Related Work section have provided the incentive to start a research on multiple networks and the spreading of a virus in them. 2 Related Work What we have seen in the research papers is what we discuss in this section. Starting with the first paper [6], the authors focus on a class of dynamic processes and the ŵĂŝŶŵŽĚĞůƚŚĂƚƚŚĞLJƵƐĞŝƐtĂƚƚ͛ƐƚŚƌĞƐŚŽůĚŵŽĚĞů͘ They study cascade processes in multiplex networks and their work extends on other results on single networks with arbitrary degree distribution to multiple networks. In their proposed model, they use a condition for adoption that is capable capturing the relative effect of content in the spread of influence. In addition, it highlights how different contents may have different spreading characteristics over the same network. Regarding analytics results, they start their analysis by deriving the condition and probability of global spreading events in overlay social-­‐physical network H, compute the probability and then use the mean-­‐field approach to solve the survival probability of a branching process. Interestingly, in their results they see an agreement between the analytical results and simulations, confirming the validity of their analysis. Moreover, they see how content might impact the dynamics of complex contagions over the same network and for a better demonstration of the effect of content on the probability and size of global cascades; they consider a different experimental set-­‐up, having three different cases, in which they fix all the parameters except the content-­‐parameter c. The formulation presented provides many new questions in the field. Following the same reasoning, we have seen a study from Brummitt, Lee, & Goh [9], for the impact of multiplex networks on cascade dynamics in the same threshold model. But here, they generalize this model for these networks, in which nodes activate, if a sufficiently large fraction on neighbors in any layer is active. To conclude, the interplay among multiple kinds of interactions, the multiplicity, can generically increase network's vulnerability to global cascades in a threshold model. For them, the impact of multiplicity on network dynamics is expected to be widespread. Dickison, Havlin, & Stanley [11], worked with two interconnected networks and they defined two different interconnected network regimes, strongly and weakly coupled, and they found the interaction strength value separating these two regimes. For their model, they considered the case of only two interconnected networks of equal size, but it is easily possible to extend the model to an arbitrary number of networks of any size. This method generates random, uncorrelated, interconnected network systems with specified inter-­‐ and intra-­‐ network degree distributions. The model they use is the SIR epidemic model, the one that we use in our work, to study the effects of interconnected network structure on epidemic threshold and Monte Carlo simulations performed to verify results. In conclusion, the strongly coupled network systems, epidemics occur always across the entire interacting network system, with the presence of interconnections enhancing epidemic spreading. In weakly coupled 3 network systems, a mixed phase exists where epidemics do not always occur across the full interconnected network system, and interconnections affect only epidemic spreading across the less interconnected network. In the next paper we read [12], the researchers try to simplify matters by omitting all aspects of cascading and treat percolation on interdependent networks as an epidemic spreading process in complete analogy to ordinary percolation. The paper deals only with locally treelike random networks for which, mean field theory based on generating functions becomes exact in the large system limit. In summary, they consider percolation on various interdependent or dependency networks, pointing out the close analogy to epidemic spreading on single networks without these dependencies. At the same level in another paper [8], they present a bond percolation formalism of multiple networks with an arbitrary joint degree distribution where nodes have explicit properties associated with the type they belong to. They introduce the multiple networks and define several quantities of interest. Then, the formalism is developed where they obtain the occupied degree and excess degree distributions, the small component sizes, the percolation threshold, and the giant and (average) small component sizes. Also, they show that their formalism corresponds to a generalization of existing approaches. The formalism describes the heterogeneous bond percolation of multiple networks. In the following paper [13], it intrigued us is that they develop a simple, dynamical model of load shedding on sparsely interconnected networks. They study BakʹTangʹ
Wiesenfeld sandpile dynamics on networks derived from real, interdependent power grids and on sparsely coupled, random regular graphs that approximate the real topologies. Concerning this part, they use a multiple branching process approximation and simulations to derive at a heuristic level how interdependence affects cascades of load. The techniques developed here advance the theoretical machinery for dynamical processes on multiple networks as well as their heuristic understanding of how interdependence and incentives affect large cascades of load in infrastructure. The model used is the sandpile model which is a well-­‐studied. ^ĂŶĚƉŝůĞ ŵŽĚĞůƐ ŚĂǀĞ ďĞĞŶ ƐƚƵĚŝĞĚ ŽŶ ŝƐŽůĂƚĞĚ ŶĞƚǁŽƌŬƐ͕ ŝŶĐůƵĚŝŶŐ ƌĚƅƐ-­‐ZĠŶLJŝ
graphs, scale-­‐free graphs, and graphs generated by the WattsʹStrogatz model on one and two-­‐dimensional lattices. They combine a mathematical framework for multiple networks with models of sandpiles on isolated networks to derive a multiple branching process approximation of cascades of load between simple interacting networks and between real power grids. Then, they show that some interdependence is beneficial to a network, for it mitigates its largest avalanches by diverting load to neighboring networks. This work also advances our mathematical understanding of dynamical processes on multiple networks. Their expectation is that the computational techniques used here to solve multidimensional generating function equations, such as multidimensional Lagrange inversion, will find other uses 4 in percolation and cascades in multiple networks. Moreover, cascades in social networks (or epidemiological model in our work) like these may require networks with triangles or other sub graphs added; inverting the resulting multidimensional generating function equations for dynamics on these networks would require similar multiple techniques as developed here. Although a large number of researchers have focused on the effort of finding new techniques to detect and eliminate threats, it provides so much importance to the development of theoretical models that are capable of predicting the size of the spread of threats to vulnerable networks. In 2000, Wang, Knight, and M. Elder proposed and analyzed a virus propagation model to a class networks in the form of tree [15]. According to this model, the viruses produce copies of themselves and spread to the network at a fixed pace, without the mediation of users is necessary. Other researchers such as Mannan and Oorschot [16] studied particular viruses that spread through the communication network users, and summarizing the main characteristics of these networks reached considered exception of preservatives conclusions about the spread of threats to these networks. 5 Background In this part of the thesis, we have an introduction to the background of our thesis, which are our tools for the completion of it. There are introductions for: o modeling & complex systems o networks o graphs models and o epidemiological modeling. Introduction on modeling & complex systems From the beginning and throughout the development of science, problem of modeling of the different phenomena appears really early. The modeling of a phenomenon, or better the development of a model, which means to simulate the structural and functional characteristics helps to understand better a phenomenon, which certainly can be an issue that requires further thorough study. Through the modeling of a phenomenon, researchers have the opportunity to represent mathematical relationships and equations of its dynamics and thus to study the phenomenon in more profound and detail aspect. These mathematical models employ mathematical equations and algorithms thus allowing a more in depth study on how they affect the dynamics of systemic behavior that is developed during the study in relation to the already preset parameters. It is obvious that the whole practice of modeling has changed radically during the recent decades with the introduction of computers in all sciences, like mathematics, physics and biology. The mass deployment and use of computer systems has enabled the study of complex computer systems whose mathematical description required the use of non-­‐linear equations, which cannot be solved analytically. In this sense, the contribution of computers is immense as they provide a numerical solving of these equations using methods developed in a separate branch of computational mathematics and numerical analysis. Thus, the use of computers gave a new direction and perspective on design and analysis of mathematical models. The contribution of computers is even more important because it enables us to develop computational models of various phenomena. Researchers have to deal with common problems and limitations regardless of the type or the nature of the system they have to study and model. The main problem in developing a model is related to the number and nature of the parameters that the researcher chooses to set for each model, as well as the degree of seriousness of 6 each parameter. This question is fundamental because it is correlated with the extent and realistic complexity that will have by developing model. It is clear that a simplistic model of a complex system threatens not to include some basic system parameters and thus the model does not present the desired realistic behavior. On the other hand, a fairly complex model of a system that includes a large number of parameters may be more realistic while increasing the complexity of the model and the number of parameters which characterize it, making it even harder to be understood by other scientists. This fact can increase the likelihood of errors and finally it will assist in consuming much more computing power in order to study the model. In many cases, due to the high degree of complexity of the model, it requires the use of specialized supercomputers. ͞Complex systems͟ is the subject of a variety of disciplines and professional methods. Equations from which are developed sophisticated models usually come from Mathematics, Physics & Information Theory and represent organized but unpredictable behaviors of physical systems considered essentially complex. Physical manifestations of these systems cannot be identified without referring to the physical object that it represents, so then to include "the system" as a mathematical model. A variety of these models, using this aforementioned approach to the complexity is the "Complex Systems". These systems are used for modeling processes in Computer Science, Biology, Economics, Physics and many other fields. This science ŝƐ ĂůƐŽ ĐĂůůĞĚ ͞dŚĞŽƌLJ ŽĨ
Complex Systems" or "Science of Complexity͘͟ A variety of abstract theoretical models of complex systems is studied as a field of mathematics. The main problem of complex systems is the difficulty faced with the modeling and simulation. From this perspective, complex systems are defined in various research frameworks based on different properties of them. Since all complex systems have many interrelated components, science and theory of networks are important aspects of the study of complex systems. The question that arises is what complex systems are and what properties characterize them. Some examples of such systems will be helpful. The purpose of these examples is to develop an initial understanding of what makes these complex systems. Once you begin to describe complex systems, a second step is to identify the common elements. We could make a list of some of the characteristics of complex systems and assign each of them a feature that creates a first method of classification or description: o Identification 7 o Interaction o Formation / function o Diversity o Environment o Activities Complex systems have the following characteristics: ͻ>ŝŵŝƚƐĂƌĞĚŝĨĨŝĐƵůƚƚŽďĞĚĞĨŝŶĞĚďĞĐĂƵƐĞŝt can be difficult to define the boundaries of a complex system. This decision ultimately depends on the observer. ͻ ŽŵƉůĞdž ƐLJƐƚĞŵƐ ĐĂŶ be open because they are usually open systems. In other words, complex systems are often away from the energy balance, but this flow can be a fixed pattern. ͻ Complex systems can have memory. Specifically, a complex system can be significant because complex systems are dynamic systems that change over time, the foregoing conditions can affect its current. More specifically, complicated systems often lagging. ͻ ŽŵƉůĞdž systems can be nested. The elements of a complex system may themselves be complex systems. ͻ LJŶĂŵŝĐ ŶĞƚǁŽƌŬ ŵƵůƚŝƉůŝĐŝƚLJ͖ Ăs the rules coupling, the dynamic network of a complex system is important. Small-­‐world or scale-­‐free networks have many local interactions which are often used. Natural complex systems often exhibit such topologies. ͻ They can produce emergent phenomena this means that complex systems may exhibit behaviors emerging, which means that, while the results can be deterministic, can have properties can be studied only at a higher level. ͻ Zelationships are nonlinear. This suggests in practice that a small disorder can cause a large effect, like the butterfly effect), a directly proportional effect of the cause, or even not to have any influence. In linear systems, the result is always directly proportional to the cause. ͻ ZĞůĂƚŝŽŶƐŚŝƉƐ ĐŽŶƚĂŝŶŝŶŐ Ĩeedback loops. It͛Ɛ ĐŽŵŵŽŶ ŝŶ Ă Đomplex system, positive and negative feedback. [1][2] Introduction on networks The transmission of information through computer systems or through Internet in recent years has occupied and has become a very important and critical research and 8 study. As the development of increasingly complex pervasive computing systems or the creation of large unstructured wireless mobile networks continues, the interest has shifted in research areas concerning the easiest and simplest way of transmitting information through information systems. Complex Networks In recent years there has been great interest in complex networks. Complex network is a network which is topologically with non-­‐trivial features. Most networks that occur in nature can be classified as such, as they exhibit characteristics that do not appear in simple networks. Examples of such networks are the web, traffic networks or epidemiological networks. For these networks, they have been proposed several models that describe the overall macroscopic properties of real networks. One aspect of the complexity associated with the structure of the system. The most important mathematical technique for representing the structural relationships between the elements of natural and social systems is the theory of graphs (graph theory). The graph theory offers a natural way to represent systems with individual nodes and connections between them, and proved very successful for the analysis of structures in many industries. The graphs and applications of theory in natural systems various sectors and industries gave birth to "theory of complex networks". Figure 1: An example oĨĂƐŝŶŐůĞƌĚƅƐʹZĠŶLJŝƌĂŶĚŽŵŶĞƚǁŽƌŬǁŝƚŚEсϭϬϬ Complex structures can describe a wide variety of high technological and intellectual importance systems. For example, the internet is a huge virtual network of websites linked with hyperlinks. These systems are just some of the many examples that 9 recently led the scientific community to explore the mechanisms that determine the topology of complex networks. Single Networks A network is a collection of elements which is called vertices or nodes, with the links between them called edges or links. Systems in the form of networks abound in the world. Tools of complex network analysis were application for the analysis of a variety of systems in many areas, spanning a wide range of applications. Examples are the Web, the Internet, financial networks, epidemiology networks, sexual contact networks, social networks and many others. These tools are used as a new approach for the treatment of nature and society and can lead beyond the usual statistical analysis, revealing qualities that go unnoticed by conventional means. Nowadays, we have witnessed major achievements in research networks, with the emphasis shifting from the analysis of simple and small graphs and properties of individual vertices or edges, which lead to examine the statistical properties in large-­‐
scale networks. This new approach has greatly affected the availability of computers and communication networks that allow us to collect and analyze data on a scale far greater than was previously possible. Studies used to examine networks with perhaps dozens or hundreds nodes, but now it is not uncommon to study networks with millions or even billion nodes. This change of scale brought about a corresponding change in our analytical approach. However, one reason why the approach of the study of networks has changed in recently is its own importance, although it is usually underestimated. For networks with tens or hundreds of nodes, is relatively simple to design one of its networks with real image points and lines, and answer specific questions about the structure of network addressing this image. This was one of the primary methods of analysis networks. Human eye is an analytical tool that has marked power, the study by eye virtual network is an excellent way to understand of their structure. With a network of one million or one billion tops, however, this approach is useless. Nobody can paint a picture with a sense of million peaks, even with modern 3D displays and therefore the direct analysis by eye does not bring any gain. The recent development of statistical methods quantifying grids is largely an attempt to find something which has the role of the eye in network analysis of the twentieth century. Thus these methods caŶ ŚĞůƉ ƚŽ ĂŶƐǁĞƌ ƚŚĞ ƋƵĞƐƚŝŽŶ ͞,Žǁ ĐĂŶ I describe something that it looks likĞƚŚĞŶĞƚǁŽƌŬǁŚĞŶ/ƌĞĂůůLJĐĂŶŶŽƚƐĞĞŝƚ͍͘͟ By common sense, a graph is an ordered pair G =(V,E) consisting of V vertices or nodes with a set of edges E or connections, which are subsets two elements of V (i.e., an edge is connected to two vertices, and the relationship represented as unordered pair of vertices with respect to a particular edge). To avoid ambiguity, this type of graph can be accurately described as undirected and simple. A set of vertices 10 joined by edges is the simplest form of the network. There are many ways in which networks can be more complex than that. For example, there may be more from a different species of peaks in a network, or more than one different type of edges, and vertices or edges may have varying properties, numerical or other, associated with them. The network connections can be directed or undirected. The maximum number of connections that can exist in a network depends on the number of nodes. If a network has N nodes, each node can have at the most N -­‐ 1 links with every other node, other than itself. Therefore, there are N (N -­‐ 1) possible connections, counting each one twice. Consequently, if the network is non-­‐directional, the maximum number of connections is N (N-­‐1)/2, while if it is directional is N (N-­‐1), as in a directed graph of two nodes may be associated with a pair of links. The number of links each node is called degree ͞k͟. The degree is not necessarily equal to the number of neighboring nodes, since there may be more than one of edges between two vertices. Degree ͞Ŭ͟ŝƐƌĞĨĞƌƌĞĚƚŽĂƐƚŚĞconnectivity of a node, although it is best to avoid this use, because the word connectivity has another meaning in graph theory. A directed graph has both through in-­‐degree and out-­‐
degree for each peak, which is the number incoming and outgoing edges respectively. The nodes of the network may have weights. The weight wij a link ͚ij͛ is an amount which, inter alia, can be used to represent the importance of the link ͚ij͛ network. There are two ways to represent a graph G = (V, E): as a collection of neighborhood lists, the adjacency lists or as adjacency matrix. A graph G that consists of ͚n͛ nodes and ͚m͛ links may be represented by an adjacency table A(n, n). dŚĞ ƚĂďůĞ ŝƐ Ŷпn whose elements aij is 0 if nodes i, j are not connected and w different. If the network contains no weights on the connections, then w = 1. When dealing with an undirected graph, the table is symmetric, and aij = aji. On the other hand, when we are dealing with a directed graph, the elements aij are generally different from aji elements. The diagonal elements aii represent any link between the same node whenever such a connection is allowed and has meaning. In graph theory, an adjacency matrix is the representation of all edges in a graph in a list. If the graph is undirected, each record is a set or multi-­‐set of two nodes containing the two ends of the corresponding edge. If directed each record is a tuple nodes from which one is the node that starts and the other node that ends the connection. In computer science, an adjacency matrix is a data structure for representing graphs. The neighborhood list, hold for each node of the graph is a list of all other nodes with which it is affiliated. 11 A representation of the adjacency list is usually preferred because it provides a compact way to represent sparse graphs, i.e. those for which the |E| is much smaller than |V|. The adjacency table, however, may be preferred when the graph is dense, i.e. |E| is close to |V|2, or when we need to be able to tell quickly if an edge connects two specific peaks. Multiplex-­‐‑Multilayer Networks Networks like, epidemiology networks or social networks are part of networks that we call them multiplex. Multiplex networks are networks in layers and with connections between layers; the interconnections between layers are only between a node and its counterpart in the other layer or better in the same node. In these networks it is interested that they represent a next step in the study of the complexity of networked systems, they have a non-­‐trivial correlations emerge between different layers, the preceding studies on percolation in interdependent networks show that coupling networks can have unexpected emergent behaviors, the understanding of dynamical processes can show new characteristics because of the multiplex structure and that the theory of multiplex must cover the current theory developed for multilayer networks. [5] Figure 2: Example of a two-­‐layer multiple ĂƌĂďĄƐŝ-­‐Albert random network with N=100 Diffusion in Networks Diffusion is a process, by which behaviors like epidemics, information, viruses, even gossips spread over networks in particular, over epidemics networks. Diffusion in epidemics and computer systems, whether they are in working order or design, is perhaps the most critical factor for the proper management and utilization. This fact is reinforced by the huge sums invested in this direction, but the 12 significance that these networks have the daily human activity. Thus, in the study of diffusion it has been proposed, developed and studied various types of algorithms. Networks and Epidemiology Networks and the epidemiology of infectious diseases transmissible directly are basically connected. The foundations of epidemiology and epidemic early tariff models based on random in mixed populations widely, but practically every individual has a certain number of contacts which may to get the contamination. The sum of all these contacts creates a mixed network. The knowledge of the network structure allows the models to calculate the dynamics of the epidemic in proportion to the population of the level of individual behavior infections. Various forms of networks are developed from computers and each of these ideals networks can be identified by the how the individuals are distributed in space and how they communicate. As a consequence, by simplifying and making clear the various complex processes involved we render the creation of a network in real populations. [5] In general, the epidemiology tries to explain what happens to humankind and especially about the diseases that appear. In particular, we can say that it describes a scientific methodology in biology to study the nature, prevalence and causes of those causing a disease. This science provides a method for the understanding and response to a disease as it spreads in a population. The epidemiology uses mathematical models to quantify, characterize, and predict the spread and impact of disease. A demographic analysis usually is applied, which is used to determine the relationship between the disease and the general public. The role of those involved in epidemiology is to destroy or damage this relationship in order to prevent contamination of the population. The main objective of epidemiology is to prevent the spread of disease and prevent possible future recurrence. [6] Introduction to graph models Modeling of networks, such as epidemiological networks, help in better understanding; enable the simulation and analysis of them, as well as in forecasting the behavior of processes taking place in them, such as diffusion and recovery of information. In graph theory, there are different models of networks which are called graph models, some of such models are the following: ƌĚƅƐʹZĠŶLJŝ ŵŽĚĞů͕
Watts-­‐^ƚƌŽŐĂƚnjŵŽĚĞů͕ĂƌĂďĄƐŝ-­‐Albert model etc. In this search, two of the aforementioned models will be employed. Specifically, we use the ƌĚƅƐʹZĠŶLJŝ ŵŽĚĞů ĂŶĚ ƚŚĞ ĂƌĂďĄƐŝ-­‐Albert model. Below, I will provide a short description for each model: 13 ”†Ý•Ȃ±›i model The observation that networks with complex topology and unknown organizational principles often appear as random, promoted the study of graphs with random distribution. Therefore, random graphs were a well-­‐developed area, and a rich source of ideas. The official theory of random graphs was introduced by PauůƌĚƅƐ
and Alfred ZĠŶLJŝ, since ƌĚƅƐĨŽƵŶĚƚŚĂƚƉƌŽďĂďŝůŝƐƚŝĐŵĞƚŚŽĚƐǁĞƌĞŽĨƚĞŶƵƐĞĨƵůĨŽƌ
troubleshooting in graph theory, and used regularly in the study of complex networks. ƌĚƅƐ and ZĠŶLJŝ proposed one of the first network models, the random graph. This model is characterized by two parameters: the number of edges and the probability of binding. Each pair of vertices is connected with equal probability, independent of other pairs. Generation of random graphs is ďĂƐĞĚŽŶ ƌĚƅƐ ĂŶĚ ZĠŶLJŝ͕ Ă ƐŝŵƉůĞŵŽĚĞů proved very useful in the first modeling. Distribution points of this model are based on the Poisson process, while almost all of the actual models have power law distribution. To overcome this problem the original model is extended so to harmonize with the true nature of networks. The resulting model is as follows; as the number of system nodes and an exhibitor initially are assigned to each node degrees produced a power -­‐ law distribution with exponent. Then randomly are selected two nodes and an edge is inserted between them. This way of building can lead to graphs unrelated. It turns out however that N is always a giant component. In their first article for random graphs, ƌĚƅƐĂŶĚZĠŶLJŝĚĞĨŝŶĞĂƌĂŶĚŽŵŐƌĂƉŚǁŝƚŚ
N nodes connected by n edges, selected randomly from the N(N-­‐1)/2 possible edges. In total there are Cn[N(N-­‐1)/2] graphs with N nodes and n edges, forming a probability space where each implementation is equally likely. The theory of random graphs studying the properties of the probability space related ŐƌĂƉŚƐ ǁŝƚŚ ŶŽĚĞƐ E ĂƐ Eїь͘ ^ĞǀĞƌĂů ƉƌŽƉĞƌƚŝĞƐ ŽĨ ƚŚĞƐĞ ƌĂŶĚŽŵ ŐƌĂƉŚƐ ĐĂŶ ďĞ
ĚĞƚĞƌŵŝŶĞĚ ƵƐŝŶŐ ƉƌŽďĂďŝůŝƐƚŝĐ ĂƌŐƵŵĞŶƚƐ͘ &ƌŽŵ ƚŚŝƐ ƐƚĂŶĚƉŽŝŶƚ͕ ƌĚƅƐ ĂŶĚ ZĠŶLJŝ
used as a definition that almost every graph has a Q as if the probability that Q approaches ϭ ǁŚĞŶ Eїь͘ dŚĞ ĐŽŶƐƚƌƵĐƚŝŽŶ ŽĨ Ă ƌĂŶĚŽŵ ŐƌĂƉŚ ŝƐ ŽĨten called evolution: starting with a total N of isolated peaks, the graph is developed by sequentially adding random edges. Graphs taken at various stages of this process correspond to increasingly more likely connection p, until finally we have a fully connected graph. The main goal of theory of random graphs is to determine what probability p connection is very likely to occur in a particular property of the graph. The greatest ĚŝƐĐŽǀĞƌLJŽĨƌĚƅƐĂŶĚZĠŶLJŝǁĂƐƚŚĂƚŵĂŶLJŝŵƉŽƌƚĂŶƚƉƌŽƉĞƌƚŝĞƐŽĨƌĂŶĚŽŵŐƌĂƉŚƐ
appear suddenly. That is, at a given probability or almost every graph has some 14 property Q (i.e. every pair of nodes connected by a path of successive edges) or on the contrary, almost no graph does not have. The transition from a property that is very unlikely to be very likely is usually quick. ^ŝŶĐĞ ƚŚĞ ƉŝŽŶĞĞƌŝŶŐ ĂƌƚŝĐůĞ ďLJ ƌĚƅƐ ĂŶĚ ZĠŶLJŝ͕ ŵƵĐŚ ǁŽƌŬ ŚĂƐ ĨŽĐƵƐĞĚ ŽŶ ƚŚĞ
existence and uniqueness of the minimum and maximum degree in a random graph. Results show that for a large range of p, both the maximum and the minimum degree is defined and finite, and the maximum degree in nearly all random graphs has the same order of magnitude as the average degree. Thus, despite the fact that the position of the edges is random, a typical graph is random rather homogeneous, with the majority of nodes have the same number of edges. [3] Figure 3: A single ƌĚƅƐʹZĠŶLJŝƌĂŶĚŽŵŶĞƚǁŽƌŬǁŝƚŚEсϭϬϬ ƒ”ƒ„ž•‹-­‐‑Albert model The origin of the degree distribution power law observed in networks was first addreƐƐĞĚ ďLJ ĂƌĂďĄƐŝ ĂŶĚ ůďĞƌƚ, who argued that free scale nature of real networks is rooted in two general mechanisms, which share many real networks, namely the development (growth) and preferred connection (preferential attachment). These two mechanisms inspired the introduction of the model ĂƌĂďĄƐŝ-­‐Albert, which first led to a network with power law degree distribution. Their research was based on the fact that the networks have models developed until they take account of their two main characteristics of most real networks. o The actual networks are open and dynamic forms with continuous adding new nodes to the network as opposed to the previously existing, which was static, i.e. the total number of nodes was stable. 15 o Both the random graphs and the small-­‐world models, the connections formed with the same probability of something that reflects actually. The development and selection of new nodes to connect are key components for creating scale-­‐free networks. These two components are related to the fact that most networks are developed constantly with the connection of new nodes and new nodes form selected links to existing nodes which usually have a large number of connections. With this approach, networks are described as open systems developed with the continuous addition of new nodes. Starting from a few nodes and increased their number during the lifetime of the network by the subsequent addition of new nodes. ĂƌĂďĄƐŝ-­‐Albert increase presents the rule of preferential binding, so that the probability of connection to a node depends on the degree of the node. Thus, a new addition is more likely to be connected to a node with a high degree of that node with a low degree. The algorithm ĂƌĂďĄƐŝ-­‐Albert model is as follows: o Development; starting from a small number m0 of nodes, at each time step, ǁĞ ĂĚĚ Ă ŶĞǁ ŶŽĚĞ ǁŝƚŚ ŵ ч ŵϬ ĞĚŐĞƐ ĐŽŶŶĞĐƚŝŶŐ ƚŚĞ ŶĞǁ ŶŽĚĞ ǁŝƚŚ ŵ
different nodes that already exist in the system. o Preferred binding; in the selection of nodes to which the new node is connected, it is assumed that the probability P that a new node will be connected to node i depends on the ki of the node i, such that: After t time steps that result in a network with N = t + m 0 nodes and mt edges. Numerical simulations have shown that this network is evolving into a scale independent state with probability that a node has k edges follows a power law with exponent g = 3. The scaling exponent is independent of m, the only parameter of the model. [4] 16 Figure 4: A single ĂƌĂďĄƐŝ-­‐Albert random network with N=100 Epidemiological Modeling The SIR model, also known as classic general epidemiological model adds an additional condition called removed. The state removed represents the hosts who are in one of the following locations: x
Have recovered from infection and most cannot be re-­‐infected x
There have been quarantined and have been withdrawn from circulation x
Have died from the infection. In this model, there are three possible situations: Susceptible, Infectious, Removed and two acceptable forms of statements. This model describes the spread of a virus through a set of three differential equations, which are the following: S: the number of susceptible individuals Ȼ: the number of infected individuals 17 R: the number of recovered individuals ɴ: The rate of infection of an individual that is susceptible and has a single infected neighbor. ɶ: The rate of recovery of an infected individual.. The total initial population that is vulnerable is finite (N) and remains stable. Thus at each time moment will apply: Inserting the relative rate of removal, p = c / b, the first equation in above system can be rewritten as: Because we consider that the population is finite and each host can infected only once, the epidemic will eventually stop. When this happens, then, all hosts in the population will be either vulnerable to infection, or have removed. Looking at the last equation, one can observe an interesting property of SIR model. It is obvious that I(t)> 0 and ɴ greater than or equal zero. As a result, we have that: This means that there will be no epidemic unless the initial number of susceptible exceeds some critical value of p. This discovery came from the study of transmission rates of infection in networks and substantial led researchers to record a threshold which concerns the dividing line between a disease and an epidemic. [7] 18 Approaches Ȃ Analysis The analysis of an epidemiological model to multiple networks is not easy. To get the results and conclusions that we need, this work will start with short experiments on single networks and then apply the same experiments on multiple networks. The purpose of this simulation is to investigate the spread of infection in various forms of multiple networks to spread to occupy the majority of the nodes, so it is called epidemic. [6][9] As mentioned above, experiments are done in the programming language R. The code is written in this language and simulates the epidemiological model SIR. In this effort, we focus on three (3) basic points [10]: 1. The different topologies (architectures) of complex multiple networks 2. The number of nodes N, 3. The probability of the spreading of the virus, p The language R contains packages, which are useful for conducting our experiments. In fact, in our code we ƵƐĞƚŚĞƉĂĐŬĂŐĞ͞ŝŐƌĂƉŚ͟;^ĞĞƉƉĞŶĚŝdžͿ͕ǁŝƚŚǁŚŝĐŚbegin doing our tests for a single network and then for the multiple ones, we created a code that contributes to the implementation of the model SIR on multiple networks. Our model of the code follows a simulation process which the steps of it are below. We construct a stochastic process which is based on a continuous-­‐time Markov chain (see Appendix): ¾ Setting the parameters beta & gamma ¾ Declaring them global & setting time step ¾ Randomly selecting a percentage of infected people (Infection Probability) ¾ Setting number of population ¾ Updating current state & putting it in the amasser ¾ Make two (three or four) graphs & plot the graphs ¾ Getting the edge-­‐lists from the networks ¾ Creating the new graph which contains all the graphs ¾ Adding the edges to the new graph and plot it 19 ¾ Putting the main equation : N = S + I + R ¾ Time spanning for the simulation ¾ Starting of simulation ¾ Definition of data frame ¾ Setting the function of probabilities that calculates the transitions based on state, adjacency and time o Process o Counting the number of infected neighbors o State indexes are off-­‐set by 1 ¾ Having the simulation ¾ Updating time ¾ Write to output ¾ Plotting the infection function (SIR plot) Single Networks Before we begin analyzing SIR on multiple networks, we need to show how an SIR model is working on a single network. For this purpose, we will begin our experiments in simple networks of ƌĚƅƐ-­‐ZĠŶLJŝ and ĂƌĂďĄƐŝ-­‐Albert. [][] There have been done many experiments with several different parameters but the most important ones are chosen. For brevity, we will refer to the ƌĚƅƐ -­‐ZĠŶLJŝ and ĂƌĂďĄƐŝ-­‐Albert random networks as ER and BA networks. First, we explain what we measure in our numerical solutions, we submit values to certain parameters and then we plot the SIR diagram and explain the results of our experiment. Finally, we discuss the results. Below the table is given to the values that we use: Functions Beta (ɴ) Values 0.15, 0.25 Gamma (ɶ) 0.15 Nodes ;ɁͿ 100, 1000 Infection Probability (p) 0.02, 0.07, 0.15, 0.25 20 Table 1: Values per expedition As we can observe, in our experiments we will change the probability that the virus randomly infect our population. The number of nodes we use are two different and likewise with the beta value. What we keep same is the value of gamma because the value resulting from the number of days we want the virus to fully recovered. Since viruses want about 5 to 10 days for a complete cure, we took the average number of days (7 days) and the gamma value resulted from dividing 1/ɶ = 1/7 = 0.15. Again, for brevity we refer to the parameters nodes, beta, gamma and infection probability as N, ɴ, ɶ and p respectively. In our experiments, our results conclude with SIR plots. Below, we can see an SIR plot; the red line is about the susceptible people, the green one is for the infected ones and the blue one is about the recovered. Figure: An example of an SIR plot ”†Ý•Ȃ±›‹š’‡”‹‡– We start our experiments having as a beginning a network with N=100 nodes. Below, in Figure 5, we have the random network that we have created. 21 Figure 5: ER Random Network with N=100 We set the parameters and in the next figure we have four different plots, where each plot is for a different infection probability (0.02, 0.07, 0.15 & 0.25) and here beta is equal to 0.15. Figure 6: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 We have done the same experiment with the same infection probabilities (0.02, 0.07, 0.15 & 0.25), but now beta is equal to 0.25. 22 Figure 7: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 As we can see, when we increase the infection probability and the beta, the virus is spreading faster in each simulation, which is something that we expected, since if we increase the probability of infected people and also increase the average number of infection, the virus will spread much faster than normal. In addition, we observe a difference in the infection peaks on each simulation. As we change the infection probability, the peak is getting higher and the recovery of people is faster. Moreover, there is a change in our plots for different values of beta due to the difference of the rate infection. We continue our experiments, having this time N=1000 and keeping the same values for the other parameters. In Figure 8, we have the random network that we created. 23 Figure 8: ER Random Network with N=1000 Below, we have the SIR plots for each simulation with ɴ=0.15. The same experiments are done with ɴ equal to 0.25. Figure 9: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 24 Figure 10: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 Having a larger population, the spread of the virus is getting slower than in a small sample. But in this case, as in the previous simulations for N=100, as increasing the probability and beta, diffusion becomes bigger. Moreover, we can see a difference in the rate of the infected people. For each simulation, the more we increase the infection probability, the higher the peak is for the infectious people. As for the recovered ones, we observe a stable recovery in each one of the SIR plots. For every change of the beta value, we cannot see a big difference in our plots. ƒ”ƒ„ž•‹-­‐‑Albert Experiment As mentioned above, the experiments were sorted into two different type networks. Here, we set a ȲȰ random network with N=100. In Figure 11, we can see the random network that we created. 25 Figure 11: BA Random Network with N=100 Like in the previous section, we set the parameters and in the next figure we have four different plots, where each plot is for a different infection probability (0.02, 0.07, 0.15 & 0.25) and here beta is equal to 0.15. Figure 12: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 26 We do the same for beta equal to 0.25 and keeping the same values for the infection probabilities. Figure 13: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 As in single BA random network, when increasing the infection probability and the beta, diffusion is getting bigger in each simulation, which is again something that we expected, as it has been done ŝŶƚŚĞƉƌĞǀŝŽƵƐƐĞĐƚŝŽŶ͛ƐĞdžƉĞƌŝŵĞŶƚƐ͘ In our experiments for BA random networks, we observe a huge difference on our SIR plots for each simulation because of the difference on the topology of a BA and ER network. Noticeable, is that around time step 10, the number of susceptible and recovered people is stable. This happens because after time step 10, some people may be dead, thus they cannot spread the virus. In addition, we see changes in the plots for different values of beta that is caused because we changed the rate of infection. Next, our experiments are done with the same graph network but with N=1000 and same values for the other parameters. In Figure 14, we have the random network. 27 Figure 14: BA Random Network with N=1000 The next two figures (Figure 15 & Figure 16) have the SIR plots for ɴ equal to 0.15 and 0.25 respectively. Figure 15: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 28 Figure 16: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 As in previously simulations, having a larger population, diffusion is getting smaller than in a smaller one. And again, as increasing the probability (p) and beta (ɴ), the spreading is faster of the virus. In addition, we cannot observe big differences between networks with 100 and 1000 nodes, thus our SIR plots seem same for each experiment. Multiple Networks After conducting experiments with single random networks, we continue experimenting on multiple networks, so we can show how differently an SIR model acts in a multiple one. To begin with, we conduct experiments in multiple networks of ER and BA with two, three and four layers with same random networks and in addition we conduct an experiment with two-­‐layer network, but with a combination of two different random networks. We have to mention that for the three and four layer multiple networks, we have done experiments only for N=100, because with ŵŽƌĞŶŽĚĞƐǁĞĐŽƵůĚŶ͛ƚĐŽŵĞƵƉǁŝƚŚŐŽŽĚĞŶŽƵŐŚƌĞƐƵůƚƐ͕ƚŚƵƐ the important ones are chosen. In addition, for all our experiment, we put the value 1/20 for the probability of drawing an edge between two arbitrary vertices in the ER graphs and for the BA, we created undirected graphs. At the beginning, we explain what we measure in our numerical solutions, we submit values to certain parameters and then we plot the SIR diagram and explain the results of our experiment, having our results in the end. 29 Functions Values Beta (ɴ) 0.15, 0.25 Gamma (ɶ) 0.15 Nodes ;ɁͿ 100, 1000 (2-­‐layer) (per layer) 100 (3-­‐layer & 4-­‐layer) 150-­‐200 & 500-­‐1000 (combined 2-­‐layer) Infection Probability (p) 0.02, 0.07, 0.15, 0.25 Table 2: Values per expedition Two-­‐‑Layers Experiment In our first experiment on multilayer networks, we have two layers of ER networks, where randomly with infection probability p, we spread the virus. In Figure 17, we can observe the two-­‐layer ER random network with 100 nodes. Figure 17: Two-­‐layer ER Random Network with N=100 As in the single networks experiments, we perform the same for the multiple one, putting in the parameters the values we have in Table 2. 30 Figure 18: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Figure 19: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 Figures 18 & 19 show us a different result in a multilayer network than from the single one. Spreading of a virus is quicker as we can observe and like in the single ones, when increase p and ɴ, the population is getting sicker faster. We can notice on our SIR plot that for the infected people the highest peak is when the infection probability is equal to 0.25. Also, there is a stable increase for the recovered population in each simulation. In addition, we do not see significant changes when we have a different value for beta. 31 Continuing the experiment with a bigger population (N=1000), with the same type of random network and same parameters. Figure 20: ER Random Network with N=1000 Figure 21: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 32 Figure 22: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 We can observe in these simulations that having a larger population, again diffusion is getting smaller than in a smaller one. And as increasing the infection probability and beta, the spreading of the virus is faster. Moreover, we have not seen immense differences between networks with 100 and 1000 nodes. Our SIR plots are almost the same for each experiment we have done with different nodes. The highest peak for infected people is when the infection probability is 0.25. In addition, there is a huge recovery of people as we increase the probability value. Regarding the change of beta, we see some small differences in each simulation. In the next experiments, we want to observe if the same happens with a two-­‐layer BA random network. Above, there is a figure of our random network followed by SIR plots of virus spreading. 33 Figure 23: BA two-­‐layer Random Network with N=100 Figure 24: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 34 Figure 25: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 As in the ER multilayer network, we have similar results with the BA one. Figure 24 and Figure 25 demonstrate how faster is the spreading in the 2-­‐layer network than in the single one that we have seen in Figures 12 & 13. Also, we observe in the SIR plots that the peak of infected people is getting higher as we change the infection probability. Additionally, the recovered people have a more stable recovery as shown on each plot. Differences can be seen when we change the beta value, as it is the contact rate of spreading the virus. We try to do the same experiment for a population of 1000 nodes. Figure 26 shows the random networks and Figures 27 and 28 show the SIR plots of our simulation. Figure 26: Two-­‐layer BA Random Network with N=1000 35 Figure 27: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Figure 28: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Once again, the SIR plots confirm that in a multiple network, diffusion is bigger than in a single one. Additionally, the peak of the infected people is getting bigger as we change the infection probability. However, there is a slight change of the recovered people for each simulation. For a higher number of infection probability, we observe that the recovery of the populations happens faster. If we change the beta from 0.15 to 0.25, the infection and the recovery are getting faster. 36 Three-­‐‑Layers Experiment A multiple network can have more than one layer. As we mentioned before, we conduct simulations with three-­‐layers and four-­‐layers. We begin with the three-­‐layer network. We want to examine if a network with more layers has a better diffusion of a virus than from one with less. Below, we have figures of the ER network and the SIR plots that we have after putting the same values as we have done for the two-­‐layer one. Figure 29: Three-­‐layer ER Random Network with N=100 Figure 30: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 37 Figure 31: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 What we wanted to observe is if a network with more layers has a faster spreading of a virus than from one with less. Our plots confirm our expectations, that in a multilayer network we have a more rapid diffusion than in a single one. In our SIR plots for ER random two-­‐layer networks, the recovery is faster and the peaks of the infected people differ when we increase the infection probability. Moreover, there is a small difference when we change the beta value. We have done the same experiment for a BA random network. Figure 32, Figure 33 & 34 show the network and the SIR plots 38 Figure 32: Three-­‐layer BA Random Network with N=100 Figure 33: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 39 Figure 34: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 Our results are the same with a two-­‐layer ER network. The virus spreads faster in this network from a single one. Moreover, as we increase the values of our parameters, the spreading is more rapid. A first observation of our SIR plots is that when we use the maximum value of infection probability, the highest peak of infected people occurs earlier than in smaller values of infection probability. Additionally, the recovered people are getting healthy in a stable pace. Four-­‐‑Layers Experiment In our next experiment, we try to use the same parameters with the same values in four-­‐layer random networks. In the following figures (Figures 35 ʹ 39), we have the ER & BA random networks, with N=100 together with the SIR plots from the simulation. Figure 35: Four-­‐layer ER Random Network with N=100 40 Figure 36: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Figure 37: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 In an ER four-­‐layer random network, we see that the highest peak of infected people occurs faster than in previous experiments for single, two-­‐layer and three-­‐layer networks. Moreover, the change of the contact rate (beta) makes a difference on our SIR plots. Specifically, when we have beta equal to 0.25, the infection is spreading faster to people. 41 Figure 38: Four-­‐layer BA Random Network with N=100 Figure 39: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 42 Figure 40: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 Our results are the same with a two-­‐layer and three-­‐layer BA network. The virus spreads faster in this network from a single one. Moreover, as we increase the values of our parameters, the spreading is more rapid. Likewise with ER random graphs, the highest peak of infected people occurs faster. In addition, the change of beta makes again a difference on our SIR plots, when the beta is equal to 0.25; the infection is spreading faster to people. Combination of random networks in a multiple one (two-­‐‑layer) Finally, our last experiment is a combination of two different random networks in a two-­‐layer network. First, we have done a simulation with an ER random network with N=150 and a BA random network with N=200, followed by a simulation with a BA random network with N=150 and an ER random network with N=200. In the following figures, we demonstrate the combined multilayer networks and the SIR plots that emerged from the simulations. 43 Figure 41: ER Random Network with N=150 and BA Random Network with N=200 Figure 42: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 44 Figure 43: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 The above SIR plots differ when we change the beta value. This happens because of the increased contact rate. Figure 44: BA Random Network with N=150 and ER Random Network with N=200 45 Figure 45: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Figure 46: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 In the above figures, two different types of random networks are combined with different number of nodes. We observe small differences as we observed in multiple networks with same types of random networks. Moreover, the increase of beta is making the spreading slightly faster. We continue our experiments with an ER random network with N=500 and a BA random network with N=1000, followed by a simulation with a BA random network with N=500 and an ER random network with N=1000. In the following figures, we 46 demonstrate the combined multilayer networks and the SIR plots that emerged from the simulations. Figure 47: ER Random Network with N=500 and BA Random Network with N=1000 47 Figure 48: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 Figure 49: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 In the same way as in our previous experiments, we can see some differences in the infection peak when we change the beta and the recovery of people happens in a stable pace. 48 Figure 50: BA Random Network with N=500 and ER Random Network with N=1000 Figure 51: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.15 49 Figure 52: SIR plots with infection rates 0.02, 0.07, 0.15, 0.25, gamma=0.15 & beta=0.25 In this last experiment, we changed the nodes in the random types of networks. Because of the different topology of BA and ER, we observe a difference in the above figures. The more we increase the infection probability, the sooner the higher peak of infected people is occurred. Likewise with the previous experiments, the recovery happens with a stable pace but slightly faster as we increase the infection probability and the beta. It is very interesting that through this simulation we observe that a virus can spread in two different type networks. The main point of this simulation is that the nature of each network can affect the spread and become either faster or slower. But as in our previous examples, we observe that as we raise the values of parameters, the quicker the spread of the virus. 50 Discussion In this part of the thesis, we will attempt to provide a more in-­‐depth analysis of our results. We have already showed how these results are affected depending on changes of the experiments we have done. Through our experiments, we have seen differences on our results after changes on the parameters or the because of the topology of each network. Above, we discuss about our results and how these alterations change them. We divided the discussion based on changes in time, infection probability, beta, ŶƵŵďĞƌ ŽĨ ŶŽĚĞƐ ĂŶĚ ƚŚĞ ƚŽƉŽůŽŐLJ ŽĨ ƌĚƅƐ-­‐ZĠŶLJŝ ĂŶĚ
Barabasi-­‐Albert networks. About time Time was very crucial for our experiments and most of our results depend on that. The following figure combines our experiments about networks with different layers depending on the change of time in diffusion for ER random networks, with nodes equals to 100 and beta equals to 0.15. 25
20
Time for 0.02
15
Time for 0.07
10
Time for 0.15
Time for 0.25
5
0
One
Layer
Two
Layer
Three
Layer
Four
Layer
Figure 47: Time per step for infection probabilities 0.02, 0.07, 0.15, 0.25 & beta=0.25 The results of the first experiment in single networks and then in two, three and four layer networks have supported the hypothesis that the rate of diffusion of the virus would increase in conjunction with the increased valued of the infection probability. We observed that with the infection probability equal to 0.25, which had the highest percentage of infection, also we had the highest rapidity of the virus. Since time is used as a measure of spreading of a virus, the greatest rate of diffusion is when we have a four-­‐layer multiple network. One explanation of this observation is that we have more connections in each node, thus the virus can easily transferred. Further experiments we have been done with change of the beta can be seen in the figures in the previous section. We have to mention that respectively happens to BA networks. 51 About beta In the following figure, we can observe the difference on the time that occurs when we change the beta value in BA random networks (single, two, three and four layer ones). 18
16
14
12
10
8
6
4
2
0
Time for 0.07 -­‐
beta 0.15
Time for 0.07 -­‐
beta 0.25
One
Layer
Two
Layer
Three
Layer
Four
Layer
Figure 48: Time per step for infection probabilities 0.02, 0.07, 0.15, and 0.25 with beta=0.15 and beta=0.25 When we increased the beta in each experiment, the spreading is faster. Beta is a parameter that controls how often a susceptible-­‐infected contact results in a new infection. Further experiments with a different value of these parameters can change dramatically our results. For our experiments, we used the best values that we could use for a virus spreading, and that was the values 0.15 and 0.25. Like in the previous experiment, it respectively happens to ER networks. About number of nodes Another interesting fact from our experiments was the rapid spreading of a virus when we had more nodes. In the following figure, we can see the change depending on the number of nodes each random network (single and multiple ones) has. 52 Nodes 100
Nodes 1000
One Two
Three Four
Layer Layer
Layer Layer
Nodes 100
Figure 49: Spreading of virus with nodes equal to 100 and 1000 for infection probability 0.07 with beta=0.25 In this case, we find that the more nodes we have, the quicker and easier is the spreading of a virus. Of course, this depends on how our random network is created. Moreover, it should be noted that it is very important to know what random network we have, as is, otherwise the distribution and association of the individual nodes. Moreover, as we observed, the number of nodes in a network and the number of connections of each node are parameters that significantly affect the spread of an infection. Increasing the number of nodes with a fixed number of connections at each node, more steps needed to grow the infection to spread throughout the network, which was expected since there is only a change of the size of the network, the number of nodes and there is an increase of connections. Conversely, increasing the number of connections to each node with a fixed number of nodes, the required steps are reduced and the infection spread rapidly across the random network. This remark is reasonable, especially for ER random graphs, because as increasing the number of connections at each node, the infection has more multiple paths to propagate. About infection probability Something important for these networks is that the probability of connection nodes is a parameter that affects the rate of a spread of infection, particularly with the reduction of the connection probability P; it can be translated as an increase of the steps required for the network of the total contamination. That is a logical observation, because when connections decrease, we have a decrease to the network connections, possible paths are limited and the contamination is more difficult to spread. In the following figure, we can see the spreading of a virus in each layer for each value of probabilities we used for our experiments. 53 Four
Layer
Three
Layer
Two
Layer
One
Layer
Infection Probabilty
0.25
Infection Probabilty
0.15
Infection Probabilty
0.07
Infection Probabilty
0.02
Figure 50: Spreading of virus on an ER random network for each layer with infection probability 0.02, 0.07, 0.1, and 0.25 with beta=0.25 About ER & BA Network Topologies We selected two models on single networks and two models for complex networks with different layers. As complex a network was, it showed a greater ease in spreading an infection, while single networks proved to complicate the spread of an infection. The two topologies can be compared with them to state which of the two is more resistant to the spread of infections. Comparing, we found that the ER graphs and BA graphs when having a small number of nodes, they are not very different for different value of the infection probability P and beta, although the ER graph shows a small increase in steps of spreading. As the number of nodes increases, the differences between the two graphs become more pronounced, especially in the ER graphs. Figure 50 shows the difference between a single and a two-­‐layer BA random network. Figure 51: Difference between a single BA and two-­‐layer BA random network with nodes equal 1000 for infection probability 0.02 with beta=0.15 By increasing the number of layers, the BA random graphs show a slightly more difficulty with regard to the spread of infection. We can say that for a small number of nodes and for various values of the probability p, the choice between the two 54 models does not really matter but for a larger number of nodes and layers, we prefer ER random graphs, while for small BA random graphs. In the following figure, we can see the difference. Figure 52: Difference between a two, three and four layer BA random network with nodes equal 100 for infection probability 0.07 with beta=0.25 Lastly, we discussed about the combination of two different random networks. In our case we had a multiple one with two layers; one was an ER random network and the other one was a BA one. What we discovered is that each network has its own dynamic and its own peculiarities, which must be taken into account if someone desires to have reliable conclusions about how a virus can spread in such a network. In a random network such the aforementioned, we observe that the infection for mixed number of nodes, spread rapidly. Instead, as the number of nodes increases, the infection spreads faster. Therefore, we can say that the topology of a random graph is indicated for a large number of nodes networks. Below, we have a figure with a comparison of a single ER, single BA and a combined network where we can see the differences. 55 Figure 53: A comparison of a single ER, a single BA, a combination of them with nodes equal 1000 for ER & BA (500 ʹ 1000 nodes) and a combination of them with nodes equal 1000 for BA & ER and (500 ʹ 1000 nodes) for the combined one with P= 0.07 and ɴ=0.15 56 Conclusion In this part of the thesis we come up with the conclusions and propose some future expansions. To summarize, in this thesis we started explaining what random networks and SIR epidemiological model are in details. Then, we explained what multilayer networks are and continued with showing how this model is projected in multiplayer networks and we illustrated what could happen afterwards. We could see that an SIR model is important for modeling infectious diseases by computing the amount of people in a population in a given period of time. A simple simulation in a continuous Markov chain based on this SIR model is constructed to study the spreading of a virus in a population. For that, we had multilayer networks with two, three and four layers and we have done a combination of two different random networks in a multiplex one, in order to see how the SIR model is working and then observe the results. Then, in our results we have some different approaches that were shown and explained. We discovered that the topology of a network has a key role in the spread of an infection. The networks that were selected were two models of simple networks and two models of multilayer networks with different topologies. For the normal networks, it was easy to spread the infection, but it was slower compared to the multilayer ones. The total number of nodes, the number of connections and the probability of infected people, affect the spread of a virus. Specifically, because of more connections that we have in multilayer networks and as the value of beta and probability of spreading the virus is bigger, a greater diffusion occurs. What has an important role is the selection of random networks (in our ER and BA networks) and how they are connected. The choice of interconnection and status of nodes is essential and the need for better protection in real life is a parameter that determines a topology of a network and the choice of it is crucial for having better results in the studies of virus and spreading of a disease in near future. Further experiments could be done on the epidemiological models. In a time that the population grows and technology evolves more and more rapidly, the need for more applications in the science of biology is necessary. 57 The analysis of epidemiological models often deals with several hundred nodes networks. Therefore, it is important to develop new algorithms which are expandable in sizes that characterize a realistic population. Even more, in a distance future, such models, like SIR, would be applicable in the field of computer science, since it is not unlikely that a computer which is infected could immediately be recovered by itself. 58 Acknowledgements I would like to thank my supervisor Matteo Magnani for the help and guidance during my thesis research. I would also like to thank my reviewer, Professor Christian Rohner for all the useful ideas he shared with me, and for all the things I learned during our cooperation. I also thank each member and professors of my department in Computational Science for their support and their help. I would like to thank my friends in Uppsala and Thessaloniki, for insightful ideas, help and support. Last, but certainly not least, I would like to thank my family for the continued assistance and support during the writing of my thesis dissertation. 59 Appendix About R programming language R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-­‐series analysis, classification, clustering,...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-­‐designed publication-­‐quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. 60 A„‘—–Dz‹‰”ƒ’Šdz’ƒ…ƒ‰‡ ͞/ŐƌĂƉŚ͟ ŝƐ Ă ŶĞƚǁŽƌŬ ĂŶĂůLJƐŝƐ ƉĂĐŬĂŐĞ ǁŝƚŚ Ă ĐŽůůĞĐƚŝŽn of network analysis tools with the emphasis on efficiency, portability and ease of use. It has routines for simple graphs and network analysis. Moreover, it can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality indices and much more. The main goals of the ͞igraph͟ library is to provide a set of data types and functions for 1) pain-­‐free implementation of graph algorithms, 2) fast handling of large graphs, with millions of vertices and edges, 3) allowing rapid prototyping via high level languages like R. ͞/ŐƌĂƉŚ͟ ŝƐ open source and free and it can be programmed in GNU, R, Python and C/C++. Creator of the package is Gabor Csardi and the official page of it is http://igraph.org/ 61 Stochastic Process & Markov Chains for SIR S(t), I(t), and R(t) denote discrete random variables for the number of susceptible, infected, and immune individuals at time t, respectively. This SIR epidemic model is a bivariate process because there are two independent random variables, S(t) and I(t). dŚĞƌĂŶĚŽŵǀĂƌŝĂďůĞZ;ƚͿсEо^;ƚͿо/;ƚͿ͘dŚĞďŝǀĂƌŝĂƚĞƉƌŽĐĞƐƐ΂;^;ƚͿ͕/;ƚͿͿ΃ьƚсϬŚĂƐĂ
joint probability function given by P(s,i)(t) = Prob{S(t) = s, I(t) = i} This bivariate process has the Markov property and is time-­‐homogeneous. Transition probabilities can be defined based on the assumptions in the SIR deterministic formulation. First, assume that ȴt can be chosen sufficiently small such that at most one change in state occurs during the time ŝŶƚĞƌǀĂů ѐƚ͘ /Ŷ ƉĂƌƚŝĐƵůĂƌ͕
there can be a new infection, a birth, a death, or a recovery. P(s+k,i+j),(s,i);ѐƚͿсƉƌŽď΂;ѐ^͕ѐ/Ϳс;Ŭ͕ũͿͮ;^;ƚͿ͕/;ƚͿͿс;Ɛ͕ŝͿ΃͕ ǁŚĞƌĞѐ^с^;ƚнѐƚͿо^;ƚͿ͘ Hence, W;ƐнŬ͕ŝнũͿ͕;Ɛ͕ŝͿ;ѐƚͿŝƐĞƋƵĂůƚŽƚŚĞĨŽůůŽǁŝŶŐĞƋƵĂƚŝŽŶƐ͗ ¾ ɴŝƐͬEѐƚĨŽƌ ;Ŭ͕ũͿс;оϭ͕ϭͿ ¾ ɶŝѐƚ͕for ;Ŭ͕ũͿс;Ϭ͕оϭͿ ¾ ďŝѐƚ͕for ;Ŭ͕ũͿс;ϭ͕оϭͿ ¾ ď;EоƐоŝͿѐƚĨŽƌ(k, j) = (1, 0) ¾ ϭоɴŝƐͬEѐƚ о ΀ɶŝнď;EоƐͿ΁ѐƚĨŽƌ (k, j) = (0, 0) ¾ 0, otherwise dŚĞƚŝŵĞƐƚĞƉѐƚŵƵƐƚďĞĐŚŽƐĞŶƐƵĨĨŝĐŝĞŶƚůLJƐŵĂůůƐƵĐŚƚŚat each of the transition probabilities lies in the interval [0, 1]. [14] 62 References [1]Y. Bar-­‐Yam, Dynamics of complex systems (Addison-­‐Wesley, Reading, Massachussets, 1997), 1st ed. 2-­‐9 [2] N. Boccara, Modelling Complex Systems (Springer-­‐Verlag, New York, 2004), 1st ed. 3-­‐4, 12 ΀ϯ΁W͘ƌĚƅƐĂŶĚ͘ZĠŶLJŝ͕WƵďů͘Math. (Debreccen). 6, 290 (1959). 23-­‐25, 62-­‐66, 78 [4] R. Albert and A.-­‐>͘ĂƌĂďĄƐŝ͕ZĞǀ͘DŽĚ͘Phys. 74, 47 (2002). 13-­‐15, 21-­‐ 25, 28-­‐30, 61, 77-­‐78, 93, 108-­‐111 ΀ϱ΁<ŝǀĞůć͕D͕͘ƌĞŶĂƐ͕͕͘ĂƌƚŚĞůĞŵLJ͕D͕͘'ůĞĞƐŽŶ͕:͘W͕͘DŽƌeno, Y., & Porter, M. a. (2014). Multilayer Networks. arXiv, 37. Physics and Society. ΀ϲ΁zĂŒĂŶ͕K͕͘Θ'ůŝŐŽƌ͕s͘;ϮϬϭϮͿ͘ŶĂůLJƐŝƐŽĨĐŽŵƉůĞdžĐŽŶƚĂŐŝŽŶƐŝŶƌĂŶĚŽŵŵƵůƚŝƉůĞdž
networks. Physical Review E, 86(3), 11. Physics and Society. doi:10.1103/PhysRevE.86.036103 [7]Herbert W. Hethcote, The Mathematics of Infectious Disease, SIAM Review, Vol. 42, No. 4, p. 599-­‐653, 2000. ΀ϴ΁ůůĂƌĚ͕͕͘EŽģů͕W͘-­‐͕͘ƵďĠ͕>͘:͕͘ΘWŽƵƌůŽƵů͕͘;ϮϬϬϵͿ͘,ĞƚĞƌŽŐĞŶĞŽƵƐďŽŶĚ
percolation on multitype networks with an application to epidemic dynamics. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 79(3 Pt 2), 036113. Populations and Evolution; Statistical Mechanics. doi:10.1103/PhysRevE.79.036113 [9]Brummitt, C., Lee, K.-­‐M., & Goh, K.-­‐I. (2012). Multiplexity-­‐facilitated cascades in networks. Physical Review E, 85(4), 5. Physics and Society; Disordered Systems and Neural Networks; Statistical Mechanics. doi:10.1103/PhysRevE.85.045102 ΀ϭϬ΁ĞŽŵĞŶŝĐŽ͕D͕͘^Žůğ-­‐ZŝďĂůƚĂ͕͕͘ŽnjnjŽ͕͕͘<ŝǀĞůć͕D͕͘DŽƌeno, Y., Porter, M. Ă͕͘ ͙ ƌĞŶĂƐ͕ ͘ ;ϮϬϭϯͿ͘ Mathematical Formulation of Multi-­‐Layer Networks, 15. Physics and Society; Disordered Systems and Neural Networks; Mathematical Physics; Mathematical Physics. doi:10.1103/PhysRevX.3.041022 [11]Dickison, M., Havlin, S., & Stanley, H. (2012). Epidemics on interconnected networks. Physical Review E, 85(6), 066109. doi:10.1103/PhysRevE.85.066109 [12]Son, S.-­‐W., Bizhani, G., Christensen, C., Grassberger, P., & Paczuski, M. (2012). Percolation theory on interdependent networks based on epidemic spreading. EPL 63 (Europhysics Letters), 97(1), 16006. Data Analysis, Statistics and Probability; Disordered Systems and Neural Networks. doi:10.1209/0295-­‐5075/97/16006 ΀ϭϯ΁ƌƵŵŵŝƚƚ͕͕͛͘͘^ŽƵnjĂ͕Z͘D͕͘Θ>ĞŝĐŚƚ͕͘Ă͘;ϮϬϭϮͿ͘^Ƶppressing cascades of load in interdependent networks. Proceedings of the National Academy of Sciences of the United States of America, 109(12), E680ʹ9. doi:10.1073/pnas.1110586109 [14] Linda J. S. Allen (2008) , An Introduction to Stochastic Epidemic Models, Department of Mathematics and Statistics, Texas Tech University p. 10-­‐11 [15] C. Wang, J. Knight, and M. Elder (2000), Computer viral infection and the effect of immunization, 16th annual computer security applications conference, New Orleans, LA [16] M. Mannan and P. Oorschot, (2005) Instant Messaging Worms, Analysis and Countermeasures, ACM workshop on rapid malcode. 64