DISS. ETH NO. MECHANISMS OF INTERNET EVOLUTION & CYBER RISK A dissertation submitted to ETH ZURICH for the degree of Doctor of Sciences presented by THOMAS-QUENTIN MAILLART Msc. EPFL born on January 6th 1981 in Colmar, France Citizen of France accepted on the recommendation of Prof. Dr. Didier Sornette, examiner Prof. Dr. Georg von Krogh, co-examiner Prof. Dr. Stefan Bechtold, co-examiner 2011 ii Summary The Internet is probably the greatest communication tool ever invented yet. Most of its todays functionalities have been designed by a multitude of entities – individuals, companies, universities, governments – with no central organization. This bottom-up organization has deep implications for the evolution of the Internet itself. In this thesis, the mechanisms of Internet development are investigated, in particular individual and collective contributions to the most complex and adaptive man-made system ever achieved. Most Internet innovations have been achieved by software development, which is made in part by original work and often by reuse of existing source code already made by others. Overall, software forms a complex directed network of modules that require other modules to work. The connectivity of this network is found to exhibit a Zipf’s power law, which is a ubiquitous empirical regularity found in many natural and social systems, thought to result from proportional growth. We establish empirically the usually assumed ingredients of stochastic proportional growth models that have been previously conjectured to be at the origin of Zipf’s law. For that, we use exceptionally detailed data on the evolution of open source software packages in Debian Linux distribution, which offer a remarkable example of a growing complex self-organizing adaptive system. Creation of new packages and exit of obsolete ones characterize the Schumpeterian nature of knowledge reuse in software and as a result in the development of the Internet. The evolution of the Internet is also bounded by its interactions with the humans who shape it. Like for many technological, economic and social phenomena, the Internet is controlled by how humans organize their daily tasks in response to both endogenous and exogenous stimulations. Queueing theory is believed to provide a generic answer to account for the often observed power-law distributions of waiting times before a task is fulfilled. However, the general validity of the power law and the nature of other regimes remain unsettled. We identify the existence of several additional regimes characterizing the time required for a population of Internet users to execute a given task after receiving a message, like updating a browser. Depending on the under- or over-utilization of time by the population of users and the strength of their response to perturbations, the pure power law is found to be coextensive with an exponential regime (tasks are performed without too much delay) and with a crossover to an asymptotic plateau (some tasks are never performed). Thus, the characterization of the availability and efficiency of humans on their interactions with Internet systems is key to understand and predict its future evolution. Among the individuals who shape the Internet, programmers are particularly important because they produce software that enables new functionalities. This work often requires having many developers cooperate to find best designs and correct mistakes. Therefore, their work is the place of intense exchange and interaction. In particular, open source software plays a crucial role in the development of new applications in a self-organized manner. Production of collective goods often requires tremendous efforts over long periods to become relevant and useful. Open source software development can be modeled as a selfexcited conditional Poisson process, for which past actions, trigger – with some probability iii and memory – future actions and joining of new developers. In many large – and successful – projects, these open source “epidemics” are found to be critical, hence just enough active to be sustainable. The main drawback of self-organization is the possibility for some people to develop malicious software and use it with criminal intentions. While the Internet brings useful innovation, it is also a land of risks and uncertainty. To understand cyber risk mechanisms as a component of the Internet evolution, their statistical properties have to be established. Cyber risk exhibits a stable power-law tail distribution of damage, proxied by personal identity losses. There is also evidence for size effect, such that the largest possible losses per event grow faster-than-linearly with the size of targeted organisations. From a risk management perspective, it would be desirable to have proper monitoring infrastructures of the Internet evolution. The Internet is a complex social world with people and organisations engaging in intense communication and thus generating numerous information transactions. Formally, these transactions can be tracked at the Internet Protocol (IP) level, like a ”sniffer” on the link. If gathering, cleaning and storing data is already an issue, analyzing them is a great challenge. From an Internet security perspective, finding and characterizing security anomalies is a cumbersome work that cannot scale with terabytes of data generated by large scale networks. For that, a generalized entropy method called Traffic Entropy Spectrum (TES) has been developed and patented. It allows straightforward visual recognition of security anomalies and machine learning classification. TES is a convenient tool for real time Internet security monitoring. In the future, it could also be used for social monitoring at the IP transmission level, for instance to better assess future Internet infrastructure requirements. iv Résumé Internet est probablement le meilleur outil de communication qui ait été inventé. La plupart de ses fonctionalités ont été conçues par une multitude d’entités – personnes, entreprises, universités, gouvernements – sans organisation centralisée. Cette approche “bottom-up” a des implications profondes sur l’évolution d’Internet. Dans cette thèse, les mécanismes du développement d’Internet sont explorés, en particulier les contributions individuelles et collectives au système le plus complexe et adaptif que l’Homme ait achevé. Sur Internet, la plupart des innovations ont été faites à travers le développement de logiciels, qui consiste en un mélange de travail original et de réutilisation de code source existant déjà fait par d’autres. L’univers du logiciel forme donc un réseau complexe et dirigé de modules qui ont besoin d’autres modules pour fonctionner. La connectivité de ce réseau a la caractéristique de suivre une loi de Zipf, qui est une rare régularité empirique que l’on trouve dans une multitude de systèmes naturels et sociaux, et dont on pense qu’elle est le résultat d’un mécanisme d’accroissement proportionnel. Nous avons établi la validité de cette loi comme un processus stochastique multiplicatif ainsi que conjecturé par la communauté scientifique. Pour cela, nous avons utilisé des données détaillées sur l’évolution des modules “open source” qui forment la distribution Debian Linux, et qui est un exemple remarquable de système émergent et adaptif. La création de nouveaux modules et la disparition de ceux devenus obsolètes caractérise la “destruction créatrice” – selon Schumpeter – de la réutilisation du savoir dans l’univers logiciel, et en conséquence pour le développement d’Internet. L’évolution d’Internet est aussi contrainte par l’interaction avec les hommes qui le façonnent. Comme pour beaucoup de phénomènes technologiques, économiques et sociaux, l’Internet est contrôlé par l’organisation et la gestion des tâches quotidiennes en réponse à la fois à des stimulations endogènes et exogènes. La théorie des files d’attentes semble fournir une réponse générique aux distributions en loi de puissance des temps d’attente avant qu’une tâche ne soit complètement exécutée. Cependant, la validité de la loi de puissance et la nature d’autres régimes observés n’est pas complètement établie. Nous avons identifié l’existence de régimes supplémentaires qui caractérisent le temps requis pour une population d’utilisateurs d’Internet pour exécuter une certaine tâche, comme par exemple mettre à jour le navigateur Web. En fonction de la sous- ou sur-utilisation du temps par la population d’utilisateurs et l’intensité de leur réponse aux perturbations, le régime pur en loi de puissance peut co-exister avec un régime exponentiel dans lequel les tâches sont réalisées sans trop de délai. Il peut aussi co-exister avec une déviation asymptotique vers un plateau. Dans ce cas, certaines tâches ne sont jamais exécutées. Dans tous les cas, la disponibilité et l’efficacité des hommes dans leurs interactions avec Internet est une clé pour comprendre et prédire son évolution future. Parmi les personnes qui façonnent Internet, les programmeurs ont une importance particulière car ils produisent les logiciels qui permettent les nouvelles fonctionnalités. Ce travail requiert souvent que les développeurs travaillent ensemble pour trouver les meilleurs designs mais aussi pour corriger les erreurs. C’est pourquoi, leur travail est un creuset v d’échange et d’interaction intenses. En particulier, les logiciels open source jouent un role crucial pour l’émergence de nouvelles applications. La production de bien collectifs requiert souvent des efforts très importants sur de longues périodes pour devenir utiles. Le développement de logiciels open source peut être modelé comme un processus de Poisson et conditionnel auto-excité, dans lequel les actions passées déclenchent – avec certaines probabilité et mémoire – les actions futures et l’engagement de nouveaux développeurs. Dans beaucoup de grands projets open source qui sont des réussites, ces épidémies de développement sont en régime critique, c’est-à-dire assez actives pour durer. Le principal problème avec l’auto-organisation est la possibilité pour certains d’écrire des logiciels malveillants et de les utiliser à des fins criminelles. Bien qu’Internet offre des innovations utiles à chacun, le réseau est aussi un monde incertain et risqué. Pour comprendre les mécanismes du cyber-risque comme une composante de l’évolution d’Internet, ses propriétés statistiques doivent être établies. Le risque cyber est caractérisé par une distribution des dommages (approximés par les vols d’identité) avec une queue en loi de puissance. On trouve aussi l’existence d’un effet de taille, en cela que les plus grandes pertes possibles par événement croissent de manière super-linéaire avec la taille des organisations visées. Dans une perspective de gestion des risques, on aimerait avoir des infrastructures adéquates pour surveiller l’évolution d’Internet. Internet est monde complexe et social avec des gens et des organisations qui communiquent de manière intense et qui par conséquent génèrent un grand nombre de transactions sur le réseau. D’un point de vue formel, ces transactions peuvent être tracées au niveau du protocole Internet (IP), par l’intermediaire d’un “sniffer” sur le câble de réseau. Cependant, acquérir, nettoyer et stocker les données de surveillance est déjà un défi en soi, mais les analyser est un autre challenge. En ce qui concerne la sécurité d’Internet, trouver et caractériser les anomalies est un travail très difficile qui ne peut pas être effectué à l’échelle des grands réseaux de communication à des coûts raisonnables. Pour cela, nous avons développé et breveté une méthode basée sur l’entropie généralisée, appelée Traffic Entropy Spectrum (TES). Elle permet une reconnaissance visuelle rapide ainsi qu’une classification automatique des anomalies. TES est un outil efficace pour le monitoring de la sécurité en temps réel. Dans le futur, TES pourrait aussi être utilisé pour mieux comprendre les comportements sociaux en prenant l’information au niveau de la couche de transmission IP. Une application pourrait être de mieux anticiper les besoins futurs en infrastructures pour Internet. vi Acknowledgements This thesis was carried under the supervision of Didier Sornette and co-supervision of Georg von Krogh, whom I owe my sincerest gratitude for their outstanding academic coaching. ETH Zurich has been a very nice environment with opportunities for exchange and cross-nurturing of ideas across disciplines and for intense “self-organized” collaboration opportunities. Part of this thesis has been supported by the Swiss National Science Foundation, the D-MTEC Foundation and the Centre for Coping with Crises in Socio-economic Systems (CCSS). The received financial support is gratefully acknowledged. By chronological order, thanks go first to my parents Roselyne and Jean-Claude Maillart who supported and sponsored all the necessary studies that would later allow me engage into PhD studies. I also acknowledge early exposition to risk management and insurance provided by Lauren Clarke. Before joining ETH Zurich and while doing cyber security business, I had the chance to meet Patrick Amon and Arjen Lenstra, whose work made me understand that some problems are very hard to solve and require patience in addition to an intense commitment. This thesis would have not even started without my friend Marc Vogt, who arranged an improbable initial interview with Didier Sornette. My thoughts go also to all the Entrepreneurial Risks group, especially to Heidi Demuth for making paperwork surely less complex that it is, to Georges Harras and Moritz Hetzer for intense discussions, to Ryan Woodard and Maxim Fedorovsky for teaching Python and supporting for computer-related problems. Special thanks go to Gilles Daniel et Riley Crane who were the social engine when I joined the Chair of Entrepreneurial Risks in 2007. I will remember great moments of discussion with the people at the chair of Strategic Management and Innovation at the coffee machine and at the Monte Verita conference in 2010. I also acknowledge help from numerous master students and in particular Thomas Frendo, for his work on open source software. I would like to thank the team of Bernhard Plattner at the Electrical Engineering department – namely Daniela Brauckhoff, Stefan Frei and Bernhard Tellenbach – who warmly welcomed fruitful collaborations. My thoughts go also to Thomas Duebendorfer at Google Switzerland. Thanks go also to Barbara van Schewick who invited me to visit the Centre for Internet and Society at Stanford University in 2009. These three years learning research would never have been so rich without the passionated supervision and friendship of Didier Sornette who taught me far much more than what I could have expected. In particular, I learned that no job can be better than when done with joy and playfulness. Special thanks go to Marie Schaer for her love and support. Her passion for research helped me seriously consider starting a PhD. Her academic experience has also been a precious guide to avoid many pitfalls that necessarily arose over time. vii viii Vita 1981 – 1999 Born and grown up in Colmar, France, 1997 – 1999 Scientific baccalaureate, lycée Bartholdi, Colmar, France, 1999 – 2005 Master of science in civil engineering, EPFL, Lausanne, 2002 – 2003 Exchange student, Technische Universität (TU), Berlin, 2005 – 2006 Project Manager, ilion Security S.A., Geneva, 2006 – 2007 Co-Founder and Manager, IRIS Solution S.A., Geneva, 2007 – 2011 Ph.D. candidate and teaching assistant, Department of Management, Technology and Economics, ETH Zurich. ix x Contents Summary iii Résumé v Acknowledgements vii Vita ix 1 Introduction 1 1.1 The “End-to-End” Argument . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Emergence of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 The Hacker Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Code is Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Background 2.1 9 Measuring and Modeling the Global Internet . . . . . . . . . . . . . . . . . 10 2.1.1 Measuring the Global Internet . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Internet Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Internet Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Virtual and Social Networks . . . . . . . . . . . . . . . . . . . . . . . 12 xi 2.2 Modularity & Knowledge Reuse . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Modularity and Industrial Design . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Knowledge Reuse : A Complex Modular System . . . . . . . . . . . 15 Individual and Collective Human Dynamics . . . . . . . . . . . . . . . . . . 17 2.3.1 Human Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Collective Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Internet Security & Cyber Risk . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Monitoring Internet Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5.1 Self-Similarity of Traffic Time Series . . . . . . . . . . . . . . . . . . 24 2.5.2 Network Anomaly Detection 25 2.3 . . . . . . . . . . . . . . . . . . . . . . 3 Discussion & Perspectives 27 3.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Human Contributions and Dynamics of Innovation . . . . . . . . . . . . . . 29 3.2.1 Multivariate Hawkes Processes . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Micro Mechanisms of Cooperation in Empirical Data . . . . . . . . . 30 3.2.3 Code Mutations to Measure Probability of Innovation Occurrence . 31 Beyond the Zipf’s Law and Proportional Growth . . . . . . . . . . . . . . . 32 3.3.1 Deviations from the Zipf’s Law . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Coexistence of Multiple Proportional Growth Regimes . . . . . . . . 33 Cyber Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.1 The Innovative Nature of Cyber Risk . . . . . . . . . . . . . . . . . 34 3.4.2 Economics of Cyber Crime . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.3 Vulnerabilities versus Damage . . . . . . . . . . . . . . . . . . . . . . 37 3.3 3.4 Concluding Remarks 39 xii Chapter 1 Introduction The Internet is an amazing communication system, which has taken a prominant place in people’s life. With Arpanet – the first implementation of the Internet – a long range comunication was established between University of California, Los Angeles (UCLA) and Stanford Research Institute on October 29, 1969. More than twenty years later – in 1992, the U.S. Congress allowed commercial activity on a network that had mainly be used for educational and research purposes so far. In 2010, the Internet has become a network of more than 700 millions active computers (see Figure 1.1) [1], and almost 2 billion users who constantly search, exchange and produce information, share pictures and videos, buy and sell goods, etc [2]. To enable all these functionalities, several billions of lines of source code1 have been produced very often by companies but also by individuals or groups of individuals (see [3] for an estimation of open source code). Furthermore, in many cases, major innovations have emerged by the action of highly motivated individuals. At each technological step, major companies have been forced to adapt or to lose significant influence on the information and communication technology (ICT) market due to unexpected competition. For instance, the invention and the rise of Google as a new search engine in the late 1990’s has reshuffled the ICT market only in a few years. Nowadays, Facebook is more and more challenging Google. How is it possible that David so often beats Goliath? Is it a matter of chance only, or are there some rules that trigger this massive bottom-up innovation? Actually, it can be shown that some formal and informal rules, set at the early stages of the Internet development and later on, enabled the innovation potential of the Internet. 1 The source code is any collection of statements or declarations written in some human-readable computer programming language. Source code is the means most often used by programmers to specify the actions to be performed by a computer. 1 1. Introduction 2 Fig. 1.1: Super-linear evolution of computers connected to the Internet, recorded by the Internet Systems Consortium [1]. 1.1 The “End-to-End” Argument When the Internet was being designed in the seventies, much discussion was going on to decide how communication between computers should be handled above the physical link (e.g. ethernet cables, phone lines). A very important problem at the time, was to cope with datagram2 transmission errors due to the lack of reliability. At first, it was thought that the routing protocol, which handles datagram transmission, should also include error control. This protocol was called the Internetwork Transmission Control Protocol (ITCP). It included datagram transfer and error control features. In January 1978, the decision was taken to split the two functionalities into two layers. The Internet Protocol (IP) would be dedicated to routing packets on the network on hop-by-hop manner, through a succession of routers. An additional layer – the transport layer – was created with the possibility to implement several protocols. Among them, two protocols are widely used today: the Transmission Control Protocol (TCP) handles error control, while the User Datagram Protocol (UDP) offers unreliable datagram service, but faster transmission. The main reason for not implementing systematic error control was indeed performance. TCP requires a dialog between sending and destination hosts to acknowledge proper transmission, while UDP just sends packets with no control. Figure 1.2 shows a schematic representation of the actual layered architecture of the Internet, with the characteristic “hourglass” form and the Internet Protocol (IP) as a unique point of communication between multiple physical links and multiple transport protocols and applications. A few years later, the Internet “founding 2 Datagram : basic transfer unit associated with a packet-switched network in which the delivery, arrival time and order are not guaranteed. A datagram consists of header and data areas, where the header contains information sufficient for routing from the originating equipment to the destination without relying on prior exchanges between the equipment and the network. The source and destination addresses as well as a type field are found in the header of a datagram. 1. Introduction 3 fathers” realized they add achieved a fundamental design principle, called the “end-to-end” argument, which states that application-specific functionality usually cannot – and preferably should not – be implemented in the lower layer of the network, the network’s core. Instead, a function should only be implemented in a network layer if it can be completely and correctly implemented at that layer and used by all clients of that layer [4]. Fig. 1.2: Internet Layering In summary, the more specific a function, the higher in the layering of the Internet architecture it should be implemented, and thus the whole system should not be constrained by rules that would not concern all stakeholders. This design principle has fundamental consequences for the use of the Internet. For instance, the impossibility to discriminate packet transmission according to their function, their origin and destination has, until recently, enforced the equality of treatment between all stakeholders from the individual to the largest companies and organisations3 . Thus, the “end-to-end” argument is thought to be the most fundamental design choice of the Internet, which has paved the way to massive bottom-up Internet application development. 1.2 Emergence of Applications While the Internet was being developed, a computer revolution happened. In the late 1970s and early 1980s, the first personal computers were commercialized and popularized. For the first time, anybody could buy a computer, learn programming languages (mainly 3 Outlawing discrimination on the Internet is the goal of the proponents of network neutrality, a bill that has raised a heated debate at the U.S. Congress in the last decade and is still unsettled. Their main argument is that discrimination would drastically reduce the opportunities for bottom-up innovation [5]. 1. Introduction 4 BASIC and FORTRAN at the time) and write executable programs at home. Somehow, like the hourglass structure of the Internet, MS-DOS operating system and later on Microsoft Windows, provided a platform on which software could be easily written by third parties. Furthermore, portability eased diffusion of software along with the tremendous increase of the personal computers market (See Figure 1.1). Nowadays we witness a similar explosion of mobile applications along with the spread of IP enabled smartphones. The prominent provider to date is the Apple Apps Store for the Iphone and subsequently for the iPad, which started in July 2008 with 500 applications available for download. By September 2010, this number has exploded to more than 250′ 000 applications. The strategy is even to lower the entry barriers for program development and maximize portability. In order to attract more developers, Google has recently launched the Google App Inventor for those “who are not developers” [6]. The emergence of bottom-up produced applications and even devices has been thoroughly documented by Zittrain [7]. He also explained why bottom-up development of applications is threatened by various industry arguments, and in particular security and copyright enforcement. However, reducing the success of “generative Internet” to technical features would be very reductive. Source code get written because people – alone and within organisations – commit at doing it, and because there is a need. 1.3 The Hacker Community The term “hack” has probably been first introduced at the Massachussets Institute of Technology (MIT) Tech Model Railroad Club (TMRC), which is one of the most famous model railroad club in the world. The club’s members shared a passion to find out how things worked and then to master them. Members had a self-authoritative social organisation and disliked authority. In particular, they were initially drawn to the first multi-million dollar computers that were installed in the late 1950’s, with the goal to understand how they worked. Many of these hackers were either freshmen, or even high-school students, used to spend nights coding, considered lectures as a waste of time. The hacking culture then spread in the academic community and became very strong in University of California, Berkeley, Stanford University, and Carnegie Mellon University, Pittsburgh. This community was disparate and many subcultures existed, but they all shared some common traits, like creating and sharing software and hardware, placing a high value on freedom on inquiry, hostility to secrecy, information-sharing as both an ideal and a practical strategy, upholding right to fork4 , emphasis on rationality, distate for authority, playful cleverness. In the seventies, the first personal computers were invented, by the “hardware hacking” movement 4 Fork: In software engineering, a project fork happens when developers take a legal copy of source code from one software package and start independent development on it, creating a distinct piece of software. The term fork derives from the use of the term in computer operating systems where it refers to the creation of a copy of a running program (process) by forking a process, which creates two identical processes (like cell division in living systems); the two are then independent, and may proceed to do different tasks, as the program dictates (Wikipedia ). 1. Introduction 5 and homebrew computer club in Stanford. Steve Wozniak’s Apple I – sold at price 666$ in 1976 – was the first computer to be adopted by households [8]. Part of the hacker’s soul was the imperative need to explore how things worked, with a preference for inaccessible things, like the phone companies networks or simply secrets kept behind closed doors. As a result, many hackers have been involved in breaking into systems, not for criminal purpose, but rather to challenge security as a kind of illegitimate authority and to satisfy their own curiosity, learn how hidden things work. However, when the Internet started to democratize in the nineties, with companies and individual users exchanging emails, surfing the nascent World Wide Web, the first serious security issues appeared, with massive worm (self-replicating viruses) and denial of service attacks. If some of these attacks were perpetrated with malicious intentions, most of them have been designed to demonstrate the lacks of security of companies and organisations as well as flaws in software, which was mainly closed source at the time. Following first large scale attacks, some hackers were sued for having breaked into private – and supposedly secured – systems (see the Hacker Manifesto reproduced on the next page). For hackers, the concept of closed source software has always been against their ideal, but their main claim against it, was that secrecy prevented auditing and thus improvement toward reduction of vulnerabilities. Software editors claimed that code secrecy and vulnerability non-disclosure would prevent finding and exploiting flaws, while hackers have been proponents of full-disclosure, which means that when a vulnerability is found, it should be made publicly available, thus forcing software editors to react fast, develop patches and deploy them to their customers [9]. Nowadays, the full-disclosure strategy has been widely accepted as the best way to reduce software vulnerability . In 1985, in order to promote the already declining hacking culture, Richard Stallman – the last “true hacker” – created the Free Software Foundation and popularized the concept of copyleft, a legal mechanism to protect the modification and redistribution rights for free software, which has enabled a regulatory framework for Free/Libre/Open Source Software thereafter called Open Source Software (OSS). The GNU General Public License (GPL) was written in 1989 and ensures freedom to reuse (copyleft) with viral propagation to work derived from original GPL, which can only be distributed under the same license terms. As of today, more than 500′ 000 OSS projects have been referenced [10]. Beyond the ideology, the hacker culture has been widely recognized as useful even for commercial purpose. When Lego Inc released their Mindstorm – a line of programmable robotics toys compatible with traditional legos – they first targeted kids as a primary market. After a few months, not only they discovered that engineers in the Silicon Valley had widely adopted it, but their hacks significantly enhanced performance. A users community was born, which helps the development and improvement of Mindstorm products [11, 12]. Nowadays, the hacking culture is predominant in Internet startups, and recruitment is made to attract such profiles, with comfortable monetary incentives and dedicated working environment. 1. Introduction 6 The Hacker Manifesto by +++The Mentor+++ Written January 8, 1986 Another one got caught today, it's all over the papers. "Teenager Arrested in Computer Crime Scandal", "Hacker Arrested after Bank Tampering"... Damn kids. They're all alike. But did you, in your three-piece psychology and 1950's technobrain, ever take a look behind the eyes of the hacker? Did you ever wonder what made him tick, what forces shaped him, what may have molded him? I am a hacker, enter my world... Mine is a world that begins with school... I'm smarter than most of the other kids, this crap they teach us bores me... Damn underachiever. They're all alike. I'm in junior high or high school. I've listened to teachers explain for the fifteenth time how to reduce a fraction. I understand it. "No, Ms. Smith, I didn't show my work. I did it in my head..." Damn kid. Probably copied it. They're all alike. I made a discovery today. I found a computer. Wait a second, this is cool. It does what I want it to. If it makes a mistake, it's because I screwed it up. Not because it doesn't like me... Or feels threatened by me.. Or thinks I'm a smart ass.. Or doesn't like teaching and shouldn't be here... Damn kid. All he does is play games. They're all alike. And then it happened... a door opened to a world... rushing through the phone line like heroin through an addict's veins, an electronic pulse is sent out, a refuge from the day-to-day incompetencies is sought... a board is found. "This is it... this is where I belong..." I know everyone here... even if I've never met them, never talked to them, may never hear from them again... I know you all... Damn kid. Tying up the phone line again. They're all alike... You bet your ass we're all alike... we've been spoon-fed baby food at school when we hungered for steak... the bits of meat that you did let slip through were pre-chewed and tasteless. We've been dominated by sadists, or ignored by the apathetic. The few that had something to teach found us willing pupils, but those few are like drops of water in the desert. This is our world now... the world of the electron and the switch, the beauty of the baud. We make use of a service already existing without paying for what could be dirt-cheap if it wasn't run by profiteering gluttons, and you call us criminals. We explore... and you call us criminals. We seek after knowledge... and you call us criminals. We exist without skin color, without nationality, without religious bias... and you call us criminals. You build atomic bombs, you wage wars, you murder, cheat, and lie to us and try to make us believe it's for our own good, yet we're the criminals. Yes, I am a criminal. My crime is that of curiosity. My crime is that of judging people by what they say and think, not what they look like. My crime is that of outsmarting you, something that you will never forgive me for. I am a hacker, and this is my manifesto. You may stop this individual, but you can't stop us all... after all, we're all alike. 1. Introduction 1.4 7 Code is Law In [13], Lessig proposed that developers are the “lawmakers” of the Internet. What they code determines most of the actions that can be performed on the Internet. Indeed, if a program doesn’t provide a desired functionality or restrains copy through digital right management, a user will never profit from this functionality. Only a developer – with the rights and the skills to change the sourcecode – can change the program by adding functionalities, removing restrictions, etc. Law scholars have called this phenomenon “Lex Informatica” or “Code is Law” [14]. Actually, many similar examples can be found in nature and man-made infrastructures: obviously, the world we live in is constrained by physical laws, such as gravitation, For instance, traffic regulation can be obtained by using it: a simple example is a speed bump installed on street to slow cars. Many similar examples can be found in the tangible. 1.5 Research Question Therefore, in order to understand the mechanisms of Internet evolution, one must recognize (i) the importance of source code, (ii) people who produce it and (iii) the adoption of this code by users. Indeed, a popular software will have much more influence compared to a program used only by a few people. As it will be shown later, software adoption by users has critical consequences for Internet security. Considering that millions of individuals and companies write source code for personal or commercial purposes. Under some circumstances (e.g. open source software), this code can be reused by others. Software must also be adapted to meet evolving needs. For large software, several programmers may generally work together to cope with development and maintenance. Thus, the Internet at runtime is mainly the result of maelstrom of source code mutations triggered by developers – within companies or open source – with heterogenous needs, desires, ideologies, skills, which can lead to brillant innovations or malicious exploitation of weaknesses. Uncovering the complex mechanisms leading to innovation and insecurity at the same time is the goal of the present thesis. It will be shown that the Internet is a complex adaptive system, where the interaction between the technical features (source code, software) and humans (developers, users) plays a central role on the evolution of the Internet. Chapter 2 Background The Internet has received much attention from researchers since its inception. Contributions have been made from computer scientists who have been mainly concerned with improving technology, physicists who are concerned with complexity and emergent properties, social scientists and economists who found a tremendous in vivo social laboratory to understand and model social ties. Early on, the networked nature of the Internet has been recognized, and triggered tremendous advancements and applications of graph theory (Section 2.1). Developments in knowledge reuse and modularity in management science have given insights on the mechanisms of Internet innovation (Section 2.2). By construction, the evolution of the Internet is completely tied by human dynamics. Therefore, we shall also review developments in individual and collective dynamics (Section 2.3). Finally, Internet security has been a critical concern for safeguarding the development of online tools, and has received much attention from technical and economic viewpoints (Sections 2.4 and 2.5). 9 2. Background 2.1 10 Measuring and Modeling the Global Internet The research concerning the Internet as a complex system has mostly been done by adopting a complex network modeling approach to the problem. 2.1.1 Measuring the Global Internet In the late nineties, when the Internet became popular and was growing up, researchers and engineers were concerned with measuring the size of the Internet. The most strenuous question at the time was to make sure that the network would be able to sustain its growth and whether engineers would be able to manage it. For that, several studies aimed at tracking and visualizing Internet large-scale topology and/or performance and providing Internet mapping at different resolution scales [15]. A first approach consisted in mapping connections between autonomous systems (AS), which are autonomously administered domains of the Internet [16]. This approach is limited by the heterogeneity of functions that each AS can take: transit (backbone functions), stub (local delivery) or multihomed (combination of various functions). Fig. 2.1: The Internet domain structure. Filled nodes represents individual routers. Hollow regions and shaded regions correspond to respectively stub and transit autonomous systems (AS). Reproduced from Pastor-Satorras and Vespignani [17]. 2. Background 11 Fig. 2.2: Two-dimensional image of a router level Internet map collected by H. Burch and B. Cheswick. Reproduced from Pastor-Satorras and Vespignani [17]. Following recent developments in network theory, it was soon found that the Internet had characteristic properties: (i) average shortest path length among vertices with small-world properties [18], compared to the size of the network and, (ii) clustering and hierarchy through heavy-tailed distribution of routers connectivity, i.e. scale-free properties [19] (see [17] p.48, for an exhaustive review). 2.1.2 Internet Models Given that characteristic features of the Internet were uncovered, the interest shifted to models and more specifically to generating mechanisms in order to replicate the evolution of the Internet. First, it was confirmed that the Internet could not be a random “Erd¨ösRényi” graph [20], because the latter cannot exhibit scale-free properties although it can be small-world. In 1999, a first growing network model – “preferential attachment” – was introduced to explain the growth of the World-Wide Web [21], which is nothing else than the proportional growth model proposed by Simon in 1955 [22]. This model has the advantage to explain the emergence of scaling properties in complex systems. Despite several improvements and massive reuse by the scientific community, this model has never been properly validated. Moreover, many other mechanisms can generate power laws, including self-organized criticality [23], highly optimized tolerance (HOT) [24] (see ch.14 in [25] for a review of these mechanisms). 2. Background 2.1.3 12 Internet Robustness In the quest for better control over the Internet, tolerance to errors (random failure) and malicious attacks (see Section 2.4) has been investigated, mainly from a theoretical viewpoint, using random and targeted node removal [26, 27]. Percolation theory [29] was used to model cascading failures, in particular scale-free networks. However, detailed empirical investigation has shown that theoretical models generally fail to capture the real structure of the Internet, as well as their single points of failure, namely because the Internet has been optimized – by the way of engineering – to be robust against outages and attacks [28]. 2.1.4 Virtual and Social Networks The Internet is called a “network of networks”. This expression relates to the very loose and changing nature of the Internet and its functions. For telecommunication engineers the Internet is the routing network, for users it can be the World Wide Web, their social network (e.g. Linkedin, Facebook). Also, each layer is deeply influenced by the others: on the one hand, the application layer is bounded by the link, the Internet Protocol (IP) and the Transport layers and on the other hand the link and the IP layers cannot neglect the actions taken at the above layers, for maintenance, security and maintenance reasons (see Section 1.1). Among the multiple networks operating at the application layer, four have received much attention: • The most famous is the World Wide Web (Web), invented by Tim Berner-Lee in 1990 at the European Organization for Nuclear Research (CERN) along with the first Web browser program. The basic structure of the Web is a network of directed hyperlinks1 between Webpages. The Web has rapidly grown to become the main medium to publish and share information, engage into e-business. Like the Internet, its structure has been found to be scale-free – with a power law distribution of incoming links [30], and mechanisms have been proposed to explain this growth [21, 31], and again with many improvements, like fitness models [32]. It is worth noting that research on complex networks has deeply inspired the pagerank search algorithm for Google search engine [33]. Using the Web, the first studies on communities have been performed on human dynamics [34], social structures , which are still ongoing nowadays and refined thanks to improved tools for social networking (see [35] and references therein). 1 Hyperlink: reference to a document that the reader can directly follow. The reference points to a whole document or to a specific element within a document. Tim Berners-Lee saw the possibility of using hyperlinks to link any unit of information to any other unit of information over the Internet. Hyperlinks were therefore integral to the creation of the World Wide Web. Web pages are written in the hypertext mark-up language HTML (source: Wikipedia). 2. Background 13 • Email communication networks have also received some attention early on, especially for Worm spreading [36, 37] and email community networks with self-similar organization [38] and long time memory processes in email treatment dynamics [39]. • Peer-to-Peer (P2P) systems are ad-hoc networks built at the application level, with their own routing network. Each peer – usually a home computer – is at the time a client and a server, respectively asking for data and sending data to other peers. “Pure” P2P networks have no central command, and each peer is authenticated by its neighbors. Famous P2P networks are Skype (online phone system), and filesharing networks (e.g. Gnutella), mainly used for music and movie sharing. When these networks started to develop, much concern was raised on their scalability, namely because the amount of data transfered became rapidly very large [41, 42]. • Instant Messaging has also received some attention. This communication network has been found to be scale-free [43] and small-world, with age and gender homophily [44]. • Investigations of Massive Multiplayer Online Games (MMOGs) [45, 46, 47], and their social networks has recently started with strong focus on homophily [48, 49]. • Open Source Software (OSS) community is probably one of the oldest online communities, as a transposition of the hacker ethic to the Internet. Naturally, its social structure has also been investigated with traditional network metrics (clustering, degree distribution, shortest path, etc. [50]), using web forums, email networks and communication infrastructures [51], as well as version control [52]. This research is mainly framed in management and economics sciences, because OSS has been recognized as an archetypical example of collective action [53]. 2.2 Modularity & Knowledge Reuse As discussed above, many Internet features can be modeled by the way of physical, logical, social networks with various features. However, their underpinning goal remains unique: exchanging information and in many cases sharing knowledge. This has rapidly led to two consequences: first, given increasing storage capabilities, more knowledge has become available over the years, and second, because it is often free, knowledge has been reused. For instance, hyperlinks are useful to reuse knowledge from a third party Web page, thus preventing hard copy of this page. Also, linking clearly helps keeping up-to-date, in case the cited page would change. Another case is software, in particular open source software (OSS) studied in this thesis as special kind of knowledge, and for which parts of the source code are reused by others. Many examples in natural and social systems, display modularity, as a kind of organisation, which can be top-down or rather self-organized. In The Architecture of Complexity (1962) [54], Herbert Simon showed the advantages of modularity with a parable of two watchmakers who organize their work – building a watch 2. Background 14 – in two different manners: one has a pure sequential assembly process and the other introduced modularity in the building process. The second process has proved to be more resilient to perturbations. This parable has been one of the first conceptualizations of modularity related to innovation and complexity. In the broad context of knowledge and technology reuse, modularity is ubiquitous in the structure of the Internet itself (see Section 1.1), in personal computers and software but also in biological systems [55]. The Watchmakers Parable There once were two watchmakers, named Hora and Tempus, who made very fine watches. The phones in their workshops rang frequently and new customers were constantly calling them. However, Hora prospered while Tempus became poorer and poorer. In the end, Tempus lost his shop. What was the reason behind this? The watches consisted of about 1000 parts each. The watches that Tempus made were designed such that, when he had to put down a partly assembled watch, it immediately fell into pieces and had to be reassembled from the basic elements. Hora had designed his watches so that he could put together sub-assemblies of about ten components each, and each subassembly could be put down without falling apart. Ten of these subassemblies could be put together to make a larger sub-assembly, and ten of the larger sub-assemblies constituted the whole watch. Herbert Simon, The Architecture of Complexity (1962). 2.2.1 Modularity and Industrial Design In well-controlled environments modularity can be implemented as a process. For instance, the industrial process behind the production of a good or a service can be modularized to make it more efficient and less fragile [56]. In [57], Baldwin and Clark describe several mechanisms called modular operators as a list of things that designers can do to a modular system: (i) splitting a design into modules, (ii) substituting one module design for another, (iii) augmenting by adding a new module to the system, (iv) excluding a module from the system, (v) inverting to create new design rules and (vi) porting a module to another system. Modularity has played a major role in the design of computer systems, respectively hardware and software. For instance, the central processing unit (CPU) is the basic “Lego” of all computers, which is itself made of millions of transistors2 that are necessary to perform all mathematical operations required by software execution. Most personal computers (PC) 2 A transistor is an electronic component with at least three terminals for connection to an external circuit. It is mainly used to amplify and switch electronic signals . A voltage or current applied to one pair of the transistor’s terminals changes the current flowing through another pair of terminals (source Wikipedia). 2. Background 15 are made of components (e.g. processor, mainboard, hard disk, ram memory, keyboard, mouse, screen), which are designed to work together, but can be separately replaced and upgraded. Modularity may also apply to abstract systems, such as knowledge and design processes. An example of the effects of modularity in open source software (OSS) has been proposed by MacCormack et al. [58] as well as by Challet et Lombardoni [59], for dependencies respectively between Red Hat Linux packages and files in source code. Both recognized the importance of propagation costs: a change in a module may trigger consequences on several other modules. MacCormack et al. [58] showed that when a software (Mozilla) undergoes massive re-engineering toward more modularity, its propagation costs are slightly reduced, making maintenance simpler (See Figure 2.3). 1 Mozilla.19980408 1684 1 1 1 1684 1508 Mozilla.19981211 1508 Fig. 2.3: Design Structure Matrix (DSM) of Mozilla showing the relation between modules (source files) in the code. Each point represents a reuse from one module (columns) by another (rows). Modules have been ordered by clusters of reuse (squares on the diagonal line). The left (resp. right) panel shows modularity before (resp. after) large reengineering in order make the code more modular for an open source model. Actually, Mozilla was created out of a proprietary software –Netscape– that America Online (AOL) decided the release as open source. After the re-engineering operation the number of modules has decreased, and component reuse is better organized around well identified clusters. (reproduced from MacCormack, A. et al. (1999) [58]) 2.2.2 Knowledge Reuse : A Complex Modular System The conditions and the processus that give rise to modularity can be relaxed to apply to complex systems. Indeed, biological systems are less the result of a sophisticated topdown design process than an evolutionary adaptation to a changing environment. In some 2. Background 16 sense, this is nothing else than a “try and fails” process, which leads to self-organization and innovation. For that, organisms as well as humans never start from scratch, but – rather naturally – reuse components (resp. resources) available in their environment [55]. The processus of components reuse and integration has been investigated in the context of firms [60, 61]. Considering the Internet, knowledge reuse has become the norm. The multitude of hyperlinks on the World Wide Web is probably the best example of knowledge reuse as a giant citation network. Similarly, the introduction of copyleft license (see section 1.3) has enabled reuse of source code at large scales by the community of developers. In the particular context of OSS development, the organization in wordwide communities or projects, the Internet-based communication between developers and the open code base introduce many opportunities for reuse of knowledge. Haefliger et al. [62] found that developers reused software for three reasons: (i) integrate functionality quickly, (ii) write certain parts of the code over others and (iii) mitigate development costs through code reuse. However, everything comes at a cost, and in the case of codified knowledge (i.e. source code) integrations costs may be up to 200% of development costs [63]. Moreover, the absence of an incentive mechanism – generally required for reuse in firms [64] – might inhibit actual reuse [65]. Considering explicit knowledge reuse, the tree of dependencies can be investigated by measuring calls to external code, i.e. code from another module3 . At large scales, the network of dependencies can be captured by inspecting the tree of dependencies in a Linux distribution, which aggregates several thousands of open source projects in a comprehensive tree. These data can be easily extracted and the distribution of reuse found to be heavytailed [59, 66]. However, dynamics of reuse in the context of a large ecosystem of source code development has remained unexplored so far. Indeed, complex adaptive systems, such a large scale software reuse networks are not centrally managed, and rather obey some evolutionary rules. Hence, modularity and moreover its dynamical properties remains unknown so far, although they have important consequences for understanding how technological innovation emerges. 3 Note: Dependencies can be analyzed either by their structure (which source file calls external code?) or their function (how many times external code is called during execution?). 2. Background 2.3 17 Individual and Collective Human Dynamics The Internet is a complex adaptive system driven by humans. Therefore, every change is the result of individual actions, their contingencies and how they react alone or collectively to various stimuli. Our concern are human features that have consequences for mechanisms of Internet evolution. For that, we review previous work on priority queueing and subsequent long range memory processes it generates, as a fundamental variable for human dynamics. Also, many Internet-based social networks exhibit collective behaviors as a result of crossstimulation between individuals. These complex dynamics are critical for the Internet as a system that has emerged from massive collaboration between individuals. Fig. 2.4: The correspondence patterns of Darwin and Einstein. a. Historical record of the number of letters sent (Darwin, black; Einstein, green) and received (Darwin, red; Einstein, blue) each year by the two scientists. An anomalous drop in Einstein’s correspondence marks the Second World War period (1939–45, boxed). Arrows, birth dates of Darwin (left) and Einstein (right). b. and c. Distribution of response times to letters by Darwin and Einstein, respectively. Note that both distributions are well approximated with a power-law tail that has an exponent 3/2, the best fit over the whole data for Darwin giving 1.45 ± 0.1 and for Einstein 1.47 ± 0.1. (reproduced from Oliveira and Barabási, Nature (2005) [69]). 2.3.1 Human Timing Many Internet dynamics are controlled by human behaviors and the way they organize and manage their own time. Recent studies of various social systems have established the 2. Background 18 remarkable fact that the distribution Q(t) of waiting times between the presentation of the message and the ensuing action has a power law asymptotic of the form Q(t) ∼ 1/tα , with an exponent α often found smaller than 2. Examples include the distribution of waiting time until a message in answered in emails [39] and in other human activity patterns, like web browsing, library visits, or stock trading [68]. Fig. 2.4 shows the distribution of waiting times before correspondence has been answered by respectively Darwin and Einstein [69]. These observations can been rationalized by simple priority queueing models that describe how the flow of tasks falling on (and/or self-created by) humans are executed with an arbitrary priority [68, 69, 70, 71]. Assuming that the average rate λ of task arrivals is larger than the average rate µ for executing them, and using a standard stochastic queuing model wherein tasks are selected for execution on the basis of random continuous priority values, Grinstein and Linsker derived the exact overall probability per unit time, pdf(t), that a given task sits in the queue for a time t before being executed [72]: pdf(t) ∼ 1 t5/2 pdf(t) ∼ et/t0 , 1 t3/2 , for µ > λ , for µ ≤ λ , (2.1) (2.2) where µ (resp. λ) is the rate of incoming (resp. executed) tasks and t0 is the scaling time of the exponential crossover. Grinstein and Linsker showed that the distribution (2.2) is independent of the specific shape of the distribution of priority values among individuals [73]. The value of the exponent p = 3/2 is compatible with previously reported numerical simulations [69, 68] and with most but not all of the empirical data. While Grinstein and Linkser [72, 73] could derive exact solutions for priority queuing, the model lacks empirical validation. Unfortunately, while some evidence confirms the analytical results for µ ≤ λ [69, 78] and for µ > λ [79], significant deviations from the canonical exponents can be observed. For instance, the probability density function (pdf) of waiting times before people answer an email is found to be a power law with exponent α ≈ 1 [39, 71]. Moreover, distributions often exhibit power law behavior with asymptotic crossovers with exponential (resp. plateau) distribution. Saichev and Sornette proposed that the standard priority queuing model can be extended with incoming (resp. outgoing) rate of tasks slowly varying over time when µ ≤ λ . In this case, the distribution of waiting times can depart from the power law with exponent 1/2, exhibiting exponents varying from 0.3 to ∞ [74]. 2.3.2 Collective Behaviors Among human dynamics, collective behaviors play a fascinating role. How people get influenced by others? When does herding effect give rise to social epidemics? More generally, it relates to how people influence each others and how the action of individual triggers (resp. is triggered by) the action(s) of others. To account for these complex and intricate causal 2. Background 19 dynamics, Sornette et al. proposed a coarse grained approach to detect and categorize epidemics [75, 76]. A first validation of the endogenous versus exogenous shocks theory was performed using dynamics of book sales on Amazon [77], and further confirmed by a systematic classification of Youtube videos [78]. To account for complex triggering effects, it is convenient to use a self-excited conditional Poisson process, which basically states that for a given system, each agent is subjected to endogenous shocks which are triggered between agents with a given memory function and exogenous shocks occurring as a renewable process [80]. It is mathematically formulated as follow, λ(t) = V (t) + X νi φ(t − ti ) (2.3) i,ti ≤t where νi is the number of potential persons who will be influenced directly over all future times after ti by person i who acted at previous time ti . Thus, the existence of wellconnected individuals can be accounted for with large values of µi . V (t) is the exogenous source, which captures all spontaneous views that are not triggered by epidemic effects on the network. The memory kernel is given by, φ(t − ti ) ∼ 1/t1+θ , with 0<θ<1, (2.4) For θ = 0.5, the standard priority queueing described above is recovered. The equation (2.3) can be solved using a mean-field approximation for various values of hµi. For hµi > 1 the process is supercritical with exponential growth [75]. At criticality (hµi = 1), three distinct results are obtained [76], and presented below and in Figure 2.5 in the context of YouTube videos [78]: • Exogenous sub-critical. The network is not “ripe” (that is, when connectivity and spreading propensity are relatively small), corresponding to the case when the mean value hµi i of µi is less than 1, then the activity generated by an exogenous event at time tc does not cascade beyond the first few generations, and the activity is proportional to the direct (or “bare”) memory kernel φ(t − ti ) : Abare (t) ∼ 1 , (t − tc )1+θ (2.5) with tc the initial exogenous shock. • Exogenous critical. If instead the network is “ripe” for a particular video, i.e., hµi i is close to 1, then the bare response is renormalized as the spreading is propagated through many generations of viewers influencing viewers influencing viewers, and the theory predicts the activity to be described by [76]: Aex−c (t) ∼ with tc the initial exogenous shock. 1 , (t − tc )1−θ (2.6) 2. Background 20 • Endogenous critical. If in addition to being “ripe”, the burst of activity is not the result of an exogenous event, but is instead fueled by endogenous (word-of-mouth) growth, the bare response is renormalized giving the following time-dependence for the view count before and after the peak of activity: Aen−c (t) ∼ 1 . |t − tc |1−2θ (2.7) with tc the critical time at which the epidemic peaks. • Endogenous sub-critical. Here the response is largely driven by fluctuations, and no clean bursts of activity: Aen−sc (t) ∼ η(t) . (2.8) where η(t) is a noise process. If these results describe pretty well social epidemics, Crane et Sornette recall that less than 10% of videos display these patterns. Most of videos are driven by stochastic fluctuations, which are assumed to be noise. It might also be a less clean signal, or reflect more complicated evolving social networks [81]. In some cases, the nature of the task might also change the structure of the response. While social epidemics seem to spread according to these dynamics, buying books or viewing videos recommended by your friend(s) do not require much time and long-term commitment. These actions are also unique since people usually don’t buy (resp. watch) the same book (resp. movie) twice. Indeed, it would be interesting to find similar patterns for dynamics involving a social network of software developers. 2. Background Fig. 2.5: A schematic view of the four categories of collective dynamics: Endogenous-subcritical (Upper Left), Endogenous-critical (Upper Right), Exogenous-subcritical (Lower Left), and Exogenous-critical (Lower Right). The theory predicts the exponent of the power law describing the response function conditioned on the class of the disturbance (exogenous/endogenous) and the susceptibility of the network (critical/subcritical). Also shown schematically in the pie chart is the fraction of views contained in the main peak relative to the total number of views for each category. This is used as a simple basis for sorting the time series into three distinct groups for further analysis of the exponents (reproduced from Crane and Sornette, PNAS (2008) [78]). 21 2. Background 2.4 22 Internet Security & Cyber Risk The security has been an issue almost since the inception of the Internet (See box below for an overview). However, implementing security at the Internet Protocol level would have violated the “end-to-end” argument because security is not required by all applications. Thus, security should be only implemented at the application layer. Thus, security is heterogenous over the Internet, according to perceived needs for each application. In addition, security is conditioned by the reliability of source code and the way people behave with sofware. Milestones of Internet Threats Almost since its inception and public release, the security of the Internet has been found to be insufficient, and scary scenarii have been discussed up to simply considering complete collapse of the system. The track record is not reassuring. Early 1971: first virus (Creeper on Arpanet), 1988: Morris Worm infects 10 percent fo the Internet (60'000 machines at this time), 2000: ``ILOVEYOU'' Worm causes 5.5 billion dollars damage in 24 hours and over 50 million computers infected, 2007: Estonian Governmental information systems are subjected to an attack from obscur Russian activists, 2005 and 2007: massive power outages in Brazil found to be the result of a cyber attacks, June 2010: a worm (Stuxnet) is found to be designed to attack the control command (SCADA) of industrial systems and in particular the first Iranian nuclear plant that has been in operation since a couple of months. From an evolutionary perspective, Internet (in)security4 is an interesting view angle to observe the capacity of adaptation of Internet components – mainly software and users – in presence of threats, and the evolution of malicious attacks to adapting components. At the aggregate level, few research has been conducted on the real effects of Internet (in)security. The most significant report is based on surveys by the Computer Security Institute (CSI) and the Federal Bureau of Investigation (FBI) on a yearly basis [82]. Major computer security and antivirus companies report quarterly and yearly on cybercrime. In [83], Anderson reported on the costs for people who suffered identity thefts. In 2006, the reinsurance company Swiss Re reported that, among all possible risks, the largest corporations consider computer-based risk as the highest priority risk in all major countries by level of concern and as second in priority as an emerging risk [84, 85]. However, crosschecking these figures and putting them in perspective with insecurity mechanisms remains impossible. For that, economics of information security has recognized the importance of 4 Stefan Frei first coined the term Internet (in)security to stress that the default state of the Internet is insecure rather than secure. 2. Background 23 incentives respectively to protect infrastructures and to attack Internet systems. Efforts have been undertaken to understand at which conditions information systems might be subjected to misuse [86]. Based on that, some theoretical risk scenarii have been developed for risk management and insurance purposes [87, 88, 89, 90, 91]. However, these models generally fail calibration to empirical figures. Fig. 2.6: Lifecycle of a vulnerability defined by distinctive events: (i) creation, (ii) discovery, (iii) exploit, (iv) disclosure, (v) patch release (by the software editor), (vi) patch installation (by the user). This sequence creates three risk “time windows”: (i) after discovery and before disclosure, (ii) after disclosure and before patch release and (iii) after patch release and before patch installation. The exact sequence of events varies between vulnerabilities (reproduced from Frei et al. 2009 [94]). The study of vulnerabilities as a driver for insecurity has also deserved much attention. Over the years, the number of vulnerabilities found in software has exploded, and it is established that they are an important driver for attacks, because they open sudden insecurity gaps that usually need some time to be filled and create opportunities for attacks [92] . As a result, a black market for vulnerabilities and exploits has appeared [93, 94, 95]. But, due to the lack of available empirical data, it remains unclear how this black market really works. Interestingly, analyzing the legacy of vulnerabilities, Frei et al. showed some evidence that they are abundant, which means that security breaches have few chances to decrease in the future, unless economic incentives to perpetrate attacks are drastically reduced [96]. This provides a darker image of the situation than the one – already pessimistic – provided by former U.S. National Security Agency (NSA), Brian Snow who claimed the security would be achieved by drastically reducing the number of vulnerabilities [97], which appears somehow unrealistic today, considering the explosion of source code production. Figure 2.6 shows the typical lifecycle of a vulnerability with associated risks. In [92] and [98], Frei et al. showed that the three steps reducing risk – disclosure, patch release and patch installation – are fulfilled according to a fat-tailed distribution of waiting times, which means that if each step is usually completed relatively fast, waiting times can be arbitrarily long, thus creating huge opportunity for malicious exploitation. It can be speculated that the 2. Background 24 origin of these contingencies can be manifold: technical [59], human, or a mix of both factors [100]. However, it is now clear that vulnerabilities – being developed, traded, used by black hats and corrected by software editors – are at the cornerstone of Internet (in)security. As a consequence, there is a clearly established link between insecurity and software development and use. It confirms the code is law statement, and understanding mechanisms of threat of the Internet requires to adopt a transversal view that integrates economics (incentives respectively to attack and to raise protection barriers) as well as technical and human contingencies. 2.5 Monitoring Internet Evolution Although code is law on the Internet, it doesn’t predict what iniatives people may take within the degrees of freedom offered by the software corpus. The only way to measure the evolution of Internet activity in an exhaustive manner is to sample traffic at the Internet Protocol (IP) portability layer, because all upper layers can thoroughly be captured (see Figure 1.2). Traffic analysis has started in the late nineties and offers much potential for Internet measurement, but it’s still in its infancy. 2.5.1 Self-Similarity of Traffic Time Series The behavior of Internet traffic has been analyzed since the early days of the Internet, supported by packet traffic tools such as tcpdump [101]. Before 1995, the usual assumption in traffic characterization was that packet arrival and size5 distributions have a Poisson nature, i.e. the probability that a certain number of packets arrives in fixed non-overlapping time intervals follows a Poisson distribution. This corresponds to a memoryless random process with exponentially decaying autocorrelation. However, Internet traffic exhibits different behaviors. Actually, it has long memory [102] and self-similar structure [103]. Figure 2.7 shows three time series of the same traffic, where the second (resp. third) plot is a zoom-in of a portion of the first (resp. second) by one order of magnitude (x10). The three plots look very similar – with bursty behavior –, while in the case of memoryless processes, larger resolution plots would smooth-out the fluctuations. There are many possible reasons for self-similar traffic. The first refers to the method of Mandelbrot [105]. In very simple terms, long range dependence can be obtained by the construction of a renewal process (i.e. intertime arrival of events is Poisson), with heavytailed distribution of file sizes transferred [106]. A different scenario for self-similarity has been proposed by invoking the presence of a phase transition. In this case, the Internet is modeled as a network of hosts that generate traffic (computers) or forward packets (routers). By applying the rules of Internet routing (mainly hop-by-hop transmission, best effort and 5 Internet packet (or datagram) : basic transfer unit of data over the Internet. It consists of a header with routing information, and a data payload, which contains a “slice” of information. 2. Background 25 Fig. 2.7: Time series of TCP traffic between Digital Equipment Corporation and the rest of the world [104]. Each plot represents the number of packets recorded on a link. Although the time series is magnified by an order of magnitude between each plot, the three plots look similar. This is one of the many examples of scale-free traffic on the Internet. no discrimination between packets), one can show that when generated traffic increases slowly, the system experiences a sudden transition from a free phase (fluid traffic) to “busy” traffic with large delays in packet delivery [107, 108, 109, 110, 111]. However, this model has not found experimental support so far. A third explanation is a direct consequence of human individual and collective dynamics (see Section 2.3). In this case, traffic can no longer be seen as a renewal process as postulated in the first explanation. Altogether, it is reasonable to postulate that self-similarity and long range dependence are the result of the three mechanisms. 2.5.2 Network Anomaly Detection Beyond traffic regular structure classification [112, 113], many traffic anomalies are constantly found on the Internet. Therefore, tools have been developed to capture them. In particular, anomaly detection in large scale Internet networks has deserved much attention 2. Background 26 with various methods of pattern recognition [114, 115]. This research has two main motivations. The first is indeed security and operation continuity that call for better anticipation of large disruptive events at large scales. The second is more conceptual: traffic flowing on the Internet has evolved over time with the apparition of new applications. For instance, the development of video streaming has completely changed the nature of the traffic, as well as the amount of data exchanged. The nature of the traffic is also heterogeneous according to applications specifications and use by people. The consequences of the sudden emergence and the rapid adoption (e.g. by word of mouth) of a new technology or a new habit, changes significantly the nature of Internet traffic. For instance, development of peer-to-peer networks had consequences on bandwidth management and forced many Internet service providers to adapt their traffic management. In a famous talk [116], David Clark – one of the Internet founding fathers – explained that anticipating and addressing new kinds of needs is key for the future of the Internet. Nowadays, the Internet is no longer only a communication system: it incorporates more and more components of our societies: economic and social features (e.g. e-business, social networks, cyber emotions), geopolitical dimensions (e.g. cyber conflicts, safeguard of sovereignty). Even though, the Internet seems to be transparent from a users point of view, the infrastructure must constantly be adapted to these new functionalities and their impact on the backbone. Change has been so radical over the years, that some network engineers call for a “new Internet” that would be able to thoroughly address present and future socio-economic needs [117]. For that, Internet monitoring is important because it allows observing in-vivo evolution of the Internet by capturing all Internet layers (by opposition to the Web, peer-to-peer networks, email and social networks that are specific applications on the Internet). Chapter 3 Results The Internet is a complex adaptive system with characteristics that are the result of software execution (i.e. code is law), and the way people use and append it. To account for Internet evolutionary mechanisms, five studies are presented here, with focus on rationalisation of empirical facts, extracted from data sources that allow to capture key dynamics at work, around inter-related topics: software evolution, individual and collective human dynamics, and cyber risks. The following articles are part of the present dissertation and reproduced thereafter: 1. Maillart, T., Sornette, D., Spaeth, S. and von Krogh, G., Empirical tests of Zipf’s Law Mechanism in Open Source Linux Distribution, Physical Review Letters 101, 218701 (2008). 2. Maillart, T., Sornette, D. Frei, S. Duebendorfer, T. and Saichev, A. , Quantification of Deviations from Rationality with Heavy-tails in Human Dynamics, Physical Review E (2011). 3. Maillart, T. and Sornette, D., Epidemics and Cross-Excitation in Open Source Software Development, working paper (2011). 4. Maillart, T. and Sornette, D., Heavy-Tailed Distribution of Cyber Risks, Eur. Phys. J. B 75, 357-364 (2010). 5. Tellenbach, B., Burkhart, M., Maillart, T. and Sornette, D., Beyond Shannon: Characterizing Internet Traffic with Generalized Entropy Metrics, Lecture Notes in Computer Science : Passive and Active Network Measurement 5448, 239-248 (2009). 27 3. Results 30 3. Results 31 3. Results 32 3. Results 34 3. Results 35 3. Results 36 3. Results 37 3. Results 38 3. Results 39 3. Results 40 3. Results 41 3. Results 42 3. Results 43 3. Results 44 3. Results 45 3. Results 46 3. Results 47 3. Results 48 3. Results 49 3. Results 50 3. Results 51 3. Results 52 3. Results 54 3. Results 55 3. Results 56 3. Results 57 3. Results 58 3. Results 59 3. Results 60 3. Results 61 3. Results 62 3. Results 63 3. Results 64 3. Results 65 3. Results 66 3. Results 67 3. Results 68 3. Results 69 3. Results 70 3. Results 71 3. Results 72 3. Results 73 3. Results 76 3. Results 77 3. Results 78 3. Results 79 3. Results 80 3. Results 81 3. Results 82 3. Results 84 3. Results 85 3. Results 86 3. Results 87 3. Results 88 3. Results 89 3. Results 90 3. Results 91 3. Results 92 Chapter 4 Discussion & Perspectives The Internet is a complex adaptive system, in which technology and human dynamics are strongly interdependent. This cross-nurthering – with each individual being a potential actor of technological development – has never been so strong before, and largely contributes to the unpredictable evolution of the Internet. To account for this complexity, most significant stylized facts have been rationalized and backed by robust empirical validation. Given that each article includes its own discussion, this last part will elaborate series of links between these findings, and perspectives for the future of Internet research will be delineated. 93 4. Discussion & Perspectives 4.1 94 Main Contributions In this thesis, five contributions shed light on the mechanisms of Internet evolution, with focus on software, human actions and cyber risk. 1. Self Organization and Proportional Growth in Innovation Networks Source code is a special kind of knowledge that can also be executed as a software, hence as a product. In the case of open source software, it forms a complex network of components found to obey a multiplicative proportional growth stochastic process, which is constantly fed by an entry flow of new component, and with volatility greater than the deterministic part. These ingredients allow even biggest component to fail and disappear as a consequence of pure randomness. These results provide a zero-order and coarse grained mechanism for Schumpeterian “creative destruction” in networks of knowledge reuse [118]. 2. Deviation from Rationality in Human Dynamics Using a large dataset of human response to a proposed task (browser update) and time required to perform them, we propose an economic model of time use as nonstorable resource, and show that individuals can significantly deviate from rational time allocation. This further confirms and validate that human dynamics exhibit long memory processes and heavy-tailed distributions. These results have also important implications for cyber risk. 3. Collective Dynamics in Open Source Software Development Software development is a complex task involving often many programmers. In particular, open source software (OSS) development is a collective action [119], with developers committed in solving problems and achieving a common goal. It was found that activity of developers over time is auto- and cross-excitated, with dynamics comparable to social epidemics. 4. Heavy-Tailed Distribution of Cyber Risks While much Internet security research has focused on technical aspects, cyber risk can only be understood by quantifying damage. Using personal identities as a proxy for loss, we show that the distribution of cyber risk follows a power law with exponent less than 1. Therefore, the mean and the variance of the distribution are not defined and more extreme events can be expected over time. In addition, maximum damage scales with the size of targeted organizations. 5. Efficient Tool for Monitoring Large Scale Networks Thoroughly monitoring the Internet as a large scale network, including all transmissions at various application layers, is necessary to better forecast the evolution of the global network and therefore anticipate large changes that could harm the Internet. For that, a comprehensive and scalable tool is proposed to automatically detect and classify anomalies – cyber attacks in particular – occurring on large scale networks. New anomalous signatures shall reflect the evolution of the Internet and security landscapes. 4. Discussion & Perspectives 95 The evolution of the Internet is timed by individual and collective human actions, like writing (resp. reusing) software source code, correcting or exploiting vulnerabilities, improving source code, updating software. These actions create innovation, trigger weaknesses, but also the means to cope with them and thus make the Internet more robust, until components get obsolete and are replaced by new ones. It is tempting to compare code changes to mutations. Both in biology and technologies, there are many examples of innovation steps in which an organism (resp. technology) suffering from a change in its environment starts to adapt by increasing the rate of mutations in order to sustain its homeostasis1 . It is well known that organisms accelerate their mutation rate in reaction to changes of environmental conditions such as a sun radiation, radioactivity, pollution [120, 121]. Similarly, when the new version of an operating system is released, most of applications have to update to keep delivering the same service to keep their customers or better service to increase market shares. 4.2 Human Contributions and Dynamics of Innovation The above examples borrowed from biology and technology show the fragility of ecosystems, with potential cascading effects. If change in biology is thought to be the result of random mutations and selection process, technological changes are the result of the actions of rationaly bounded – yet thinking – humans. Moreover, individuals tend to work together and develop collective action in order to achieve common goals [119]. One of the fundamental ingredients for collective action to happen is social skills, which are thought to be the result of an evolutionary process, as proposed by Dunbar in the social brain hypothesis [122]. Collective action and social skills must also be put in perspective with emergence of cooperation in many systems [123]. Indeed, research marrying game theoretical cooperation models and social networks has gained some attention recently [124, 125]. 4.2.1 Multivariate Hawkes Processes In study 3, we could establish evidence of cross-excitation along with triggering with long memory between activity events for open source software, which is an example of collective action. Developers have epidemic-like dynamics of contribution, which could be tested by a mean-field solution of a self-excited conditional Poisson process. However, this model doesn’t take into account for cross-excitation between agents. Indeed, in study 3, evidence of cross-excitation has been found without applying this model. Therefore, it appears that OSS developers are multi-excited rather than self-excited, with the actions of one developer (resp. a group of developers) influencing other developers. The Multivariate Hawkes Process generalizes (2.3) into the following form for the conditional Poisson intensity for an event of type j among a set of m possible types [126]: 1 Homeostasis: property of a system, either open or closed, that regulates its internal environment and tends to maintain a stable, constant condition. Multiple dynamic equilibrium adjustment and regulation mechanisms make homeostasis possible (Wikipedia). 4. Discussion & Perspectives λj (t|Ht ) = λ0j (t) 96 + m X k=1 Λkj Z (−∞,t)×R hj (t − s) gk (x) Nk (ds × dx) , (4.1) where Ht denotes the whole past history, λ0j is the rate of spontaneous (exogenous) events of type j, sources of immigrants of type j, Λkj is the (k, j)’s element of the matrix of coupling between the different types which quantifies the ability of a type k-event to trigger a type jevent. Specifically, the value of an element Λjk is just the average number of first-generation events of type j triggered by an event of type k. The memory kernel hj (t − s) gives the probability that an event of type k that occurred at time s < t will trigger an event of type j at time t. The function hj (t − s) is nothing but the distribution of waiting times (here between the impulse of event k which impacted the system at time s, the system taking a certain time t − s to react with an event of type j, this time being a random variable distributed according to the function hj (t − s). The fertility (or productivity) law gk (x) of events of type k with mark x quantifies the total average number of first-generation events of anyR type triggered by an event of type k. Here, the following standard notation has been P used (−∞,t)×R f (t, x)N (ds × dx) := k|tk <t f (ti , xi ) [127]. The matrix Λkj embodies both the topology of the network of interactions between different types, and the coupling strength between elements. In particular, Λkj includes the information contained on the adjacency matrix of the underlying network. Analogous to the condition n < 1 (subcritical regime) for the stability and stationarity of the monovariate Hawkes process, the condition for the existence and stationarity of the process defined by (3.1) is that the spectral radius of the matrix Λkj be less than 1. Recall that the spectral radius of a matrix is nothing but its largest eigenvalue. The multivariate Hawkes process generalization opens large perspectives to model and understand dynamics in social systems, influence between people (resp. groups of people). It has the potential to cope with one major issue faced by network experts who are realizing that the naive view of a static social network is limited and often inadequate to describe fast evolving systems, preventing proper model validation. 4.2.2 Micro Mechanisms of Cooperation in Empirical Data According to game theory, two or more agents can interact and play various games according to some pay-offs functions. Among them, the prisoner’s dilemma is the most famous . In case it is repeated several times, the game is called iterative prisoner’s dilemma. This offers room to program agents to play various strategies together. This was the aim of the famous tournament organized by Axelrod in the early eighties with many teams proposing their agent endowed with a set of rules defining the strategy to play with other agents [123]. Many successful strategies were made to be nice and to cooperate if the other agent cooperates, and retaliate if she defects in the game. Game theoretical approaches have been successful to explain cooperation, mainly in computer simulations and lab experiments. However, cooperation has not been documented directly from field experiments. A major reason is certainly that in real life, the game is completely asynchronous with great memory 4. Discussion & Perspectives 97 effects and individuals usually don’t make a judgement on their counterpart with only one defection, but rather on an array of negative signals over time. To account for these memory effects, it is probably necessary to relax the model formulation. In this context, it is worth noting the conceptual proximity between strategies for playing the iterative prisoner’s dilemma and the multi-variate conditional excitation process presented above. Actually, the cross-excitation parameter Λkj is the sensitivity of j on k’s actions, hence the propension of agent k to positively (resp. negatively) react to the activity j as a succession of contributions over time, in the context of collective action. 4.2.3 Code Mutations to Measure Probability of Innovation Occurrence As described above, if we assume that source code evolution is the result of random “try and fail” mutations locally and at short time scales, then mutation rate (number of changes per unit of time) is determinant to measure the resilience of an innovation network and gives also a benchmark to quantify the rate of resource consumption for the maintenance and adaption of all components. The more adaptation is needed, the more mutations, the more developing resources, the more time required by a team of developers. The possible interactions are twofolds: either (i) a certain rate of mutation of one component imposes some stress on the rest of the network or (ii) a component reacts to this stress in order to adapt to a new situation. Therefore, the intensity of change of one component in the open source network and consequences on neighbouring nodes can be measured: reactions of all others components can be investigated by following the paths of dependencies. Again, to uncover causal dependences, it is necessary to account for human timing. In this setting, a significant presence of loops is expected due to numerous cross dependencies existing between components. To account for them, a systematic measure of the coupling between the components should be made, with a special care to determine whether coupling between component changes is dampening (negative feedback loops) or rather reinforcing (positive feedback loops). This approach is valuable to predict in which part of the code, developers are going to intervene in order to maintain the homeostasis of the whole system. This view angle is closely related and complementary to measuring the pure dynamics of activity of developers. Indeed, it incorporates a network dimension by asking the question: “In which part of the code, developers tend to code together and at the same time?” The self- (resp. mutual- ) excited conditional Poisson process can indeed be extended to several dimensions, not only including memory in time but also in space. The 2-dimension selfexcited conditional Poisson process is formulated with a bivariate memory kernel φ(t−ti , r− ri ) with r the position in the considered space. In the case of open source software, r would be a file or a module in the source code of a project. 4. Discussion & Perspectives 4.3 98 Beyond the Zipf’s Law and Proportional Growth Proportional growth stochastic process is a “zero-order” mechanism in the sense that it can capture the general behavior of many self-organized complex systems, in particular innovation networks. However, a direct consequence of the self-organization is the presence of correlations between innovations, each of them being challenged by the rewiring of several other components with some various consequences. In an ecosystem, the extinction of a species from a food web forces several other species to adapt and find other ways to sustain themselves. Obviously, innovation networks are not only correlated but also prone to important cascading effects that can affect directly or indirectly connectivity between components. Therefore, gaining insight in rewiring cascades requires to establish robust statistical causal relations between all events at various micro- and mesoscales. Additionally, a special attention should be devoted to the context (local structure of the network) in which new innovations appear to fill a niche. For that, three research directions are proposed according to a “zoom-in” strategy in order to uncover proportional growth sub-mechanisms. 4.3.1 Deviations from the Zipf’s Law While many complex systems are thought to be the result of proportional growth, some exhibit significant deviation from the Zipf’s law, which is a special power law p(x) ∼ 1/x1+µ with exponent µ = 1. Saichev et al. [128] showed that for this special exponent to appear, the following balance condition must hold, r − h = d + c0 , (4.2) with r the growth rate of each component, h the hazard rate for death or dismissal, d the growth rate of new entrants ans c0 the growth rate of sizes of newborn entities. Thus, the proportional growth must be balanced by the entry flow (creation) and the hazard rate (destruction). Furthermore, Malevergne et al. argue that the Zipf’s law reflects an economic optimum, therefore a healthy and mature sustainable ecosystem [129]. However, deviations from µ = 1 can be found for ecosystems that haven’t reached maturity yet. Recently, using a dataset from an online collaboration platform2 and counting the number of active users, Zhang and Sornette showed the example of a system with µ = 0.7 ± 0.1 < 1, which is the signature of a ecosystem where the power of the few dominate, while not enough new projects are created [130]. Therefore, deviation from the Zipf’s law can tell a lot on the “health” of an ecosystem. In terms of Internet innovation, it resonates with the claim that the violation “end-to-end” argument (c.f. Section 1.1) with more control by large broadband providers that might prevent innovation by start-up firms (c.f. Section 1.2). Assuming that we could precisely measure the size of Internet firms, looking at the distribution of their turnover (resp. number of patents awarded), and the contribution of each size percentile 2 Amazee is a Zurich based Internet company, which offers a platform for creating and maintaining collaboration projects. 4. Discussion & Perspectives 99 of companies to the whole ICT economy, would be useful to assert or to refute the claim. In the former case, one would find µ < 1 with possible existence of “dragon-kings” [131] and in the latter case the exponent would be superior or equal to 1. Therefore, the Zipf’s law and deviation from it can tell a lot on the state of an ecosystem. This measurement tool needs further empirical validation, but would be unvaluable to detect clusters of new creations versus reuse in innovation networks. 4.3.2 Coexistence of Multiple Proportional Growth Regimes At mesoscopic scales, behaviors can differ and exhibit different growth rates, with clusters that presumably grow faster than others. The creative destruction may also be measured at these scales. For instance, in the context of recovery of Eastern economies after communism, Challet et al. [132] described the so-called “J-shape” of the economy, which reads h i W (t) = W (t0 ) f eλ+ (t−t0 ) + (1 − f )eλ− (t−t0 ) , (4.3) where W (t0 ) is the initial GDP, f is the fraction of the economy that grows at rate λ+ , while the rest of the economy (1 − f ) deflates at rate λ− . This model rationalizes the repeating pattern of economies in reconversion. This model can be generalized with an arbitrary number of fractions of the economy growing (resp. deflating) at different rates, W = n X k=1 W (tk ) · fk eλk , with X fk = 1, (4.4) where k denotes all regimes occuring in all segments of the economy in the considered period, tk is the time of change of regime for each k. However, a main difference is the asynchrony between shocks. Indeed, in the context of a shock such as a change of economic regime from socialism to capitalism, the growing and the declining components start at the same time. However, in many situations it might appear that decline is not necessarily synchronized with emergence of a new economic sector. 4. Discussion & Perspectives 4.4 100 Cyber Risk Cyber risk is a major emerging threat against all socio-economic activities occurring on the Internet. Nowadays, the global network has become the spinal cord of modern societies that heavily rely on it. However, the Internet has proven to be largely exposed to malicious attacks, which take advantage of unreliable software and somehow the naivety of people. In spite of this, exposure to cyber risk has remained hardly measurable because of the complexity of the task, but not only. Over the years, the Internet has constantly exhibited serious weaknesses, which have prevented efficient security of operations and of information against malicious attacks. In the meantime, the Internet has become the backbone of modern economies, allowing realtime communication and transactions, social networking and ubiquitous computing, and many other invaluable tools. Unfortunately, there are tangible indications that the Internet will become a very significant conflict area. Presently, and arguably much more in the near future, criminals and military powers are building on innumerable flaws in software to gain power on all possible types of infrastructures, such as electric networks, telecommunication, GPS, sensitive energy producing plants and military complexes. In addition, politics and groups of interests are increasingly using the capabilities of social networks to influence populations. The world has recently witnessed unprecedented acceleration of cyber threat intensity. In summer 2010, a malware was found in the control command of the newly built Iranian nuclear plant Bushehr I. This worm of a new kind is designed to compromise SCADA (supervisory control and data acquisition) systems controlling a wide array of critical infrastructures including water treatment and distribution, oil and gas pipeline, electrical power transmission and distribution. In the case of Bushehr I, it was established that this malware had the potential to send erroneous commands and provoke nothing less than the destruction of the nuclear plant. There are mainly two entry points for malicious operations on the Internet: (i) software vulnerabilities which allow to take control of computer infrastructures at various levels and (ii) the capacity offered by the World Wide Web to spread information. The latter is concretely equivalent to reputation risk, regardless whether it is true or defaming, laudatory or critical. All these operations are indeed closely tied to the evolution of the Internet, but more precisely to mechanisms of software update by developers and users. Because of their evolutionary capabilities, monitoring cyber risks remains extremely difficult. Moreover, attacks can occur at any layer of the Internet. Therefore, anomaly detection must be implemented at the Internet Protocol layer in order to have a chance to capture all behaviors at higher levels. 4.4.1 The Innovative Nature of Cyber Risk In a recent paper on Information War and Security Policies for Switzerland , it was reported that cyber wars would prosper on innovation [133]. In substance, threats are also in a 4. Discussion & Perspectives 101 creative destruction process: some attacks that used to be relevant yesterday have been replaced by new ones, according to new functionalities (resp. new vulnerabilities) offered by innovation. Therefore, it should be anticipated that cyber risks landscapes will change quite fast. However, the link between damage suffered and vulnerability is not really established yet. This gap actually prevents the development of actuarial models and as a result of insurance. Moreover, the economic and innovative dimensions play a fundamental role for Internet (in)security. We argue than Internet security requires a shift of paradigm from a pure and somehow naive - technical point of view (“design a secure system”), toward a broad view that would encompass contribution of humans – and their natural economic incentives – in the development of threats. In that sense, study 2 demonstrates the importance of user behaviors. Also, studies 1 and 3 are clearly related to how source code maintenance constrains software reliability. Fig. 4.1: Schematic representation of a cyber risk landscape. Right to left: (i) New vulnerabilities happen randomly and concern a given amount of computers; over time the cumulative number of hosts hit by a vulnerability increases. (ii) To counterbalance vulnerabilities patches are released; however, having all users install the patch is long memory process as shown in study 2. (iii) The net risk landscape is spiky but nevertheless, the number of vulnerable hosts increases over time again as a result of the long memory process. 4.4.2 Economics of Cyber Crime To better forecast cyber crime and predict on which targets attacks might focus, using the economic framework is insightful. Assuming that cyber criminals are rational agents and the market cannot be completely addressed, they would invest in “low hanging fruits” rather than in attacks with marginal return. Therefore, one can expect that cybercrime community, at the aggregate level, measures risk versus potential return when engaging in any action, i.e. perform a cost-benefit analysis. Unfortunately, vulnerability landscapes evolve very fast and the risk at the same time. Indeed, study 2 established the long memory process of updating. Figure 3.1 shows a representation of such landscape, with spikes of vulnerabilities. In this context, a natural question arises and calls for further research: how 4. Discussion & Perspectives 102 do cyber criminals exploits this vulnerability spikes in order to maximize compromission of computers? Rank Ordering (unnormalized CCDF) Figure 3.2 shows the complementary cumulative distribution function (ccdf) of vulnerabilities per software editor per year presented in double logarithmic scale. Obviously, the ccdf is heavy-tailed, with a few companies (e.g. Microsoft, Oracle) accounting for the majority of vulnerabilities discovered. More precisely, the ccdf is a power law, with exponents between 1.0 and 1.25. There are two possible explanations for this distribution. On the one hand, larger software might exhibit more vulnerabilities because costs of maintenance increase with software complexity. On the other hand, the number of vulnerabilities may just reflect market shares of each software editor, which is very probably also heavy-tailed since the Zipf’s law applies to the size distribution of firms, or market shares. It can also be a combination of both effects, but the latter is more plausible, since vulnerabilities are abundant in code [96]. Again here, deviation from the Zipf’s law with exponent equal to 1 tells us about the health of the vulnerability ecosystem [129]. Here, the exponent is equal to 1 around year 2000 and slowly increases over time, which shows that more software is concerned by less vulnerabilities. slope ≈ -1.25 slope ≈ -1.0 Vulnerabilities per software editor [vulns/year] Fig. 4.2: Distribution (rank ordering) of vulnerabilities per software editor per year. The graph displays a straight line in double logarithmic scale, which is the signature of a power law. The slope slowly decreasing from −1.0 ± 0.1 to −1.25 ± 0.1 over time (with Stefan Frei). 4. Discussion & Perspectives 4.4.3 103 Vulnerabilities versus Damage In study 4, we showed that damage scales with the size of organisations and some hypothesis have been proposed to explain why large organisations are most exposed. However, the causal link between vulnerable software and real damage (e.g. financial loss, damage to reputation) has not been formally established. Qualitatively, this link can be established in many but not all cases. Among the best examples are botnets3 : about 9% of personal computers (PC) in operation and connected to the Internet, are infected by malware and one third of these computers seem to be part of a botnet [134]. But real damage incurred by the owners of compromised PC remains unclear so far. 3 botnet : network of computers –often compromised by malicious software – that run autonomously to perform one or several distributed tasks. Botnets are mainly used to send spam, conduct distributed of denial service (DDoS) attacks, crack passwords. They are also used for stealing identities on compromised hosts. Concluding Remarks The Internet is probably the most complex man-made adaptive system. It is the result of innumerous contributions by individuals and organisations, with extraordinary heterogeneity and self-organization, with cooperation and intense competition. Because no central government had – or did take – the power to decide what is good for everyone on the Internet, motivated contributors could freely invent and compete to make the best software. For that, the community of hackers and their thirst for exploring new horizons has led to two major consequences for the Internet: on the one hand, it has triggered unprecedented bottom-up code production, but also their capacity to “hack” into supposedly secured systems has early on been a warning for possible cyber attacks that we will experience more in the future, as the Internet becomes ubiquitous and critical to our societies. If no major disruption occurs, the Internet will evolve with more software produced along with insecurity. The key issue will be to find an acceptable balance between innovation and security. On the one hand, the concern is certainly not having too much innovation, but rather habing too much security. Western societies – at least their politicians in their speech – are obsessed with security, and while the Internet is a place of relative freedom and free speech for people, technology could certainly be used to prevent or at least to reduce free expression and at the same time innovation, which is the result of intensive knowledge reuse and ad-hoc social networks. Not only China has raised the level of control on the Internet in the past decade, many Western countries also did. On the other hand, the dramatic lack of security is a door open to frightening cyber conflicts of a new kind and even challenge of dominant states by a handful of individuals. Therefore, understanding mechanisms of Internet evolution is crucial to sustain the best conditions for enjoying future innovation, and allow people to live in peace with no 105 Concluding Remarks 106 limitation on the advantages gained from an open Internet. In this thesis, five mechanisms have been explored: (i) knowledge reuse, which is ubiquitous on the Internet, (ii) long memory processes in human dynamics, (iii) collective action as a triggering process, (iv) the catastrophical nature of cyber risk and its implications, and (v) how to detect and characterize Internet anomalies. Although not everything has been understood yet, these results provide testable and tested mechanisms at work. Furthermore, they can all rather well be rationalized from an economic perspective. On the one hand, they emphasize the importance of human contribution and work organisation based on technical mutualization (knowledge reuse) and human cooperation (collective action in open source software) to the evolution of the Internet. On the other hand, enormous technical and human contingencies make the Internet unperfect, slightly insecure, forcing small and big Internet components to face their weaknesses and subsequent danger, and therefore, adapt themselves in order to survive. Bibliography Introduction [1] Internet Systems Consortium, http://www.isc.org/solutions/survey. [2] Internet World Stats, http://www.internetworldstats.com/stats.htm. [3] Black Duck Koders, http://www.koders.com/. [4] Saltzer, J., Reed, D. and Clark, D. (1984), End-To-End Arguments in System Design, ACM Transactions on Computer Systems, 2, 277-288. [5] van Schewick, B. (2010) The Architecture of Innovation, The MIT Press. [6] Google App Inventor, http://appinventor.googlelabs.com/about/ [7] Zittrain, J. (2008), The Future of the Internet–And How to Stop It?, Yale University Press. [8] Levy, S. (2001), Hackers: Heroes of the Computer Revolution, Penguin Books, London. [9] Schneier, B. (2001), Full Disclosure, Crypto-Gram Newsletter, (http://www.schneier. com/crypto-gram-0111.html). [10] http://www.ohlo.net. [11] Antorini, Y. M. (2007), Brand community innovation - An intrinsic case study of the adult fans of LEGO community, Ph.D. Thesis Copenhagen, Copenhagen Business School. [12] Spaeth, S., Stuermer, M., and von Krogh, G. F. (2010), Enabling Knowledge Creation through Outsiders: Towards a Push Model of Open Innovation, International Journal of Technology Management 52(3/4), 411-431. [13] Lessig, L., (2006), Code: And Other Laws of Cyberspace, Version 2.0., Basic Books. [14] Reidenberg, J. (1998), Lex Informatica: The Formulation of Information Policy Rules Through Technology, Texas Law Review 76, 553. 107 Bibliography 108 Background [15] Murray, M., Claffy, K.C. (2001), Measuring the Immeasurable: Global Internet Measurement Infrastructure, Workshop on Passive and Active Measurements. [16] Zegura, E.W., Calvert, K.L., Donahoo, M.J. (1997), A Quantitative Comparison of Graph-Based Models for Internet Topology, IEEE ACM Transactions on Networking 5, 6. [17] Pastor-Satorras, R., Vespignani, A. (2004), Evolution and Structure of the Internet, Cambridge University Press. [18] Watts, D.J. and Strogatz, S.H. (1998), Collective Dynamics of “small-world” networks, Nature 393, 440-442. [19] Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999), On power-law relationship of the Internet topology, Comput. Commun. Rev.29, 251-263. [20] Erd¨ös, P. and Renyi (1959), On random graphs, Publicationes Mathematicae 6, 290297. [21] Barabási, A.-L., and Albert, R. (1999), Emergence of scaling in random networks, Science 286, 509-512. [22] Simon, H. A. (1955), On a class of skew distribution functions, Biometrika 52, 425-440. [23] Bak, P., Sneppen, K. (1993), Punctuated Equilibrium and Criticality in a Simple Model of Evolution, Phys. Rev. Letters 71, 24. [24] Carlson, J.M. and Doyle, J. (2000), Highly Optimized Tolerance: Robustness and Design in Complex Systems, Physical Review Letters 84, 2529-2532. [25] Sornette, D. (2004), Critical Phenomena in Natural Sciences 2nd ed., Springer Series in Synergetics, Heidelberg. [26] Albert, R., Jeong, H. and Barabási, A.-L. (2000), Error and attack tolerance of complex networks, Nature 406, 378-381. [27] Broido, A. and Claffy, K. (2002), Topological resilience in IP and AS graphs, http: //caida.org/analysis/topology/resilience/index.xml. [28] Doyle, J. et al. (2005), The “robust yet fragile” nature of the Internet, Proceedings of the National Academy of Sciences41, 14497-14502. [29] Stauffer, D. and Aharony, A. (1994), Introduction to Percolation Theory 2nd edn, Taylor & Francis, London. [30] Albert R., Jeong, H., Barábasi, A.-L. (1999), Internet - Diameter of the World-Wide Web. Nature401, 130-131. Bibliography 109 [31] Huberman, B.A., Adamic, L.A. (1999), Growth dynamics of the World-Wide Web, Nature 401, 130-131. [32] Bianconi, G. and Barábasi, A.-L. (2001), Competition and Multiscaling in Evolving Networks, Europhys. Lett. 54, 436-442. [33] Page, L. and Brin, S. and Motwani, R. and Winograd, T. (1999) The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. [34] Huberman, B. A., Pirolli, P., Pitkow, J. & Lukose, R. M. (1998), Strong regularities in world wide web surfing, Science 280, 95- 97. [35] Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W. (2009) Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, Internet Mathematics 6, 29-123. [36] Newman M.E.J., Forrest, S. and Balthrop, J. (2002), Email networks and the spread of computer viruses, Phys. Rev. E. 66, 035101. [37] Zou, C.C. and Towsley, D. and Gong, W. (2005), Email worm modeling and defense, Proc. 13th IEEE Conf. on Computer Communications and Networks, 2004, 409–414. [38] Guimerá, R., Danon, L., Diaz-Guilera, A., Girault, F. and Arenas, A. (2003), Selfsimilar community structure in a network of human interactions, Phys. Rev. E 68, 6. [39] Eckmann, J.-P., E. Moses and D. Sergi (2010) Entropy of dialogues creates coherent structures in e-mail traffic, Proc. Nat. Acad. Sci. USA 101(40), 14333-14337. [40] Asur, S., Huberman, B. (2010), Predicting the Future With Social Media, http:// arxiv.org/1003.5699. [41] Saroui, S., Gummadi, P.K., and Gribble, S.D. (2002), A Measurement Study of Peerto-Peer File Sharing Systems, Proc. of Multimedia Computing and Networking. [42] Ripeanu, M., Foster, L. and Iamnitchi, A. (2002), Mapping the Gnutella Network: Properties of Large-scale Peer-to-Peer Systems and Implications for System Design, IEEE Internet Computing Journal 6, 50-57. [43] Smith, R.D. (2002), Instant Messaging as a Scale-Free Network, http://arxiv.org/ abs/condmat/0206378. [44] Leskovec, J., Horwitz, E (2008), Planetary-Scale Views on a Large Instant-Messaging Network, WWW’08. [45] Castronova, E. (2005), Synthetic Worlds: The Business and Culture of Online Games, Univ of Chicago Press, Chicago. [46] Castronova, E. (2006) On the research value of large games, Games Cult 1,163–186. [47] Bainbridge, W.S. (2007), The scientific research potential of virtual worlds, Science 317,472–476. Bibliography 110 [48] Szell, M. and Thurner S. (2010), Measuring social dynamics in a massive multiplayer online game, Soc. Netw.. [49] Szell, M., Lambiotte, R. and Thurner, S. (2010), Multirelational organization of largescale social networks in an online world, Proc. Nat. Acad. Sci. USA 107, 13636-13641. [50] Newman, M.E.J. (2003), The Structure and Function of Complex Networks, SIAM Review 45(2), 167-256. [51] Crowston, K., Howison, J. (2005), The social structure of open source software development, First Monday. [52] Spaeth, S. (2005), Coordination in Open Source Projects, HSG Dissertation no. 3110, University of Saint Gallen. [53] Lerner, J., Tirole J. (2001), The open source movement: Key research questions, European Economic Review 45, 819-826. [54] Simon, H. The Architecture of Complexity, Proc. Am. Phil. Society 106, 467-482, (1962) [55] Hartwell, L.H. et al., From molecular to modular cell biology, Nature, 402, (1999) [56] Baldwin, C.Y., Clark, K.B., (1997), Managing in an age of modularity. Harvard Business Review 75(5), 84–93. [57] Baldwin, C.Y., Clark, K.B., (1999), Design Rules: The Power of Modularity (Volume 1), MIT Press, Cambridge. [58] MacCormack, A. and Rusnak, J. and Baldwin, C.Y., (2006) Exploring the structure of complex software designs: An empirical study of open source and proprietary code, Management Science 52(7), 1015. [59] Challet, D. and A. Lombardoni (2004), Bug propagation and debugging in asymmetric software structures Phys. Rev. E 70, 046109. [60] Majchrzak, A. et al., (2004) Knowledge reuse for innovation, Management Science 50, 174-188. [61] Grant, R. (1996), Prospering in Dynamically-competitive Environments: Organizational Capability as Knowledge Integration, Organization Science 7(4). [62] Haefliger, S., von Krogh, G., Spaeth, S. (2008), Code Reuse in Open Source Software, Management Science 54(1), 180-193. [63] Tracz, W. (1995), Confession of a Used-Program Salesman: Lessons Learned, ACM SIGSOFT Software Engineering Notes 20, 11-13. [64] Lynex, A. and Layzell, P. (1998), Organisational considerations for software reuse, Annals of Software Engineering 5, 105-124. Bibliography 111 [65] von Krogh, G., Speath, S. and Haefliger, S. (2005), Knowledge reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects, Proc. 38th Hawaii Int. Conf. System Sciences (HICSS’05). [66] Spaeth, S., Stuermer, M., Haefliger, S. and von Krogh, G. (2007), Sampling in Open Source Software Development: The Case for Using the Debian GNU/Linux Distribution, Proceedings of the 40th Annual Hawaii International Conference on System Science (HICSS’O7). [67] Baldwin, C.Y., Clark, K.B. (2007), The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model?, Management Science 52, 1116-1127. [68] Vazquez, A., Oliveira, J. G. ,Dezso, Z. , Goh, K. I., Kondor, I. and Barabási, A.-L. (2006) Modeling bursts and heavy tails in human dynamics, Phys. Rev. E 73, 036127. [69] Oliveira, J.G. and Barabási, A.-L. (2005) Darwin and Einstein correspondence patterns, Nature 437, 1251. [70] Cobham, A., (1954), Priority assignment in waiting line problems, Operations Research 2(1), 70-76. [71] Barabási, A.-L. (2005), The origin of bursts and heavy tails in human dynamics, Nature, 435, 207. [72] Grinstein, G. and Linsker, R. (2006), Biased diffusion and universality in model queues. Phys. Rev. Lett. 97, 130201. [73] Grinstein, G. and Linsker, R. (2008), Power-Law and Exponential Tails in a Stochastic, Priority-Based, Model Queue, Phys. Rev. E 77, 012101. [74] Saichev, A. and Sornette, D., (2009) Effects of Diversity and Procrastination in Priority Queuing Theory: the Different Power Law Regimes, Phys. Rev. E 81, 016108. [75] Helmstetter, A., Sornette, D (2002), Subcritical and supercritical regimes in epidemic models earthquake aftershocks, J. Geophys. Res. 107 (B10), 2237, [76] Sornette, D., Helmstetter, A. (2003), Endogenous versus exogenous shocks in systems with memory, Physica A 318, 577-591. [77] Sornette, D., Deschatres, F., Gilbert, T. and Ageon, Y. (2004), Endogenous Versus Exogenous Shocks in Complex Networks: an Empirical Test Using Book Sale Ranking, Phys. Rev. Lett. 93, 228701. [78] Crane, R. and Sornette, D. (2008), Robust dynamic classes revealed by measuring the response function of a social system, Proc. Nat. Acad. Sci. USA 105(41), 15649-15653. [79] Crane, R., Schweitzer, F. and Sornette, D., (2010), New Power Law Signature of Media Exposure in Human Response Waiting Time Distributions, Phys. Rev. E 80, 056101. Bibliography 112 [80] Hawkes, A.G., Oakes, D. (1974) A cluster representation of a self-exciting process. J. Appl. Prob 11, 493-503. [81] Zhenyuan Zhao, J. P. Calderón, Chen Xu, Guannan Zhao, Dan Fenn, Didier Sornette, Riley Crane, Pak Ming Hui, and Neil F. Johnson (2010), Phys. Rev. E 81, 056107. [82] CSI/FBI Computer Crime and Security Survey, http://gocsi.com/survey. (yearly edition). [83] Anderson, K., Durbin, E. and Salinger, M. (2008), Identity Theft Journal of Economic Perspectives 2,171-192. [84] Swiss Re (2006), Swiss Re Corporate Survey 2006 Report. [85] Swiss Re (2007), Natural catastrophes and man-made disasters 2006, Sigma Report No 2/2007. [86] Anderson, R. and Moore, T. (2006), The Economics of Information Security, Science 314, 610-613. [87] Majuca, R.P., Yurcik, W., Kesan, J. (2006), The Evolution of Cyberinsurance, http: //arxiv.org/abs/cs.CR/0601020. [88] Böhme, R., Kataria, G. (2006), Models and Measures for Correlation In CyberInsurance, 5th Workshop on the Economics of Information. [89] Mukhopadhyay, A. et al. (2006), e-Risk Management with Insurance : A framework using Copula aided Bayesian Belief Networks, Proc. HICSS’2006. [90] Mukhopadhyay, A. et al. (2007), Insuring big losses due to security breaches through Insurance: A business model Proc. HICSS’2007. [91] Herath, H., Herath,T. (2007), Cyber-Insurance: Copula Pricing Framework and Implications for Risk Management, 6th Workshop on the Economics of Information. [92] Frei, S., May, M., Fiedler, U., Plattner, B. (2006), Large Scale Vulnerability Analysis,ACM SIGCOMM 2006 Workshop. [93] Radianti, J., Gonzalez, J. (2007), A Preliminary Model of the Vulnerability Black Market, 25th Int. System Dynamics Conference, Boston. [94] Frei, S., Schatzmann, D.,Plattner, B., Trammel, B. (2009) Modelling the Security Ecosystem - The Dynamics of (In)Security, Workshop on the Economics of Information Security (WEIS). [95] Frei, S., (2009)Security Econometrics - The Dynamics of (In)Security, DISS. ETH. NO. 18197. [96] Frei, S. et al (2010), Software Vulnerabilities are Abundant, submitted. [97] Snow. B., (2005) We need assurance!. Computer Security Applications Conference. Bibliography 113 [98] Frei S, Duebendorfer T, Plattner B. (2009), Firefox (In)Security Update Dynamics Exposed, ACM SIGCOMM Computer Communication Review. [99] Challet, D. , Solomon, S., and Yaari, G. (2009). The Universal Shape of Economic Recession and Recovery after a Shock. Economics: The Open-Access, Open-Assessment E-Journal 3, 36. [100] Schneier, B. (2008), The Psychology of Security, Proceedings of the Cryptology in Africa. [101] Jacobson, V., Leres, C., McCanne, S. (1989), tcpdump, Lawrence Berkeley Laboratory, Berkeley, CA. [102] Paxson, V. and Floyd, S. (1995), Wide Area Traffix: The Failure of Poisson Modeling, IEEE/ACM Transaction on Networking 3, 601-615. [103] Willinger, W., Paxson, V., Taqqu, M.S. (1998), Self-similarity and Heavy Tails: Structural Modeling of Network Traffic, Statistical Techniques and Applications. [104] Internet Traffic Archive http://ita.ee.lbl.gov/html/traces.html (1995). [105] Mandelbrot, B.B. (1969), Long run linearity, locally Gaussian Processes, H-spectra and infinite variances, Intern. Econom. Rev. 10, 82-113. [106] Park, K., Kim, G. and Crovella, M. (1996), On the relationship between file sizes, transport protocols, and self-similar network traffic, Proc. IEEE International Conference on Network Protocols, 171-180. [107] Ohira, T. and Sawatari, R. (1998), Phase Transition in a Computer Network Traffic Model, Phys. Rev. E 58, 193-195. [108] Takayasu, M., Fukuda, K., and Takayasu, H. (1999), Application of Statistical Physics to the Internet Traffics, Physica A 274, 140-148. [109] Fukuda, K., Takayasu, H., Takayasu, M. (2000), Origin of Critical Behavior in Ethernet Traffic, Physica A 287, 289-301. [110] Solé, R.V. and Valverde, S. (2001), Information Transfer and Phase Transitions in a Model of Internet Traffic, Physica A 289, 595-605. [111] Valverde, S. and Solé, R.V.(2002), Self-organized Critical Traffic in Parrallel Computer Networks, Physica A 312, 636-648. [112] Thompson, K., Miller, G.J., Wilder, R. (1997), Wide-Area Internet Traffic Patterns and Characteristics, IEEE Network. [113] Moore, A.W., Zuev, D. (2005), Internet Traffic Classification using Bayesian Analysis Techniques, Proc. ACM SIGMETRICS’05. [114] Barford, P., Kline, J., Plonka, D., Ron, A. (2002), A Signal Analysis of Network Traffic Anomalies, Proc. 2nd ACM SIGCOMM Workshop on Internet Measurement. Bibliography 114 [115] Lakhina, A., Crovella, M., Diot, C. (2004) Diagnosing Network-Wide Traffic Anomalies, ACM SIGCOMM Computer Communication Review, 34. [116] Clark, D. (2008) The Internets we did not build, Talk at IPAM (UCLA), available at http://www.ischool.berkeley.edu/newsandevents/events/sl20090304. [117] Clean Slate, Stanford University, cleanslate.php. http://cleanslate.stanford.edu/about_ Discussion & Perspectives [118] Schumpeter, J. (1934), Theory of Economic Development, Harvard University Press, Cambridge. [119] Ostrom, E. (2007) Collective action and local development processes, Sociologica, doi:10.2383/25950. [120] Rosenberg, S.M. (2001), Evolving responsively : adaptive mutation, Nature Reviews Genetics 2, 504-515. [121] Galhardo, R.S., Hastings, P.J., Rosenberg, S.M. (2007), Mutation as a stress response and the regulation of evolvability, Crit. Rev. Biochem. Mol. Biol. 42(5), 399-435. [122] Dunbar, R. I. M., (1998), The social brain hypothesis. Evol. Anthropol. 6, 178–190. [123] Axelrod, R. (1984),The evolution of cooperation, Basic Books. [124] Ohtsuki, H., Hauert, C., Lieberman, E., Nowak, M.A. (2006), A simple rule for the evolution of cooperation on graphs and social networks, Nature 441, 502-505. [125] Hanaki, N., Peterhansl, A., Dodds, P.S. and Watts, D.J. (2007), Cooperation in evolving social networks, Management Science 53,1036-1050. [126] Liniger, T.J. (2009), Multivariate Hawkes Processes, PhD Diss. ETH NO. 18403, ETH Zurich. [127] Saichev, A. and Sornette, D. (2011), Multivariate Self-Excited Epidemic Processes, working paper. [128] Saichev, A., Malevergne, Y., and Sornette, D., (2009), Theory of Zipf ’s Law and Beyond, Lecture Notes in Economics and Mathematical Systems, 632 Springer. [129] Malevergne, Y., Saichev, A. & Sornette, D., Maximum sustainable growth diagnosed by Zipf’s law submitted to American Economic Review (http://ssrn.com/abstract= 1083962). [130] Zhang, Q. and Sornette, D. (2010) Predicted and Verified Deviation from Zipf’s Law in Growing Social Networks, submitted http://arxiv.org/abs/1007.2650. Bibliography 115 [131] Sornette, D. (2009) Dragon-Kings, Black Swans and the Prediction of Crises, International Journal of Terraspace Science and Engineering 1 (3), 1-17. [132] Challet, D. , Solomon, S., and Yaari, G. (2009). The Universal Shape of Economic Recession and Recovery after a Shock. Economics: The Open-Access, Open-Assessment E-Journal 3 36. [133] Vernez, G. (2009), Information Warfare and National Security Policy in Switzerland, MAS ETH Security Policy and Crisis Management. [134] BBC article on Vinton Cerf’s WEF talk, http://news.bbc.co.uk/2/hi/business/ 6298641.stm (06.01.2009). Curriculum Vitae 118 Curriculum Vitae 119 Curriculum Vitae 120
© Copyright 2026 Paperzz