Mechanisms of Internet Evolution and Cyber Risks, PhD thesis at

DISS. ETH NO.
MECHANISMS OF INTERNET EVOLUTION
& CYBER RISK
A dissertation submitted to
ETH ZURICH
for the degree of
Doctor of Sciences
presented by
THOMAS-QUENTIN MAILLART
Msc. EPFL
born on January 6th 1981 in Colmar, France
Citizen of France
accepted on the recommendation of
Prof. Dr. Didier Sornette, examiner
Prof. Dr. Georg von Krogh, co-examiner
Prof. Dr. Stefan Bechtold, co-examiner
2011
ii
Summary
The Internet is probably the greatest communication tool ever invented yet. Most of its
todays functionalities have been designed by a multitude of entities – individuals, companies,
universities, governments – with no central organization. This bottom-up organization has
deep implications for the evolution of the Internet itself. In this thesis, the mechanisms of
Internet development are investigated, in particular individual and collective contributions
to the most complex and adaptive man-made system ever achieved.
Most Internet innovations have been achieved by software development, which is made in
part by original work and often by reuse of existing source code already made by others.
Overall, software forms a complex directed network of modules that require other modules
to work. The connectivity of this network is found to exhibit a Zipf’s power law, which
is a ubiquitous empirical regularity found in many natural and social systems, thought to
result from proportional growth. We establish empirically the usually assumed ingredients
of stochastic proportional growth models that have been previously conjectured to be at the
origin of Zipf’s law. For that, we use exceptionally detailed data on the evolution of open
source software packages in Debian Linux distribution, which offer a remarkable example
of a growing complex self-organizing adaptive system. Creation of new packages and exit
of obsolete ones characterize the Schumpeterian nature of knowledge reuse in software and
as a result in the development of the Internet.
The evolution of the Internet is also bounded by its interactions with the humans who shape
it. Like for many technological, economic and social phenomena, the Internet is controlled
by how humans organize their daily tasks in response to both endogenous and exogenous
stimulations. Queueing theory is believed to provide a generic answer to account for the
often observed power-law distributions of waiting times before a task is fulfilled. However,
the general validity of the power law and the nature of other regimes remain unsettled.
We identify the existence of several additional regimes characterizing the time required
for a population of Internet users to execute a given task after receiving a message, like
updating a browser. Depending on the under- or over-utilization of time by the population
of users and the strength of their response to perturbations, the pure power law is found to
be coextensive with an exponential regime (tasks are performed without too much delay)
and with a crossover to an asymptotic plateau (some tasks are never performed). Thus,
the characterization of the availability and efficiency of humans on their interactions with
Internet systems is key to understand and predict its future evolution.
Among the individuals who shape the Internet, programmers are particularly important
because they produce software that enables new functionalities. This work often requires
having many developers cooperate to find best designs and correct mistakes. Therefore,
their work is the place of intense exchange and interaction. In particular, open source
software plays a crucial role in the development of new applications in a self-organized
manner. Production of collective goods often requires tremendous efforts over long periods
to become relevant and useful. Open source software development can be modeled as a selfexcited conditional Poisson process, for which past actions, trigger – with some probability
iii
and memory – future actions and joining of new developers. In many large – and successful
– projects, these open source “epidemics” are found to be critical, hence just enough active
to be sustainable.
The main drawback of self-organization is the possibility for some people to develop
malicious software and use it with criminal intentions. While the Internet brings useful
innovation, it is also a land of risks and uncertainty. To understand cyber risk mechanisms
as a component of the Internet evolution, their statistical properties have to be established.
Cyber risk exhibits a stable power-law tail distribution of damage, proxied by personal
identity losses. There is also evidence for size effect, such that the largest possible losses
per event grow faster-than-linearly with the size of targeted organisations.
From a risk management perspective, it would be desirable to have proper monitoring
infrastructures of the Internet evolution. The Internet is a complex social world with
people and organisations engaging in intense communication and thus generating numerous
information transactions. Formally, these transactions can be tracked at the Internet
Protocol (IP) level, like a ”sniffer” on the link. If gathering, cleaning and storing data is
already an issue, analyzing them is a great challenge. From an Internet security perspective,
finding and characterizing security anomalies is a cumbersome work that cannot scale
with terabytes of data generated by large scale networks. For that, a generalized entropy
method called Traffic Entropy Spectrum (TES) has been developed and patented. It allows
straightforward visual recognition of security anomalies and machine learning classification.
TES is a convenient tool for real time Internet security monitoring. In the future, it could
also be used for social monitoring at the IP transmission level, for instance to better assess
future Internet infrastructure requirements.
iv
Résumé
Internet est probablement le meilleur outil de communication qui ait été inventé. La plupart
de ses fonctionalités ont été conçues par une multitude d’entités – personnes, entreprises,
universités, gouvernements – sans organisation centralisée. Cette approche “bottom-up”
a des implications profondes sur l’évolution d’Internet. Dans cette thèse, les mécanismes
du développement d’Internet sont explorés, en particulier les contributions individuelles et
collectives au système le plus complexe et adaptif que l’Homme ait achevé.
Sur Internet, la plupart des innovations ont été faites à travers le développement de logiciels,
qui consiste en un mélange de travail original et de réutilisation de code source existant
déjà fait par d’autres. L’univers du logiciel forme donc un réseau complexe et dirigé de
modules qui ont besoin d’autres modules pour fonctionner. La connectivité de ce réseau a la
caractéristique de suivre une loi de Zipf, qui est une rare régularité empirique que l’on trouve
dans une multitude de systèmes naturels et sociaux, et dont on pense qu’elle est le résultat
d’un mécanisme d’accroissement proportionnel. Nous avons établi la validité de cette loi
comme un processus stochastique multiplicatif ainsi que conjecturé par la communauté
scientifique. Pour cela, nous avons utilisé des données détaillées sur l’évolution des modules
“open source” qui forment la distribution Debian Linux, et qui est un exemple remarquable
de système émergent et adaptif. La création de nouveaux modules et la disparition de
ceux devenus obsolètes caractérise la “destruction créatrice” – selon Schumpeter – de la
réutilisation du savoir dans l’univers logiciel, et en conséquence pour le développement
d’Internet.
L’évolution d’Internet est aussi contrainte par l’interaction avec les hommes qui le façonnent.
Comme pour beaucoup de phénomènes technologiques, économiques et sociaux, l’Internet
est contrôlé par l’organisation et la gestion des tâches quotidiennes en réponse à la fois à
des stimulations endogènes et exogènes. La théorie des files d’attentes semble fournir une
réponse générique aux distributions en loi de puissance des temps d’attente avant qu’une
tâche ne soit complètement exécutée. Cependant, la validité de la loi de puissance et
la nature d’autres régimes observés n’est pas complètement établie. Nous avons identifié
l’existence de régimes supplémentaires qui caractérisent le temps requis pour une population
d’utilisateurs d’Internet pour exécuter une certaine tâche, comme par exemple mettre à jour
le navigateur Web. En fonction de la sous- ou sur-utilisation du temps par la population
d’utilisateurs et l’intensité de leur réponse aux perturbations, le régime pur en loi de
puissance peut co-exister avec un régime exponentiel dans lequel les tâches sont réalisées
sans trop de délai. Il peut aussi co-exister avec une déviation asymptotique vers un plateau.
Dans ce cas, certaines tâches ne sont jamais exécutées. Dans tous les cas, la disponibilité
et l’efficacité des hommes dans leurs interactions avec Internet est une clé pour comprendre
et prédire son évolution future.
Parmi les personnes qui façonnent Internet, les programmeurs ont une importance
particulière car ils produisent les logiciels qui permettent les nouvelles fonctionnalités. Ce
travail requiert souvent que les développeurs travaillent ensemble pour trouver les meilleurs
designs mais aussi pour corriger les erreurs. C’est pourquoi, leur travail est un creuset
v
d’échange et d’interaction intenses. En particulier, les logiciels open source jouent un
role crucial pour l’émergence de nouvelles applications. La production de bien collectifs
requiert souvent des efforts très importants sur de longues périodes pour devenir utiles. Le
développement de logiciels open source peut être modelé comme un processus de Poisson
et conditionnel auto-excité, dans lequel les actions passées déclenchent – avec certaines
probabilité et mémoire – les actions futures et l’engagement de nouveaux développeurs.
Dans beaucoup de grands projets open source qui sont des réussites, ces épidémies de
développement sont en régime critique, c’est-à-dire assez actives pour durer.
Le principal problème avec l’auto-organisation est la possibilité pour certains d’écrire
des logiciels malveillants et de les utiliser à des fins criminelles. Bien qu’Internet
offre des innovations utiles à chacun, le réseau est aussi un monde incertain et risqué.
Pour comprendre les mécanismes du cyber-risque comme une composante de l’évolution
d’Internet, ses propriétés statistiques doivent être établies. Le risque cyber est caractérisé
par une distribution des dommages (approximés par les vols d’identité) avec une queue
en loi de puissance. On trouve aussi l’existence d’un effet de taille, en cela que les plus
grandes pertes possibles par événement croissent de manière super-linéaire avec la taille des
organisations visées.
Dans une perspective de gestion des risques, on aimerait avoir des infrastructures adéquates
pour surveiller l’évolution d’Internet. Internet est monde complexe et social avec des gens
et des organisations qui communiquent de manière intense et qui par conséquent génèrent
un grand nombre de transactions sur le réseau. D’un point de vue formel, ces transactions
peuvent être tracées au niveau du protocole Internet (IP), par l’intermediaire d’un “sniffer”
sur le câble de réseau. Cependant, acquérir, nettoyer et stocker les données de surveillance
est déjà un défi en soi, mais les analyser est un autre challenge. En ce qui concerne la sécurité
d’Internet, trouver et caractériser les anomalies est un travail très difficile qui ne peut pas
être effectué à l’échelle des grands réseaux de communication à des coûts raisonnables.
Pour cela, nous avons développé et breveté une méthode basée sur l’entropie généralisée,
appelée Traffic Entropy Spectrum (TES). Elle permet une reconnaissance visuelle rapide
ainsi qu’une classification automatique des anomalies. TES est un outil efficace pour le
monitoring de la sécurité en temps réel. Dans le futur, TES pourrait aussi être utilisé
pour mieux comprendre les comportements sociaux en prenant l’information au niveau de
la couche de transmission IP. Une application pourrait être de mieux anticiper les besoins
futurs en infrastructures pour Internet.
vi
Acknowledgements
This thesis was carried under the supervision of Didier Sornette and co-supervision of
Georg von Krogh, whom I owe my sincerest gratitude for their outstanding academic
coaching. ETH Zurich has been a very nice environment with opportunities for exchange
and cross-nurturing of ideas across disciplines and for intense “self-organized” collaboration
opportunities.
Part of this thesis has been supported by the Swiss National Science Foundation, the
D-MTEC Foundation and the Centre for Coping with Crises in Socio-economic Systems
(CCSS). The received financial support is gratefully acknowledged.
By chronological order, thanks go first to my parents Roselyne and Jean-Claude Maillart
who supported and sponsored all the necessary studies that would later allow me engage
into PhD studies. I also acknowledge early exposition to risk management and insurance
provided by Lauren Clarke. Before joining ETH Zurich and while doing cyber security
business, I had the chance to meet Patrick Amon and Arjen Lenstra, whose work made me
understand that some problems are very hard to solve and require patience in addition to
an intense commitment. This thesis would have not even started without my friend Marc
Vogt, who arranged an improbable initial interview with Didier Sornette.
My thoughts go also to all the Entrepreneurial Risks group, especially to Heidi Demuth
for making paperwork surely less complex that it is, to Georges Harras and Moritz Hetzer
for intense discussions, to Ryan Woodard and Maxim Fedorovsky for teaching Python and
supporting for computer-related problems. Special thanks go to Gilles Daniel et Riley Crane
who were the social engine when I joined the Chair of Entrepreneurial Risks in 2007. I will
remember great moments of discussion with the people at the chair of Strategic Management
and Innovation at the coffee machine and at the Monte Verita conference in 2010. I also
acknowledge help from numerous master students and in particular Thomas Frendo, for his
work on open source software. I would like to thank the team of Bernhard Plattner at the
Electrical Engineering department – namely Daniela Brauckhoff, Stefan Frei and Bernhard
Tellenbach – who warmly welcomed fruitful collaborations. My thoughts go also to Thomas
Duebendorfer at Google Switzerland. Thanks go also to Barbara van Schewick who invited
me to visit the Centre for Internet and Society at Stanford University in 2009.
These three years learning research would never have been so rich without the passionated
supervision and friendship of Didier Sornette who taught me far much more than what I
could have expected. In particular, I learned that no job can be better than when done
with joy and playfulness.
Special thanks go to Marie Schaer for her love and support. Her passion for research helped
me seriously consider starting a PhD. Her academic experience has also been a precious
guide to avoid many pitfalls that necessarily arose over time.
vii
viii
Vita
1981 –
1999
Born and grown up in Colmar, France,
1997 –
1999
Scientific baccalaureate, lycée Bartholdi, Colmar, France,
1999 –
2005
Master of science in civil engineering, EPFL, Lausanne,
2002 –
2003
Exchange student, Technische Universität (TU), Berlin,
2005 –
2006
Project Manager, ilion Security S.A., Geneva,
2006 –
2007
Co-Founder and Manager, IRIS Solution S.A., Geneva,
2007 –
2011
Ph.D. candidate and teaching assistant, Department of
Management, Technology and Economics, ETH Zurich.
ix
x
Contents
Summary
iii
Résumé
v
Acknowledgements
vii
Vita
ix
1 Introduction
1
1.1
The “End-to-End” Argument . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Emergence of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
The Hacker Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.4
Code is Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.5
Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 Background
2.1
9
Measuring and Modeling the Global Internet . . . . . . . . . . . . . . . . .
10
2.1.1
Measuring the Global Internet . . . . . . . . . . . . . . . . . . . . .
10
2.1.2
Internet Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.1.3
Internet Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.4
Virtual and Social Networks . . . . . . . . . . . . . . . . . . . . . . .
12
xi
2.2
Modularity & Knowledge Reuse . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2.1
Modularity and Industrial Design . . . . . . . . . . . . . . . . . . . .
14
2.2.2
Knowledge Reuse : A Complex Modular System . . . . . . . . . . .
15
Individual and Collective Human Dynamics . . . . . . . . . . . . . . . . . .
17
2.3.1
Human Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.3.2
Collective Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.4
Internet Security & Cyber Risk . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.5
Monitoring Internet Evolution . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.5.1
Self-Similarity of Traffic Time Series . . . . . . . . . . . . . . . . . .
24
2.5.2
Network Anomaly Detection
25
2.3
. . . . . . . . . . . . . . . . . . . . . .
3 Discussion & Perspectives
27
3.1
Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2
Human Contributions and Dynamics of Innovation . . . . . . . . . . . . . .
29
3.2.1
Multivariate Hawkes Processes . . . . . . . . . . . . . . . . . . . . .
29
3.2.2
Micro Mechanisms of Cooperation in Empirical Data . . . . . . . . .
30
3.2.3
Code Mutations to Measure Probability of Innovation Occurrence .
31
Beyond the Zipf’s Law and Proportional Growth . . . . . . . . . . . . . . .
32
3.3.1
Deviations from the Zipf’s Law . . . . . . . . . . . . . . . . . . . . .
32
3.3.2
Coexistence of Multiple Proportional Growth Regimes . . . . . . . .
33
Cyber Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.4.1
The Innovative Nature of Cyber Risk
. . . . . . . . . . . . . . . . .
34
3.4.2
Economics of Cyber Crime . . . . . . . . . . . . . . . . . . . . . . .
35
3.4.3
Vulnerabilities versus Damage . . . . . . . . . . . . . . . . . . . . . .
37
3.3
3.4
Concluding Remarks
39
xii
Chapter 1
Introduction
The Internet is an amazing communication system, which has taken a prominant place
in people’s life. With Arpanet – the first implementation of the Internet – a long range
comunication was established between University of California, Los Angeles (UCLA) and
Stanford Research Institute on October 29, 1969. More than twenty years later – in 1992,
the U.S. Congress allowed commercial activity on a network that had mainly be used for
educational and research purposes so far. In 2010, the Internet has become a network of
more than 700 millions active computers (see Figure 1.1) [1], and almost 2 billion users
who constantly search, exchange and produce information, share pictures and videos, buy
and sell goods, etc [2]. To enable all these functionalities, several billions of lines of source
code1 have been produced very often by companies but also by individuals or groups of
individuals (see [3] for an estimation of open source code).
Furthermore, in many cases, major innovations have emerged by the action of highly
motivated individuals. At each technological step, major companies have been forced to
adapt or to lose significant influence on the information and communication technology
(ICT) market due to unexpected competition. For instance, the invention and the rise of
Google as a new search engine in the late 1990’s has reshuffled the ICT market only in a
few years. Nowadays, Facebook is more and more challenging Google. How is it possible
that David so often beats Goliath? Is it a matter of chance only, or are there some rules
that trigger this massive bottom-up innovation? Actually, it can be shown that some formal
and informal rules, set at the early stages of the Internet development and later on, enabled
the innovation potential of the Internet.
1
The source code is any collection of statements or declarations written in some human-readable
computer programming language. Source code is the means most often used by programmers to specify
the actions to be performed by a computer.
1
1. Introduction
2
Fig. 1.1: Super-linear evolution of computers connected to the Internet, recorded by the Internet
Systems Consortium [1].
1.1
The “End-to-End” Argument
When the Internet was being designed in the seventies, much discussion was going on to
decide how communication between computers should be handled above the physical link
(e.g. ethernet cables, phone lines). A very important problem at the time, was to cope
with datagram2 transmission errors due to the lack of reliability. At first, it was thought
that the routing protocol, which handles datagram transmission, should also include error
control. This protocol was called the Internetwork Transmission Control Protocol (ITCP).
It included datagram transfer and error control features. In January 1978, the decision was
taken to split the two functionalities into two layers. The Internet Protocol (IP) would be
dedicated to routing packets on the network on hop-by-hop manner, through a succession
of routers. An additional layer – the transport layer – was created with the possibility
to implement several protocols. Among them, two protocols are widely used today: the
Transmission Control Protocol (TCP) handles error control, while the User Datagram
Protocol (UDP) offers unreliable datagram service, but faster transmission. The main
reason for not implementing systematic error control was indeed performance. TCP requires
a dialog between sending and destination hosts to acknowledge proper transmission, while
UDP just sends packets with no control. Figure 1.2 shows a schematic representation of the
actual layered architecture of the Internet, with the characteristic “hourglass” form and the
Internet Protocol (IP) as a unique point of communication between multiple physical links
and multiple transport protocols and applications. A few years later, the Internet “founding
2
Datagram : basic transfer unit associated with a packet-switched network in which the delivery, arrival
time and order are not guaranteed. A datagram consists of header and data areas, where the header contains
information sufficient for routing from the originating equipment to the destination without relying on prior
exchanges between the equipment and the network. The source and destination addresses as well as a type
field are found in the header of a datagram.
1. Introduction
3
fathers” realized they add achieved a fundamental design principle, called the “end-to-end”
argument, which states that application-specific functionality usually cannot – and preferably
should not – be implemented in the lower layer of the network, the network’s core. Instead, a
function should only be implemented in a network layer if it can be completely and correctly
implemented at that layer and used by all clients of that layer [4].
Fig. 1.2: Internet Layering
In summary, the more specific a function, the higher in the layering of the Internet
architecture it should be implemented, and thus the whole system should not be constrained
by rules that would not concern all stakeholders. This design principle has fundamental
consequences for the use of the Internet. For instance, the impossibility to discriminate
packet transmission according to their function, their origin and destination has, until
recently, enforced the equality of treatment between all stakeholders from the individual
to the largest companies and organisations3 . Thus, the “end-to-end” argument is thought
to be the most fundamental design choice of the Internet, which has paved the way to
massive bottom-up Internet application development.
1.2
Emergence of Applications
While the Internet was being developed, a computer revolution happened. In the late
1970s and early 1980s, the first personal computers were commercialized and popularized.
For the first time, anybody could buy a computer, learn programming languages (mainly
3
Outlawing discrimination on the Internet is the goal of the proponents of network neutrality, a bill
that has raised a heated debate at the U.S. Congress in the last decade and is still unsettled. Their main
argument is that discrimination would drastically reduce the opportunities for bottom-up innovation [5].
1. Introduction
4
BASIC and FORTRAN at the time) and write executable programs at home. Somehow, like
the hourglass structure of the Internet, MS-DOS operating system and later on Microsoft
Windows, provided a platform on which software could be easily written by third parties.
Furthermore, portability eased diffusion of software along with the tremendous increase of
the personal computers market (See Figure 1.1).
Nowadays we witness a similar explosion of mobile applications along with the spread of
IP enabled smartphones. The prominent provider to date is the Apple Apps Store for
the Iphone and subsequently for the iPad, which started in July 2008 with 500 applications
available for download. By September 2010, this number has exploded to more than 250′ 000
applications. The strategy is even to lower the entry barriers for program development and
maximize portability. In order to attract more developers, Google has recently launched
the Google App Inventor for those “who are not developers” [6].
The emergence of bottom-up produced applications and even devices has been thoroughly
documented by Zittrain [7]. He also explained why bottom-up development of applications
is threatened by various industry arguments, and in particular security and copyright
enforcement. However, reducing the success of “generative Internet” to technical features
would be very reductive. Source code get written because people – alone and within
organisations – commit at doing it, and because there is a need.
1.3
The Hacker Community
The term “hack” has probably been first introduced at the Massachussets Institute of
Technology (MIT) Tech Model Railroad Club (TMRC), which is one of the most famous
model railroad club in the world. The club’s members shared a passion to find out how things
worked and then to master them. Members had a self-authoritative social organisation and
disliked authority. In particular, they were initially drawn to the first multi-million dollar
computers that were installed in the late 1950’s, with the goal to understand how they
worked. Many of these hackers were either freshmen, or even high-school students, used to
spend nights coding, considered lectures as a waste of time. The hacking culture then spread
in the academic community and became very strong in University of California, Berkeley,
Stanford University, and Carnegie Mellon University, Pittsburgh. This community was
disparate and many subcultures existed, but they all shared some common traits, like
creating and sharing software and hardware, placing a high value on freedom on inquiry,
hostility to secrecy, information-sharing as both an ideal and a practical strategy, upholding
right to fork4 , emphasis on rationality, distate for authority, playful cleverness. In the
seventies, the first personal computers were invented, by the “hardware hacking” movement
4
Fork: In software engineering, a project fork happens when developers take a legal copy of source code
from one software package and start independent development on it, creating a distinct piece of software.
The term fork derives from the use of the term in computer operating systems where it refers to the creation
of a copy of a running program (process) by forking a process, which creates two identical processes (like
cell division in living systems); the two are then independent, and may proceed to do different tasks, as the
program dictates (Wikipedia ).
1. Introduction
5
and homebrew computer club in Stanford. Steve Wozniak’s Apple I – sold at price 666$ in
1976 – was the first computer to be adopted by households [8].
Part of the hacker’s soul was the imperative need to explore how things worked, with
a preference for inaccessible things, like the phone companies networks or simply secrets
kept behind closed doors. As a result, many hackers have been involved in breaking into
systems, not for criminal purpose, but rather to challenge security as a kind of illegitimate
authority and to satisfy their own curiosity, learn how hidden things work. However, when
the Internet started to democratize in the nineties, with companies and individual users
exchanging emails, surfing the nascent World Wide Web, the first serious security issues
appeared, with massive worm (self-replicating viruses) and denial of service attacks. If
some of these attacks were perpetrated with malicious intentions, most of them have been
designed to demonstrate the lacks of security of companies and organisations as well as flaws
in software, which was mainly closed source at the time. Following first large scale attacks,
some hackers were sued for having breaked into private – and supposedly secured – systems
(see the Hacker Manifesto reproduced on the next page). For hackers, the concept of closed
source software has always been against their ideal, but their main claim against it, was
that secrecy prevented auditing and thus improvement toward reduction of vulnerabilities.
Software editors claimed that code secrecy and vulnerability non-disclosure would prevent
finding and exploiting flaws, while hackers have been proponents of full-disclosure, which
means that when a vulnerability is found, it should be made publicly available, thus forcing
software editors to react fast, develop patches and deploy them to their customers [9].
Nowadays, the full-disclosure strategy has been widely accepted as the best way to reduce
software vulnerability .
In 1985, in order to promote the already declining hacking culture, Richard Stallman – the
last “true hacker” – created the Free Software Foundation and popularized the concept of
copyleft, a legal mechanism to protect the modification and redistribution rights for free
software, which has enabled a regulatory framework for Free/Libre/Open Source Software
thereafter called Open Source Software (OSS). The GNU General Public License (GPL)
was written in 1989 and ensures freedom to reuse (copyleft) with viral propagation to work
derived from original GPL, which can only be distributed under the same license terms. As
of today, more than 500′ 000 OSS projects have been referenced [10].
Beyond the ideology, the hacker culture has been widely recognized as useful even for
commercial purpose. When Lego Inc released their Mindstorm – a line of programmable
robotics toys compatible with traditional legos – they first targeted kids as a primary market.
After a few months, not only they discovered that engineers in the Silicon Valley had
widely adopted it, but their hacks significantly enhanced performance. A users community
was born, which helps the development and improvement of Mindstorm products [11, 12].
Nowadays, the hacking culture is predominant in Internet startups, and recruitment is
made to attract such profiles, with comfortable monetary incentives and dedicated working
environment.
1. Introduction
6
The Hacker Manifesto
by
+++The Mentor+++
Written January 8, 1986
Another one got caught today, it's all over the papers. "Teenager Arrested in Computer
Crime Scandal", "Hacker Arrested after Bank Tampering"...
Damn kids. They're all alike.
But did you, in your three-piece psychology and 1950's technobrain, ever take a look
behind the eyes of the hacker? Did you ever wonder what made him tick, what forces
shaped him, what may have molded him?
I am a hacker, enter my world...
Mine is a world that begins with school... I'm smarter than most of the other kids, this
crap they teach us bores me...
Damn underachiever. They're all alike.
I'm in junior high or high school. I've listened to teachers explain for the fifteenth time
how to reduce a fraction. I understand it. "No, Ms. Smith, I didn't show my work. I did it
in my head..."
Damn kid. Probably copied it. They're all alike.
I made a discovery today. I found a computer. Wait a second, this is cool. It does what I
want it to. If it makes a mistake, it's because I screwed it up. Not because it doesn't like
me... Or feels threatened by me.. Or thinks I'm a smart ass.. Or doesn't like teaching
and shouldn't be here...
Damn kid. All he does is play games. They're all alike.
And then it happened... a door opened to a world... rushing through the phone line like
heroin through an addict's veins, an electronic pulse is sent out, a refuge from the
day-to-day incompetencies is sought... a board is found. "This is it... this is where I
belong..." I know everyone here... even if I've never met them, never talked to them,
may never hear from them again... I know you all...
Damn kid. Tying up the phone line again. They're all alike...
You bet your ass we're all alike... we've been spoon-fed baby food at school when we
hungered for steak... the bits of meat that you did let slip through were pre-chewed and
tasteless. We've been dominated by sadists, or ignored by the apathetic. The few that
had something to teach found us willing pupils, but those few are like drops of water in
the desert.
This is our world now... the world of the electron and the switch, the beauty of the baud.
We make use of a service already existing without paying for what could be dirt-cheap if
it wasn't run by profiteering gluttons, and you call us criminals. We explore... and you
call us criminals. We seek after knowledge... and you call us criminals. We exist without
skin color, without nationality, without religious bias... and you call us criminals. You
build atomic bombs, you wage wars, you murder, cheat, and lie to us and try to make us
believe it's for our own good, yet we're the criminals.
Yes, I am a criminal. My crime is that of curiosity. My crime is that of judging people by
what they say and think, not what they look like. My crime is that of outsmarting you,
something that you will never forgive me for.
I am a hacker, and this is my manifesto. You may stop this individual, but you can't stop
us all... after all, we're all alike.
1. Introduction
1.4
7
Code is Law
In [13], Lessig proposed that developers are the “lawmakers” of the Internet. What they
code determines most of the actions that can be performed on the Internet. Indeed, if
a program doesn’t provide a desired functionality or restrains copy through digital right
management, a user will never profit from this functionality. Only a developer – with
the rights and the skills to change the sourcecode – can change the program by adding
functionalities, removing restrictions, etc. Law scholars have called this phenomenon “Lex
Informatica” or “Code is Law” [14]. Actually, many similar examples can be found in nature
and man-made infrastructures: obviously, the world we live in is constrained by physical
laws, such as gravitation, For instance, traffic regulation can be obtained by using it: a
simple example is a speed bump installed on street to slow cars. Many similar examples
can be found in the tangible.
1.5
Research Question
Therefore, in order to understand the mechanisms of Internet evolution, one must recognize
(i) the importance of source code, (ii) people who produce it and (iii) the adoption of this
code by users. Indeed, a popular software will have much more influence compared to a
program used only by a few people. As it will be shown later, software adoption by users
has critical consequences for Internet security.
Considering that millions of individuals and companies write source code for personal or
commercial purposes. Under some circumstances (e.g. open source software), this code
can be reused by others. Software must also be adapted to meet evolving needs. For
large software, several programmers may generally work together to cope with development
and maintenance. Thus, the Internet at runtime is mainly the result of maelstrom of
source code mutations triggered by developers – within companies or open source – with
heterogenous needs, desires, ideologies, skills, which can lead to brillant innovations or
malicious exploitation of weaknesses.
Uncovering the complex mechanisms leading to innovation and insecurity at the same time
is the goal of the present thesis. It will be shown that the Internet is a complex adaptive
system, where the interaction between the technical features (source code, software) and
humans (developers, users) plays a central role on the evolution of the Internet.
Chapter 2
Background
The Internet has received much attention from researchers since its inception. Contributions
have been made from computer scientists who have been mainly concerned with improving
technology, physicists who are concerned with complexity and emergent properties, social
scientists and economists who found a tremendous in vivo social laboratory to understand
and model social ties. Early on, the networked nature of the Internet has been recognized,
and triggered tremendous advancements and applications of graph theory (Section 2.1).
Developments in knowledge reuse and modularity in management science have given insights
on the mechanisms of Internet innovation (Section 2.2). By construction, the evolution
of the Internet is completely tied by human dynamics. Therefore, we shall also review
developments in individual and collective dynamics (Section 2.3). Finally, Internet security
has been a critical concern for safeguarding the development of online tools, and has received
much attention from technical and economic viewpoints (Sections 2.4 and 2.5).
9
2. Background
2.1
10
Measuring and Modeling the Global Internet
The research concerning the Internet as a complex system has mostly been done by adopting
a complex network modeling approach to the problem.
2.1.1
Measuring the Global Internet
In the late nineties, when the Internet became popular and was growing up, researchers
and engineers were concerned with measuring the size of the Internet. The most strenuous
question at the time was to make sure that the network would be able to sustain its growth
and whether engineers would be able to manage it. For that, several studies aimed at
tracking and visualizing Internet large-scale topology and/or performance and providing
Internet mapping at different resolution scales [15]. A first approach consisted in mapping
connections between autonomous systems (AS), which are autonomously administered
domains of the Internet [16]. This approach is limited by the heterogeneity of functions
that each AS can take: transit (backbone functions), stub (local delivery) or multihomed
(combination of various functions).
Fig. 2.1: The Internet domain structure. Filled nodes represents individual routers. Hollow
regions and shaded regions correspond to respectively stub and transit autonomous systems (AS).
Reproduced from Pastor-Satorras and Vespignani [17].
2. Background
11
Fig. 2.2: Two-dimensional image of a router level Internet map collected by H. Burch and B.
Cheswick. Reproduced from Pastor-Satorras and Vespignani [17].
Following recent developments in network theory, it was soon found that the Internet had
characteristic properties: (i) average shortest path length among vertices with small-world
properties [18], compared to the size of the network and, (ii) clustering and hierarchy
through heavy-tailed distribution of routers connectivity, i.e. scale-free properties [19] (see
[17] p.48, for an exhaustive review).
2.1.2
Internet Models
Given that characteristic features of the Internet were uncovered, the interest shifted to
models and more specifically to generating mechanisms in order to replicate the evolution
of the Internet. First, it was confirmed that the Internet could not be a random “Erd¨ösRényi” graph [20], because the latter cannot exhibit scale-free properties although it can
be small-world. In 1999, a first growing network model – “preferential attachment” –
was introduced to explain the growth of the World-Wide Web [21], which is nothing else
than the proportional growth model proposed by Simon in 1955 [22]. This model has the
advantage to explain the emergence of scaling properties in complex systems. Despite several
improvements and massive reuse by the scientific community, this model has never been
properly validated. Moreover, many other mechanisms can generate power laws, including
self-organized criticality [23], highly optimized tolerance (HOT) [24] (see ch.14 in [25] for a
review of these mechanisms).
2. Background
2.1.3
12
Internet Robustness
In the quest for better control over the Internet, tolerance to errors (random failure)
and malicious attacks (see Section 2.4) has been investigated, mainly from a theoretical
viewpoint, using random and targeted node removal [26, 27]. Percolation theory [29]
was used to model cascading failures, in particular scale-free networks. However, detailed
empirical investigation has shown that theoretical models generally fail to capture the real
structure of the Internet, as well as their single points of failure, namely because the Internet
has been optimized – by the way of engineering – to be robust against outages and attacks
[28].
2.1.4
Virtual and Social Networks
The Internet is called a “network of networks”. This expression relates to the very loose
and changing nature of the Internet and its functions. For telecommunication engineers
the Internet is the routing network, for users it can be the World Wide Web, their social
network (e.g. Linkedin, Facebook). Also, each layer is deeply influenced by the others: on
the one hand, the application layer is bounded by the link, the Internet Protocol (IP) and
the Transport layers and on the other hand the link and the IP layers cannot neglect the
actions taken at the above layers, for maintenance, security and maintenance reasons (see
Section 1.1).
Among the multiple networks operating at the application layer, four have received much
attention:
• The most famous is the World Wide Web (Web), invented by Tim Berner-Lee
in 1990 at the European Organization for Nuclear Research (CERN) along with the
first Web browser program. The basic structure of the Web is a network of directed
hyperlinks1 between Webpages. The Web has rapidly grown to become the main
medium to publish and share information, engage into e-business. Like the Internet,
its structure has been found to be scale-free – with a power law distribution of incoming
links [30], and mechanisms have been proposed to explain this growth [21, 31], and
again with many improvements, like fitness models [32]. It is worth noting that
research on complex networks has deeply inspired the pagerank search algorithm for
Google search engine [33]. Using the Web, the first studies on communities have
been performed on human dynamics [34], social structures , which are still ongoing
nowadays and refined thanks to improved tools for social networking (see [35] and
references therein).
1
Hyperlink: reference to a document that the reader can directly follow. The reference points to a
whole document or to a specific element within a document. Tim Berners-Lee saw the possibility of using
hyperlinks to link any unit of information to any other unit of information over the Internet. Hyperlinks
were therefore integral to the creation of the World Wide Web. Web pages are written in the hypertext
mark-up language HTML (source: Wikipedia).
2. Background
13
• Email communication networks have also received some attention early on, especially
for Worm spreading [36, 37] and email community networks with self-similar
organization [38] and long time memory processes in email treatment dynamics [39].
• Peer-to-Peer (P2P) systems are ad-hoc networks built at the application level, with
their own routing network. Each peer – usually a home computer – is at the time
a client and a server, respectively asking for data and sending data to other peers.
“Pure” P2P networks have no central command, and each peer is authenticated by its
neighbors. Famous P2P networks are Skype (online phone system), and filesharing
networks (e.g. Gnutella), mainly used for music and movie sharing. When these
networks started to develop, much concern was raised on their scalability, namely
because the amount of data transfered became rapidly very large [41, 42].
• Instant Messaging has also received some attention. This communication network
has been found to be scale-free [43] and small-world, with age and gender homophily
[44].
• Investigations of Massive Multiplayer Online Games (MMOGs) [45, 46, 47], and
their social networks has recently started with strong focus on homophily [48, 49].
• Open Source Software (OSS) community is probably one of the oldest online
communities, as a transposition of the hacker ethic to the Internet. Naturally, its
social structure has also been investigated with traditional network metrics (clustering,
degree distribution, shortest path, etc. [50]), using web forums, email networks and
communication infrastructures [51], as well as version control [52]. This research
is mainly framed in management and economics sciences, because OSS has been
recognized as an archetypical example of collective action [53].
2.2
Modularity & Knowledge Reuse
As discussed above, many Internet features can be modeled by the way of physical, logical,
social networks with various features. However, their underpinning goal remains unique:
exchanging information and in many cases sharing knowledge. This has rapidly led to
two consequences: first, given increasing storage capabilities, more knowledge has become
available over the years, and second, because it is often free, knowledge has been reused.
For instance, hyperlinks are useful to reuse knowledge from a third party Web page, thus
preventing hard copy of this page. Also, linking clearly helps keeping up-to-date, in case the
cited page would change. Another case is software, in particular open source software (OSS)
studied in this thesis as special kind of knowledge, and for which parts of the source code
are reused by others. Many examples in natural and social systems, display modularity, as
a kind of organisation, which can be top-down or rather self-organized.
In The Architecture of Complexity (1962) [54], Herbert Simon showed the advantages of
modularity with a parable of two watchmakers who organize their work – building a watch
2. Background
14
– in two different manners: one has a pure sequential assembly process and the other
introduced modularity in the building process. The second process has proved to be more
resilient to perturbations. This parable has been one of the first conceptualizations of
modularity related to innovation and complexity. In the broad context of knowledge and
technology reuse, modularity is ubiquitous in the structure of the Internet itself (see Section
1.1), in personal computers and software but also in biological systems [55].
The Watchmakers Parable
There once were two watchmakers, named Hora and Tempus, who made very fine
watches. The phones in their workshops rang frequently and new customers were
constantly calling them. However, Hora prospered while Tempus became poorer and
poorer. In the end, Tempus lost his shop. What was the reason behind this? The watches
consisted of about 1000 parts each. The watches that Tempus made were designed such
that, when he had to put down a partly assembled watch, it immediately fell into pieces
and had to be reassembled from the basic elements. Hora had designed his watches so
that he could put together sub-assemblies of about ten components each, and each subassembly could be put down without falling apart. Ten of these subassemblies could be
put together to make a larger sub-assembly, and ten of the larger sub-assemblies constituted the whole watch.
Herbert Simon, The Architecture of Complexity (1962).
2.2.1
Modularity and Industrial Design
In well-controlled environments modularity can be implemented as a process. For instance,
the industrial process behind the production of a good or a service can be modularized
to make it more efficient and less fragile [56]. In [57], Baldwin and Clark describe several
mechanisms called modular operators as a list of things that designers can do to a modular
system: (i) splitting a design into modules, (ii) substituting one module design for another,
(iii) augmenting by adding a new module to the system, (iv) excluding a module from the
system, (v) inverting to create new design rules and (vi) porting a module to another system.
Modularity has played a major role in the design of computer systems, respectively hardware
and software. For instance, the central processing unit (CPU) is the basic “Lego” of all
computers, which is itself made of millions of transistors2 that are necessary to perform all
mathematical operations required by software execution. Most personal computers (PC)
2
A transistor is an electronic component with at least three terminals for connection to an external
circuit. It is mainly used to amplify and switch electronic signals . A voltage or current applied to one
pair of the transistor’s terminals changes the current flowing through another pair of terminals (source
Wikipedia).
2. Background
15
are made of components (e.g. processor, mainboard, hard disk, ram memory, keyboard,
mouse, screen), which are designed to work together, but can be separately replaced and
upgraded. Modularity may also apply to abstract systems, such as knowledge and design
processes.
An example of the effects of modularity in open source software (OSS) has been proposed
by MacCormack et al. [58] as well as by Challet et Lombardoni [59], for dependencies
respectively between Red Hat Linux packages and files in source code. Both recognized the
importance of propagation costs: a change in a module may trigger consequences on several
other modules. MacCormack et al. [58] showed that when a software (Mozilla) undergoes
massive re-engineering toward more modularity, its propagation costs are slightly reduced,
making maintenance simpler (See Figure 2.3).
1
Mozilla.19980408
1684
1
1
1
1684
1508
Mozilla.19981211
1508
Fig. 2.3: Design Structure Matrix (DSM) of Mozilla showing the relation between modules
(source files) in the code. Each point represents a reuse from one module (columns) by another
(rows). Modules have been ordered by clusters of reuse (squares on the diagonal line). The left
(resp. right) panel shows modularity before (resp. after) large reengineering in order make the
code more modular for an open source model. Actually, Mozilla was created out of a proprietary
software –Netscape– that America Online (AOL) decided the release as open source. After the
re-engineering operation the number of modules has decreased, and component reuse is better
organized around well identified clusters. (reproduced from MacCormack, A. et al. (1999) [58])
2.2.2
Knowledge Reuse : A Complex Modular System
The conditions and the processus that give rise to modularity can be relaxed to apply
to complex systems. Indeed, biological systems are less the result of a sophisticated topdown design process than an evolutionary adaptation to a changing environment. In some
2. Background
16
sense, this is nothing else than a “try and fails” process, which leads to self-organization
and innovation. For that, organisms as well as humans never start from scratch, but –
rather naturally – reuse components (resp. resources) available in their environment [55].
The processus of components reuse and integration has been investigated in the context
of firms [60, 61]. Considering the Internet, knowledge reuse has become the norm. The
multitude of hyperlinks on the World Wide Web is probably the best example of knowledge
reuse as a giant citation network. Similarly, the introduction of copyleft license (see section
1.3) has enabled reuse of source code at large scales by the community of developers. In
the particular context of OSS development, the organization in wordwide communities or
projects, the Internet-based communication between developers and the open code base
introduce many opportunities for reuse of knowledge. Haefliger et al. [62] found that
developers reused software for three reasons: (i) integrate functionality quickly, (ii) write
certain parts of the code over others and (iii) mitigate development costs through code
reuse. However, everything comes at a cost, and in the case of codified knowledge (i.e.
source code) integrations costs may be up to 200% of development costs [63]. Moreover,
the absence of an incentive mechanism – generally required for reuse in firms [64] – might
inhibit actual reuse [65].
Considering explicit knowledge reuse, the tree of dependencies can be investigated by
measuring calls to external code, i.e. code from another module3 . At large scales, the
network of dependencies can be captured by inspecting the tree of dependencies in a Linux
distribution, which aggregates several thousands of open source projects in a comprehensive
tree. These data can be easily extracted and the distribution of reuse found to be heavytailed [59, 66].
However, dynamics of reuse in the context of a large ecosystem of source code development
has remained unexplored so far. Indeed, complex adaptive systems, such a large scale
software reuse networks are not centrally managed, and rather obey some evolutionary
rules. Hence, modularity and moreover its dynamical properties remains unknown so far,
although they have important consequences for understanding how technological innovation
emerges.
3
Note: Dependencies can be analyzed either by their structure (which source file calls external code?)
or their function (how many times external code is called during execution?).
2. Background
2.3
17
Individual and Collective Human Dynamics
The Internet is a complex adaptive system driven by humans. Therefore, every change is the
result of individual actions, their contingencies and how they react alone or collectively to
various stimuli. Our concern are human features that have consequences for mechanisms of
Internet evolution. For that, we review previous work on priority queueing and subsequent
long range memory processes it generates, as a fundamental variable for human dynamics.
Also, many Internet-based social networks exhibit collective behaviors as a result of crossstimulation between individuals. These complex dynamics are critical for the Internet as a
system that has emerged from massive collaboration between individuals.
Fig. 2.4: The correspondence patterns of Darwin and Einstein. a. Historical record of the
number of letters sent (Darwin, black; Einstein, green) and received (Darwin, red; Einstein, blue)
each year by the two scientists. An anomalous drop in Einstein’s correspondence marks the Second
World War period (1939–45, boxed). Arrows, birth dates of Darwin (left) and Einstein (right).
b. and c. Distribution of response times to letters by Darwin and Einstein, respectively. Note
that both distributions are well approximated with a power-law tail that has an exponent 3/2, the
best fit over the whole data for Darwin giving 1.45 ± 0.1 and for Einstein 1.47 ± 0.1. (reproduced
from Oliveira and Barabási, Nature (2005) [69]).
2.3.1
Human Timing
Many Internet dynamics are controlled by human behaviors and the way they organize
and manage their own time. Recent studies of various social systems have established the
2. Background
18
remarkable fact that the distribution Q(t) of waiting times between the presentation of the
message and the ensuing action has a power law asymptotic of the form Q(t) ∼ 1/tα , with
an exponent α often found smaller than 2. Examples include the distribution of waiting
time until a message in answered in emails [39] and in other human activity patterns, like
web browsing, library visits, or stock trading [68]. Fig. 2.4 shows the distribution of waiting
times before correspondence has been answered by respectively Darwin and Einstein [69].
These observations can been rationalized by simple priority queueing models that describe
how the flow of tasks falling on (and/or self-created by) humans are executed with an
arbitrary priority [68, 69, 70, 71]. Assuming that the average rate λ of task arrivals is larger
than the average rate µ for executing them, and using a standard stochastic queuing model
wherein tasks are selected for execution on the basis of random continuous priority values,
Grinstein and Linsker derived the exact overall probability per unit time, pdf(t), that a
given task sits in the queue for a time t before being executed [72]:
pdf(t) ∼
1
t5/2
pdf(t) ∼
et/t0 ,
1
t3/2
,
for µ > λ ,
for µ ≤ λ ,
(2.1)
(2.2)
where µ (resp. λ) is the rate of incoming (resp. executed) tasks and t0 is the scaling time
of the exponential crossover. Grinstein and Linsker showed that the distribution (2.2) is
independent of the specific shape of the distribution of priority values among individuals
[73]. The value of the exponent p = 3/2 is compatible with previously reported numerical
simulations [69, 68] and with most but not all of the empirical data. While Grinstein
and Linkser [72, 73] could derive exact solutions for priority queuing, the model lacks
empirical validation. Unfortunately, while some evidence confirms the analytical results for
µ ≤ λ [69, 78] and for µ > λ [79], significant deviations from the canonical exponents can
be observed. For instance, the probability density function (pdf) of waiting times before
people answer an email is found to be a power law with exponent α ≈ 1 [39, 71]. Moreover,
distributions often exhibit power law behavior with asymptotic crossovers with exponential
(resp. plateau) distribution. Saichev and Sornette proposed that the standard priority
queuing model can be extended with incoming (resp. outgoing) rate of tasks slowly varying
over time when µ ≤ λ . In this case, the distribution of waiting times can depart from the
power law with exponent 1/2, exhibiting exponents varying from 0.3 to ∞ [74].
2.3.2
Collective Behaviors
Among human dynamics, collective behaviors play a fascinating role. How people get
influenced by others? When does herding effect give rise to social epidemics? More generally,
it relates to how people influence each others and how the action of individual triggers (resp.
is triggered by) the action(s) of others. To account for these complex and intricate causal
2. Background
19
dynamics, Sornette et al. proposed a coarse grained approach to detect and categorize
epidemics [75, 76]. A first validation of the endogenous versus exogenous shocks theory
was performed using dynamics of book sales on Amazon [77], and further confirmed by a
systematic classification of Youtube videos [78]. To account for complex triggering effects, it
is convenient to use a self-excited conditional Poisson process, which basically states that for
a given system, each agent is subjected to endogenous shocks which are triggered between
agents with a given memory function and exogenous shocks occurring as a renewable process
[80]. It is mathematically formulated as follow,
λ(t) = V (t) +
X
νi φ(t − ti )
(2.3)
i,ti ≤t
where νi is the number of potential persons who will be influenced directly over all future
times after ti by person i who acted at previous time ti . Thus, the existence of wellconnected individuals can be accounted for with large values of µi . V (t) is the exogenous
source, which captures all spontaneous views that are not triggered by epidemic effects on
the network. The memory kernel is given by,
φ(t − ti ) ∼ 1/t1+θ ,
with
0<θ<1,
(2.4)
For θ = 0.5, the standard priority queueing described above is recovered. The equation
(2.3) can be solved using a mean-field approximation for various values of hµi. For hµi > 1
the process is supercritical with exponential growth [75]. At criticality (hµi = 1), three
distinct results are obtained [76], and presented below and in Figure 2.5 in the context of
YouTube videos [78]:
• Exogenous sub-critical. The network is not “ripe” (that is, when connectivity
and spreading propensity are relatively small), corresponding to the case when the
mean value hµi i of µi is less than 1, then the activity generated by an exogenous
event at time tc does not cascade beyond the first few generations, and the activity is
proportional to the direct (or “bare”) memory kernel φ(t − ti ) :
Abare (t) ∼
1
,
(t − tc )1+θ
(2.5)
with tc the initial exogenous shock.
• Exogenous critical. If instead the network is “ripe” for a particular video, i.e., hµi i
is close to 1, then the bare response is renormalized as the spreading is propagated
through many generations of viewers influencing viewers influencing viewers, and the
theory predicts the activity to be described by [76]:
Aex−c (t) ∼
with tc the initial exogenous shock.
1
,
(t − tc )1−θ
(2.6)
2. Background
20
• Endogenous critical. If in addition to being “ripe”, the burst of activity is not the
result of an exogenous event, but is instead fueled by endogenous (word-of-mouth)
growth, the bare response is renormalized giving the following time-dependence for
the view count before and after the peak of activity:
Aen−c (t) ∼
1
.
|t − tc |1−2θ
(2.7)
with tc the critical time at which the epidemic peaks.
• Endogenous sub-critical. Here the response is largely driven by fluctuations, and
no clean bursts of activity:
Aen−sc (t) ∼ η(t) .
(2.8)
where η(t) is a noise process.
If these results describe pretty well social epidemics, Crane et Sornette recall that less than
10% of videos display these patterns. Most of videos are driven by stochastic fluctuations,
which are assumed to be noise. It might also be a less clean signal, or reflect more
complicated evolving social networks [81]. In some cases, the nature of the task might
also change the structure of the response. While social epidemics seem to spread according
to these dynamics, buying books or viewing videos recommended by your friend(s) do not
require much time and long-term commitment. These actions are also unique since people
usually don’t buy (resp. watch) the same book (resp. movie) twice. Indeed, it would
be interesting to find similar patterns for dynamics involving a social network of software
developers.
2. Background
Fig. 2.5: A schematic view of the four categories of collective dynamics: Endogenous-subcritical
(Upper Left), Endogenous-critical (Upper Right), Exogenous-subcritical (Lower Left), and
Exogenous-critical (Lower Right). The theory predicts the exponent of the power law describing
the response function conditioned on the class of the disturbance (exogenous/endogenous) and the
susceptibility of the network (critical/subcritical). Also shown schematically in the pie chart is
the fraction of views contained in the main peak relative to the total number of views for each
category. This is used as a simple basis for sorting the time series into three distinct groups for
further analysis of the exponents (reproduced from Crane and Sornette, PNAS (2008) [78]).
21
2. Background
2.4
22
Internet Security & Cyber Risk
The security has been an issue almost since the inception of the Internet (See box below
for an overview). However, implementing security at the Internet Protocol level would have
violated the “end-to-end” argument because security is not required by all applications.
Thus, security should be only implemented at the application layer. Thus, security is
heterogenous over the Internet, according to perceived needs for each application. In
addition, security is conditioned by the reliability of source code and the way people behave
with sofware.
Milestones of Internet Threats
Almost since its inception and public release, the security of the Internet has been
found to be insufficient, and scary scenarii have been discussed up to simply
considering complete collapse of the system. The track record is not reassuring.
Early 1971: first virus (Creeper on Arpanet), 1988: Morris Worm infects 10
percent fo the Internet (60'000 machines at this time), 2000: ``ILOVEYOU''
Worm causes 5.5 billion dollars damage in 24 hours and over 50 million computers infected, 2007: Estonian Governmental information systems are subjected to
an attack from obscur Russian activists, 2005 and 2007: massive power outages
in Brazil found to be the result of a cyber attacks, June 2010: a worm (Stuxnet)
is found to be designed to attack the control command (SCADA) of industrial
systems and in particular the first Iranian nuclear plant that has been in operation
since a couple of months.
From an evolutionary perspective, Internet (in)security4 is an interesting view angle to
observe the capacity of adaptation of Internet components – mainly software and users –
in presence of threats, and the evolution of malicious attacks to adapting components.
At the aggregate level, few research has been conducted on the real effects of Internet
(in)security. The most significant report is based on surveys by the Computer Security
Institute (CSI) and the Federal Bureau of Investigation (FBI) on a yearly basis [82]. Major
computer security and antivirus companies report quarterly and yearly on cybercrime. In
[83], Anderson reported on the costs for people who suffered identity thefts. In 2006,
the reinsurance company Swiss Re reported that, among all possible risks, the largest
corporations consider computer-based risk as the highest priority risk in all major countries
by level of concern and as second in priority as an emerging risk [84, 85]. However, crosschecking these figures and putting them in perspective with insecurity mechanisms remains
impossible. For that, economics of information security has recognized the importance of
4
Stefan Frei first coined the term Internet (in)security to stress that the default state of the Internet is
insecure rather than secure.
2. Background
23
incentives respectively to protect infrastructures and to attack Internet systems. Efforts
have been undertaken to understand at which conditions information systems might be
subjected to misuse [86]. Based on that, some theoretical risk scenarii have been developed
for risk management and insurance purposes [87, 88, 89, 90, 91]. However, these models
generally fail calibration to empirical figures.
Fig. 2.6: Lifecycle of a vulnerability defined by distinctive events: (i) creation, (ii) discovery,
(iii) exploit, (iv) disclosure, (v) patch release (by the software editor), (vi) patch installation
(by the user). This sequence creates three risk “time windows”: (i) after discovery and before
disclosure, (ii) after disclosure and before patch release and (iii) after patch release and before
patch installation. The exact sequence of events varies between vulnerabilities (reproduced from
Frei et al. 2009 [94]).
The study of vulnerabilities as a driver for insecurity has also deserved much attention. Over
the years, the number of vulnerabilities found in software has exploded, and it is established
that they are an important driver for attacks, because they open sudden insecurity gaps
that usually need some time to be filled and create opportunities for attacks [92] . As a
result, a black market for vulnerabilities and exploits has appeared [93, 94, 95]. But, due to
the lack of available empirical data, it remains unclear how this black market really works.
Interestingly, analyzing the legacy of vulnerabilities, Frei et al. showed some evidence that
they are abundant, which means that security breaches have few chances to decrease in
the future, unless economic incentives to perpetrate attacks are drastically reduced [96].
This provides a darker image of the situation than the one – already pessimistic – provided
by former U.S. National Security Agency (NSA), Brian Snow who claimed the security
would be achieved by drastically reducing the number of vulnerabilities [97], which appears
somehow unrealistic today, considering the explosion of source code production.
Figure 2.6 shows the typical lifecycle of a vulnerability with associated risks. In [92] and [98],
Frei et al. showed that the three steps reducing risk – disclosure, patch release and patch
installation – are fulfilled according to a fat-tailed distribution of waiting times, which means
that if each step is usually completed relatively fast, waiting times can be arbitrarily long,
thus creating huge opportunity for malicious exploitation. It can be speculated that the
2. Background
24
origin of these contingencies can be manifold: technical [59], human, or a mix of both factors
[100]. However, it is now clear that vulnerabilities – being developed, traded, used by black
hats and corrected by software editors – are at the cornerstone of Internet (in)security. As a
consequence, there is a clearly established link between insecurity and software development
and use. It confirms the code is law statement, and understanding mechanisms of threat
of the Internet requires to adopt a transversal view that integrates economics (incentives
respectively to attack and to raise protection barriers) as well as technical and human
contingencies.
2.5
Monitoring Internet Evolution
Although code is law on the Internet, it doesn’t predict what iniatives people may take
within the degrees of freedom offered by the software corpus. The only way to measure the
evolution of Internet activity in an exhaustive manner is to sample traffic at the Internet
Protocol (IP) portability layer, because all upper layers can thoroughly be captured (see
Figure 1.2). Traffic analysis has started in the late nineties and offers much potential for
Internet measurement, but it’s still in its infancy.
2.5.1
Self-Similarity of Traffic Time Series
The behavior of Internet traffic has been analyzed since the early days of the Internet,
supported by packet traffic tools such as tcpdump [101]. Before 1995, the usual assumption in
traffic characterization was that packet arrival and size5 distributions have a Poisson nature,
i.e. the probability that a certain number of packets arrives in fixed non-overlapping time
intervals follows a Poisson distribution. This corresponds to a memoryless random process
with exponentially decaying autocorrelation. However, Internet traffic exhibits different
behaviors. Actually, it has long memory [102] and self-similar structure [103]. Figure 2.7
shows three time series of the same traffic, where the second (resp. third) plot is a zoom-in
of a portion of the first (resp. second) by one order of magnitude (x10). The three plots
look very similar – with bursty behavior –, while in the case of memoryless processes, larger
resolution plots would smooth-out the fluctuations.
There are many possible reasons for self-similar traffic. The first refers to the method of
Mandelbrot [105]. In very simple terms, long range dependence can be obtained by the
construction of a renewal process (i.e. intertime arrival of events is Poisson), with heavytailed distribution of file sizes transferred [106]. A different scenario for self-similarity has
been proposed by invoking the presence of a phase transition. In this case, the Internet is
modeled as a network of hosts that generate traffic (computers) or forward packets (routers).
By applying the rules of Internet routing (mainly hop-by-hop transmission, best effort and
5
Internet packet (or datagram) : basic transfer unit of data over the Internet. It consists of a header
with routing information, and a data payload, which contains a “slice” of information.
2. Background
25
Fig. 2.7: Time series of TCP traffic between Digital Equipment Corporation and the rest of
the world [104]. Each plot represents the number of packets recorded on a link. Although the time
series is magnified by an order of magnitude between each plot, the three plots look similar. This
is one of the many examples of scale-free traffic on the Internet.
no discrimination between packets), one can show that when generated traffic increases
slowly, the system experiences a sudden transition from a free phase (fluid traffic) to “busy”
traffic with large delays in packet delivery [107, 108, 109, 110, 111]. However, this model
has not found experimental support so far. A third explanation is a direct consequence
of human individual and collective dynamics (see Section 2.3). In this case, traffic can no
longer be seen as a renewal process as postulated in the first explanation. Altogether, it is
reasonable to postulate that self-similarity and long range dependence are the result of the
three mechanisms.
2.5.2
Network Anomaly Detection
Beyond traffic regular structure classification [112, 113], many traffic anomalies are
constantly found on the Internet. Therefore, tools have been developed to capture them. In
particular, anomaly detection in large scale Internet networks has deserved much attention
2. Background
26
with various methods of pattern recognition [114, 115]. This research has two main
motivations. The first is indeed security and operation continuity that call for better
anticipation of large disruptive events at large scales. The second is more conceptual: traffic
flowing on the Internet has evolved over time with the apparition of new applications.
For instance, the development of video streaming has completely changed the nature of
the traffic, as well as the amount of data exchanged. The nature of the traffic is also
heterogeneous according to applications specifications and use by people. The consequences
of the sudden emergence and the rapid adoption (e.g. by word of mouth) of a new technology
or a new habit, changes significantly the nature of Internet traffic. For instance, development
of peer-to-peer networks had consequences on bandwidth management and forced many
Internet service providers to adapt their traffic management. In a famous talk [116], David
Clark – one of the Internet founding fathers – explained that anticipating and addressing
new kinds of needs is key for the future of the Internet. Nowadays, the Internet is no longer
only a communication system: it incorporates more and more components of our societies:
economic and social features (e.g. e-business, social networks, cyber emotions), geopolitical
dimensions (e.g. cyber conflicts, safeguard of sovereignty). Even though, the Internet seems
to be transparent from a users point of view, the infrastructure must constantly be adapted
to these new functionalities and their impact on the backbone. Change has been so radical
over the years, that some network engineers call for a “new Internet” that would be able
to thoroughly address present and future socio-economic needs [117]. For that, Internet
monitoring is important because it allows observing in-vivo evolution of the Internet by
capturing all Internet layers (by opposition to the Web, peer-to-peer networks, email and
social networks that are specific applications on the Internet).
Chapter 3
Results
The Internet is a complex adaptive system with characteristics that are the result of software
execution (i.e. code is law), and the way people use and append it. To account for Internet
evolutionary mechanisms, five studies are presented here, with focus on rationalisation of
empirical facts, extracted from data sources that allow to capture key dynamics at work,
around inter-related topics: software evolution, individual and collective human dynamics,
and cyber risks.
The following articles are part of the present dissertation and reproduced thereafter:
1. Maillart, T., Sornette, D., Spaeth, S. and von Krogh, G., Empirical tests of Zipf’s Law
Mechanism in Open Source Linux Distribution, Physical Review Letters 101, 218701
(2008).
2. Maillart, T., Sornette, D. Frei, S. Duebendorfer, T. and Saichev, A. , Quantification
of Deviations from Rationality with Heavy-tails in Human Dynamics, Physical Review
E (2011).
3. Maillart, T. and Sornette, D., Epidemics and Cross-Excitation in Open Source
Software Development, working paper (2011).
4. Maillart, T. and Sornette, D., Heavy-Tailed Distribution of Cyber Risks, Eur. Phys.
J. B 75, 357-364 (2010).
5. Tellenbach, B., Burkhart, M., Maillart, T. and Sornette, D., Beyond Shannon:
Characterizing Internet Traffic with Generalized Entropy Metrics, Lecture Notes in
Computer Science : Passive and Active Network Measurement 5448, 239-248 (2009).
27
3. Results
30
3. Results
31
3. Results
32
3. Results
34
3. Results
35
3. Results
36
3. Results
37
3. Results
38
3. Results
39
3. Results
40
3. Results
41
3. Results
42
3. Results
43
3. Results
44
3. Results
45
3. Results
46
3. Results
47
3. Results
48
3. Results
49
3. Results
50
3. Results
51
3. Results
52
3. Results
54
3. Results
55
3. Results
56
3. Results
57
3. Results
58
3. Results
59
3. Results
60
3. Results
61
3. Results
62
3. Results
63
3. Results
64
3. Results
65
3. Results
66
3. Results
67
3. Results
68
3. Results
69
3. Results
70
3. Results
71
3. Results
72
3. Results
73
3. Results
76
3. Results
77
3. Results
78
3. Results
79
3. Results
80
3. Results
81
3. Results
82
3. Results
84
3. Results
85
3. Results
86
3. Results
87
3. Results
88
3. Results
89
3. Results
90
3. Results
91
3. Results
92
Chapter 4
Discussion & Perspectives
The Internet is a complex adaptive system, in which technology and human dynamics
are strongly interdependent. This cross-nurthering – with each individual being a potential
actor of technological development – has never been so strong before, and largely contributes
to the unpredictable evolution of the Internet. To account for this complexity, most
significant stylized facts have been rationalized and backed by robust empirical validation.
Given that each article includes its own discussion, this last part will elaborate series of
links between these findings, and perspectives for the future of Internet research will be
delineated.
93
4. Discussion & Perspectives
4.1
94
Main Contributions
In this thesis, five contributions shed light on the mechanisms of Internet evolution, with
focus on software, human actions and cyber risk.
1. Self Organization and Proportional Growth in Innovation Networks
Source code is a special kind of knowledge that can also be executed as a software,
hence as a product. In the case of open source software, it forms a complex network
of components found to obey a multiplicative proportional growth stochastic process,
which is constantly fed by an entry flow of new component, and with volatility greater
than the deterministic part. These ingredients allow even biggest component to fail
and disappear as a consequence of pure randomness. These results provide a zero-order
and coarse grained mechanism for Schumpeterian “creative destruction” in networks
of knowledge reuse [118].
2. Deviation from Rationality in Human Dynamics
Using a large dataset of human response to a proposed task (browser update) and
time required to perform them, we propose an economic model of time use as nonstorable resource, and show that individuals can significantly deviate from rational
time allocation. This further confirms and validate that human dynamics exhibit long
memory processes and heavy-tailed distributions. These results have also important
implications for cyber risk.
3. Collective Dynamics in Open Source Software Development
Software development is a complex task involving often many programmers. In
particular, open source software (OSS) development is a collective action [119], with
developers committed in solving problems and achieving a common goal. It was
found that activity of developers over time is auto- and cross-excitated, with dynamics
comparable to social epidemics.
4. Heavy-Tailed Distribution of Cyber Risks
While much Internet security research has focused on technical aspects, cyber risk can
only be understood by quantifying damage. Using personal identities as a proxy for
loss, we show that the distribution of cyber risk follows a power law with exponent
less than 1. Therefore, the mean and the variance of the distribution are not defined
and more extreme events can be expected over time. In addition, maximum damage
scales with the size of targeted organizations.
5. Efficient Tool for Monitoring Large Scale Networks
Thoroughly monitoring the Internet as a large scale network, including all transmissions at various application layers, is necessary to better forecast the evolution of the
global network and therefore anticipate large changes that could harm the Internet.
For that, a comprehensive and scalable tool is proposed to automatically detect and
classify anomalies – cyber attacks in particular – occurring on large scale networks.
New anomalous signatures shall reflect the evolution of the Internet and security
landscapes.
4. Discussion & Perspectives
95
The evolution of the Internet is timed by individual and collective human actions, like
writing (resp. reusing) software source code, correcting or exploiting vulnerabilities,
improving source code, updating software. These actions create innovation, trigger
weaknesses, but also the means to cope with them and thus make the Internet more robust,
until components get obsolete and are replaced by new ones. It is tempting to compare
code changes to mutations. Both in biology and technologies, there are many examples
of innovation steps in which an organism (resp. technology) suffering from a change in
its environment starts to adapt by increasing the rate of mutations in order to sustain its
homeostasis1 . It is well known that organisms accelerate their mutation rate in reaction
to changes of environmental conditions such as a sun radiation, radioactivity, pollution
[120, 121]. Similarly, when the new version of an operating system is released, most of
applications have to update to keep delivering the same service to keep their customers or
better service to increase market shares.
4.2
Human Contributions and Dynamics of Innovation
The above examples borrowed from biology and technology show the fragility of ecosystems,
with potential cascading effects. If change in biology is thought to be the result of random
mutations and selection process, technological changes are the result of the actions of
rationaly bounded – yet thinking – humans. Moreover, individuals tend to work together
and develop collective action in order to achieve common goals [119]. One of the fundamental
ingredients for collective action to happen is social skills, which are thought to be the
result of an evolutionary process, as proposed by Dunbar in the social brain hypothesis
[122]. Collective action and social skills must also be put in perspective with emergence of
cooperation in many systems [123]. Indeed, research marrying game theoretical cooperation
models and social networks has gained some attention recently [124, 125].
4.2.1
Multivariate Hawkes Processes
In study 3, we could establish evidence of cross-excitation along with triggering with long
memory between activity events for open source software, which is an example of collective
action. Developers have epidemic-like dynamics of contribution, which could be tested by
a mean-field solution of a self-excited conditional Poisson process. However, this model
doesn’t take into account for cross-excitation between agents. Indeed, in study 3, evidence
of cross-excitation has been found without applying this model. Therefore, it appears that
OSS developers are multi-excited rather than self-excited, with the actions of one developer
(resp. a group of developers) influencing other developers. The Multivariate Hawkes Process
generalizes (2.3) into the following form for the conditional Poisson intensity for an event
of type j among a set of m possible types [126]:
1
Homeostasis: property of a system, either open or closed, that regulates its internal environment and
tends to maintain a stable, constant condition. Multiple dynamic equilibrium adjustment and regulation
mechanisms make homeostasis possible (Wikipedia).
4. Discussion & Perspectives
λj (t|Ht ) =
λ0j (t)
96
+
m
X
k=1
Λkj
Z
(−∞,t)×R
hj (t − s) gk (x) Nk (ds × dx) ,
(4.1)
where Ht denotes the whole past history, λ0j is the rate of spontaneous (exogenous) events of
type j, sources of immigrants of type j, Λkj is the (k, j)’s element of the matrix of coupling
between the different types which quantifies the ability of a type k-event to trigger a type jevent. Specifically, the value of an element Λjk is just the average number of first-generation
events of type j triggered by an event of type k. The memory kernel hj (t − s) gives the
probability that an event of type k that occurred at time s < t will trigger an event of type
j at time t. The function hj (t − s) is nothing but the distribution of waiting times (here
between the impulse of event k which impacted the system at time s, the system taking
a certain time t − s to react with an event of type j, this time being a random variable
distributed according to the function hj (t − s). The fertility (or productivity) law gk (x) of
events of type k with mark x quantifies the total average number of first-generation events
of anyR type triggered by an event of type k. Here, the following standard notation has been
P
used (−∞,t)×R f (t, x)N (ds × dx) := k|tk <t f (ti , xi ) [127].
The matrix Λkj embodies both the topology of the network of interactions between different
types, and the coupling strength between elements. In particular, Λkj includes the
information contained on the adjacency matrix of the underlying network. Analogous to
the condition n < 1 (subcritical regime) for the stability and stationarity of the monovariate
Hawkes process, the condition for the existence and stationarity of the process defined by
(3.1) is that the spectral radius of the matrix Λkj be less than 1. Recall that the spectral
radius of a matrix is nothing but its largest eigenvalue.
The multivariate Hawkes process generalization opens large perspectives to model and
understand dynamics in social systems, influence between people (resp. groups of people).
It has the potential to cope with one major issue faced by network experts who are realizing
that the naive view of a static social network is limited and often inadequate to describe
fast evolving systems, preventing proper model validation.
4.2.2
Micro Mechanisms of Cooperation in Empirical Data
According to game theory, two or more agents can interact and play various games according
to some pay-offs functions. Among them, the prisoner’s dilemma is the most famous . In
case it is repeated several times, the game is called iterative prisoner’s dilemma. This
offers room to program agents to play various strategies together. This was the aim of the
famous tournament organized by Axelrod in the early eighties with many teams proposing
their agent endowed with a set of rules defining the strategy to play with other agents
[123]. Many successful strategies were made to be nice and to cooperate if the other agent
cooperates, and retaliate if she defects in the game. Game theoretical approaches have been
successful to explain cooperation, mainly in computer simulations and lab experiments.
However, cooperation has not been documented directly from field experiments. A major
reason is certainly that in real life, the game is completely asynchronous with great memory
4. Discussion & Perspectives
97
effects and individuals usually don’t make a judgement on their counterpart with only
one defection, but rather on an array of negative signals over time. To account for these
memory effects, it is probably necessary to relax the model formulation. In this context,
it is worth noting the conceptual proximity between strategies for playing the iterative
prisoner’s dilemma and the multi-variate conditional excitation process presented above.
Actually, the cross-excitation parameter Λkj is the sensitivity of j on k’s actions, hence the
propension of agent k to positively (resp. negatively) react to the activity j as a succession
of contributions over time, in the context of collective action.
4.2.3
Code Mutations to Measure Probability of Innovation Occurrence
As described above, if we assume that source code evolution is the result of random “try
and fail” mutations locally and at short time scales, then mutation rate (number of changes
per unit of time) is determinant to measure the resilience of an innovation network and
gives also a benchmark to quantify the rate of resource consumption for the maintenance
and adaption of all components. The more adaptation is needed, the more mutations, the
more developing resources, the more time required by a team of developers.
The possible interactions are twofolds: either (i) a certain rate of mutation of one component
imposes some stress on the rest of the network or (ii) a component reacts to this stress in
order to adapt to a new situation. Therefore, the intensity of change of one component in the
open source network and consequences on neighbouring nodes can be measured: reactions
of all others components can be investigated by following the paths of dependencies. Again,
to uncover causal dependences, it is necessary to account for human timing. In this setting,
a significant presence of loops is expected due to numerous cross dependencies existing
between components. To account for them, a systematic measure of the coupling between
the components should be made, with a special care to determine whether coupling between
component changes is dampening (negative feedback loops) or rather reinforcing (positive
feedback loops). This approach is valuable to predict in which part of the code, developers
are going to intervene in order to maintain the homeostasis of the whole system.
This view angle is closely related and complementary to measuring the pure dynamics of
activity of developers. Indeed, it incorporates a network dimension by asking the question:
“In which part of the code, developers tend to code together and at the same time?” The
self- (resp. mutual- ) excited conditional Poisson process can indeed be extended to several
dimensions, not only including memory in time but also in space. The 2-dimension selfexcited conditional Poisson process is formulated with a bivariate memory kernel φ(t−ti , r−
ri ) with r the position in the considered space. In the case of open source software, r would
be a file or a module in the source code of a project.
4. Discussion & Perspectives
4.3
98
Beyond the Zipf’s Law and Proportional Growth
Proportional growth stochastic process is a “zero-order” mechanism in the sense that it
can capture the general behavior of many self-organized complex systems, in particular
innovation networks. However, a direct consequence of the self-organization is the presence
of correlations between innovations, each of them being challenged by the rewiring of several
other components with some various consequences. In an ecosystem, the extinction of
a species from a food web forces several other species to adapt and find other ways to
sustain themselves. Obviously, innovation networks are not only correlated but also prone
to important cascading effects that can affect directly or indirectly connectivity between
components.
Therefore, gaining insight in rewiring cascades requires to establish robust statistical causal
relations between all events at various micro- and mesoscales. Additionally, a special
attention should be devoted to the context (local structure of the network) in which new
innovations appear to fill a niche. For that, three research directions are proposed according
to a “zoom-in” strategy in order to uncover proportional growth sub-mechanisms.
4.3.1
Deviations from the Zipf’s Law
While many complex systems are thought to be the result of proportional growth, some
exhibit significant deviation from the Zipf’s law, which is a special power law p(x) ∼ 1/x1+µ
with exponent µ = 1. Saichev et al. [128] showed that for this special exponent to appear,
the following balance condition must hold,
r − h = d + c0 ,
(4.2)
with r the growth rate of each component, h the hazard rate for death or dismissal, d the
growth rate of new entrants ans c0 the growth rate of sizes of newborn entities. Thus,
the proportional growth must be balanced by the entry flow (creation) and the hazard rate
(destruction). Furthermore, Malevergne et al. argue that the Zipf’s law reflects an economic
optimum, therefore a healthy and mature sustainable ecosystem [129]. However, deviations
from µ = 1 can be found for ecosystems that haven’t reached maturity yet. Recently, using
a dataset from an online collaboration platform2 and counting the number of active users,
Zhang and Sornette showed the example of a system with µ = 0.7 ± 0.1 < 1, which is
the signature of a ecosystem where the power of the few dominate, while not enough new
projects are created [130]. Therefore, deviation from the Zipf’s law can tell a lot on the
“health” of an ecosystem. In terms of Internet innovation, it resonates with the claim that
the violation “end-to-end” argument (c.f. Section 1.1) with more control by large broadband
providers that might prevent innovation by start-up firms (c.f. Section 1.2). Assuming that
we could precisely measure the size of Internet firms, looking at the distribution of their
turnover (resp. number of patents awarded), and the contribution of each size percentile
2
Amazee is a Zurich based Internet company, which offers a platform for creating and maintaining
collaboration projects.
4. Discussion & Perspectives
99
of companies to the whole ICT economy, would be useful to assert or to refute the claim.
In the former case, one would find µ < 1 with possible existence of “dragon-kings” [131]
and in the latter case the exponent would be superior or equal to 1. Therefore, the Zipf’s
law and deviation from it can tell a lot on the state of an ecosystem. This measurement
tool needs further empirical validation, but would be unvaluable to detect clusters of new
creations versus reuse in innovation networks.
4.3.2
Coexistence of Multiple Proportional Growth Regimes
At mesoscopic scales, behaviors can differ and exhibit different growth rates, with clusters
that presumably grow faster than others. The creative destruction may also be measured at
these scales. For instance, in the context of recovery of Eastern economies after communism,
Challet et al. [132] described the so-called “J-shape” of the economy, which reads
h
i
W (t) = W (t0 ) f eλ+ (t−t0 ) + (1 − f )eλ− (t−t0 ) ,
(4.3)
where W (t0 ) is the initial GDP, f is the fraction of the economy that grows at rate λ+ , while
the rest of the economy (1 − f ) deflates at rate λ− . This model rationalizes the repeating
pattern of economies in reconversion. This model can be generalized with an arbitrary
number of fractions of the economy growing (resp. deflating) at different rates,
W =
n
X
k=1
W (tk ) · fk eλk , with
X
fk = 1,
(4.4)
where k denotes all regimes occuring in all segments of the economy in the considered
period, tk is the time of change of regime for each k. However, a main difference is the
asynchrony between shocks. Indeed, in the context of a shock such as a change of economic
regime from socialism to capitalism, the growing and the declining components start at the
same time. However, in many situations it might appear that decline is not necessarily
synchronized with emergence of a new economic sector.
4. Discussion & Perspectives
4.4
100
Cyber Risk
Cyber risk is a major emerging threat against all socio-economic activities occurring on the
Internet. Nowadays, the global network has become the spinal cord of modern societies
that heavily rely on it. However, the Internet has proven to be largely exposed to malicious
attacks, which take advantage of unreliable software and somehow the naivety of people.
In spite of this, exposure to cyber risk has remained hardly measurable because of the
complexity of the task, but not only.
Over the years, the Internet has constantly exhibited serious weaknesses, which have
prevented efficient security of operations and of information against malicious attacks. In
the meantime, the Internet has become the backbone of modern economies, allowing realtime communication and transactions, social networking and ubiquitous computing, and
many other invaluable tools. Unfortunately, there are tangible indications that the Internet
will become a very significant conflict area. Presently, and arguably much more in the near
future, criminals and military powers are building on innumerable flaws in software to gain
power on all possible types of infrastructures, such as electric networks, telecommunication,
GPS, sensitive energy producing plants and military complexes. In addition, politics and
groups of interests are increasingly using the capabilities of social networks to influence
populations. The world has recently witnessed unprecedented acceleration of cyber threat
intensity. In summer 2010, a malware was found in the control command of the newly
built Iranian nuclear plant Bushehr I. This worm of a new kind is designed to compromise
SCADA (supervisory control and data acquisition) systems controlling a wide array of
critical infrastructures including water treatment and distribution, oil and gas pipeline,
electrical power transmission and distribution. In the case of Bushehr I, it was established
that this malware had the potential to send erroneous commands and provoke nothing less
than the destruction of the nuclear plant.
There are mainly two entry points for malicious operations on the Internet: (i) software
vulnerabilities which allow to take control of computer infrastructures at various levels
and (ii) the capacity offered by the World Wide Web to spread information. The latter is
concretely equivalent to reputation risk, regardless whether it is true or defaming, laudatory
or critical. All these operations are indeed closely tied to the evolution of the Internet, but
more precisely to mechanisms of software update by developers and users. Because of their
evolutionary capabilities, monitoring cyber risks remains extremely difficult. Moreover,
attacks can occur at any layer of the Internet. Therefore, anomaly detection must be
implemented at the Internet Protocol layer in order to have a chance to capture all behaviors
at higher levels.
4.4.1
The Innovative Nature of Cyber Risk
In a recent paper on Information War and Security Policies for Switzerland , it was reported
that cyber wars would prosper on innovation [133]. In substance, threats are also in a
4. Discussion & Perspectives
101
creative destruction process: some attacks that used to be relevant yesterday have been
replaced by new ones, according to new functionalities (resp. new vulnerabilities) offered by
innovation. Therefore, it should be anticipated that cyber risks landscapes will change quite
fast. However, the link between damage suffered and vulnerability is not really established
yet. This gap actually prevents the development of actuarial models and as a result of
insurance.
Moreover, the economic and innovative dimensions play a fundamental role for Internet
(in)security. We argue than Internet security requires a shift of paradigm from a pure and somehow naive - technical point of view (“design a secure system”), toward a broad
view that would encompass contribution of humans – and their natural economic incentives
– in the development of threats. In that sense, study 2 demonstrates the importance of
user behaviors. Also, studies 1 and 3 are clearly related to how source code maintenance
constrains software reliability.
Fig. 4.1: Schematic representation of a cyber risk landscape. Right to left: (i) New
vulnerabilities happen randomly and concern a given amount of computers; over time
the cumulative number of hosts hit by a vulnerability increases. (ii) To counterbalance
vulnerabilities patches are released; however, having all users install the patch is long
memory process as shown in study 2. (iii) The net risk landscape is spiky but nevertheless,
the number of vulnerable hosts increases over time again as a result of the long memory
process.
4.4.2
Economics of Cyber Crime
To better forecast cyber crime and predict on which targets attacks might focus, using
the economic framework is insightful. Assuming that cyber criminals are rational agents
and the market cannot be completely addressed, they would invest in “low hanging fruits”
rather than in attacks with marginal return. Therefore, one can expect that cybercrime
community, at the aggregate level, measures risk versus potential return when engaging in
any action, i.e. perform a cost-benefit analysis. Unfortunately, vulnerability landscapes
evolve very fast and the risk at the same time. Indeed, study 2 established the long memory
process of updating. Figure 3.1 shows a representation of such landscape, with spikes of
vulnerabilities. In this context, a natural question arises and calls for further research: how
4. Discussion & Perspectives
102
do cyber criminals exploits this vulnerability spikes in order to maximize compromission of
computers?
Rank Ordering (unnormalized CCDF)
Figure 3.2 shows the complementary cumulative distribution function (ccdf) of vulnerabilities per software editor per year presented in double logarithmic scale. Obviously, the ccdf
is heavy-tailed, with a few companies (e.g. Microsoft, Oracle) accounting for the majority of
vulnerabilities discovered. More precisely, the ccdf is a power law, with exponents between
1.0 and 1.25. There are two possible explanations for this distribution. On the one hand,
larger software might exhibit more vulnerabilities because costs of maintenance increase
with software complexity. On the other hand, the number of vulnerabilities may just
reflect market shares of each software editor, which is very probably also heavy-tailed since
the Zipf’s law applies to the size distribution of firms, or market shares. It can also be
a combination of both effects, but the latter is more plausible, since vulnerabilities are
abundant in code [96]. Again here, deviation from the Zipf’s law with exponent equal to 1
tells us about the health of the vulnerability ecosystem [129]. Here, the exponent is equal
to 1 around year 2000 and slowly increases over time, which shows that more software is
concerned by less vulnerabilities.
slope ≈ -1.25
slope ≈ -1.0
Vulnerabilities per software editor [vulns/year]
Fig. 4.2: Distribution (rank ordering) of vulnerabilities per software editor per year. The
graph displays a straight line in double logarithmic scale, which is the signature of a power
law. The slope slowly decreasing from −1.0 ± 0.1 to −1.25 ± 0.1 over time (with Stefan
Frei).
4. Discussion & Perspectives
4.4.3
103
Vulnerabilities versus Damage
In study 4, we showed that damage scales with the size of organisations and some hypothesis
have been proposed to explain why large organisations are most exposed. However, the
causal link between vulnerable software and real damage (e.g. financial loss, damage to
reputation) has not been formally established. Qualitatively, this link can be established
in many but not all cases. Among the best examples are botnets3 : about 9% of personal
computers (PC) in operation and connected to the Internet, are infected by malware and
one third of these computers seem to be part of a botnet [134]. But real damage incurred
by the owners of compromised PC remains unclear so far.
3
botnet : network of computers –often compromised by malicious software – that run autonomously to
perform one or several distributed tasks. Botnets are mainly used to send spam, conduct distributed of
denial service (DDoS) attacks, crack passwords. They are also used for stealing identities on compromised
hosts.
Concluding Remarks
The Internet is probably the most complex man-made adaptive system. It is the
result of innumerous contributions by individuals and organisations, with extraordinary
heterogeneity and self-organization, with cooperation and intense competition. Because no
central government had – or did take – the power to decide what is good for everyone on the
Internet, motivated contributors could freely invent and compete to make the best software.
For that, the community of hackers and their thirst for exploring new horizons has led to
two major consequences for the Internet: on the one hand, it has triggered unprecedented
bottom-up code production, but also their capacity to “hack” into supposedly secured
systems has early on been a warning for possible cyber attacks that we will experience
more in the future, as the Internet becomes ubiquitous and critical to our societies. If no
major disruption occurs, the Internet will evolve with more software produced along with
insecurity. The key issue will be to find an acceptable balance between innovation and
security. On the one hand, the concern is certainly not having too much innovation, but
rather habing too much security. Western societies – at least their politicians in their speech
– are obsessed with security, and while the Internet is a place of relative freedom and free
speech for people, technology could certainly be used to prevent or at least to reduce free
expression and at the same time innovation, which is the result of intensive knowledge reuse
and ad-hoc social networks. Not only China has raised the level of control on the Internet
in the past decade, many Western countries also did. On the other hand, the dramatic lack
of security is a door open to frightening cyber conflicts of a new kind and even challenge of
dominant states by a handful of individuals.
Therefore, understanding mechanisms of Internet evolution is crucial to sustain the best
conditions for enjoying future innovation, and allow people to live in peace with no
105
Concluding Remarks
106
limitation on the advantages gained from an open Internet. In this thesis, five mechanisms
have been explored: (i) knowledge reuse, which is ubiquitous on the Internet, (ii) long
memory processes in human dynamics, (iii) collective action as a triggering process, (iv)
the catastrophical nature of cyber risk and its implications, and (v) how to detect and
characterize Internet anomalies. Although not everything has been understood yet, these
results provide testable and tested mechanisms at work. Furthermore, they can all rather
well be rationalized from an economic perspective. On the one hand, they emphasize the
importance of human contribution and work organisation based on technical mutualization
(knowledge reuse) and human cooperation (collective action in open source software) to the
evolution of the Internet. On the other hand, enormous technical and human contingencies
make the Internet unperfect, slightly insecure, forcing small and big Internet components
to face their weaknesses and subsequent danger, and therefore, adapt themselves in order
to survive.
Bibliography
Introduction
[1] Internet Systems Consortium, http://www.isc.org/solutions/survey.
[2] Internet World Stats, http://www.internetworldstats.com/stats.htm.
[3] Black Duck Koders, http://www.koders.com/.
[4] Saltzer, J., Reed, D. and Clark, D. (1984), End-To-End Arguments in System Design,
ACM Transactions on Computer Systems, 2, 277-288.
[5] van Schewick, B. (2010) The Architecture of Innovation, The MIT Press.
[6] Google App Inventor, http://appinventor.googlelabs.com/about/
[7] Zittrain, J. (2008), The Future of the Internet–And How to Stop It?, Yale University
Press.
[8] Levy, S. (2001), Hackers: Heroes of the Computer Revolution, Penguin Books, London.
[9] Schneier, B. (2001), Full Disclosure, Crypto-Gram Newsletter, (http://www.schneier.
com/crypto-gram-0111.html).
[10] http://www.ohlo.net.
[11] Antorini, Y. M. (2007), Brand community innovation - An intrinsic case study of
the adult fans of LEGO community, Ph.D. Thesis Copenhagen, Copenhagen Business
School.
[12] Spaeth, S., Stuermer, M., and von Krogh, G. F. (2010), Enabling Knowledge Creation
through Outsiders: Towards a Push Model of Open Innovation, International Journal
of Technology Management 52(3/4), 411-431.
[13] Lessig, L., (2006), Code: And Other Laws of Cyberspace, Version 2.0., Basic Books.
[14] Reidenberg, J. (1998), Lex Informatica: The Formulation of Information Policy Rules
Through Technology, Texas Law Review 76, 553.
107
Bibliography
108
Background
[15] Murray, M., Claffy, K.C. (2001), Measuring the Immeasurable: Global Internet
Measurement Infrastructure, Workshop on Passive and Active Measurements.
[16] Zegura, E.W., Calvert, K.L., Donahoo, M.J. (1997), A Quantitative Comparison of
Graph-Based Models for Internet Topology, IEEE ACM Transactions on Networking 5,
6.
[17] Pastor-Satorras, R., Vespignani, A. (2004), Evolution and Structure of the Internet,
Cambridge University Press.
[18] Watts, D.J. and Strogatz, S.H. (1998), Collective Dynamics of “small-world” networks,
Nature 393, 440-442.
[19] Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999), On power-law relationship of the
Internet topology, Comput. Commun. Rev.29, 251-263.
[20] Erd¨ös, P. and Renyi (1959), On random graphs, Publicationes Mathematicae 6, 290297.
[21] Barabási, A.-L., and Albert, R. (1999), Emergence of scaling in random networks,
Science 286, 509-512.
[22] Simon, H. A. (1955), On a class of skew distribution functions, Biometrika 52, 425-440.
[23] Bak, P., Sneppen, K. (1993), Punctuated Equilibrium and Criticality in a Simple Model
of Evolution, Phys. Rev. Letters 71, 24.
[24] Carlson, J.M. and Doyle, J. (2000), Highly Optimized Tolerance: Robustness and
Design in Complex Systems, Physical Review Letters 84, 2529-2532.
[25] Sornette, D. (2004), Critical Phenomena in Natural Sciences 2nd ed., Springer Series
in Synergetics, Heidelberg.
[26] Albert, R., Jeong, H. and Barabási, A.-L. (2000), Error and attack tolerance of complex
networks, Nature 406, 378-381.
[27] Broido, A. and Claffy, K. (2002), Topological resilience in IP and AS graphs, http:
//caida.org/analysis/topology/resilience/index.xml.
[28] Doyle, J. et al. (2005), The “robust yet fragile” nature of the Internet, Proceedings of
the National Academy of Sciences41, 14497-14502.
[29] Stauffer, D. and Aharony, A. (1994), Introduction to Percolation Theory 2nd edn,
Taylor & Francis, London.
[30] Albert R., Jeong, H., Barábasi, A.-L. (1999), Internet - Diameter of the World-Wide
Web. Nature401, 130-131.
Bibliography
109
[31] Huberman, B.A., Adamic, L.A. (1999), Growth dynamics of the World-Wide Web,
Nature 401, 130-131.
[32] Bianconi, G. and Barábasi, A.-L. (2001), Competition and Multiscaling in Evolving
Networks, Europhys. Lett. 54, 436-442.
[33] Page, L. and Brin, S. and Motwani, R. and Winograd, T. (1999) The PageRank
Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[34] Huberman, B. A., Pirolli, P., Pitkow, J. & Lukose, R. M. (1998), Strong regularities in
world wide web surfing, Science 280, 95- 97.
[35] Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W. (2009) Community Structure
in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined
Clusters, Internet Mathematics 6, 29-123.
[36] Newman M.E.J., Forrest, S. and Balthrop, J. (2002), Email networks and the spread
of computer viruses, Phys. Rev. E. 66, 035101.
[37] Zou, C.C. and Towsley, D. and Gong, W. (2005), Email worm modeling and defense,
Proc. 13th IEEE Conf. on Computer Communications and Networks, 2004, 409–414.
[38] Guimerá, R., Danon, L., Diaz-Guilera, A., Girault, F. and Arenas, A. (2003), Selfsimilar community structure in a network of human interactions, Phys. Rev. E 68, 6.
[39] Eckmann, J.-P., E. Moses and D. Sergi (2010) Entropy of dialogues creates coherent
structures in e-mail traffic, Proc. Nat. Acad. Sci. USA 101(40), 14333-14337.
[40] Asur, S., Huberman, B. (2010), Predicting the Future With Social Media, http://
arxiv.org/1003.5699.
[41] Saroui, S., Gummadi, P.K., and Gribble, S.D. (2002), A Measurement Study of Peerto-Peer File Sharing Systems, Proc. of Multimedia Computing and Networking.
[42] Ripeanu, M., Foster, L. and Iamnitchi, A. (2002), Mapping the Gnutella Network:
Properties of Large-scale Peer-to-Peer Systems and Implications for System Design,
IEEE Internet Computing Journal 6, 50-57.
[43] Smith, R.D. (2002), Instant Messaging as a Scale-Free Network, http://arxiv.org/
abs/condmat/0206378.
[44] Leskovec, J., Horwitz, E (2008), Planetary-Scale Views on a Large Instant-Messaging
Network, WWW’08.
[45] Castronova, E. (2005), Synthetic Worlds: The Business and Culture of Online Games,
Univ of Chicago Press, Chicago.
[46] Castronova, E. (2006) On the research value of large games, Games Cult 1,163–186.
[47] Bainbridge, W.S. (2007), The scientific research potential of virtual worlds, Science
317,472–476.
Bibliography
110
[48] Szell, M. and Thurner S. (2010), Measuring social dynamics in a massive multiplayer
online game, Soc. Netw..
[49] Szell, M., Lambiotte, R. and Thurner, S. (2010), Multirelational organization of largescale social networks in an online world, Proc. Nat. Acad. Sci. USA 107, 13636-13641.
[50] Newman, M.E.J. (2003), The Structure and Function of Complex Networks, SIAM
Review 45(2), 167-256.
[51] Crowston, K., Howison, J. (2005), The social structure of open source software
development, First Monday.
[52] Spaeth, S. (2005), Coordination in Open Source Projects, HSG Dissertation no. 3110,
University of Saint Gallen.
[53] Lerner, J., Tirole J. (2001), The open source movement: Key research questions,
European Economic Review 45, 819-826.
[54] Simon, H. The Architecture of Complexity, Proc. Am. Phil. Society 106, 467-482,
(1962)
[55] Hartwell, L.H. et al., From molecular to modular cell biology, Nature, 402, (1999)
[56] Baldwin, C.Y., Clark, K.B., (1997), Managing in an age of modularity. Harvard
Business Review 75(5), 84–93.
[57] Baldwin, C.Y., Clark, K.B., (1999), Design Rules: The Power of Modularity (Volume
1), MIT Press, Cambridge.
[58] MacCormack, A. and Rusnak, J. and Baldwin, C.Y., (2006) Exploring the structure
of complex software designs: An empirical study of open source and proprietary code,
Management Science 52(7), 1015.
[59] Challet, D. and A. Lombardoni (2004), Bug propagation and debugging in asymmetric
software structures Phys. Rev. E 70, 046109.
[60] Majchrzak, A. et al., (2004) Knowledge reuse for innovation, Management Science 50,
174-188.
[61] Grant, R. (1996), Prospering in Dynamically-competitive Environments: Organizational Capability as Knowledge Integration, Organization Science 7(4).
[62] Haefliger, S., von Krogh, G., Spaeth, S. (2008), Code Reuse in Open Source Software,
Management Science 54(1), 180-193.
[63] Tracz, W. (1995), Confession of a Used-Program Salesman: Lessons Learned, ACM
SIGSOFT Software Engineering Notes 20, 11-13.
[64] Lynex, A. and Layzell, P. (1998), Organisational considerations for software reuse,
Annals of Software Engineering 5, 105-124.
Bibliography
111
[65] von Krogh, G., Speath, S. and Haefliger, S. (2005), Knowledge reuse in Open Source
Software: An Exploratory Study of 15 Open Source Projects, Proc. 38th Hawaii Int.
Conf. System Sciences (HICSS’05).
[66] Spaeth, S., Stuermer, M., Haefliger, S. and von Krogh, G. (2007), Sampling in Open
Source Software Development: The Case for Using the Debian GNU/Linux Distribution,
Proceedings of the 40th Annual Hawaii International Conference on System Science
(HICSS’O7).
[67] Baldwin, C.Y., Clark, K.B. (2007), The Architecture of Participation: Does
Code Architecture Mitigate Free Riding in the Open Source Development Model?,
Management Science 52, 1116-1127.
[68] Vazquez, A., Oliveira, J. G. ,Dezso, Z. , Goh, K. I., Kondor, I. and Barabási, A.-L.
(2006) Modeling bursts and heavy tails in human dynamics, Phys. Rev. E 73, 036127.
[69] Oliveira, J.G. and Barabási, A.-L. (2005) Darwin and Einstein correspondence
patterns, Nature 437, 1251.
[70] Cobham, A., (1954), Priority assignment in waiting line problems, Operations Research
2(1), 70-76.
[71] Barabási, A.-L. (2005), The origin of bursts and heavy tails in human dynamics, Nature,
435, 207.
[72] Grinstein, G. and Linsker, R. (2006), Biased diffusion and universality in model queues.
Phys. Rev. Lett. 97, 130201.
[73] Grinstein, G. and Linsker, R. (2008), Power-Law and Exponential Tails in a Stochastic,
Priority-Based, Model Queue, Phys. Rev. E 77, 012101.
[74] Saichev, A. and Sornette, D., (2009) Effects of Diversity and Procrastination in Priority
Queuing Theory: the Different Power Law Regimes, Phys. Rev. E 81, 016108.
[75] Helmstetter, A., Sornette, D (2002), Subcritical and supercritical regimes in epidemic
models earthquake aftershocks, J. Geophys. Res. 107 (B10), 2237,
[76] Sornette, D., Helmstetter, A. (2003), Endogenous versus exogenous shocks in systems
with memory, Physica A 318, 577-591.
[77] Sornette, D., Deschatres, F., Gilbert, T. and Ageon, Y. (2004), Endogenous Versus
Exogenous Shocks in Complex Networks: an Empirical Test Using Book Sale Ranking,
Phys. Rev. Lett. 93, 228701.
[78] Crane, R. and Sornette, D. (2008), Robust dynamic classes revealed by measuring the
response function of a social system, Proc. Nat. Acad. Sci. USA 105(41), 15649-15653.
[79] Crane, R., Schweitzer, F. and Sornette, D., (2010), New Power Law Signature of Media
Exposure in Human Response Waiting Time Distributions, Phys. Rev. E 80, 056101.
Bibliography
112
[80] Hawkes, A.G., Oakes, D. (1974) A cluster representation of a self-exciting process. J.
Appl. Prob 11, 493-503.
[81] Zhenyuan Zhao, J. P. Calderón, Chen Xu, Guannan Zhao, Dan Fenn, Didier Sornette,
Riley Crane, Pak Ming Hui, and Neil F. Johnson (2010), Phys. Rev. E 81, 056107.
[82] CSI/FBI Computer Crime and Security Survey, http://gocsi.com/survey. (yearly
edition).
[83] Anderson, K., Durbin, E. and Salinger, M. (2008), Identity Theft Journal of Economic
Perspectives 2,171-192.
[84] Swiss Re (2006), Swiss Re Corporate Survey 2006 Report.
[85] Swiss Re (2007), Natural catastrophes and man-made disasters 2006, Sigma Report
No 2/2007.
[86] Anderson, R. and Moore, T. (2006), The Economics of Information Security, Science
314, 610-613.
[87] Majuca, R.P., Yurcik, W., Kesan, J. (2006), The Evolution of Cyberinsurance, http:
//arxiv.org/abs/cs.CR/0601020.
[88] Böhme, R., Kataria, G. (2006), Models and Measures for Correlation In CyberInsurance, 5th Workshop on the Economics of Information.
[89] Mukhopadhyay, A. et al. (2006), e-Risk Management with Insurance : A framework
using Copula aided Bayesian Belief Networks, Proc. HICSS’2006.
[90] Mukhopadhyay, A. et al. (2007), Insuring big losses due to security breaches through
Insurance: A business model Proc. HICSS’2007.
[91] Herath, H., Herath,T. (2007), Cyber-Insurance: Copula Pricing Framework and
Implications for Risk Management, 6th Workshop on the Economics of Information.
[92] Frei, S., May, M., Fiedler, U., Plattner, B. (2006), Large Scale Vulnerability
Analysis,ACM SIGCOMM 2006 Workshop.
[93] Radianti, J., Gonzalez, J. (2007), A Preliminary Model of the Vulnerability Black
Market, 25th Int. System Dynamics Conference, Boston.
[94] Frei, S., Schatzmann, D.,Plattner, B., Trammel, B. (2009) Modelling the Security
Ecosystem - The Dynamics of (In)Security, Workshop on the Economics of Information
Security (WEIS).
[95] Frei, S., (2009)Security Econometrics - The Dynamics of (In)Security, DISS. ETH. NO.
18197.
[96] Frei, S. et al (2010), Software Vulnerabilities are Abundant, submitted.
[97] Snow. B., (2005) We need assurance!. Computer Security Applications Conference.
Bibliography
113
[98] Frei S, Duebendorfer T, Plattner B. (2009), Firefox (In)Security Update Dynamics
Exposed, ACM SIGCOMM Computer Communication Review.
[99] Challet, D. , Solomon, S., and Yaari, G. (2009). The Universal Shape of Economic
Recession and Recovery after a Shock. Economics: The Open-Access, Open-Assessment
E-Journal 3, 36.
[100] Schneier, B. (2008), The Psychology of Security, Proceedings of the Cryptology in
Africa.
[101] Jacobson, V., Leres, C., McCanne, S. (1989), tcpdump, Lawrence Berkeley Laboratory,
Berkeley, CA.
[102] Paxson, V. and Floyd, S. (1995), Wide Area Traffix: The Failure of Poisson Modeling,
IEEE/ACM Transaction on Networking 3, 601-615.
[103] Willinger, W., Paxson, V., Taqqu, M.S. (1998), Self-similarity and Heavy Tails:
Structural Modeling of Network Traffic, Statistical Techniques and Applications.
[104] Internet Traffic Archive http://ita.ee.lbl.gov/html/traces.html (1995).
[105] Mandelbrot, B.B. (1969), Long run linearity, locally Gaussian Processes, H-spectra
and infinite variances, Intern. Econom. Rev. 10, 82-113.
[106] Park, K., Kim, G. and Crovella, M. (1996), On the relationship between file
sizes, transport protocols, and self-similar network traffic, Proc. IEEE International
Conference on Network Protocols, 171-180.
[107] Ohira, T. and Sawatari, R. (1998), Phase Transition in a Computer Network Traffic
Model, Phys. Rev. E 58, 193-195.
[108] Takayasu, M., Fukuda, K., and Takayasu, H. (1999), Application of Statistical Physics
to the Internet Traffics, Physica A 274, 140-148.
[109] Fukuda, K., Takayasu, H., Takayasu, M. (2000), Origin of Critical Behavior in
Ethernet Traffic, Physica A 287, 289-301.
[110] Solé, R.V. and Valverde, S. (2001), Information Transfer and Phase Transitions in a
Model of Internet Traffic, Physica A 289, 595-605.
[111] Valverde, S. and Solé, R.V.(2002), Self-organized Critical Traffic in Parrallel
Computer Networks, Physica A 312, 636-648.
[112] Thompson, K., Miller, G.J., Wilder, R. (1997), Wide-Area Internet Traffic Patterns
and Characteristics, IEEE Network.
[113] Moore, A.W., Zuev, D. (2005), Internet Traffic Classification using Bayesian Analysis
Techniques, Proc. ACM SIGMETRICS’05.
[114] Barford, P., Kline, J., Plonka, D., Ron, A. (2002), A Signal Analysis of Network
Traffic Anomalies, Proc. 2nd ACM SIGCOMM Workshop on Internet Measurement.
Bibliography
114
[115] Lakhina, A., Crovella, M., Diot, C. (2004) Diagnosing Network-Wide Traffic
Anomalies, ACM SIGCOMM Computer Communication Review, 34.
[116] Clark, D. (2008) The Internets we did not build, Talk at IPAM (UCLA), available at
http://www.ischool.berkeley.edu/newsandevents/events/sl20090304.
[117] Clean Slate, Stanford University,
cleanslate.php.
http://cleanslate.stanford.edu/about_
Discussion & Perspectives
[118] Schumpeter, J. (1934), Theory of Economic Development, Harvard University Press,
Cambridge.
[119] Ostrom, E. (2007) Collective action and local development processes, Sociologica,
doi:10.2383/25950.
[120] Rosenberg, S.M. (2001), Evolving responsively : adaptive mutation, Nature Reviews
Genetics 2, 504-515.
[121] Galhardo, R.S., Hastings, P.J., Rosenberg, S.M. (2007), Mutation as a stress response
and the regulation of evolvability, Crit. Rev. Biochem. Mol. Biol. 42(5), 399-435.
[122] Dunbar, R. I. M., (1998), The social brain hypothesis. Evol. Anthropol. 6, 178–190.
[123] Axelrod, R. (1984),The evolution of cooperation, Basic Books.
[124] Ohtsuki, H., Hauert, C., Lieberman, E., Nowak, M.A. (2006), A simple rule for the
evolution of cooperation on graphs and social networks, Nature 441, 502-505.
[125] Hanaki, N., Peterhansl, A., Dodds, P.S. and Watts, D.J. (2007), Cooperation in
evolving social networks, Management Science 53,1036-1050.
[126] Liniger, T.J. (2009), Multivariate Hawkes Processes, PhD Diss. ETH NO. 18403, ETH
Zurich.
[127] Saichev, A. and Sornette, D. (2011), Multivariate Self-Excited Epidemic Processes,
working paper.
[128] Saichev, A., Malevergne, Y., and Sornette, D., (2009), Theory of Zipf ’s Law and
Beyond, Lecture Notes in Economics and Mathematical Systems, 632 Springer.
[129] Malevergne, Y., Saichev, A. & Sornette, D., Maximum sustainable growth diagnosed
by Zipf’s law submitted to American Economic Review (http://ssrn.com/abstract=
1083962).
[130] Zhang, Q. and Sornette, D. (2010) Predicted and Verified Deviation from Zipf’s Law
in Growing Social Networks, submitted http://arxiv.org/abs/1007.2650.
Bibliography
115
[131] Sornette, D. (2009) Dragon-Kings, Black Swans and the Prediction of Crises,
International Journal of Terraspace Science and Engineering 1 (3), 1-17.
[132] Challet, D. , Solomon, S., and Yaari, G. (2009). The Universal Shape of Economic
Recession and Recovery after a Shock. Economics: The Open-Access, Open-Assessment
E-Journal 3 36.
[133] Vernez, G. (2009), Information Warfare and National Security Policy in Switzerland,
MAS ETH Security Policy and Crisis Management.
[134] BBC article on Vinton Cerf’s WEF talk, http://news.bbc.co.uk/2/hi/business/
6298641.stm (06.01.2009).
Curriculum Vitae
118
Curriculum Vitae
119
Curriculum Vitae
120