Sensibility analysis of BGP convergence and scalability using

Sensibility analysis of BGP convergence
and scalability using network simulation
Systementwicklungsprojekt (SEP)
Institut für Informatik
Technische Universität München
85748 Garching bei München
Aufgabenstellerin: Prof. Anja Feldmann, PhD
Betreuer: Olaf Maennel
von
Wolfgang Mühlbauer
([email protected])
Abgabedatum: August 30, 2004
Abstract
The Border Gateway Protocol (BGP) is the quasi-standard for the routing between autonomous systems in the Internet. Instabilities in the topology like a failing link can lead to
a considerable delay in convergence times. Therefore it is necessary to gain a better understanding of the global dynamics and underlying mechanisms of BGP.
In this work we perform a sensibility analysis of convergence times and number of exchanged
updates to the settings of BGP parameters. In particular, the influence of the Minimum
Route Advertisement Interval (MRAI) timer is investigated.
Further experiments serve to lighten the propagation of updates in succession to the failure of
a link. Scalability questions like how many autonomous systems are affected by the instability
and how far do update messages spread out from the broken link will be examined in this
work.
All experiments are conducted using the SSFNet network simulator.
Contents
Contents
1
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . .
1.2 Goals of this Study . . . . . . . . . . . . .
1.2.1 Influence of MRAI Timer Settings
1.2.2 Propagation of updates . . . . . .
1.3 Guide to the Reader . . . . . . . . . . . .
.
.
.
.
.
3
3
3
4
4
4
2 Using the SSFNet Simulator
2.1 General Overview . . . . . . . . . . . . . .
2.2 Extensions to SSFNet . . . . . . . . . . .
2.3 Generation of DML Files . . . . . . . . .
2.3.1 Subgraph Extraction . . . . . . .
2.3.2 Automatic Generation of the DML
2.4 Simulator Output . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
File
. . .
3 Setting up the Experiments
3.1 Simulation Topologies . . . . . . . . . . . .
3.1.1 Middle Topology . . . . . . . . . . .
3.1.2 Topology 1140 . . . . . . . . . . . .
3.1.3 Topology 7774 . . . . . . . . . . . .
3.2 Generation of Link Failures . . . . . . . . .
3.2.1 Link Categories . . . . . . . . . . .
3.2.2 Failure Scenarios in the Experiments
3.3 Analysis of the Simulation Results . . . . .
3.4 Taken Experiments . . . . . . . . . . . . . .
3.4.1 Investigation of MRAI Timer Effects
3.4.2 Investigation of Update Propagation
.
.
.
.
.
.
.
.
.
.
4 Simulation results
4.1 Influence of MRAI Timer Settings . . . . . .
4.1.1 Varying the MRAI Timer Values . . .
4.1.2 Per-peer and per-prefix MRAI Timers
4.2 Propagation of Updates . . . . . . . . . . . .
4.2.1 Experiment Description . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
8
8
9
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
12
13
13
14
15
15
16
16
17
.
.
.
.
.
19
19
19
21
24
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4.2.2
4.2.3
2
Number of affected ASes after a link failure . . . . . . . . . . . . . . .
Propagation Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
28
5 Conclusions and directions for future work
30
List of Figures
32
List of Tables
33
Bibliography
34
Chapter 1
Introduction
1.1
Motivation
”QUERTYUIOP!” This strange-looking collection of characters is said to be the content of
the first electronic missive sent by the engineer Ray Tomlinson in 1971 from one computer to
another computer sitting right beside it. Of course, Tomlinson and other network pioneers
of this time, could not anticipate the tremendous development of networking resulting in a
worldwide mesh of connections, the Internet.
Complex issues arise with the increasing size of networks. Take as an example routing in
the Internet: how do packets find their way to a specific destination in this distributed
environment where no router has knowledge on the global topology and all available network
links? It is the task of so-called intra-domain routing protocols (e.g. OSPF, RIP) and interdomain routing protocols (BGP) to provide a solution to this problem.
However, existing routing protocol implementations are far from being perfect. The Border
Gateway Protocol (BGP) being responsible for maintaining connectivity between autonomous
systems (ASes) in the Internet sometimes cannot prevent considerable delays in the convergence process after instabilities have occurred in the network.
Unfortunately, the underlying mechanisms of BGP are not yet understood well enough to
improve the existing protocol implementation in terms of specific aspects. Therefore, a careful
analysis of the status quo is indispensable.
1.2
Goals of this Study
It is the main objective of this work to explore the scalability of BGP and the influence
of configuration parameters on convergence times and the number of exchanged protocol
messages by using the SSFNet simulator. For this purpose, we concentrated basically on two
aspects which will now be introduced very briefly: the settings of the MRAI timer and the
propagation of updates.
3
CHAPTER 1. INTRODUCTION
1.2.1
4
Influence of MRAI Timer Settings
1
In order to generate their routing tables, BGP speaking routers exchange messages in a
similar way as it is done by other distance vector protocols. These advertisement messages are
rate-limited using timers associated with the value Minimum Route Advertisement Interval
(MRAI). Whenever a router is advertising a route for a certain destination to a neighbor
autonomous system (AS), a new instance of this timer is started. In the aftermath it is
prohibited to send another advertisement concerning this destination to that neighbor until
the associated timer has expired after MRAI seconds.
This rate limiting is supposed to dampen some of the oscillations inherent in a distance
vector protocol. While waiting for an MRAI timer to expire, a BGP router does not expose
its connected neighbor ASes to every intermediate step in finding the best path to a certain
destination. Thus rate limiting can be expected to reduce the number of updates needed for
convergence at the cost of adding some delay to the sent messages.
It is one main objective of this work to perform a sensibility analysis on the parameters of
the MRAI timer.
1.2.2
Propagation of updates
After a link has broken in a given network, update messages between the autonomous systems
(ASes) must be exchanged until new best paths for all affected routes have been installed
again. It could be assumed that in general some ASes will not discern the instability, meaning
that they don’t receive any BGP update messages. This might happen if prefixes which have
been routed over the broken link are now redirected to new paths and those new paths
possibly do not differ completely from the original path but actually have some nodes (ASes)
in common.
Investigating the propagation of updates involves making a statement on the number or ratio
of ASes receiving update messages as a consequence of a broken link and on the propagation
radius. By propagation radius we understand the distances updates spread within the topology
starting from the source of the instability. Said more simply: how far away from the broken
link can the instability still be observed?
Altogether, this second main aspect of our work could be of great importance for drawing
conclusions on the scalability of the BGP protocol.
1.3
Guide to the Reader
This document is structured as follows: In the following part we are describing the usage of
SSFNet and the extensions made to this network simulator. After giving an overview of how
the experiments were conducted in Chapter 3, we present the results of our simulations in
Chapter 4. We are closing with a short summary and some suggestions for future work in
this area.
1
section has been adopted from [1] with marginal modifications
Chapter 2
Using the SSFNet Simulator
2.1
General Overview
Examining the dynamics of updates in a distributed protocol like BGP constitutes a challenging task. For a complete understanding and analysis, it is desirable to have a global view and
control on all routers involved in the protocol communication. In the case of BGP, knowledge
on all messages sent from or received by a BGP-speaker allows to find out the prefixes which
are advertised to another neighbor, thus deducing the routing in this inter-domain topology.
Additionally, it is possible to impose specific events like link failures or the advertisement of
new prefixes. By using simulation techniques, this global view and control of things can most
easily be achieved.
We decided to use the Scalable Simulation Framework (SSF) [2] mainly due to three reasons.
First of all, this framework already provides an implementation of the BGP4 protocol made
available by B.J. Premore [1]. Furthermore, the basic BGP implementation has been extended
with a lot of new features by members of the research group of Prof. Anja Feldmann in the
past (see section 2.1). The modular and concise structure of SSF in comparison to other
network simulators like ns-2 alleviates last but not least the enhancement with new features
needed for our investigations.
A general overview on the SSF network simulator is given in Figure 2.1. It consists of a
discrete-event simulation kernel, dealing with all the fundamental aspects of a simulation.
Based on this kernel, SSFNet provides a collection of Java-based components which contain the modeling of the Internet protocols and network topologies and which can be easily
extended with new features and new components. Within the SSFNet packages, it can be distinguished between further parts, of which the most important ones are SSF.OS and SSF.Net.
Whereas SSF.Net reproduces the network topologies (links, nodes, network connectivity),
SSF.OS is responsible for modeling the numerous protocols (e.g. TCP, BGP, OSPF). The
configuration of the simulation parameters, the used network topologies and the simulation
dynamics are defined in text files which are written in the Domain Modeling Language (DML)
syntax [2].
Under the assumption that all needed protocols and features are already implemented in
SSFNet, the major task is to generate the DML-files. The following excerpt is supposed to
5
CHAPTER 2. USING THE SSFNET SIMULATOR
6
SSF
simulation kernel
based on
SSFNet
simulation models
part of
SSF.OS
protocol simulation
configures
DML files
configuration files
part of
SSF.Net
topology simulation
Figure 2.1: Structural overview of the SSFNet simulator
illustrate in a very simple manner the concept of the Domain Modeling Language.
Net [
host [
id 1
interface [id 0 bitrate 100000000 latency 0.0]
graph [
ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster]
ProtocolSession [name ip use SSF.OS.IP]
]
]
host [
id 2
interface [id 0 bitrate 100000000 latency 0.0]
graph [
ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster]
ProtocolSession [name ip use SSF.OS.IP]
]
]
link [attach 1(0) attach 2(0) delay 0.002]
]
This DML snippet describes a network with two hosts, each running a TCP session over the
IP protocol. The host (id 1) is connected to the host (id 2) by a link with delay 0.002.
As the DML files are getting very large with increasing size and complexity of the simulation
topologies, they will be generated automatically in most cases (see section 2.3). The user
does not have to care about the assignment of unique identifiers for hosts and interfaces (NHI
addresses) and their corresponding IP addresses because this is done by the simulator itself.
Issuing the following command will run the simulation for 2000 seconds, provided that all path
variables have been set correctly (refer to [2]) and that the configuration has been written
into the DML file myModel.dml.
java SSF.Net.Net 2000 myModel.dml
CHAPTER 2. USING THE SSFNET SIMULATOR
2.2
7
Extensions to SSFNet
As already mentioned in the last section, there is already an implementation of BGP4 included
in the SSFNet package (primary author B.J. Premore). For our work we used SSFNet version
1.4 which orientates itself strongly to the recommendations of RFC 1771 but which still misses
some BGP functionality like route flap damping. More detailed information and a summary
of the implemented features can be found in [2] and in [3].
However, numerous features were added to the BGP implementation of SSFNet in the last
few years by members of the research group of Prof. Anja Feldmann. In the scope of his
diploma thesis “Analysis of OSPFv2-BGP4 Interactions Using the SSFNet Simulator” [4]
Hagen Böhm amended SSFNet with the OSPFv2 protocol and a scanning process for BGP
with the possibility of simulating link failures . Another diploma thesis by Andreas Hartl [3]
investigated the dynamics of BGP updates in realistic topologies making it necessary to add
new functionality to the existing BGP implementation in SSFNet. The following were the
most important changes:
• MRAI timer : With Cisco routers being the overwhelming majority in nowadays networks, the MRAI timer implementation was modified such that it exactly models the
behavior of Cisco BGP speakers. Needed changes were the normal distribution of the
MRAI timer value (takes values between 25s and 31s) and the use of a per-peer timer
basis instead of a per-prefix MRAI timer (Cisco routers keep a separate timer only for
each neighbor not for each advertised prefix to a neighbor).
• Best Path Selection Process: Here again, the strategy of finding the “best” route to a
destination differs for Cisco routers from that of the RFC specification. In particular,
more emphasis is placed on the length of the AS PATH attribute in the tie-breaking
mechanism.
• Community Values: The SSFNet BGP implementation was modified such that it understands BGP community values as specified in RFC 1997. With the help of community
values, it is possible to reproduce peering, or customer-provider relations between autonomous systems (refer to [5] and [6] for closer information on AS relationships).
• Workload Generation: Factors like the number of BGP sessions on a host, the size of
the routing tables or the number of updates in the input queue were taken into account
in order to create a more realistic workload.
An all-embracing description of these adaptions can be found in [3].
For the investigation of BGP convergence and scalability, a mechanism was needed to create
instabilities in a given topology. Modified SSFNet classes, implemented by Hagen Böhm,
were merged with the normal BGP implementation, making it possible to configure link fails
in a comfortable way in the DML files. Though not used for the simulations in this work, we
added the route flap damping mechanism (taken from a later SSFNet version) and the option
of inserting “dummy prefixes” into the network.
CHAPTER 2. USING THE SSFNET SIMULATOR
2.3
8
Generation of DML Files
The configuration of the topology, the simulation parameters and simulation dynamics are
all defined in DML files. The main objective of this work consists in examining BGP update
dynamics in realistic topologies. But with the high complexity of realistic networks it becomes
more and more difficult to built the input files manually. That is why we need the possibility
of creating DML files automatically. The general procedure is depicted in Figure 2.2.
AS relationship pairs
(extracted from authentic
BGP routing tables)
AS relationship pairs
(extracted subgraph)
DML configuration file
(simulator input)
Figure 2.2: Generation of DML files
The source for the automated generation of DML files are so-called AS relationship pairs.
Being extracted from authentic BGP tables (for example from RIPE) with some kind of seed
information (e.g. a tier1-provider) by a tool from Arne Wichmann [7], these pairs reflect the
commercial relationships between interconnected ASes. For our study we distinguish between
provider-customer relationships (customer pays its provider for connectivity to the rest of the
Internet) and peering links (neighboring ASes agree to exchange traffic free of charge). The
following excerpt shows a possible AS relationship pairs input:
1234 > 2401
2401 = 3110
3110 < 1234
This short listing denotes that AS 1234 is a provider of 2401 (>), 2401 is sharing a peering
link with 3110 (=) and AS 3110 is customer of 1234 (<).
Unfortunately, taking the complete topology graph resulting from all AS pairs gained from
RIPE or other BGP instances, is not possible. Due to high memory demands of such a sample
network in the simulations, we extract a complete subtree of a specified AS out of the original
graph. The Topology Extraction tool from Andreas Hartl [3] will be explained briefly in 2.3.1.
After extracting a subgraph, we have a reduced topology with less ASes. However, our current
input is still in the syntax of the original AS relationship pairs. It is the task of the Topology
Conversion tool to convert this “AS-Pairs syntax” into the DML language format which can
then be passed on to the SSFNet simulator as input. A short description of this tool together
with our extensions can be found in 2.3.2.
2.3.1
Subgraph Extraction
In order to reduce the complexity and size of the sample networks used for our simulations,
a subtree of a specified AS can be extracted out of the original graph given in the form of
AS Pairs. The basic idea of the Topology Extraction tool [3] consists in first doing a kind
of depth-first-search algorithm up to a certain depth meaning that it will find all core ASes
CHAPTER 2. USING THE SSFNET SIMULATOR
9
which are not more than a specified number of AS hops away from the starting AS (the tool
refers to this parameter as “number of AS hops”).
Afterwards it searches all paths between the core ASes up to a specified length (referred to as
“ maximum path length” by the tool), adding all intermediate ASes on these paths which have
not been visited yet. It should be mentioned that the extraction of the core ASes is done
under consideration of certain redistribution policies arising from commercial relationships
(peering, customer-provider). Taking into account all paths between the core ASes up to
a certain length ensures that most propagation effects should also appear in our extracted
topology if they can be observed in the complete network.
For an all-embracing explanation of the subtree extraction, we refer to section 4.2.1 of [3].
The extraction tool asks for all parameters and is started by typing:
make extract
2.3.2
Automatic Generation of the DML File
Figure 2.2 showed that we generate the DML files needed for our simulations on the basis of
AS relationship pairs. Even if a subtree is extracted from the complete topology, the input is
still in the form of AS pairs and must be converted to the DML syntax before SSFNet can
use it. For this purpose there is another tool called Topology Converter which fulfills this
task.
A detailed description of the functionality of this tool can be found in section 4.2.2 of [3].
In terms of the DML file generation out of AS pairs, it is possible to distinguish between
two important parts: the external structure of the topology (links between ASes) and the
internal structure (I-BGP mesh within an AS). Whereas the external structure is built upon
the information given in the form of AS pairs, the interior of an AS is generated according
to the wishes of the user. For example, the user can determine the number of route reflectors
within an AS or the number of border routers which connect to other customer, provider and
peering ASes. However, the internal structures will look similar for all ASes in the topology;
it is not possible to generate different (I)-BGP meshes for each AS.
The original Topology Converter was modified slightly. The changed version will be called
Topology Generator and contains the following additional features:
• Per-prefix MRAI, WRATE, SSLD: The user is asked whether he wishes to use perprefix MRAI timers instead of per-peer MRAI timers and whether to activate WRATE
(withdrawal rate limiting) and SSLD (sender side loop detection).
• Route Flap Damping: The route flap damping mechanism of a later SSFNet version
(1.5) was merged into the used SSFNet implementation (version 1.4). For the case that
the modified SSFNet version is used, the Topology Generator tool can enable route flap
damping with different parameter settings (default Cisco or Juniper settings or manual
specification of the parameters is possible). However, route flap damping was not used
for the simulations in this study.
• Dummy Prefixes: There is the possibility of inserting a specified number of dummy
prefixes starting from a dummy-AS into the network. In this way, it can be achieved
CHAPTER 2. USING THE SSFNET SIMULATOR
10
that the BGP routing tables are larger and have more entries.
• Link Failures: The specification of link failures has been extended. Now it is possible
to define the number of links to fail, a time interval in which the link failures occur at
a random time and a time when all broken links are supposed to recover.
The Topology Conversion or Topology Generator tool is started by typing make convert in
the appropriate directory.
2.4
Simulator Output
An essential part of this study consists in analyzing the simulation output and drawing conclusions based on the results of the simulation. During a simulation run, all sent and received
updates are logged, containing information on the sending or receiving time, the sender or
receiver, the type of the update message, the affected prefix and the AS PATH attribute of
the BGP message. The following two lines give an idea of the logged data:
45.709774161 send 4:10 4:2 rte 0.0.1.0/26 (3 1)
45.710960373 receive 4:10 4:2 rte 0.0.1.0/26 (3 1)
Both lines actually belong to the identic BGP message from interface 4:10 to 4:2. The first
line shows the time when the message was sent by the source, the second time indicates the
arrival time at the destination. Here the prefix 0.0.1.0/26 is advertised within AS 4 because
sender and receiver are both part of AS 4 (4:10 and 4:2). Originally, the prefix was announced
by AS 3 and has propagated over AS 1 to AS 4 which is indicated by the AS PATH attribute
(3 1).
With all this information it is possible to get the desired complete view and control on all
BGP speakers in the network thus enabling us to perform a comprehensive analysis of update
dynamics.
Chapter 3
Setting up the Experiments
Whereas the last chapter dealt with all relevant aspects in terms of the used network simulator
SSFNet, this chapter is dedicated to the setting up of the experiments. It is essential to
know how an experiment was conducted, what pre-assumptions were made, what testing
environment was used, etc. The sections below explain in detail our investigations of the
MRAI timer influence on convergence times and the propagation of updates.
3.1
Simulation Topologies
Running simulations with the SSFNet simulator requires as input a file in the DML format.
The DML files do not only describe the simulation parameters and dynamics (e.g. link
failures) but also the network topology, i.e. the graph of ASes. One of the main objectives
of this study is to examine BGP behavior in realistic networks which approach the structure
of the Internet as closely as possible. For that reason choosing the simulation topologies is a
critical task.
In order to verify the correctness of our extensions to SSFNet and certain auxiliary Perl scripts,
we developed some simple testing networks, which are of no greater importance for the results
of this study. The more complex and realistic networks were all generated automatically
as described in section 2.3. Now we introduce the relevant topologies for conducting the
experiments.
3.1.1
Middle Topology
The so-called Middle Topology (taken from [3]) is pictured in Figure 3.1. This topology was
created manually by specifying the commercial relationships between the ASes in the AS pair
format and then running the Topology Conversion tool (see 2.3.2). Contrary to the next two
topologies, it was not generated out of AS relationship pairs from RIPE or other Internet
sources thus being a more synthetic network.
Nonetheless, it already shows some characteristics which can be found in realistic networks,
too. For example, the graph already contains a certain hierarchy of top-level tier 1 ASes
11
CHAPTER 3. SETTING UP THE EXPERIMENTS
AS 1
AS 4
AS 5
AS 10
AS 11
AS 2
AS 6
AS 12
12
AS 3
AS 7
AS 13
AS 8
AS 9
AS 14
AS 15
Figure 3.1: Middle Topology (green lines are peering links)
(here AS 1, 2 and 3), parts which are more in the middle of the graph (AS 4 to 9) and
ASes at the bottom of the graph (AS 10 to 15) to which we frequently refer as stub ASes.
It shows out that this distinction between different levels (tiers) makes sense in the Internet,
too. Furthermore, here are some ASes in the graph which are multi-homed, meaning that
they are connected with more than one provider.
For the Middle topology as well as for the other ones used in this work, it must be pointed
out that the number of external links between a pair of ASes is varied according to the needs
of the specific experiment. However, this will be mentioned clearly in each case.
3.1.2
Topology 1140
The main objective of investigating BGP update dynamics in a realistic environment requires
that more or less realistic test networks are used. For this purpose, a subtree of a small
German ISP [8] was extracted with the Topology Extraction tool (see 2.3.1) based on the
commercial relationships between ASes measured in 2003 by [7]. Due to memory limitations,
it was necessary to restrict the “number of AS hops” to one (for finding the core ASes) and
the “maximum path length” to five ASes. Table 3.1 summarizes some facts for the extracted
network:
# ASes
95
# external links
1145
graph degree (avg)
24.1
# core ASes
5
Table 3.1: Properties of Topology 1140
Altogether we receive 95 ASes where each AS is composed of several routers organized in an
I-BGP mesh. Under the assumption that every pair of ASes is only connected by one link, we
obtain 1145 external links, leading to an average graph degree (average number of neighbors
for each AS) of 24.1. Though the topology graph seems to be highly meshed, the extraction
tool only finds 5 core ASes in the first step. The still missing conversion from the extracted
subgraph to the DML syntax is done with the Topology Conversion tool.
CHAPTER 3. SETTING UP THE EXPERIMENTS
3.1.3
13
Topology 7774
The procedure for generating Topology 7774 is basically the same as for Topology 1140. It
mainly differs in the used AS relationship pairs [7] which are here from April 2004 and thus
more up-to-date. The extraction was started from AS 7774 with the “number of AS hops”
set to one (to find the core ASes) and the “maximum path length” set to five. A summary
of some characteristics is shown in Table 3.1.
# ASes
105
# external links
614
graph degree (avg)
11.7
# core ASes
3
Table 3.2: Properties of Topology 7774
Contrary to Topology 1140, it has more ASes, though the extraction was started with less
core ASes. The number of external links is lower compared to 1140, consequently resulting
in a lower average graph degree. However, the density functions in Figure 3.2 suggest that
the number of neighbor ASes is subject to a broad distribution.
0.02
Density
0.00
0.01
0.02
0.00
0.01
Density
0.03
Topology 7774
0.03
Topology 1140
0
20
40
60
80
node degree (number of neighbor ASes)
0
20
40
60
80
node degree (number of neighbor ASes)
Figure 3.2: Density functions of the node degrees (number of neighbor ASes) for Topology
1140 and Topology 7774
3.2
Generation of Link Failures
Up to now, we only covered the static aspects of the simulator input, namely the generation
of the topology and the interconnections between ASes. However, an important part in
the experiments are dynamic circumstances like the occurrence of link instabilities or the
advertisement of new routes and prefixes. For the testing scenarios in this work, it is sufficient
to dispose of a mean of simulating link failures at a specific time. Thanks to Hagen Böhm
[4], it is possible to let a link fail with the following DML extension:
link [ attach 1:1(1) attach 2:2(2) delay 0.0010 fail [ from 300 until 900 ] ]
This DML statement will make the link between router 1 in AS 1 and router 2 in AS 2 fail
at simulation time 300s, basically dropping all (IP) packets at one router interface. At time
point 900s the link will recover and transport data as usually. It should be mentioned that
CHAPTER 3. SETTING UP THE EXPERIMENTS
14
in general link failures are not configured manually in the DML files but with the help of the
Topology Generator tool (see 2.3.2) or a special Perl script (cf. 3.2.2) which was developed
for this purpose.
Last but not least, we are interested in categorizing links in terms of their harmfulness if they
should fail. The next subsection will illustrate what is understand by such a classification of
external links, whereas subsection 3.2.2 presents a script for configuring link failures depending
on the desired “failure category”.
3.2.1
Link Categories
When discussing the characteristics of Topology Middle in 3.1.1, we already alluded to the
fact that realistic topologies - e.g. the Internet - obey a certain hierarchy. Indeed, there
are research papers ([5] and [6]) which seem to confirm that the autonomous systems in the
Internet can be classified in different categories in terms of their commercial relationships.
By convention, ASes which are at the top of the hierarchy, having no providers and only
peering with other “top ASes” are called tier1. In our work we wanted to examine in how far
the position in this hierarchy is correlated with the harmfulness which this link has for the
propagation of updates accepted the case the link should fail.
Before classifying the external links of a network, the ASes were associated with one of the
following categories:
• tier1-AS : All ASes which are not connected to any provider, thus being at the top of
the hierarchy are said to be in tier1.
• stub-AS : ASes which don’t have any customers are at the bottom of our ranking and
are assigned to the category of stub entities.
• middle-AS : All ASes which don’t belong to one of the first two groups fall into this
category.
Starting from these categories of ASes, the external links were assigned to one of the groups
below:
• tier1-tier1 : Links between two tier1-ASes.
• tier1-middle: Link between tier1 and middle-AS.
• middle-middle: Link between two middle ASes.
• middle-stub: Link between middle AS and stub AS.
• stub-stub: Link between stub ASes.
Table 3.3 shows the results of this classifications for the topologies used in our simulations (it
is assumed that each AS pair is only connected by one link).
All topologies have in common that they consist of only very few tier1-tier1 links with the
majority of external links concentrated in the middle-middle group. This fact suggests that
the tier1 ASes as well as the stub ASes are probably situated more on the “edge” of the
CHAPTER 3. SETTING UP THE EXPERIMENTS
Topology
Middle
1140
7774
# links
27
1145
614
tier1-tier1
3
3
15
tier1-middle
11
125
116
middle-middle
3
627
242
15
middle-stub
8
352
219
stub-stub
2
38
22
Table 3.3: Categorization of external links for the used topologies
topology graph. It is pointed out that Topology Middle is much smaller than the other two
networks, only having 27 external links.
The configuration of link failures according to the just described categorization is automatized
by the Perl script CreateLinkFails.pl which will be the explained in the next subsection.
3.2.2
Failure Scenarios in the Experiments
It was one objective of this study to examine the propagation of updates depending on the
category of the failing link. Choosing a link of the desired category can be done easily with
the Perl script CreateLinkFails.pl. As input this script requires the desired number of
links to fails, the category of which the failing links should belong to, the DML input file
for the simulation, a failure time period and the time when the link should recover from the
failure state. According to these input parameters an appropriate failing link is configured in
the DML file as described in 3.2.
The internal proceeding of this script as basically as described in the last subsection. After
classifying the ASes, the external links are assigned to categories and then the links to fail
are chosen randomly as well as the exact failure time within the specified failure period. In
most cases, this script will be called be other control scripts in our experiments.
3.3
Analysis of the Simulation Results
Investigating the propagation of updates and the influence of different MRAI timer settings
are an integral part of this project. From the logged BGP messages (see 2.4) the following
values are derived:
• Convergence times: After a link in the topology fails, BGP messages are exchanged
between BGP speakers until all routes which were leading over the broken link are
redirected to other paths. The time from when the first BGP message is sent after the
occurrence of the instability until the time when the last BGP update is received by
a router will be referred to as the convergence time. With the help of the logged time
stamps, it is possible to determine these convergence times.
• Number of affected ASes: Another interesting aspect consists in examining the spread
of instabilities across the topology. If a connection between two routers drops out, not
necessarily all BGP speakers will see this change, possibly due to the reason that they
didn’t route any prefix over the broken link. By looking for all ASes in the log files
CHAPTER 3. SETTING UP THE EXPERIMENTS
16
which received a BGP message as a consequence of one broken link, it is possible to
determine the number or percentage of ASes which are reached by the instability.
• Propagation Radius: Concerning the propagation of updates it is interesting to know
the distance of the affected ASes from the broken link. Basically, BGP messages spread
in all directions from where the instability occurs. By analyzing all logged messages it
is possible to trace back the intermediate hops along which updates have propagated
until reaching this AS. The propagation distance or radius is the number of hops not
including the source node of the instability.
Analyzing the logged data must be done for each simulation run and is automatized by the
Perl script LogfileAnalyzer.pl. It gets as input a file with the logged simulator output and
asks additionally for the DML file which was used by SSFNet. The output of this Perl script
is a text file containing among other things the just described result values like convergence
times, percentage of affected ASes and propagation radius.
It deserves mentioning that the script only takes into account BGP messages for analysis
whose timestamp lies in a specified time window. In this way, it can be ensured that all considered updates are exclusively affiliated with a specific instability event. Assigning the value 0
for the update radius to the two ASes being incident to the failing edge, the update distance for
the other ASes can be recursively determined by defining it to be min{currentDistance, n+1}
if an update message was received from an AS which has the radius n.
Usually, the script LogfileAnalyzer.pl is called by other control scripts and not started
manually. The two main control scripts will be introduced in the next section.
3.4
Taken Experiments
Having talked about the generation of the static and dynamic properties of our experiments
and the analysis of the simulation results, this chapter deals with the “high-level” view of how
we conducted our investigations. It will be clarified what exact steps were taken to obtain
the results of Chapter 4. For both main goals of this study - the investigation of the update
propagation and the MRAI timer influence on convergence - Perl control scripts were written
which will be presented in the next two subsections.
3.4.1
Investigation of MRAI Timer Effects
Examining the properties of different MRAI timer values and the use of a per-peer versus
per-prefix timer is done by the Perl script mraiInvestigation.pl. Basically, we are running
the simulations along four dimensions: Different failure links, diverse MRAI timer values,
a per-peer or per-prefix timer basis and different random seeds for the initialization of the
random number generators. The general steps taken by this script are depicted in Figure 3.3.
At the beginning of a cycle the script configures a link failure with the help of the
CreateLinkFails.pl tool (compare 3.2.2). After adjusting the MRAI timer to a value out of
the set {4s, 5s, 10s, 15s, . . . , 55s, 60s}, the timer basis is determined to be either per-peer or
per-prefix. Last but not least, a seed for the random number generator of SSFNet is chosen
CHAPTER 3. SETTING UP THE EXPERIMENTS
17
configure link failures in dml file
START
set the MRAI timer value
change timer basis: per−peer and per−prefix
choose seed for random number generators
run simulation with SSFNet
analyze the results (summary is stored to file)
END
Figure 3.3: Flow chart of the script mraiInvestigation.pl
out of a given set of possible seeds, all being arbitrary text strings. When the dimension
parameters have been set, the control script initiates the SSFNet simulation and has the
computed results analyzed with the LogfileAnalyzer.pl. For each single simulation run,
mraiInvestigation.pl keeps some information about convergence time and number of exchanged updates which are summarized in a text file after all simulations (and the control
script) have been finished.
Altogether, this control script contains four nested loops each iterating over the parameters
of one so-called testing dimension. More detailed information like the number of iterations
with different random seeds is given in in Chapter 4, as some settings might vary for diverse
testing series.
3.4.2
Investigation of Update Propagation
Examining update propagation properties - number of affected ASes or update radius - is
done in an analogical manner as described in the preceding subsection. Again a Perl script
called updateRadius.pl is responsible for testing along three dimensions: Different categories
of link failures (cf. 3.2.1), diverse failing links within each link category and different seeds
for the random number generator of SSFNet. Figure 3.4 illustrates the basic steps during a
run of updateRadius.pl:
Here the script comprises 3 nested loops, iterating over the parameters for each so-called
dimension. First we determine to which category the failing link should belong to (stub-stub,
tier1-middle, etc.). The CreateLinkFails.pl script (see 3.2.2) then configures a link to fail,
whereby different links for each failure category are tested (second loop). After choosing a seed
string for the random number generator (third loop), SSFNet is started and the results are
CHAPTER 3. SETTING UP THE EXPERIMENTS
18
choose a failure category
START
configure link failures in DML file
choose seed for random number generators
run simulation with SSFNet
analyze the results (summary is stored to file)
END
Figure 3.4: Flow chart of the script updateRadius.pl
analyzed with LogfileAnalyzer.pl. Again, we remember some results like the percentage
of affected ASes for each simulation run in order to create a summary of the results before
updateRadius.pl is terminating.
Chapter 4
Simulation results
In order to obtain a better understanding of the underlying mechanisms of BGP, a careful
and all-embracing sensibility analysis of the protocol parameters is needed. Within the scope
of this study, we concentrated on two important aspects which we believe to be essential
for an evaluation of BGP in terms of scalability and convergence times: the propagation of
updates in succession to a link failure and the influence of the MRAI timer on the convergence
times and the number of sent updates. This chapter describes the conducted experiments and
documents the received results.
4.1
Influence of MRAI Timer Settings
The protocol specification of BGP includes several configurable timers, one of which is the
Minimum Route Advertisement Interval (MRAI) timer. Being responsible for limiting the
number of updates sent by a BGP speaker or for a certain prefix, this timer might have direct
influence on the number of updates and the convergence times after a link failure.
Part of this section are different configuration settings for the MRAI timer and their effects
on the general convergence process. Main attention will be devoted to two important configuration options: choosing a per-peer or per-prefix timer and what timer value to take.
Arising questions are for example: How do convergence times and number of updates change
with increasing value of the MRAI timer and what advantages do per-prefix timers offer in
comparison to per-peer timers?
4.1.1
Varying the MRAI Timer Values
Every time a router sends a route advertisement to a neighbor it is starts a new instance
of the MRAI timer, not allowing this router to send another advertisement concerning the
same destination until the timer has expired. For this experiment we used a per-peer timer
basis and had 20 links failed at arbitrary locations within Topology 7774. The exact point of
time when the link failures occur are chosen randomly out of a time window with a length of
20s in order to avoid possible synchronous runs of different timer instances. All experiments
19
CHAPTER 4. SIMULATION RESULTS
20
concerning the MRAI timer were conducted with our sample topologies having two external
links (multi-homing) between a pair of ASes and SSLD but no WRATE being used.
Always measuring the number of exchanged external updates (updates between ASes) and
the time from the first update sent after the instability event until the time the last update
was received, we conducted the experiment for MRAI timer values of 4s, 5s, 10s, 15s, . . . , 60s.
With the help of the Perl script mraiInvestigation.pl (see 3.4.1), the simulations were
initiated automatically, running each simulation based on three diverse seeds for the random
number generators of SSFNet. Furthermore, we considered four different failure scenarios
with 20 broken links for every MRAI timer value configuration.
Diagram 4.1 shows the results of these experiments. Note that all testings were done with a
per-peer timer basis and that only the means of the measured number of external updates
and of the convergence times for a specific MRAI timer value are displayed.
60
1600
1400
1500
70
80
90
# external updates (mean)
100 110
# External Updates
50
convergence time in sec (mean)
Convergence times
10
20
30
40
MRAI timer value in sec
50
60
10
20
30
40
50
60
MRAI timer value in sec
Figure 4.1: Convergence times and number of updates depending on different MRAI timer
values (per-peer) in Topology 7774
Regarding the convergence process it can be inferred from Figure 4.1 that increasing values
for the MRAI timer impose a penalty for the times needed until a steady state for the routing
has been reached again. Whereas an MRAI timer of 60s requires 110s to converge, setting the
timer to 4s only leads to a period of about 40s until all updates after the link failures have
been exchanged. The growth of the observed convergence times seems to be approximately
linear with respect to increasing MRAI values in this experiment.
These results suggest that the rate-limiting mechanism of the MRAI timer adds more delay
to the messages for higher timer values. However, it is pointed out that there exists related
work [1] which found out that very low MRAI timer values can cause a high workload in BGP
routers thus inducing an increase in convergence times again.
The second part of Figure 4.1 depicts the number of external updates as a function of the
setting of the timer value. Here we observe the converse trend: with increasing MRAI value,
the number of exchanged BGP messages is decreasing from about 1600 to about 1400s. We
explain this observation by the dampening of some of the route oscillations which are inherent
in the path-vector protocol BGP. A BGP-speaking router can collect and evaluate alternative
CHAPTER 4. SIMULATION RESULTS
21
paths for a certain prefix before it is advertising the best path to its neighbors. Therefore
its neighbors are not exposed to every intermediate path but only to the best path within a
period of time.
It might be asked whether it is justified to use the means of the measured data values for
drawing general conclusions on the influence of the MRAI timer parameters. Strong fluctuations could possible weaken the explanatory power of our results. The standard deviations for
the measured number of external updates and the convergence times are depicted in Figure
4.2.
# External Updates
2000
1500
1000
# external updates
0
500
100
80
60
40
convergence time in sec
120
Convergence times
10
20
30
40
50
60
MRAI timer value in sec
10
20
30
40
50
60
MRAI timer value in sec
Figure 4.2: Standard deviations for the measured data values in Figure 4.1
The standard deviations for the number of updates are less than 35% of the computed means
for all MRAI timer configurations, whereas the deviations are never larger than 10% of the
means in the case of the convergence times. Although the number of updates shows a higher
variability, the calculation and use of the means seems to be justified considered the fact that
twelve simulation runs were made for every MRAI timer value.
In closing, we summarize that with increasing values for a per-peer MRAI, the number of
external updates is decreasing at the cost of higher convergence times.
4.1.2
Per-peer and per-prefix MRAI Timers
The question was already raised whether per-prefix MRAI timers have any advantages in
comparison to per-peer timers. One might expect that keeping a separate timer for each
single prefix being advertised to a neighboring peer does not impose so high penalties on
convergence times as using timers on a per-peer basis. The following experiment tries to light
up questions concerning the use of per-prefix MRAI timers.
In analogy to the simulation described in 4.1.1 we generate 20 link failures at arbitrary
locations of Topology 1140 and Topology 7774 which occur randomly within a time window
of 20s. Again, we measure the number of exchanged external updates and the time from the
first update sent after the instability event until the time the last update was received by a
host (referred to as convergence time). The MRAI value was varied in the same way as in
CHAPTER 4. SIMULATION RESULTS
22
4.1.1 but this time simulations are run for both a per-peer and a per-prefix MRAI timer. In
order to enhance the explanatory power of our conclusions, we perform simulations on the
two “big” topologies: Topology 1140 and Topology 7774. With the help of the Perl script
mraiInvestigation.pl twelve simulations are run for every fixed MRAI timer value and
fixed timer basis (per-peer and per-prefix) for the two topologies, as three different seeds for
the random number generator and 4 different failure scenarios are used in each case.
The diagrams in Figure 4.3 illustrate the convergence times and the number of external
updates depending on the used MRAI value for a per-peer and a per-prefix timer basis. Note,
that always the means for a specific timer value are plotted.
20
30
40
50
100
80
10
20
30
40
50
Topology 1140: # External Updates
Topology 7774: # External Updates
60
2000
1500
per−peer
per−prefix
500
1000
per−peer
per−prefix
1000
1500
2000
# external updates (mean)
2500
MRAI timer value in sec
500
# external updates (mean)
60
convergence time in sec (mean)
60
MRAI timer value in sec
2500
10
per−peer
per−prefix
40
100
80
60
per−peer
per−prefix
40
convergence time in sec (mean)
120
Topology 7774: Convergence times
120
Topology 1140: Convergence times
10
20
30
40
MRAI timer value in sec
50
60
10
20
30
40
50
60
MRAI timer value in sec
Figure 4.3: Comparison of per-peer and per-prefix MRAI timers in terms of convergence times
and number of external updates
Concerning the dependence between convergence times or number of updates and different
timer values for a per-peer MRAI in Topology 1140, the reader is referred to 4.1.1, as the
observations and conclusions for this case are basically the same: increasing values for the
MRAI seem to lead to less external updates but longer convergence times.
If corresponding convergence times for per-peer and per-prefix MRAI timers are compared
with each other, it seems that per-prefix implementations offer slight advantages over the
default per-peer timers. Whereas in Topology 1140 the convergence process is always some
seconds faster for a per-prefix timer basis, this is only true up to a timer value of 25s in
Topology 7774.
However, Figure 4.4 suggests that the standard deviations for the convergence times in
CHAPTER 4. SIMULATION RESULTS
23
Topology 7774 are more than 10s for per-prefix timers set to values higher than 25s. Possibly, this could explain why per-prefix MRAI timers show worse convergence behavior than
per-peer timers in that case.
In most failure scenarios per-prefix timers will have slight advantages in terms of convergence
times over keeping one timer for every neighboring AS. Holding back all update messages to
a peer independent of the concerned prefixes, a per-peer MRAI timer imposes penalties on
convergence times in comparison to per-prefix timers. This is due to the fact that timers
on a per-prefix basis can “react” to each advertised prefix individually in the case of several
overlapping link failures.
# External Updates
4000
3000
2000
# external updates
1000
120
100
80
60
0
40
convergence time in sec
140
Convergence times
10
20
30
40
MRAI timer value in sec
50
60
10
20
30
40
50
60
MRAI timer value in sec
Figure 4.4: Standard deviations for the data values measured for the per-prefix MRAI timer
in Topology 7774 in Figure (see Figure 4.3)
Continuing our discussion of the results, we hold down that in both topologies the mean of the
number of updates remains more or less constant while varying the values of the per-prefix
MRAI timers. However, strong fluctuations in the number of exchanged messages for the
single simulation runs of a specific MRAI timer setting can be observed in Figure 4.4. The
standard deviation is about 35% of the computed means possibly due to the statistical nature
of router interactions.
Taking into account the number of BGP updates in succession to instability events, per-peer
timers probably seem to be the better choice. Compared to per-peer timers, the use of MRAI
timers on a per-prefix basis produces significantly more external updates. For Topology 1140
we observe an average increase of 29% in the number of updates using a per-peer instead of
a per-prefix timer, in Topology 7774 there is even an increase of 44%.
This might be explained by the fact that a per-prefix timer does not hold back update messages
for the same neighboring AS if different prefixes are concerned. In such a scenario a timer
on the per-peer basis could reduce the number of BGP messages which are passed on to the
neighbors.
If the results also proved true in additional experiments and for other topologies, this would
justify the configuration of the MRAI timers as per-peer, being the default setting in the
widespread Cisco and Juniper routers. Nonetheless, it should be questioned whether the
CHAPTER 4. SIMULATION RESULTS
24
default configuration of the timer value to be near 30s is the best one possible for balancing
between a low router workload and fast convergence times.
4.2
Propagation of Updates
Scalability is an important issue inherent in many fields of network research. Especially
in distributed protocols like BGP which is responsible for maintaing connectivity between
autonomous systems in the Internet, it is of great importance to gain an understanding of
how protocol behavior changes with increasing size of the network. This section deals with
the investigation of update propagation after a link has failed somewhere in the topology.
Arising questions are for example: how many ASes receive an update message in succession
to a link failure and how far away from the broken link can the instability still be perceived?
Last but not least, we want to investigate whether the classification of links according to their
commercial relationships reflects the harmfulness in terms of the number of affected ASes if
this link should fail.
4.2.1
Experiment Description
For this experiment we always produced one single link failure such that all updates sent
afterwards must be related with this instability. As described in section 3.3, the number of
affected ASes can be easily determined by considering all ASes receiving a BGP message in
succession to the failure event. The second point of interest is the distance updates propagate
through the topology when a connection between two ASes breaks. We refer to that distance
as the propagation radius, measuring it in the number of AS hops not including the nodes
incident on the failed edge.
In order to approximate the circumstances in the Internet where Cisco routers are the overwhelming majority, we use per-peer MRAI timers (timer value normally distributed between
25s and 31s), SSLD but no WRATE. All simulations take place in our sample topologies
Middle Topology, Topology 1140 and Topology 7774. By only permitting one link between a
pair of ASes (no multi-homing), it is ensured that each failure of an external link leads to a
change in the inter-AS routing of prefixes observable by other ASes.
Configuring a link failure in the DML file is done by the script CreateLinkFails.pl (refer to
3.2.2), making it possible to choose systematically the failing edge such that it belongs to one
of the categories described in 3.2.1. CreateLinkFails.pl is invoked from the general control
script updateRadius.pl (see 3.4.2) which is running simulations in every failure category
for ten different failure scenarios. In order to mask statistical properties, all simulations are
started with three different seeds for the random number generator with otherwise identical
parameters.
The results of these experiments will be discussed now in the following two subsections.
CHAPTER 4. SIMULATION RESULTS
4.2.2
25
Number of affected ASes after a link failure
1.0
Figure 4.5 depicts the ratio of ASes (in percentage) receiving an update in succession to a
link failure. In the used topologies, we distinguished between the type of the failing link and
plotted only the mean value of all simulations runs for a specific failure category.
0.6
0.4
0.0
0.2
ratio of reached ASes (mean)
0.8
tier1−tier1
tier1−middle
middle−middle
middle−stub
stub−stub
Middle
1140
7774
Figure 4.5: Average percentage of ASes receiving updates after a failure broken down by
different link categories
First of all, it can be stated that in general not all ASes of the investigated networks are
affected by the link failure. Prefixes which have been routed over the broken link must be
redirected to new AS paths which possibly do not differ completely from the original path
but have some ASes in common. Taking into account the maximum of the means of all five
link categories, the percentage of affected ASes is not larger than 84% in Middle Topology,
38% in Topology 1140 and 59% in Topology 7774.
However, one has to be very careful to draw define conclusions. Figure 4.6 suggests that
there are strong fluctuations in the number of reached ASes for all failure categories. At
the example of Topology 7774, histograms illustrate how the computed mean values for the
percentage of affected ASes in Figure 4.5 emanate from the values measured in the different
simulation runs for each link failure category. The distribution of the ratios of ASes receiving
updates after the instability event shows similar deviations in Topology 1140 and is therefore
not presented here.
Furthermore, we can read off Figure 4.5 that if an external link between two stub ASes fails,
the ratio of reached ASes is very low for all topologies. Actually, examining the results of the
single simulations shows that in such a case only two ASes are affected by the broken link:
the ASes incident on the failure edge. This is due to the fact that the connection between two
stub ASes is a peering link, which is not supposed to be used by other ASes and on which
therefore no prefixes are advertised. As always only two ASes are affected by the instability
event, we didn’t generate a histogram for the stub-stub link failure in Figure 4.6. Note, that
CHAPTER 4. SIMULATION RESULTS
26
Middle Topology is very small compared to the other networks, leading to a higher percentage
of ASes receiving updates after a link failure in Figure 4.5.
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
ratio of ASes receiving updates after a link failure
tier1−middle links
middle−stub links
1.0
10
5
0
0
5
Frequency
10
15
ratio of ASes receiving updates after a link failure
15
0.0
Frequency
10
0
5
Frequency
10
0
5
Frequency
15
middle−middle links
15
tier1−tier1 links
0.0
0.2
0.4
0.6
0.8
ratio of ASes receiving updates after a link failure
1.0
0.0
0.2
0.4
0.6
0.8
1.0
ratio of ASes receiving updates after a link failure
Figure 4.6: Histograms of the ratio of ASes receiving updates after a link failure in Topology
7774 broken down by different link categories
We already mentioned that it is critical to predict the ratio of affected ASes based on the
classification to which the failing link belongs to. Studying Figure 4.6, only very vague
statements can be made on the relationship between the harmfulness of links if they should
fail and their membership in one of our link categories. For example, it seems that in Topology
7774 broken connections between middle ASes are noticed by slightly less ASes than it would
be the case if a link to a tier-1 AS failed.
Furthermore, it is surprising that in many simulation runs, link failures between a stub AS
and a middle AS are very harmful, inducing the propagation of updates to nearly all ASes,
whereas in many other simulations for the same failure category basically no ASes are reached
at all by updates. Similar behavior is seen for the other link failure categories.
The question arises whether it makes sense to investigate the propagation of updates broken
down by different link categories, as for all categories strong fluctuations in terms of the
ASes affected by the failure event were observed. We believe it to be essential that further
experiments with different and probable more realistic topologies are conducted before a final
statement on this issue can be made.
Speculating on the reasons for these strong fluctuations, it might be interesting to examine the
relationship between the degrees of the nodes incident on the failing edge and the percentage
CHAPTER 4. SIMULATION RESULTS
27
of affected ASes. In a highly meshed network there exist many alternative paths to the same
destination with the result that possibly only very few ASes have selected a best route which
is running over the broken link. Are the node degrees a better metric for predicting the
harmfulness of a link failure than the classification presented in this work?
Trying to answer this question it might be helpful to consult Table 4.1. It contains the average
number of neighbors for each AS in terms of the category of the link failure. Behind each
table entry, the standard deviations are indicated in brackets.
topology
Middle
1140
7774
total
2.9 (1.1)
24.1 (17.0)
11.7 (13.2)
tier1
4.0 (1.0)
20.0 (11.3)
28.6 (13.1)
middle
3.3 (0.9)
29.9 (17.1)
20.9 (14.2)
stub
2.0 (0.5)
14.7 (13.4)
5.0 (6.6)
Table 4.1: Average node degrees (standard deviations in brackets) broken down by categories
First of all, the average total node degrees show that the used topologies differ strongly
concerning their meshing degree. While Topology 7774 is more highly meshed with a mean of
24.1 neighbors per AS compared to 11.7 neighboring ASes in Topology 1140, the very small
Middle Topology only has an average node degree of 2.9. Further investigation is needed to
determine if these meshing degrees influence the ratio of affected ASes after a link failure.
Comparing the average node degree broken down by categories for Topology 7774 and Topology
1140 it seems interesting that they show converse characteristics for the tier-1 ASes. Whereas
the node degrees for the ASes which we assigned to the tier-1 group are on average higher
than those of the so-called middle ASes (28.6% compared to 20.9%) in Topology 7774, it is
just the other way round in Topology 1140 (here 20.0% to 29.9%). In generally, it is believed
that ASes in the tier-1 level are situated in the center of the network graph, neighboring
to more ASes than non-tier-1 ASes. Future work should pay more attention to model the
hierarchical structure of the Internet in a way such that topological characteristics like the
meshing degree are not falsified.
The standard deviations printed in brackets behind each data value of Table 4.1 already
indicate that there are again some fluctuations in the number of neighbors which ASes within
a certain category have. These deviations are illustrated in a more elaborate manner by
Figure 4.7.
Estimated density functions of the node degree distribution are drawn for the different AS
categories - tier-1, stub and middle - for Topology 1140 and Topology 7774. The main conclusion to be drawn of these density plots is possibly the insight that it is dangerous to consider
only the mean values, as strongly varying node degrees can be observed.
Nonetheless, we point out that tier-1 ASes in Topology 7774 are connected to 28.6 neighbors
on average whereas the node degree for the same category is only 20.0 in Topology 1140 (see
4.1). If the theory holds that the high number of alternative paths to the same destination in
a highly meshed part of a network has the consequence that only very few ASes have selected
a best route which is running over the broken link, we could conclude that this is the reason
for the fact that a tier-1 link failure is much more harmful in Topology 7774 than in Topology
1140 (compare 4.5). But again, it is not possible to make a fixed statement as for example
the results for the middle-stub links contradict this theory.
CHAPTER 4. SIMULATION RESULTS
28
0.020
tier1 ASes
middle ASes
stub ASes
0.000
0.000
0.010
Density
0.020
tier1 ASes
middle ASes
stub ASes
0.010
Density
0.030
Topology 7774
0.030
Topology 1140
0
20
40
60
node degree (number of neighbor ASes)
80
0
20
40
60
80
node degree (number of neighbor ASes)
Figure 4.7: Density functions of the node degrees (number of neighbor ASes) broken down
categories
In closing, we summarize that further investigations are needed to explore the relationship
between the commercial classification of external links and their harmfulness in terms of
the number of affected ASes if this link should fail. Maybe, some new criteria needs to be
developed for a more meaningful categorization of external connections between ASes.
4.2.3
Propagation Radius
Except for the ratio of affected ASes, the second point of interest are the distances updates
propagate through the topology when a connection between two ASes breaks. How this socalled update radius can be measured was already described in 3.3. Figure 4.8 depicts the
computed distances in the number of AS hops not including the nodes incident on the failed
edge. Again, 10 different failure scenarios were tested for all link failure categories, every time
using three different seeds for the random number generator. Whereas the mean values of
all simulation runs are plotted in the left bar-plot of Figure 4.8, the right diagram shows the
maximum update radius observed for a series of experiments for a specific failure category.
Maybe the most conspicuous result is that the maximum update radius is always less than 4
meaning that in no case updates spread more than 4 AS hops away from the source of the
instability event. We lead this back to the high average meshing degrees of our topologies
(compare Table 4.1: 11.7 for Topology 7774 and 24.1 for Topology 1140) where possibly not
many “best” routes were using the broken link. Due to its small size and synthetic nature,
Middle Topology can only be used in a restricted manner to draw significant conclusions.
The mean values for the update radius, shown in the left bar-plot, are always in the range
between 0.53 and 1.56 AS hops for all failure categories. In this context we point out that the
mean values are computed on the basis of only the ASes which receive an update in succession
to a link failure. If only a small ratio of ASes is affected, the weight of the ASes, which are
incident on the failing edge, is very high in the computation of means, leading to an average
value for the propagation distance of less than one AS hop. Again we place emphasis on the
5
29
5
CHAPTER 4. SIMULATION RESULTS
3
4
tier1−tier1
tier1−middle
middle−middle
middle−stub
stub−stub
0
1
2
# AS hops (max)
3
2
0
1
# AS hops (mean)
4
tier1−tier1
tier1−middle
middle−middle
middle−stub
stub−stub
Middle
1140
7774
Middle
1140
7774
Figure 4.8: Propagation distance of updates in the case of a link failure (measured by the
number of hops)
short mean distances updates propagate through the topologies after a link failure, but point
out at the same time to strong fluctuations in the single experiments.
Concerning the distinction between different failure categories, it seems difficult to derive any
trends in terms of their harmfulness out of the diagrams. More research is needed to explore
this issue in a more detailed manner.
Last but not least, the left diagram in Figure 4.8 and Figure 4.5 can be compared with each
other: the relationship of the ratios of reached ASes (means) for different failure categories is
predominantly reflected in the relationship of the average propagation radius. For example,
if the mean percentage of reached ASes after the failure of a tier-1 -middle link is higher than
that for a broken tier-1 -tier-1 link in Topology 1140, this fact can be seen in a higher mean
propagation radius for the tier-1 -middle link category in Figure 4.8.
In closing, we point out that all these observations need to be confirmed by further investigations.
Chapter 5
Conclusions and directions for
future work
In closing, we summarize the results of the performed sensibility analysis of BGP convergence
and scalability, obtained by using the SSFNet simulator.
One of the main objectives of this work was the examination of the influence of the MRAI
timer configuration on convergence times and number of sent external updates. The results
from Chapter 4 basically confirmed that a higher timer value leads to less exchanged update
messages at the cost of higher convergence times. In most cases a per-prefix timer basis
offers only slight advantages in terms of needed convergence times in comparison to per-peer
timers. However, the number of external updates is largely increased when keeping timers
on a per-prefix basis. Further investigations have to show whether these findings justify the
default configuration of MRAI timers on a per-peer basis as it is done by the market leaders
Cisco and Juniper.
Except for MRAI timer influences we tried to explore how far updates propagate through the
topology after a link failure. We found out that for our test topologies updates are never seen
more than 4 AS hops away from the broken edge. Altogether instabilities do not seem to
spread very strongly, being relatively locally limited. Concerning the number of ASes affected
by the failure of a link, strong fluctuations depending on the broken link were observed. Our
categorization of links according to the commercial relationship between the connected ASes
therefore does not seem very promising. However future experiments have to confirm these
results.
Altogether, there remains a lot of work to do. Using MRAI timers on a per-prefix instead of
a per-peer basis imposes a higher workload on the routers as different timer instances have
to be kept for all different prefixes. It would be interesting to examine the joint influence of
workload and the timer basis on the overall convergence process. Maybe, the use of per-peer
timers is then even more justified
The SSFNet BGP implementation contains some simplifications which might be relevant for
an all-embracing analysis. Particular emphasis could be placed on the investigation of the
route flap dampening. This mechanism was already added to the used BGP implementation and it could be worthwhile to study if route flap damping is only invoked by network
30
CHAPTER 5. CONCLUSIONS AND DIRECTIONS FOR FUTURE WORK
31
instabilities or also by oscillations which are inherent in the BGP protocol.
In our point of view, the generation of realistic topologies is essential for an analysis of BGP
behavior. In that respect, efforts can be made to improve the internal and external structure
of our sample networks. Up to now, the BGP meshes within autonomous systems are rather
statical always consisting of a ring of route reflectors and some border routers connecting to
other ASes. Maybe, other structures within the ASes can be considered in future testings.
Regarding the external topology, it is desirable to reproduce the hierarchy inherent in the
Internet as best as possible in our test topologies. For example, the distribution of node
degrees - number of neighbor ASes for an AS - should follow the same patterns as in the
Internet.
Considering all the simplifications made in our models and the remaining open questions,
there remains a lot of research to do in this field.
List of Figures
2.1
2.2
Structural overview of the SSFNet simulator . . . . . . . . . . . . . . . . . . .
Generation of DML files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
8
3.1
3.2
Middle Topology (green lines are peering links) . . . . . . . . . . . . . . . . .
Density functions of the node degrees (number of neighbor ASes) for Topology
1140 and Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Flow chart of the script mraiInvestigation.pl . . . . . . . . . . . . . . . .
Flow chart of the script updateRadius.pl . . . . . . . . . . . . . . . . . . . .
12
3.3
3.4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Convergence times and number of updates depending on different MRAI timer
values (per-peer) in Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . .
Standard deviations for the measured data values in Figure 4.1 . . . . . . . .
Comparison of per-peer and per-prefix MRAI timers in terms of convergence
times and number of external updates . . . . . . . . . . . . . . . . . . . . . .
Standard deviations for the data values measured for the per-prefix MRAI
timer in Topology 7774 in Figure (see Figure 4.3) . . . . . . . . . . . . . . . .
Average percentage of ASes receiving updates after a failure broken down by
different link categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Histograms of the ratio of ASes receiving updates after a link failure in Topology 7774 broken down by different link categories . . . . . . . . . . . . . . . .
Density functions of the node degrees (number of neighbor ASes) broken down
categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Propagation distance of updates in the case of a link failure (measured by the
number of hops) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
13
17
18
20
21
22
23
25
26
28
29
List of Tables
3.1
3.2
3.3
Properties of Topology 1140 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Properties of Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Categorization of external links for the used topologies . . . . . . . . . . . . .
4.1
Average node degrees (standard deviations in brackets) broken down by categories 27
33
12
13
15
Bibliography
[1] Timothy G. Griffin and Brian J. Premore. An Experimental Analysis of BGP Convergence Time. 2001.
[2] SSFNet 1.5, Raceway SSF, Raceway DML. Renesys Corporation. http://www.ssfnet.
org, May 2003.
[3] Andreas Hartl. Examining BGP update dynamics using network simulation. 2004.
[4] Hagen Böhm. Analysis of OSFPv2-BGP4 Interactions Using the SSFNet Simulator.
2003.
[5] Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter
Willinger. Network Topologies, Power Laws, and Hierarchy. 2001.
[6] Lixin Gao. On Inferring Autonomous System Relationships in the Internet. 2001.
[7] A. Feldmann and A. Wichmann. Extracting AS relationship information from raw BGP
data, work in progress.
[8] Saargate-AS9063. http://www.saargate.de.
[9] John W. Stewart III. BGP4 - Inter-Domain Routing in the Internet. Addison Wesley
Longman, Inc, 1999.
[10] Timothy G. Griffin. Interdomain routing links.
intel-research.net/~tgriffin/interdomain.
http://http://www.cambridge.
[11] Guido Krüger. GoTo Java 2, 2. Auflauge. Addison Wesley Longman Verlag GmbH,
2001.
[12] Michael Schilli. GoTo Perl 5. Addison Wesley Longman Verlag GmbH, 2000.
[13] Zhuoqing Morley Mano, Ramesh Govindan, George Varghese, and Randy H. Katz. Route
Flap Damping Exacerbates Internet Routing Convergence. 2002.
[14] Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, and
Christophe Diot. Analysis of link failures in an IP backbone. 2002.
[15] Olaf Maennel and Anja Feldmann. Realistic BGP Traffic for Test Labs. 2002.
34