Sensibility analysis of BGP convergence and scalability using network simulation Systementwicklungsprojekt (SEP) Institut für Informatik Technische Universität München 85748 Garching bei München Aufgabenstellerin: Prof. Anja Feldmann, PhD Betreuer: Olaf Maennel von Wolfgang Mühlbauer ([email protected]) Abgabedatum: August 30, 2004 Abstract The Border Gateway Protocol (BGP) is the quasi-standard for the routing between autonomous systems in the Internet. Instabilities in the topology like a failing link can lead to a considerable delay in convergence times. Therefore it is necessary to gain a better understanding of the global dynamics and underlying mechanisms of BGP. In this work we perform a sensibility analysis of convergence times and number of exchanged updates to the settings of BGP parameters. In particular, the influence of the Minimum Route Advertisement Interval (MRAI) timer is investigated. Further experiments serve to lighten the propagation of updates in succession to the failure of a link. Scalability questions like how many autonomous systems are affected by the instability and how far do update messages spread out from the broken link will be examined in this work. All experiments are conducted using the SSFNet network simulator. Contents Contents 1 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . 1.2 Goals of this Study . . . . . . . . . . . . . 1.2.1 Influence of MRAI Timer Settings 1.2.2 Propagation of updates . . . . . . 1.3 Guide to the Reader . . . . . . . . . . . . . . . . . 3 3 3 4 4 4 2 Using the SSFNet Simulator 2.1 General Overview . . . . . . . . . . . . . . 2.2 Extensions to SSFNet . . . . . . . . . . . 2.3 Generation of DML Files . . . . . . . . . 2.3.1 Subgraph Extraction . . . . . . . 2.3.2 Automatic Generation of the DML 2.4 Simulator Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File . . . 3 Setting up the Experiments 3.1 Simulation Topologies . . . . . . . . . . . . 3.1.1 Middle Topology . . . . . . . . . . . 3.1.2 Topology 1140 . . . . . . . . . . . . 3.1.3 Topology 7774 . . . . . . . . . . . . 3.2 Generation of Link Failures . . . . . . . . . 3.2.1 Link Categories . . . . . . . . . . . 3.2.2 Failure Scenarios in the Experiments 3.3 Analysis of the Simulation Results . . . . . 3.4 Taken Experiments . . . . . . . . . . . . . . 3.4.1 Investigation of MRAI Timer Effects 3.4.2 Investigation of Update Propagation . . . . . . . . . . 4 Simulation results 4.1 Influence of MRAI Timer Settings . . . . . . 4.1.1 Varying the MRAI Timer Values . . . 4.1.2 Per-peer and per-prefix MRAI Timers 4.2 Propagation of Updates . . . . . . . . . . . . 4.2.1 Experiment Description . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 7 8 8 9 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 11 12 13 13 14 15 15 16 16 17 . . . . . 19 19 19 21 24 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 4.2.2 4.2.3 2 Number of affected ASes after a link failure . . . . . . . . . . . . . . . Propagation Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 28 5 Conclusions and directions for future work 30 List of Figures 32 List of Tables 33 Bibliography 34 Chapter 1 Introduction 1.1 Motivation ”QUERTYUIOP!” This strange-looking collection of characters is said to be the content of the first electronic missive sent by the engineer Ray Tomlinson in 1971 from one computer to another computer sitting right beside it. Of course, Tomlinson and other network pioneers of this time, could not anticipate the tremendous development of networking resulting in a worldwide mesh of connections, the Internet. Complex issues arise with the increasing size of networks. Take as an example routing in the Internet: how do packets find their way to a specific destination in this distributed environment where no router has knowledge on the global topology and all available network links? It is the task of so-called intra-domain routing protocols (e.g. OSPF, RIP) and interdomain routing protocols (BGP) to provide a solution to this problem. However, existing routing protocol implementations are far from being perfect. The Border Gateway Protocol (BGP) being responsible for maintaining connectivity between autonomous systems (ASes) in the Internet sometimes cannot prevent considerable delays in the convergence process after instabilities have occurred in the network. Unfortunately, the underlying mechanisms of BGP are not yet understood well enough to improve the existing protocol implementation in terms of specific aspects. Therefore, a careful analysis of the status quo is indispensable. 1.2 Goals of this Study It is the main objective of this work to explore the scalability of BGP and the influence of configuration parameters on convergence times and the number of exchanged protocol messages by using the SSFNet simulator. For this purpose, we concentrated basically on two aspects which will now be introduced very briefly: the settings of the MRAI timer and the propagation of updates. 3 CHAPTER 1. INTRODUCTION 1.2.1 4 Influence of MRAI Timer Settings 1 In order to generate their routing tables, BGP speaking routers exchange messages in a similar way as it is done by other distance vector protocols. These advertisement messages are rate-limited using timers associated with the value Minimum Route Advertisement Interval (MRAI). Whenever a router is advertising a route for a certain destination to a neighbor autonomous system (AS), a new instance of this timer is started. In the aftermath it is prohibited to send another advertisement concerning this destination to that neighbor until the associated timer has expired after MRAI seconds. This rate limiting is supposed to dampen some of the oscillations inherent in a distance vector protocol. While waiting for an MRAI timer to expire, a BGP router does not expose its connected neighbor ASes to every intermediate step in finding the best path to a certain destination. Thus rate limiting can be expected to reduce the number of updates needed for convergence at the cost of adding some delay to the sent messages. It is one main objective of this work to perform a sensibility analysis on the parameters of the MRAI timer. 1.2.2 Propagation of updates After a link has broken in a given network, update messages between the autonomous systems (ASes) must be exchanged until new best paths for all affected routes have been installed again. It could be assumed that in general some ASes will not discern the instability, meaning that they don’t receive any BGP update messages. This might happen if prefixes which have been routed over the broken link are now redirected to new paths and those new paths possibly do not differ completely from the original path but actually have some nodes (ASes) in common. Investigating the propagation of updates involves making a statement on the number or ratio of ASes receiving update messages as a consequence of a broken link and on the propagation radius. By propagation radius we understand the distances updates spread within the topology starting from the source of the instability. Said more simply: how far away from the broken link can the instability still be observed? Altogether, this second main aspect of our work could be of great importance for drawing conclusions on the scalability of the BGP protocol. 1.3 Guide to the Reader This document is structured as follows: In the following part we are describing the usage of SSFNet and the extensions made to this network simulator. After giving an overview of how the experiments were conducted in Chapter 3, we present the results of our simulations in Chapter 4. We are closing with a short summary and some suggestions for future work in this area. 1 section has been adopted from [1] with marginal modifications Chapter 2 Using the SSFNet Simulator 2.1 General Overview Examining the dynamics of updates in a distributed protocol like BGP constitutes a challenging task. For a complete understanding and analysis, it is desirable to have a global view and control on all routers involved in the protocol communication. In the case of BGP, knowledge on all messages sent from or received by a BGP-speaker allows to find out the prefixes which are advertised to another neighbor, thus deducing the routing in this inter-domain topology. Additionally, it is possible to impose specific events like link failures or the advertisement of new prefixes. By using simulation techniques, this global view and control of things can most easily be achieved. We decided to use the Scalable Simulation Framework (SSF) [2] mainly due to three reasons. First of all, this framework already provides an implementation of the BGP4 protocol made available by B.J. Premore [1]. Furthermore, the basic BGP implementation has been extended with a lot of new features by members of the research group of Prof. Anja Feldmann in the past (see section 2.1). The modular and concise structure of SSF in comparison to other network simulators like ns-2 alleviates last but not least the enhancement with new features needed for our investigations. A general overview on the SSF network simulator is given in Figure 2.1. It consists of a discrete-event simulation kernel, dealing with all the fundamental aspects of a simulation. Based on this kernel, SSFNet provides a collection of Java-based components which contain the modeling of the Internet protocols and network topologies and which can be easily extended with new features and new components. Within the SSFNet packages, it can be distinguished between further parts, of which the most important ones are SSF.OS and SSF.Net. Whereas SSF.Net reproduces the network topologies (links, nodes, network connectivity), SSF.OS is responsible for modeling the numerous protocols (e.g. TCP, BGP, OSPF). The configuration of the simulation parameters, the used network topologies and the simulation dynamics are defined in text files which are written in the Domain Modeling Language (DML) syntax [2]. Under the assumption that all needed protocols and features are already implemented in SSFNet, the major task is to generate the DML-files. The following excerpt is supposed to 5 CHAPTER 2. USING THE SSFNET SIMULATOR 6 SSF simulation kernel based on SSFNet simulation models part of SSF.OS protocol simulation configures DML files configuration files part of SSF.Net topology simulation Figure 2.1: Structural overview of the SSFNet simulator illustrate in a very simple manner the concept of the Domain Modeling Language. Net [ host [ id 1 interface [id 0 bitrate 100000000 latency 0.0] graph [ ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster] ProtocolSession [name ip use SSF.OS.IP] ] ] host [ id 2 interface [id 0 bitrate 100000000 latency 0.0] graph [ ProtocolSession [name tcp use SSF.OS.TCP.tcpSessionMaster] ProtocolSession [name ip use SSF.OS.IP] ] ] link [attach 1(0) attach 2(0) delay 0.002] ] This DML snippet describes a network with two hosts, each running a TCP session over the IP protocol. The host (id 1) is connected to the host (id 2) by a link with delay 0.002. As the DML files are getting very large with increasing size and complexity of the simulation topologies, they will be generated automatically in most cases (see section 2.3). The user does not have to care about the assignment of unique identifiers for hosts and interfaces (NHI addresses) and their corresponding IP addresses because this is done by the simulator itself. Issuing the following command will run the simulation for 2000 seconds, provided that all path variables have been set correctly (refer to [2]) and that the configuration has been written into the DML file myModel.dml. java SSF.Net.Net 2000 myModel.dml CHAPTER 2. USING THE SSFNET SIMULATOR 2.2 7 Extensions to SSFNet As already mentioned in the last section, there is already an implementation of BGP4 included in the SSFNet package (primary author B.J. Premore). For our work we used SSFNet version 1.4 which orientates itself strongly to the recommendations of RFC 1771 but which still misses some BGP functionality like route flap damping. More detailed information and a summary of the implemented features can be found in [2] and in [3]. However, numerous features were added to the BGP implementation of SSFNet in the last few years by members of the research group of Prof. Anja Feldmann. In the scope of his diploma thesis “Analysis of OSPFv2-BGP4 Interactions Using the SSFNet Simulator” [4] Hagen Böhm amended SSFNet with the OSPFv2 protocol and a scanning process for BGP with the possibility of simulating link failures . Another diploma thesis by Andreas Hartl [3] investigated the dynamics of BGP updates in realistic topologies making it necessary to add new functionality to the existing BGP implementation in SSFNet. The following were the most important changes: • MRAI timer : With Cisco routers being the overwhelming majority in nowadays networks, the MRAI timer implementation was modified such that it exactly models the behavior of Cisco BGP speakers. Needed changes were the normal distribution of the MRAI timer value (takes values between 25s and 31s) and the use of a per-peer timer basis instead of a per-prefix MRAI timer (Cisco routers keep a separate timer only for each neighbor not for each advertised prefix to a neighbor). • Best Path Selection Process: Here again, the strategy of finding the “best” route to a destination differs for Cisco routers from that of the RFC specification. In particular, more emphasis is placed on the length of the AS PATH attribute in the tie-breaking mechanism. • Community Values: The SSFNet BGP implementation was modified such that it understands BGP community values as specified in RFC 1997. With the help of community values, it is possible to reproduce peering, or customer-provider relations between autonomous systems (refer to [5] and [6] for closer information on AS relationships). • Workload Generation: Factors like the number of BGP sessions on a host, the size of the routing tables or the number of updates in the input queue were taken into account in order to create a more realistic workload. An all-embracing description of these adaptions can be found in [3]. For the investigation of BGP convergence and scalability, a mechanism was needed to create instabilities in a given topology. Modified SSFNet classes, implemented by Hagen Böhm, were merged with the normal BGP implementation, making it possible to configure link fails in a comfortable way in the DML files. Though not used for the simulations in this work, we added the route flap damping mechanism (taken from a later SSFNet version) and the option of inserting “dummy prefixes” into the network. CHAPTER 2. USING THE SSFNET SIMULATOR 2.3 8 Generation of DML Files The configuration of the topology, the simulation parameters and simulation dynamics are all defined in DML files. The main objective of this work consists in examining BGP update dynamics in realistic topologies. But with the high complexity of realistic networks it becomes more and more difficult to built the input files manually. That is why we need the possibility of creating DML files automatically. The general procedure is depicted in Figure 2.2. AS relationship pairs (extracted from authentic BGP routing tables) AS relationship pairs (extracted subgraph) DML configuration file (simulator input) Figure 2.2: Generation of DML files The source for the automated generation of DML files are so-called AS relationship pairs. Being extracted from authentic BGP tables (for example from RIPE) with some kind of seed information (e.g. a tier1-provider) by a tool from Arne Wichmann [7], these pairs reflect the commercial relationships between interconnected ASes. For our study we distinguish between provider-customer relationships (customer pays its provider for connectivity to the rest of the Internet) and peering links (neighboring ASes agree to exchange traffic free of charge). The following excerpt shows a possible AS relationship pairs input: 1234 > 2401 2401 = 3110 3110 < 1234 This short listing denotes that AS 1234 is a provider of 2401 (>), 2401 is sharing a peering link with 3110 (=) and AS 3110 is customer of 1234 (<). Unfortunately, taking the complete topology graph resulting from all AS pairs gained from RIPE or other BGP instances, is not possible. Due to high memory demands of such a sample network in the simulations, we extract a complete subtree of a specified AS out of the original graph. The Topology Extraction tool from Andreas Hartl [3] will be explained briefly in 2.3.1. After extracting a subgraph, we have a reduced topology with less ASes. However, our current input is still in the syntax of the original AS relationship pairs. It is the task of the Topology Conversion tool to convert this “AS-Pairs syntax” into the DML language format which can then be passed on to the SSFNet simulator as input. A short description of this tool together with our extensions can be found in 2.3.2. 2.3.1 Subgraph Extraction In order to reduce the complexity and size of the sample networks used for our simulations, a subtree of a specified AS can be extracted out of the original graph given in the form of AS Pairs. The basic idea of the Topology Extraction tool [3] consists in first doing a kind of depth-first-search algorithm up to a certain depth meaning that it will find all core ASes CHAPTER 2. USING THE SSFNET SIMULATOR 9 which are not more than a specified number of AS hops away from the starting AS (the tool refers to this parameter as “number of AS hops”). Afterwards it searches all paths between the core ASes up to a specified length (referred to as “ maximum path length” by the tool), adding all intermediate ASes on these paths which have not been visited yet. It should be mentioned that the extraction of the core ASes is done under consideration of certain redistribution policies arising from commercial relationships (peering, customer-provider). Taking into account all paths between the core ASes up to a certain length ensures that most propagation effects should also appear in our extracted topology if they can be observed in the complete network. For an all-embracing explanation of the subtree extraction, we refer to section 4.2.1 of [3]. The extraction tool asks for all parameters and is started by typing: make extract 2.3.2 Automatic Generation of the DML File Figure 2.2 showed that we generate the DML files needed for our simulations on the basis of AS relationship pairs. Even if a subtree is extracted from the complete topology, the input is still in the form of AS pairs and must be converted to the DML syntax before SSFNet can use it. For this purpose there is another tool called Topology Converter which fulfills this task. A detailed description of the functionality of this tool can be found in section 4.2.2 of [3]. In terms of the DML file generation out of AS pairs, it is possible to distinguish between two important parts: the external structure of the topology (links between ASes) and the internal structure (I-BGP mesh within an AS). Whereas the external structure is built upon the information given in the form of AS pairs, the interior of an AS is generated according to the wishes of the user. For example, the user can determine the number of route reflectors within an AS or the number of border routers which connect to other customer, provider and peering ASes. However, the internal structures will look similar for all ASes in the topology; it is not possible to generate different (I)-BGP meshes for each AS. The original Topology Converter was modified slightly. The changed version will be called Topology Generator and contains the following additional features: • Per-prefix MRAI, WRATE, SSLD: The user is asked whether he wishes to use perprefix MRAI timers instead of per-peer MRAI timers and whether to activate WRATE (withdrawal rate limiting) and SSLD (sender side loop detection). • Route Flap Damping: The route flap damping mechanism of a later SSFNet version (1.5) was merged into the used SSFNet implementation (version 1.4). For the case that the modified SSFNet version is used, the Topology Generator tool can enable route flap damping with different parameter settings (default Cisco or Juniper settings or manual specification of the parameters is possible). However, route flap damping was not used for the simulations in this study. • Dummy Prefixes: There is the possibility of inserting a specified number of dummy prefixes starting from a dummy-AS into the network. In this way, it can be achieved CHAPTER 2. USING THE SSFNET SIMULATOR 10 that the BGP routing tables are larger and have more entries. • Link Failures: The specification of link failures has been extended. Now it is possible to define the number of links to fail, a time interval in which the link failures occur at a random time and a time when all broken links are supposed to recover. The Topology Conversion or Topology Generator tool is started by typing make convert in the appropriate directory. 2.4 Simulator Output An essential part of this study consists in analyzing the simulation output and drawing conclusions based on the results of the simulation. During a simulation run, all sent and received updates are logged, containing information on the sending or receiving time, the sender or receiver, the type of the update message, the affected prefix and the AS PATH attribute of the BGP message. The following two lines give an idea of the logged data: 45.709774161 send 4:10 4:2 rte 0.0.1.0/26 (3 1) 45.710960373 receive 4:10 4:2 rte 0.0.1.0/26 (3 1) Both lines actually belong to the identic BGP message from interface 4:10 to 4:2. The first line shows the time when the message was sent by the source, the second time indicates the arrival time at the destination. Here the prefix 0.0.1.0/26 is advertised within AS 4 because sender and receiver are both part of AS 4 (4:10 and 4:2). Originally, the prefix was announced by AS 3 and has propagated over AS 1 to AS 4 which is indicated by the AS PATH attribute (3 1). With all this information it is possible to get the desired complete view and control on all BGP speakers in the network thus enabling us to perform a comprehensive analysis of update dynamics. Chapter 3 Setting up the Experiments Whereas the last chapter dealt with all relevant aspects in terms of the used network simulator SSFNet, this chapter is dedicated to the setting up of the experiments. It is essential to know how an experiment was conducted, what pre-assumptions were made, what testing environment was used, etc. The sections below explain in detail our investigations of the MRAI timer influence on convergence times and the propagation of updates. 3.1 Simulation Topologies Running simulations with the SSFNet simulator requires as input a file in the DML format. The DML files do not only describe the simulation parameters and dynamics (e.g. link failures) but also the network topology, i.e. the graph of ASes. One of the main objectives of this study is to examine BGP behavior in realistic networks which approach the structure of the Internet as closely as possible. For that reason choosing the simulation topologies is a critical task. In order to verify the correctness of our extensions to SSFNet and certain auxiliary Perl scripts, we developed some simple testing networks, which are of no greater importance for the results of this study. The more complex and realistic networks were all generated automatically as described in section 2.3. Now we introduce the relevant topologies for conducting the experiments. 3.1.1 Middle Topology The so-called Middle Topology (taken from [3]) is pictured in Figure 3.1. This topology was created manually by specifying the commercial relationships between the ASes in the AS pair format and then running the Topology Conversion tool (see 2.3.2). Contrary to the next two topologies, it was not generated out of AS relationship pairs from RIPE or other Internet sources thus being a more synthetic network. Nonetheless, it already shows some characteristics which can be found in realistic networks, too. For example, the graph already contains a certain hierarchy of top-level tier 1 ASes 11 CHAPTER 3. SETTING UP THE EXPERIMENTS AS 1 AS 4 AS 5 AS 10 AS 11 AS 2 AS 6 AS 12 12 AS 3 AS 7 AS 13 AS 8 AS 9 AS 14 AS 15 Figure 3.1: Middle Topology (green lines are peering links) (here AS 1, 2 and 3), parts which are more in the middle of the graph (AS 4 to 9) and ASes at the bottom of the graph (AS 10 to 15) to which we frequently refer as stub ASes. It shows out that this distinction between different levels (tiers) makes sense in the Internet, too. Furthermore, here are some ASes in the graph which are multi-homed, meaning that they are connected with more than one provider. For the Middle topology as well as for the other ones used in this work, it must be pointed out that the number of external links between a pair of ASes is varied according to the needs of the specific experiment. However, this will be mentioned clearly in each case. 3.1.2 Topology 1140 The main objective of investigating BGP update dynamics in a realistic environment requires that more or less realistic test networks are used. For this purpose, a subtree of a small German ISP [8] was extracted with the Topology Extraction tool (see 2.3.1) based on the commercial relationships between ASes measured in 2003 by [7]. Due to memory limitations, it was necessary to restrict the “number of AS hops” to one (for finding the core ASes) and the “maximum path length” to five ASes. Table 3.1 summarizes some facts for the extracted network: # ASes 95 # external links 1145 graph degree (avg) 24.1 # core ASes 5 Table 3.1: Properties of Topology 1140 Altogether we receive 95 ASes where each AS is composed of several routers organized in an I-BGP mesh. Under the assumption that every pair of ASes is only connected by one link, we obtain 1145 external links, leading to an average graph degree (average number of neighbors for each AS) of 24.1. Though the topology graph seems to be highly meshed, the extraction tool only finds 5 core ASes in the first step. The still missing conversion from the extracted subgraph to the DML syntax is done with the Topology Conversion tool. CHAPTER 3. SETTING UP THE EXPERIMENTS 3.1.3 13 Topology 7774 The procedure for generating Topology 7774 is basically the same as for Topology 1140. It mainly differs in the used AS relationship pairs [7] which are here from April 2004 and thus more up-to-date. The extraction was started from AS 7774 with the “number of AS hops” set to one (to find the core ASes) and the “maximum path length” set to five. A summary of some characteristics is shown in Table 3.1. # ASes 105 # external links 614 graph degree (avg) 11.7 # core ASes 3 Table 3.2: Properties of Topology 7774 Contrary to Topology 1140, it has more ASes, though the extraction was started with less core ASes. The number of external links is lower compared to 1140, consequently resulting in a lower average graph degree. However, the density functions in Figure 3.2 suggest that the number of neighbor ASes is subject to a broad distribution. 0.02 Density 0.00 0.01 0.02 0.00 0.01 Density 0.03 Topology 7774 0.03 Topology 1140 0 20 40 60 80 node degree (number of neighbor ASes) 0 20 40 60 80 node degree (number of neighbor ASes) Figure 3.2: Density functions of the node degrees (number of neighbor ASes) for Topology 1140 and Topology 7774 3.2 Generation of Link Failures Up to now, we only covered the static aspects of the simulator input, namely the generation of the topology and the interconnections between ASes. However, an important part in the experiments are dynamic circumstances like the occurrence of link instabilities or the advertisement of new routes and prefixes. For the testing scenarios in this work, it is sufficient to dispose of a mean of simulating link failures at a specific time. Thanks to Hagen Böhm [4], it is possible to let a link fail with the following DML extension: link [ attach 1:1(1) attach 2:2(2) delay 0.0010 fail [ from 300 until 900 ] ] This DML statement will make the link between router 1 in AS 1 and router 2 in AS 2 fail at simulation time 300s, basically dropping all (IP) packets at one router interface. At time point 900s the link will recover and transport data as usually. It should be mentioned that CHAPTER 3. SETTING UP THE EXPERIMENTS 14 in general link failures are not configured manually in the DML files but with the help of the Topology Generator tool (see 2.3.2) or a special Perl script (cf. 3.2.2) which was developed for this purpose. Last but not least, we are interested in categorizing links in terms of their harmfulness if they should fail. The next subsection will illustrate what is understand by such a classification of external links, whereas subsection 3.2.2 presents a script for configuring link failures depending on the desired “failure category”. 3.2.1 Link Categories When discussing the characteristics of Topology Middle in 3.1.1, we already alluded to the fact that realistic topologies - e.g. the Internet - obey a certain hierarchy. Indeed, there are research papers ([5] and [6]) which seem to confirm that the autonomous systems in the Internet can be classified in different categories in terms of their commercial relationships. By convention, ASes which are at the top of the hierarchy, having no providers and only peering with other “top ASes” are called tier1. In our work we wanted to examine in how far the position in this hierarchy is correlated with the harmfulness which this link has for the propagation of updates accepted the case the link should fail. Before classifying the external links of a network, the ASes were associated with one of the following categories: • tier1-AS : All ASes which are not connected to any provider, thus being at the top of the hierarchy are said to be in tier1. • stub-AS : ASes which don’t have any customers are at the bottom of our ranking and are assigned to the category of stub entities. • middle-AS : All ASes which don’t belong to one of the first two groups fall into this category. Starting from these categories of ASes, the external links were assigned to one of the groups below: • tier1-tier1 : Links between two tier1-ASes. • tier1-middle: Link between tier1 and middle-AS. • middle-middle: Link between two middle ASes. • middle-stub: Link between middle AS and stub AS. • stub-stub: Link between stub ASes. Table 3.3 shows the results of this classifications for the topologies used in our simulations (it is assumed that each AS pair is only connected by one link). All topologies have in common that they consist of only very few tier1-tier1 links with the majority of external links concentrated in the middle-middle group. This fact suggests that the tier1 ASes as well as the stub ASes are probably situated more on the “edge” of the CHAPTER 3. SETTING UP THE EXPERIMENTS Topology Middle 1140 7774 # links 27 1145 614 tier1-tier1 3 3 15 tier1-middle 11 125 116 middle-middle 3 627 242 15 middle-stub 8 352 219 stub-stub 2 38 22 Table 3.3: Categorization of external links for the used topologies topology graph. It is pointed out that Topology Middle is much smaller than the other two networks, only having 27 external links. The configuration of link failures according to the just described categorization is automatized by the Perl script CreateLinkFails.pl which will be the explained in the next subsection. 3.2.2 Failure Scenarios in the Experiments It was one objective of this study to examine the propagation of updates depending on the category of the failing link. Choosing a link of the desired category can be done easily with the Perl script CreateLinkFails.pl. As input this script requires the desired number of links to fails, the category of which the failing links should belong to, the DML input file for the simulation, a failure time period and the time when the link should recover from the failure state. According to these input parameters an appropriate failing link is configured in the DML file as described in 3.2. The internal proceeding of this script as basically as described in the last subsection. After classifying the ASes, the external links are assigned to categories and then the links to fail are chosen randomly as well as the exact failure time within the specified failure period. In most cases, this script will be called be other control scripts in our experiments. 3.3 Analysis of the Simulation Results Investigating the propagation of updates and the influence of different MRAI timer settings are an integral part of this project. From the logged BGP messages (see 2.4) the following values are derived: • Convergence times: After a link in the topology fails, BGP messages are exchanged between BGP speakers until all routes which were leading over the broken link are redirected to other paths. The time from when the first BGP message is sent after the occurrence of the instability until the time when the last BGP update is received by a router will be referred to as the convergence time. With the help of the logged time stamps, it is possible to determine these convergence times. • Number of affected ASes: Another interesting aspect consists in examining the spread of instabilities across the topology. If a connection between two routers drops out, not necessarily all BGP speakers will see this change, possibly due to the reason that they didn’t route any prefix over the broken link. By looking for all ASes in the log files CHAPTER 3. SETTING UP THE EXPERIMENTS 16 which received a BGP message as a consequence of one broken link, it is possible to determine the number or percentage of ASes which are reached by the instability. • Propagation Radius: Concerning the propagation of updates it is interesting to know the distance of the affected ASes from the broken link. Basically, BGP messages spread in all directions from where the instability occurs. By analyzing all logged messages it is possible to trace back the intermediate hops along which updates have propagated until reaching this AS. The propagation distance or radius is the number of hops not including the source node of the instability. Analyzing the logged data must be done for each simulation run and is automatized by the Perl script LogfileAnalyzer.pl. It gets as input a file with the logged simulator output and asks additionally for the DML file which was used by SSFNet. The output of this Perl script is a text file containing among other things the just described result values like convergence times, percentage of affected ASes and propagation radius. It deserves mentioning that the script only takes into account BGP messages for analysis whose timestamp lies in a specified time window. In this way, it can be ensured that all considered updates are exclusively affiliated with a specific instability event. Assigning the value 0 for the update radius to the two ASes being incident to the failing edge, the update distance for the other ASes can be recursively determined by defining it to be min{currentDistance, n+1} if an update message was received from an AS which has the radius n. Usually, the script LogfileAnalyzer.pl is called by other control scripts and not started manually. The two main control scripts will be introduced in the next section. 3.4 Taken Experiments Having talked about the generation of the static and dynamic properties of our experiments and the analysis of the simulation results, this chapter deals with the “high-level” view of how we conducted our investigations. It will be clarified what exact steps were taken to obtain the results of Chapter 4. For both main goals of this study - the investigation of the update propagation and the MRAI timer influence on convergence - Perl control scripts were written which will be presented in the next two subsections. 3.4.1 Investigation of MRAI Timer Effects Examining the properties of different MRAI timer values and the use of a per-peer versus per-prefix timer is done by the Perl script mraiInvestigation.pl. Basically, we are running the simulations along four dimensions: Different failure links, diverse MRAI timer values, a per-peer or per-prefix timer basis and different random seeds for the initialization of the random number generators. The general steps taken by this script are depicted in Figure 3.3. At the beginning of a cycle the script configures a link failure with the help of the CreateLinkFails.pl tool (compare 3.2.2). After adjusting the MRAI timer to a value out of the set {4s, 5s, 10s, 15s, . . . , 55s, 60s}, the timer basis is determined to be either per-peer or per-prefix. Last but not least, a seed for the random number generator of SSFNet is chosen CHAPTER 3. SETTING UP THE EXPERIMENTS 17 configure link failures in dml file START set the MRAI timer value change timer basis: per−peer and per−prefix choose seed for random number generators run simulation with SSFNet analyze the results (summary is stored to file) END Figure 3.3: Flow chart of the script mraiInvestigation.pl out of a given set of possible seeds, all being arbitrary text strings. When the dimension parameters have been set, the control script initiates the SSFNet simulation and has the computed results analyzed with the LogfileAnalyzer.pl. For each single simulation run, mraiInvestigation.pl keeps some information about convergence time and number of exchanged updates which are summarized in a text file after all simulations (and the control script) have been finished. Altogether, this control script contains four nested loops each iterating over the parameters of one so-called testing dimension. More detailed information like the number of iterations with different random seeds is given in in Chapter 4, as some settings might vary for diverse testing series. 3.4.2 Investigation of Update Propagation Examining update propagation properties - number of affected ASes or update radius - is done in an analogical manner as described in the preceding subsection. Again a Perl script called updateRadius.pl is responsible for testing along three dimensions: Different categories of link failures (cf. 3.2.1), diverse failing links within each link category and different seeds for the random number generator of SSFNet. Figure 3.4 illustrates the basic steps during a run of updateRadius.pl: Here the script comprises 3 nested loops, iterating over the parameters for each so-called dimension. First we determine to which category the failing link should belong to (stub-stub, tier1-middle, etc.). The CreateLinkFails.pl script (see 3.2.2) then configures a link to fail, whereby different links for each failure category are tested (second loop). After choosing a seed string for the random number generator (third loop), SSFNet is started and the results are CHAPTER 3. SETTING UP THE EXPERIMENTS 18 choose a failure category START configure link failures in DML file choose seed for random number generators run simulation with SSFNet analyze the results (summary is stored to file) END Figure 3.4: Flow chart of the script updateRadius.pl analyzed with LogfileAnalyzer.pl. Again, we remember some results like the percentage of affected ASes for each simulation run in order to create a summary of the results before updateRadius.pl is terminating. Chapter 4 Simulation results In order to obtain a better understanding of the underlying mechanisms of BGP, a careful and all-embracing sensibility analysis of the protocol parameters is needed. Within the scope of this study, we concentrated on two important aspects which we believe to be essential for an evaluation of BGP in terms of scalability and convergence times: the propagation of updates in succession to a link failure and the influence of the MRAI timer on the convergence times and the number of sent updates. This chapter describes the conducted experiments and documents the received results. 4.1 Influence of MRAI Timer Settings The protocol specification of BGP includes several configurable timers, one of which is the Minimum Route Advertisement Interval (MRAI) timer. Being responsible for limiting the number of updates sent by a BGP speaker or for a certain prefix, this timer might have direct influence on the number of updates and the convergence times after a link failure. Part of this section are different configuration settings for the MRAI timer and their effects on the general convergence process. Main attention will be devoted to two important configuration options: choosing a per-peer or per-prefix timer and what timer value to take. Arising questions are for example: How do convergence times and number of updates change with increasing value of the MRAI timer and what advantages do per-prefix timers offer in comparison to per-peer timers? 4.1.1 Varying the MRAI Timer Values Every time a router sends a route advertisement to a neighbor it is starts a new instance of the MRAI timer, not allowing this router to send another advertisement concerning the same destination until the timer has expired. For this experiment we used a per-peer timer basis and had 20 links failed at arbitrary locations within Topology 7774. The exact point of time when the link failures occur are chosen randomly out of a time window with a length of 20s in order to avoid possible synchronous runs of different timer instances. All experiments 19 CHAPTER 4. SIMULATION RESULTS 20 concerning the MRAI timer were conducted with our sample topologies having two external links (multi-homing) between a pair of ASes and SSLD but no WRATE being used. Always measuring the number of exchanged external updates (updates between ASes) and the time from the first update sent after the instability event until the time the last update was received, we conducted the experiment for MRAI timer values of 4s, 5s, 10s, 15s, . . . , 60s. With the help of the Perl script mraiInvestigation.pl (see 3.4.1), the simulations were initiated automatically, running each simulation based on three diverse seeds for the random number generators of SSFNet. Furthermore, we considered four different failure scenarios with 20 broken links for every MRAI timer value configuration. Diagram 4.1 shows the results of these experiments. Note that all testings were done with a per-peer timer basis and that only the means of the measured number of external updates and of the convergence times for a specific MRAI timer value are displayed. 60 1600 1400 1500 70 80 90 # external updates (mean) 100 110 # External Updates 50 convergence time in sec (mean) Convergence times 10 20 30 40 MRAI timer value in sec 50 60 10 20 30 40 50 60 MRAI timer value in sec Figure 4.1: Convergence times and number of updates depending on different MRAI timer values (per-peer) in Topology 7774 Regarding the convergence process it can be inferred from Figure 4.1 that increasing values for the MRAI timer impose a penalty for the times needed until a steady state for the routing has been reached again. Whereas an MRAI timer of 60s requires 110s to converge, setting the timer to 4s only leads to a period of about 40s until all updates after the link failures have been exchanged. The growth of the observed convergence times seems to be approximately linear with respect to increasing MRAI values in this experiment. These results suggest that the rate-limiting mechanism of the MRAI timer adds more delay to the messages for higher timer values. However, it is pointed out that there exists related work [1] which found out that very low MRAI timer values can cause a high workload in BGP routers thus inducing an increase in convergence times again. The second part of Figure 4.1 depicts the number of external updates as a function of the setting of the timer value. Here we observe the converse trend: with increasing MRAI value, the number of exchanged BGP messages is decreasing from about 1600 to about 1400s. We explain this observation by the dampening of some of the route oscillations which are inherent in the path-vector protocol BGP. A BGP-speaking router can collect and evaluate alternative CHAPTER 4. SIMULATION RESULTS 21 paths for a certain prefix before it is advertising the best path to its neighbors. Therefore its neighbors are not exposed to every intermediate path but only to the best path within a period of time. It might be asked whether it is justified to use the means of the measured data values for drawing general conclusions on the influence of the MRAI timer parameters. Strong fluctuations could possible weaken the explanatory power of our results. The standard deviations for the measured number of external updates and the convergence times are depicted in Figure 4.2. # External Updates 2000 1500 1000 # external updates 0 500 100 80 60 40 convergence time in sec 120 Convergence times 10 20 30 40 50 60 MRAI timer value in sec 10 20 30 40 50 60 MRAI timer value in sec Figure 4.2: Standard deviations for the measured data values in Figure 4.1 The standard deviations for the number of updates are less than 35% of the computed means for all MRAI timer configurations, whereas the deviations are never larger than 10% of the means in the case of the convergence times. Although the number of updates shows a higher variability, the calculation and use of the means seems to be justified considered the fact that twelve simulation runs were made for every MRAI timer value. In closing, we summarize that with increasing values for a per-peer MRAI, the number of external updates is decreasing at the cost of higher convergence times. 4.1.2 Per-peer and per-prefix MRAI Timers The question was already raised whether per-prefix MRAI timers have any advantages in comparison to per-peer timers. One might expect that keeping a separate timer for each single prefix being advertised to a neighboring peer does not impose so high penalties on convergence times as using timers on a per-peer basis. The following experiment tries to light up questions concerning the use of per-prefix MRAI timers. In analogy to the simulation described in 4.1.1 we generate 20 link failures at arbitrary locations of Topology 1140 and Topology 7774 which occur randomly within a time window of 20s. Again, we measure the number of exchanged external updates and the time from the first update sent after the instability event until the time the last update was received by a host (referred to as convergence time). The MRAI value was varied in the same way as in CHAPTER 4. SIMULATION RESULTS 22 4.1.1 but this time simulations are run for both a per-peer and a per-prefix MRAI timer. In order to enhance the explanatory power of our conclusions, we perform simulations on the two “big” topologies: Topology 1140 and Topology 7774. With the help of the Perl script mraiInvestigation.pl twelve simulations are run for every fixed MRAI timer value and fixed timer basis (per-peer and per-prefix) for the two topologies, as three different seeds for the random number generator and 4 different failure scenarios are used in each case. The diagrams in Figure 4.3 illustrate the convergence times and the number of external updates depending on the used MRAI value for a per-peer and a per-prefix timer basis. Note, that always the means for a specific timer value are plotted. 20 30 40 50 100 80 10 20 30 40 50 Topology 1140: # External Updates Topology 7774: # External Updates 60 2000 1500 per−peer per−prefix 500 1000 per−peer per−prefix 1000 1500 2000 # external updates (mean) 2500 MRAI timer value in sec 500 # external updates (mean) 60 convergence time in sec (mean) 60 MRAI timer value in sec 2500 10 per−peer per−prefix 40 100 80 60 per−peer per−prefix 40 convergence time in sec (mean) 120 Topology 7774: Convergence times 120 Topology 1140: Convergence times 10 20 30 40 MRAI timer value in sec 50 60 10 20 30 40 50 60 MRAI timer value in sec Figure 4.3: Comparison of per-peer and per-prefix MRAI timers in terms of convergence times and number of external updates Concerning the dependence between convergence times or number of updates and different timer values for a per-peer MRAI in Topology 1140, the reader is referred to 4.1.1, as the observations and conclusions for this case are basically the same: increasing values for the MRAI seem to lead to less external updates but longer convergence times. If corresponding convergence times for per-peer and per-prefix MRAI timers are compared with each other, it seems that per-prefix implementations offer slight advantages over the default per-peer timers. Whereas in Topology 1140 the convergence process is always some seconds faster for a per-prefix timer basis, this is only true up to a timer value of 25s in Topology 7774. However, Figure 4.4 suggests that the standard deviations for the convergence times in CHAPTER 4. SIMULATION RESULTS 23 Topology 7774 are more than 10s for per-prefix timers set to values higher than 25s. Possibly, this could explain why per-prefix MRAI timers show worse convergence behavior than per-peer timers in that case. In most failure scenarios per-prefix timers will have slight advantages in terms of convergence times over keeping one timer for every neighboring AS. Holding back all update messages to a peer independent of the concerned prefixes, a per-peer MRAI timer imposes penalties on convergence times in comparison to per-prefix timers. This is due to the fact that timers on a per-prefix basis can “react” to each advertised prefix individually in the case of several overlapping link failures. # External Updates 4000 3000 2000 # external updates 1000 120 100 80 60 0 40 convergence time in sec 140 Convergence times 10 20 30 40 MRAI timer value in sec 50 60 10 20 30 40 50 60 MRAI timer value in sec Figure 4.4: Standard deviations for the data values measured for the per-prefix MRAI timer in Topology 7774 in Figure (see Figure 4.3) Continuing our discussion of the results, we hold down that in both topologies the mean of the number of updates remains more or less constant while varying the values of the per-prefix MRAI timers. However, strong fluctuations in the number of exchanged messages for the single simulation runs of a specific MRAI timer setting can be observed in Figure 4.4. The standard deviation is about 35% of the computed means possibly due to the statistical nature of router interactions. Taking into account the number of BGP updates in succession to instability events, per-peer timers probably seem to be the better choice. Compared to per-peer timers, the use of MRAI timers on a per-prefix basis produces significantly more external updates. For Topology 1140 we observe an average increase of 29% in the number of updates using a per-peer instead of a per-prefix timer, in Topology 7774 there is even an increase of 44%. This might be explained by the fact that a per-prefix timer does not hold back update messages for the same neighboring AS if different prefixes are concerned. In such a scenario a timer on the per-peer basis could reduce the number of BGP messages which are passed on to the neighbors. If the results also proved true in additional experiments and for other topologies, this would justify the configuration of the MRAI timers as per-peer, being the default setting in the widespread Cisco and Juniper routers. Nonetheless, it should be questioned whether the CHAPTER 4. SIMULATION RESULTS 24 default configuration of the timer value to be near 30s is the best one possible for balancing between a low router workload and fast convergence times. 4.2 Propagation of Updates Scalability is an important issue inherent in many fields of network research. Especially in distributed protocols like BGP which is responsible for maintaing connectivity between autonomous systems in the Internet, it is of great importance to gain an understanding of how protocol behavior changes with increasing size of the network. This section deals with the investigation of update propagation after a link has failed somewhere in the topology. Arising questions are for example: how many ASes receive an update message in succession to a link failure and how far away from the broken link can the instability still be perceived? Last but not least, we want to investigate whether the classification of links according to their commercial relationships reflects the harmfulness in terms of the number of affected ASes if this link should fail. 4.2.1 Experiment Description For this experiment we always produced one single link failure such that all updates sent afterwards must be related with this instability. As described in section 3.3, the number of affected ASes can be easily determined by considering all ASes receiving a BGP message in succession to the failure event. The second point of interest is the distance updates propagate through the topology when a connection between two ASes breaks. We refer to that distance as the propagation radius, measuring it in the number of AS hops not including the nodes incident on the failed edge. In order to approximate the circumstances in the Internet where Cisco routers are the overwhelming majority, we use per-peer MRAI timers (timer value normally distributed between 25s and 31s), SSLD but no WRATE. All simulations take place in our sample topologies Middle Topology, Topology 1140 and Topology 7774. By only permitting one link between a pair of ASes (no multi-homing), it is ensured that each failure of an external link leads to a change in the inter-AS routing of prefixes observable by other ASes. Configuring a link failure in the DML file is done by the script CreateLinkFails.pl (refer to 3.2.2), making it possible to choose systematically the failing edge such that it belongs to one of the categories described in 3.2.1. CreateLinkFails.pl is invoked from the general control script updateRadius.pl (see 3.4.2) which is running simulations in every failure category for ten different failure scenarios. In order to mask statistical properties, all simulations are started with three different seeds for the random number generator with otherwise identical parameters. The results of these experiments will be discussed now in the following two subsections. CHAPTER 4. SIMULATION RESULTS 4.2.2 25 Number of affected ASes after a link failure 1.0 Figure 4.5 depicts the ratio of ASes (in percentage) receiving an update in succession to a link failure. In the used topologies, we distinguished between the type of the failing link and plotted only the mean value of all simulations runs for a specific failure category. 0.6 0.4 0.0 0.2 ratio of reached ASes (mean) 0.8 tier1−tier1 tier1−middle middle−middle middle−stub stub−stub Middle 1140 7774 Figure 4.5: Average percentage of ASes receiving updates after a failure broken down by different link categories First of all, it can be stated that in general not all ASes of the investigated networks are affected by the link failure. Prefixes which have been routed over the broken link must be redirected to new AS paths which possibly do not differ completely from the original path but have some ASes in common. Taking into account the maximum of the means of all five link categories, the percentage of affected ASes is not larger than 84% in Middle Topology, 38% in Topology 1140 and 59% in Topology 7774. However, one has to be very careful to draw define conclusions. Figure 4.6 suggests that there are strong fluctuations in the number of reached ASes for all failure categories. At the example of Topology 7774, histograms illustrate how the computed mean values for the percentage of affected ASes in Figure 4.5 emanate from the values measured in the different simulation runs for each link failure category. The distribution of the ratios of ASes receiving updates after the instability event shows similar deviations in Topology 1140 and is therefore not presented here. Furthermore, we can read off Figure 4.5 that if an external link between two stub ASes fails, the ratio of reached ASes is very low for all topologies. Actually, examining the results of the single simulations shows that in such a case only two ASes are affected by the broken link: the ASes incident on the failure edge. This is due to the fact that the connection between two stub ASes is a peering link, which is not supposed to be used by other ASes and on which therefore no prefixes are advertised. As always only two ASes are affected by the instability event, we didn’t generate a histogram for the stub-stub link failure in Figure 4.6. Note, that CHAPTER 4. SIMULATION RESULTS 26 Middle Topology is very small compared to the other networks, leading to a higher percentage of ASes receiving updates after a link failure in Figure 4.5. 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 ratio of ASes receiving updates after a link failure tier1−middle links middle−stub links 1.0 10 5 0 0 5 Frequency 10 15 ratio of ASes receiving updates after a link failure 15 0.0 Frequency 10 0 5 Frequency 10 0 5 Frequency 15 middle−middle links 15 tier1−tier1 links 0.0 0.2 0.4 0.6 0.8 ratio of ASes receiving updates after a link failure 1.0 0.0 0.2 0.4 0.6 0.8 1.0 ratio of ASes receiving updates after a link failure Figure 4.6: Histograms of the ratio of ASes receiving updates after a link failure in Topology 7774 broken down by different link categories We already mentioned that it is critical to predict the ratio of affected ASes based on the classification to which the failing link belongs to. Studying Figure 4.6, only very vague statements can be made on the relationship between the harmfulness of links if they should fail and their membership in one of our link categories. For example, it seems that in Topology 7774 broken connections between middle ASes are noticed by slightly less ASes than it would be the case if a link to a tier-1 AS failed. Furthermore, it is surprising that in many simulation runs, link failures between a stub AS and a middle AS are very harmful, inducing the propagation of updates to nearly all ASes, whereas in many other simulations for the same failure category basically no ASes are reached at all by updates. Similar behavior is seen for the other link failure categories. The question arises whether it makes sense to investigate the propagation of updates broken down by different link categories, as for all categories strong fluctuations in terms of the ASes affected by the failure event were observed. We believe it to be essential that further experiments with different and probable more realistic topologies are conducted before a final statement on this issue can be made. Speculating on the reasons for these strong fluctuations, it might be interesting to examine the relationship between the degrees of the nodes incident on the failing edge and the percentage CHAPTER 4. SIMULATION RESULTS 27 of affected ASes. In a highly meshed network there exist many alternative paths to the same destination with the result that possibly only very few ASes have selected a best route which is running over the broken link. Are the node degrees a better metric for predicting the harmfulness of a link failure than the classification presented in this work? Trying to answer this question it might be helpful to consult Table 4.1. It contains the average number of neighbors for each AS in terms of the category of the link failure. Behind each table entry, the standard deviations are indicated in brackets. topology Middle 1140 7774 total 2.9 (1.1) 24.1 (17.0) 11.7 (13.2) tier1 4.0 (1.0) 20.0 (11.3) 28.6 (13.1) middle 3.3 (0.9) 29.9 (17.1) 20.9 (14.2) stub 2.0 (0.5) 14.7 (13.4) 5.0 (6.6) Table 4.1: Average node degrees (standard deviations in brackets) broken down by categories First of all, the average total node degrees show that the used topologies differ strongly concerning their meshing degree. While Topology 7774 is more highly meshed with a mean of 24.1 neighbors per AS compared to 11.7 neighboring ASes in Topology 1140, the very small Middle Topology only has an average node degree of 2.9. Further investigation is needed to determine if these meshing degrees influence the ratio of affected ASes after a link failure. Comparing the average node degree broken down by categories for Topology 7774 and Topology 1140 it seems interesting that they show converse characteristics for the tier-1 ASes. Whereas the node degrees for the ASes which we assigned to the tier-1 group are on average higher than those of the so-called middle ASes (28.6% compared to 20.9%) in Topology 7774, it is just the other way round in Topology 1140 (here 20.0% to 29.9%). In generally, it is believed that ASes in the tier-1 level are situated in the center of the network graph, neighboring to more ASes than non-tier-1 ASes. Future work should pay more attention to model the hierarchical structure of the Internet in a way such that topological characteristics like the meshing degree are not falsified. The standard deviations printed in brackets behind each data value of Table 4.1 already indicate that there are again some fluctuations in the number of neighbors which ASes within a certain category have. These deviations are illustrated in a more elaborate manner by Figure 4.7. Estimated density functions of the node degree distribution are drawn for the different AS categories - tier-1, stub and middle - for Topology 1140 and Topology 7774. The main conclusion to be drawn of these density plots is possibly the insight that it is dangerous to consider only the mean values, as strongly varying node degrees can be observed. Nonetheless, we point out that tier-1 ASes in Topology 7774 are connected to 28.6 neighbors on average whereas the node degree for the same category is only 20.0 in Topology 1140 (see 4.1). If the theory holds that the high number of alternative paths to the same destination in a highly meshed part of a network has the consequence that only very few ASes have selected a best route which is running over the broken link, we could conclude that this is the reason for the fact that a tier-1 link failure is much more harmful in Topology 7774 than in Topology 1140 (compare 4.5). But again, it is not possible to make a fixed statement as for example the results for the middle-stub links contradict this theory. CHAPTER 4. SIMULATION RESULTS 28 0.020 tier1 ASes middle ASes stub ASes 0.000 0.000 0.010 Density 0.020 tier1 ASes middle ASes stub ASes 0.010 Density 0.030 Topology 7774 0.030 Topology 1140 0 20 40 60 node degree (number of neighbor ASes) 80 0 20 40 60 80 node degree (number of neighbor ASes) Figure 4.7: Density functions of the node degrees (number of neighbor ASes) broken down categories In closing, we summarize that further investigations are needed to explore the relationship between the commercial classification of external links and their harmfulness in terms of the number of affected ASes if this link should fail. Maybe, some new criteria needs to be developed for a more meaningful categorization of external connections between ASes. 4.2.3 Propagation Radius Except for the ratio of affected ASes, the second point of interest are the distances updates propagate through the topology when a connection between two ASes breaks. How this socalled update radius can be measured was already described in 3.3. Figure 4.8 depicts the computed distances in the number of AS hops not including the nodes incident on the failed edge. Again, 10 different failure scenarios were tested for all link failure categories, every time using three different seeds for the random number generator. Whereas the mean values of all simulation runs are plotted in the left bar-plot of Figure 4.8, the right diagram shows the maximum update radius observed for a series of experiments for a specific failure category. Maybe the most conspicuous result is that the maximum update radius is always less than 4 meaning that in no case updates spread more than 4 AS hops away from the source of the instability event. We lead this back to the high average meshing degrees of our topologies (compare Table 4.1: 11.7 for Topology 7774 and 24.1 for Topology 1140) where possibly not many “best” routes were using the broken link. Due to its small size and synthetic nature, Middle Topology can only be used in a restricted manner to draw significant conclusions. The mean values for the update radius, shown in the left bar-plot, are always in the range between 0.53 and 1.56 AS hops for all failure categories. In this context we point out that the mean values are computed on the basis of only the ASes which receive an update in succession to a link failure. If only a small ratio of ASes is affected, the weight of the ASes, which are incident on the failing edge, is very high in the computation of means, leading to an average value for the propagation distance of less than one AS hop. Again we place emphasis on the 5 29 5 CHAPTER 4. SIMULATION RESULTS 3 4 tier1−tier1 tier1−middle middle−middle middle−stub stub−stub 0 1 2 # AS hops (max) 3 2 0 1 # AS hops (mean) 4 tier1−tier1 tier1−middle middle−middle middle−stub stub−stub Middle 1140 7774 Middle 1140 7774 Figure 4.8: Propagation distance of updates in the case of a link failure (measured by the number of hops) short mean distances updates propagate through the topologies after a link failure, but point out at the same time to strong fluctuations in the single experiments. Concerning the distinction between different failure categories, it seems difficult to derive any trends in terms of their harmfulness out of the diagrams. More research is needed to explore this issue in a more detailed manner. Last but not least, the left diagram in Figure 4.8 and Figure 4.5 can be compared with each other: the relationship of the ratios of reached ASes (means) for different failure categories is predominantly reflected in the relationship of the average propagation radius. For example, if the mean percentage of reached ASes after the failure of a tier-1 -middle link is higher than that for a broken tier-1 -tier-1 link in Topology 1140, this fact can be seen in a higher mean propagation radius for the tier-1 -middle link category in Figure 4.8. In closing, we point out that all these observations need to be confirmed by further investigations. Chapter 5 Conclusions and directions for future work In closing, we summarize the results of the performed sensibility analysis of BGP convergence and scalability, obtained by using the SSFNet simulator. One of the main objectives of this work was the examination of the influence of the MRAI timer configuration on convergence times and number of sent external updates. The results from Chapter 4 basically confirmed that a higher timer value leads to less exchanged update messages at the cost of higher convergence times. In most cases a per-prefix timer basis offers only slight advantages in terms of needed convergence times in comparison to per-peer timers. However, the number of external updates is largely increased when keeping timers on a per-prefix basis. Further investigations have to show whether these findings justify the default configuration of MRAI timers on a per-peer basis as it is done by the market leaders Cisco and Juniper. Except for MRAI timer influences we tried to explore how far updates propagate through the topology after a link failure. We found out that for our test topologies updates are never seen more than 4 AS hops away from the broken edge. Altogether instabilities do not seem to spread very strongly, being relatively locally limited. Concerning the number of ASes affected by the failure of a link, strong fluctuations depending on the broken link were observed. Our categorization of links according to the commercial relationship between the connected ASes therefore does not seem very promising. However future experiments have to confirm these results. Altogether, there remains a lot of work to do. Using MRAI timers on a per-prefix instead of a per-peer basis imposes a higher workload on the routers as different timer instances have to be kept for all different prefixes. It would be interesting to examine the joint influence of workload and the timer basis on the overall convergence process. Maybe, the use of per-peer timers is then even more justified The SSFNet BGP implementation contains some simplifications which might be relevant for an all-embracing analysis. Particular emphasis could be placed on the investigation of the route flap dampening. This mechanism was already added to the used BGP implementation and it could be worthwhile to study if route flap damping is only invoked by network 30 CHAPTER 5. CONCLUSIONS AND DIRECTIONS FOR FUTURE WORK 31 instabilities or also by oscillations which are inherent in the BGP protocol. In our point of view, the generation of realistic topologies is essential for an analysis of BGP behavior. In that respect, efforts can be made to improve the internal and external structure of our sample networks. Up to now, the BGP meshes within autonomous systems are rather statical always consisting of a ring of route reflectors and some border routers connecting to other ASes. Maybe, other structures within the ASes can be considered in future testings. Regarding the external topology, it is desirable to reproduce the hierarchy inherent in the Internet as best as possible in our test topologies. For example, the distribution of node degrees - number of neighbor ASes for an AS - should follow the same patterns as in the Internet. Considering all the simplifications made in our models and the remaining open questions, there remains a lot of research to do in this field. List of Figures 2.1 2.2 Structural overview of the SSFNet simulator . . . . . . . . . . . . . . . . . . . Generation of DML files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 8 3.1 3.2 Middle Topology (green lines are peering links) . . . . . . . . . . . . . . . . . Density functions of the node degrees (number of neighbor ASes) for Topology 1140 and Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flow chart of the script mraiInvestigation.pl . . . . . . . . . . . . . . . . Flow chart of the script updateRadius.pl . . . . . . . . . . . . . . . . . . . . 12 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Convergence times and number of updates depending on different MRAI timer values (per-peer) in Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . Standard deviations for the measured data values in Figure 4.1 . . . . . . . . Comparison of per-peer and per-prefix MRAI timers in terms of convergence times and number of external updates . . . . . . . . . . . . . . . . . . . . . . Standard deviations for the data values measured for the per-prefix MRAI timer in Topology 7774 in Figure (see Figure 4.3) . . . . . . . . . . . . . . . . Average percentage of ASes receiving updates after a failure broken down by different link categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histograms of the ratio of ASes receiving updates after a link failure in Topology 7774 broken down by different link categories . . . . . . . . . . . . . . . . Density functions of the node degrees (number of neighbor ASes) broken down categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation distance of updates in the case of a link failure (measured by the number of hops) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 13 17 18 20 21 22 23 25 26 28 29 List of Tables 3.1 3.2 3.3 Properties of Topology 1140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of Topology 7774 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Categorization of external links for the used topologies . . . . . . . . . . . . . 4.1 Average node degrees (standard deviations in brackets) broken down by categories 27 33 12 13 15 Bibliography [1] Timothy G. Griffin and Brian J. Premore. An Experimental Analysis of BGP Convergence Time. 2001. [2] SSFNet 1.5, Raceway SSF, Raceway DML. Renesys Corporation. http://www.ssfnet. org, May 2003. [3] Andreas Hartl. Examining BGP update dynamics using network simulation. 2004. [4] Hagen Böhm. Analysis of OSFPv2-BGP4 Interactions Using the SSFNet Simulator. 2003. [5] Hongsuda Tangmunarunkit, Ramesh Govindan, Sugih Jamin, Scott Shenker, and Walter Willinger. Network Topologies, Power Laws, and Hierarchy. 2001. [6] Lixin Gao. On Inferring Autonomous System Relationships in the Internet. 2001. [7] A. Feldmann and A. Wichmann. Extracting AS relationship information from raw BGP data, work in progress. [8] Saargate-AS9063. http://www.saargate.de. [9] John W. Stewart III. BGP4 - Inter-Domain Routing in the Internet. Addison Wesley Longman, Inc, 1999. [10] Timothy G. Griffin. Interdomain routing links. intel-research.net/~tgriffin/interdomain. http://http://www.cambridge. [11] Guido Krüger. GoTo Java 2, 2. Auflauge. Addison Wesley Longman Verlag GmbH, 2001. [12] Michael Schilli. GoTo Perl 5. Addison Wesley Longman Verlag GmbH, 2000. [13] Zhuoqing Morley Mano, Ramesh Govindan, George Varghese, and Randy H. Katz. Route Flap Damping Exacerbates Internet Routing Convergence. 2002. [14] Gianluca Iannaccone, Chen-nee Chuah, Richard Mortier, Supratik Bhattacharyya, and Christophe Diot. Analysis of link failures in an IP backbone. 2002. [15] Olaf Maennel and Anja Feldmann. Realistic BGP Traffic for Test Labs. 2002. 34
© Copyright 2025 Paperzz