LHCOne Architecture Thoughts V2 (Jan 2012)

LHCONE:
1-12-12
Introduction:
The current understanding of the LHCONE environment, utilizing L3 VPN’s, is
shown in this diagram:
Traffic Patterns
Fermi
Brookhaven
Other
SItes
Other
sites
NetherLight
SLAC
ESnet
ESnet
StarLight
ESnet
Chi
GEANT AS 20965
ESnet
NY
ESnet
DC
ESnet AS 293
MAN LAN
Canarie
Chi
Internet2 AS 11537
Clev
Sea
NY
University
KC
DC
SLC
WIX
LA
I2
Layer 2
Service
ATL
Hou
University
University
University
Each domain, Internet2, ESnet and GEANT, will operate separate L3 VPN’s. In the
case of Internet2 this vrf will be extended out to all of the core routers in the
Internet2 infrastructure. Universities will participate in this either by BGP peering
with the vrf or possibly by having the environment extended via Layer2 to the
campus.
The goal here is to maximize the use of the Trans-Atlantic circuits understanding
that these are best effort paths. For sites connected off of Exchange points or other
layer 2 facilities solutions will be implemented which keep their traffic local when
appropriate and sent through a VRF when appropriate.
It is straightforward for the domains to create these L3 VPN’s within their domain.
And equally it is straightforward to exchange traffic between these three L3 VPN’s.
For sites not directly connected to a VRF, for example sites connected to exchange
points the situation is more complex.
The solution we propose will be to obtain a distinct address block for each Exchange
Point (IXP). The IXP would create a single broadcast domain and use the addresses
to number the interfaces on the edges of the campuses
The reason for using a distinct block of addresses instead of simply taking small
blocks of address from the campus is that 3rd party next hop routing (which is what
would allow the traffic to cut through the IXP switch) is not possible if we use the
campus blocks.
Not all potential policy issues will be addressed here, thought they will be identified.
This would include:
1) Policy about transiting traffic.
2) Re-announcing VRF routes to another VRF.
3) Determining whether Tier1 to Tier 1 or Tier 0 traffic will be allowed on
this system
4) Expectations regarding multiple sites connected to either an Exchange
point or other Layer 2 facility.
Interaction between the VRF Domains:
The 3 (or more if more develop) domains will all peer with one another, so fore
example Internet2 and ESnet will both peer with GEANT via MAN LAN and WIX,
utilizing the bonded 10G trans-atlantic circuits provided by the ACE project and
GEANT. Internet2 and ESnet will peer at multiple locations to be determined. The
vrf providers will not re-announce routes to the other vrf providers. Thus they will
not act as transit for the other VRF’s. For instance, Internet2 would not announce
ESnet sites to GEANT. Other organizations may peer with one of Internet2, ESnet or
GEANT. Those organizations all agree to provide transit for any networks that they
connect with. Those organizations are not shown as having seperate AS’s in these
diagrams.
Setting up the L3 VPN within Internet2 is well understood. The process of setting
this up will begin on Monday the 9th and it is anticipated it will be completed by the
middle of the week.
Other Policy Issues:
There are however three areas where questions remain.
Traffic Engineering:
How will path selection be made? There are at least 2 sets of Trans-Atlantic links
that are available for use between GEANT and Internet2 and ESnet. The three
parties will need to determine what policy they will apply to BGP to govern the way
traffic is divided across those links. This could range from only announcing certain
routes on each path to simply letting traffic as the shortest path would dictate. It is
of course possible that these policies can be developed as more operational
experience is gained.
Further if other organizations contribute bandwidth it will need to be understood if
this is allocated for a specific set of routes or if it is added to the general pool of
bandwidth.
These policies will develop and adapt over time.
Organizations connecting to a VRF:
The role of organizations like MREN and other groups that bring sites to LHCONE
needs to be understood. The general approach planned for the initial iteration of
LHCONE differs based on the sort of organization they are.
If they operate at Layer 3 the expectation is that they will peer with one of the
existing VRF’s, the organization providing that VRF will in turn agree to transit the
connectors traffic to the other VRF’. If these connectors have diverse connections to
more then one of the VRF’s they would be able to peer with both of those VRF’s,
though they would have to work out the details of transit in those cases.
Exchange points and other Layer 2 Connectors:
The last and more difficult question concerns what will be done about Universities
or other organizations that connect to Layer 2 based facilities.
There will be participants in LHCONE who will have separate connections to either
an existing exchange point(IXP), a regional network connecting to an exchange point
or to some other Layer 2 facility connecting to one or more of those L3 VPN’s.
The question is how will traffic flow between Universities connected in this fashion.
On the following diagrams two Universities are shown connected to an IXP. The core
question is: Will traffic between them go directly through the exchange point switch
or will it go through the exchange point into the VRF provider and then back.
Clearly the later is sub-optimal and it should be avoided. The question is how will
that situation be prevented from occurring.
The solution we propose will be to obtain a distinct address block for each IXP. The
IXP would create a single broadcast domain and use the addresses to number the
interfaces on the edges of the campuses
The reason for using a distinct block of addresses instead of simply taking small
blocks of address from the campus is that 3rd party next hop routing (which is what
would allow the traffic to cut through the IXP switch) is not possible if we use the
campus blocks.
A more detailed explanation follows.
3rd Party next hop routing
VRF Provider
x.x.x.y
vlan 100
Exchange Point
vlan 100
University A
x.x.x.z
L2
Device
vlan 100
x.x.x.w
University B
In this case the 2 Universities connected to the Exchange Point and the connection
to the VRF provider are in the same broadcast domain, shown here as vlan 100. The
interfaces on the borders of the campuses and the VRF provider are also in the same
subnet, shown here as a /24 but it can be at any level.
In this instance the Universities will speak BGP with the VRF provider announcing
to it the prefixes that it wants advertised to LHCONE. The VRF provider will in turn
announce to the Universities LHCONE prefixes as well as the prefixes for other
Universities attached to the Exchange Point. In this case when University A’s
prefixes are announced by the VRF provider to University B the next hop will be
x.x.x.z. This will allow for traffic to flow between them without traversing the VRF
provider and without them having to set up a BGP session between each other.
Traffic flow is shown by the dotted line.
By having the sites in the same broadcast domain and in the same subnet the VRF
provider is able to do 3rd party next hop routing for the IXP participants.
No 3rd party next hop routing
VRF Provider
vlan 100
vlan 200
University A
Exchange Point
vlan 200
L2
Device
vlan 100
University B
In this instance the Universities have separate peerings with the VRF provider on
separate broadcast domains, shown here as vlan’s 100 and 200. Both University A
and B will have BGP sessions with the VRF provider where they announce the
prefixes relevant to LHCONE. However in this case as shown their traffic to one
another will need to traverse the VRF providers link to the Exchange point in both
directions.
This is clearly undesirable. Where it is not possible to create a single broadcast
domain for connectors to an IXP a policy decision will need to be made. Will the VRF
provider require that the participants within the IXP all set up point to point vlans
and do BGP across those links as well as maintaining the BGP session with the VRF
provider or not?
In the case of LHCONE-NA the intention is to require those peerings where it is not
possible to put a single broadcast domain and subnet in place.