Slides

THE DEFORESTATION OF L2
JamesMcCauley,MingjieZhao,EthanJ.Jackson,Barath Raghavan,
SylviaRatnasamy,ScottShenker
UCBerkeley,UESTC,andICSI
The Talk
• What isAXE?
• Why lookatthis?
• How doesitwork?
• Really?Thisactuallyworks?
2
The What
• AnredesignofL2 toreplaceEthernetand
SpanningTreeProtocol(anditsvariants)
• Targetsare“normal”enterprisenetworks,
machinerooms,smallprivateDCs
– Not theGoogles,Microsofts,Rackspaces
– Notnetworkswithincrediblyhighlyutilization
– Not managedbyafull-timeteamofexperts
3
The What: Goals
• Plug-and-play
– Ifnot,mightaswelljustuseL3
• Usealllinksforshortestpaths
– NumberoneshortcomingofSTP
• Fastrecoveryfromfailure
– NumbertwoshortcomingofSTP?
4
The What: Goals
• Plug-and-play
– Ifnot,mightaswelljustuseL3
• Usealllinksforshortestpaths
– NumberoneshortcomingofSTP
• Fast Packet-timescalerecoveryfromfailure
– NumbertwoshortcomingofSTP?
5
The What: Assumptions
• Failuredetectioncanbefast
– Nottraditionallythebottleneck
• Controlplane“hellos”weresufficient
– Needinterrupt-drivenLFS,BFD,etc.
• There’samarketforflood-and-learnL2
– Flooding/learnhassecurityimplications
– Noheavyunidirectionaltraffic
• Nomulti-accesslinks
– Everythingispoint-to-point
6
The Why: Is L2 still a problem?
• Stillmanylargely-unmanaged,small/medL2networks!
– TwoinourbuildinginBerkeley!
• Therehavebeenafewinterestingdevelopments…
– SPB,TRILL,SEATTLE,etc.
– Providevarioustradeoffs
• AXEattemptstostrikeadifferentbalance
– Focusontwokeyproblems
– Keepingthingsassimpleaspossible(nocontrolplane)
7
The Why: Context
STP
No STP (Tree)
TRILL/SPB
Plug-and-play
✓
✓
✓
IP (L3)
Custom
AXE
✓
Shortest Paths
Fast Recovery
✓
✓
✓
✓
✓
No Control Plane
✓
?
✓
✓
8
The How: Extend Ethernet
• Basicflood/learnEthernet
– Whenyouseeapacket:learn
– Whenyoudon’tknowwhattodo:flood
• ButAXEdoesnotneedatreetodealwithloops
– Meansfloodingworksforhandlingfailurestoo
• (becausealternatepathsareimmediatelyavailable)
– Meansthatflood/learnfindsshortpaths
• (becauseyouhaven’tremovedlinks)
9
The How: Treeless flooding
• Howdoyougetaroundtheloopproblem?
• Duplicate-packet-detection
• Multiplewaysofdoingit
• Ourfocus:hash-baseddeduplication filter
– Inshort:hashtablewhereyoureplaceuponcollision
– Straightforward
– Amenabletohardware/P4implementation
10
The How: What changes?
• Learningismoresubtle
– Sourceaddressseenonmultipleports
– Packetsmayevenbegoingbackwards!
• Respondingtofailuresismoresubtle
– Meanswehavetounlearn(outdated)state
11
The How: Extend Header
• Extendthepacketheaderbetween switches
– Nonce(per-switchsequencenumber)
• Usedforpacketdeduplication
– Hopcount
• Influenceslearning,also protectsfromloops
– Floodedflag:F
• Trackswhetherapacketisbeingflooded
– Learnableflag:L
• Trackswhetherpacketcanbelearnedfrom
12
The How: Separate queues
• Switcheshavefloodqueueandnormalqueue
– TheFloodedflagintheheaderdetermineswhich
– Floodqueuehashigherpriorityandisshorter
– Normalqueuesized…normally
• Intuition:
– Deliveringfloodsquicklystops floodingquickly
– Deduplication onlyappliestofloods,keeping
fewerfloodsinflightmakesdedup easier
13
The How: Overview
• Extendpacketheader
– Nonce,HopCount,Flooded/Learnableflags
• Learning/UnlearningPhase
– MaylearnportandHCtosrc
– Mayunlearnpathtodst iftroublewasobserved
• OutputPhase
– Ifpacketisaduplicate:drop
– Ifunknown-dst/path-failed/already-flooding:flood
– Otherwiseforwardaccordingtotable
14
The How: Example
•
AsendsapackettoB(L:True)
– DestinationBunknown;packetfloodedfromfirsthop(F:True)
• AllswitcheslearnhowtoreachA
•
Bsendsto A(L:True)
– DirectpathfollowingtableentriestoA(F:False)
• Switchesalongpathlearnhowtoreach B
•
•
Linkfails
A sendsanotherpackettoB (L:True)
– Followsalongpath…(F:False)
– ..untilithitsfailure(L:False F:True)
– Switchfloodspacketoutall ports(evenbackwards)
•
• Floodedpacketreaches B (Successfuldelivery!)
• AnotherduplicateoffloodedpacketreachesA’sfirsthop;unlearnB
Asendsanother packettoB (L:True)
– DestinationBunknown;packetfloodedfromfirsthop(F:True)
• AllswitcheslearnhowtoreachAagain
15
Really? Preliminaries
• Howmuchfloodingdofailurescause?
Campus
0.00%
0.02%
0.04%
0.06%
0.08%
0.10%
0.12%
0.14%
Cluster
0.16%
0.18%
• Howbigdoesthededuplication filterneedtobe?
– Lessthan1,000entriesinoursimulations
• Doesitrecoverfromoverload?
– Yes *
16
Really? Overview
• Thinkingbacktothatmatrix…
– Wewantplugandplay
– Wetosupportshortestpathsusingalllinks
– Wedon’twanttohaveacontrolplane
– Packet-timescalerecoveryfromfailures
17
Really? Failure benchmark
• Omniscient,randomized,shortest-pathrouting
• Failure→Adjustabledelay→Fixroutes
• Delayofzeroisoptimalrouting /anupperbound
• Nonzerodelaymeanttoroughlysimulate…
– OSPF,IS-IS,TRILL,SPB,etc.
– ..withoutneedingtomodeleachoneindetail
• Randomshortest-costtreerootedateachdestination
• Note:wedon’tcompareourselvestoSTPatall
18
Really? Failure recovery - UDP
• Sendtrafficonnetwork
withhighfailurerate
160ms
0.27%
80ms
0.19%
40ms
0.13%
20ms
• Metricisunnecessary
deliveryfailures –
packetsthatweren’t
deliveredeventhough
optimalroutingcould
havedeliveredthem
0.08%
10ms
0.05%
5ms
AXE
0.03%
0.00%
0.00%
0.05%
0.15%
0.20%
0.25%
0.30%
Delivery Failures (Campus)
160ms
0.021%
80ms
0.011%
0.008%
40ms
20ms
• AXEhasno unnecessary
deliveryfailures
0.10%
0.005%
10ms
0.003%
0.002%
5ms
AXE
0.000%
0.000%
0.005%
0.010%
0.015%
0.020%
0.025%
Delivery Failures (Cluster)
19
Really? Failure recovery - TCP
160ms
0.25%
80ms
0.11%
40ms
• Similarsetup,butwithTCP
0.14%
20ms
0.06%
10ms
0.01%
5ms
0%
AXE
0%
0.0%
0.1%
0.1%
0.2%
0.2%
0.3%
0.3%
Delayed FCTs (Campus)
160ms
0.036%
80ms
0.020%
40ms
0.015%
20ms
0.010%
10ms
0.007%
5ms
AXE
• Metricisnumberofflows
withsignificantlyworseFCT
thanoptimalrouting
• AXEhasno significantly
worseFCTs
0.002%
0%
0.00% 0.01% 0.01% 0.02% 0.02% 0.03% 0.03% 0.04% 0.04%
Delayed FCTs (Cluster)
20
The End: Not Mentioned Here
• MulticastAXE
– Onanychange(failure;join),flood+dedup andprune
– Floodedpacketshavealldataneededtobuildtree
• AXEwithHedera
– UseAXEformice&recovery
– CentralizedSDNroutingforelephantflows
• P4implementation
– AXEisexpressibleinP4
– Performanceonrealhardwareisopenquestion
21
THE END
22