Planck: Millisecond-scale Monitoring and Control for Commodity Networks Jeff Rasley, Brent Stephens,! Colin Dixon, Eric Rozner,! Wes Felter, Kanak Agarwal,! John Carter, Rodrigo Fonseca 1 Self-Tuning Networks Control Loop Examples! ! • • • Measurement Traffic Engineering! Failure Avoidance! Forwarding Behavior Verification How fast can we do this? Control Decision 2 Self-Tuning Networks 100 ms — 1 sec+ Control Loop Examples! ! • • • Measurement Traffic Engineering! Failure Avoidance! Forwarding Behavior Verification How fast can we do this? Control Decision 2 Self-Tuning Networks 100 ms — 1 sec+ Control Loop Examples! ! • • • Measurement Traffic Engineering! Failure Avoidance! Forwarding Behavior Verification ~100 µs How fast can we do this? Control Decision 2 Self-Tuning Networks 100 ms — 1 sec+ Control Loop Examples! ! • • • Measurement Traffic Engineering! Failure Avoidance! Forwarding Behavior Verification How fast can we do this? > 10 ms ~100 µs Control Decision 2 Self-Tuning Networks 100 ms — 1 sec+ Control Loop Examples! ! • • • Measurement Traffic Engineering! Failure Avoidance! Forwarding Behavior Verification How fast can we do this? > 10 ms ~100 µs Control Decision 2 State-of-the-Art Measurement System Measurement Speed (ms) Hedera (NSDI ’10) 5,000 DevoFlow Polling (Sigcomm ’11) 500–15,000 Mahout Polling (Infocom ’11) 190 sFlow/OpenSample (ICDCS ’14) 100 Helios (Sigcomm ’10) 77.4 Planck < 4.2 3 State-of-the-Art Measurement System Measurement Speed (ms) Hedera (NSDI ’10) 5,000 DevoFlow Polling (Sigcomm ’11) 500–15,000 Mahout Polling (Infocom ’11) 190 sFlow/OpenSample (ICDCS ’14) 100 Helios (Sigcomm ’10) 77.4 Planck < 4.2 4 Outline ❖ Motivation! ❖ Planck Architecture! ❖ Is Planck Feasible?! ❖ Is Planck Useful?! ❖ Microbenchmarks! ❖ Traffic Engineering 5 Architecture Goals ❖ Obtain very fast samples across all switches in the network! ❖ Use those samples to infer global state of the network! ❖ Flow throughput! ❖ Flow paths! ❖ Port congestion state 6 Our Solution: Repurpose Port Mirroring ❖ Modern switches support port-mirroring! ❖ ❖ Copies all packets e.g. going out a port to a designated mirror port! Production! Traffic Mirror all ports to a single mirror port! ❖ Intentional oversubscription! ❖ Drop behavior approximates sampling! ❖ Data-plane sampling much faster than control-plane based approaches Switch 7 Our Solution: Repurpose Port Mirroring ❖ Modern switches support port-mirroring! ❖ ❖ Copies all packets e.g. going out a port to a designated mirror port! Production! Traffic Mirror all ports to a single mirror port! ❖ ❖ ❖ Intentional oversubscription! Drop behavior approximates sampling! Data-plane sampling much faster than control-plane based approaches Mirror Port Switch 7 Our Solution: Repurpose Port Mirroring ❖ Modern switches support port-mirroring! ❖ ❖ Copies all packets e.g. going out a port to a designated mirror port! Production! Traffic Mirror all ports to a single mirror port! ❖ ❖ ❖ Intentional oversubscription! Drop behavior approximates sampling! Data-plane sampling much faster than control-plane based approaches Mirror Port Switch 7 Planck Architecture ❖ Oversubscribed port-mirroring as a primitive ! ❖ Collectors receive samples from mirror ports! ❖ Netmap for fast processing! ❖ Reconstruct flow information across all flows in the network! ❖ Collectors can interact with an SDN controller to implement various applications Planck Architecture ❖ Oversubscribed port-mirroring as a primitive ! ❖ Collectors receive samples from mirror ports! ❖ ❖ ❖ Netmap for fast processing! Reconstruct flow information across all flows in the network! Collectors can interact with an SDN controller to implement various applications Planck! Collector! Instance(s) Planck Architecture ❖ Oversubscribed port-mirroring as a primitive ! ❖ Collectors receive samples from mirror ports! ❖ ❖ ❖ SDN! Controller Netmap for fast processing! Reconstruct flow information across all flows in the network! Collectors can interact with an SDN controller to implement various applications Planck! Collector! Instance(s) Outline ❖ Motivation! ❖ Planck Architecture! ❖ Is Planck Feasible?! ❖ Is Planck Useful?! ❖ Microbenchmarks! ❖ Traffic Engineering 9 Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck! Collector! Instance(s) 10 Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck! Collector! Instance(s) 11 Experiment Setup ❖ Vary the number of congested ports! N number of: 2 senders paired with 1 receiver! ❖ ❖ Mirror Port Collector TCP will fill up the output buffer going to the receiver! 15 trials for each config with/without Planck-mirroring! ❖ Monitor latency, packet loss and throughput 1 2 … ❖ Switch N 12 Switches Share Buffers ❖ Modern switches use shared buffers! ❖ Independent queues per ports is not completely accurate! ❖ Memory consumed by an oversubscribed mirror port may use space that production traffic could use 13 Switches Share Buffers Switch r o r r Mi Modern switches use shared buffers! ❖ Independent queues per ports is not completely accurate! 1 ❖ Memory consumed by an oversubscribed mirror port may use space that production traffic could use 2 t r o P … ❖ Collector N 13 Production Traffic Latency Average Latency (ms) 3 Mirror 2.5 No Mirror 2 1.5 1 0.5 0 1 2 3 4 5 6 7 Congested Output Ports 8 9 10 14 Production Traffic Packet Loss 15 Median Throughput (Gbps) 0.1% Throughput (Gbps) Production Throughput 6 4 Mirror No Mirror Mirror No Mirror 2 0 6 4 2 0 1 2 3 4 5 6 7 Congested Output Ports 8 9 16 Production Throughput Median 0.1% Median 0.1% Throughput Throughput Throughput Throughput (Gbps) (Gbps) (Gbps) (Gbps) 6 6 4 4 Mirror Mirror No Mirror Mirror No Mirror Mirror No Mirror Mirror No 2 2 0 0 6 6 4 4 2 2 0 0 1 1 2 2 3 4 5 6 7 3 4 5 6 7 Congested Output Output Ports Ports Congested 8 8 9 9 16 Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck! Collector! Instance(s) 17 Throughput Estimation is Accurate ❖ Estimates are trivial if sampling rate is known! ❖ Leverage TCP seq# in packets! ❖ Smoothed estimates in 200–700 µs! ❖ See paper for more details (n*10 Gbps sent to mirror port) 18 Inter-arrival Length (MTUs) Sample Inter-Arrival Length 14 12 10 8 6 4 2 0 2 4 6 8 10 12 ❖ x13 10 Gbps flows! ❖ Grows roughly linearly! ❖ See paper for further results 14 Flows 19 Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck! Collector! Instance(s) Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck provides flow throughput, a flow just needs to be mapped to a port Planck! Collector! Instance(s) Is Planck Feasible? SDN! Controller ❖ Does Planck hurt production traffic?! ❖ Can Planck infer throughput?! ❖ Can Planck infer congested ports? Planck provides flow throughput, a flow just needs to be mapped to a port Planck! Collector! Instance(s) Outline ❖ Motivation! ❖ Planck Architecture! ❖ Is Planck Feasible?! ❖ Is Planck Useful?! ❖ Microbenchmarks! ❖ Traffic Engineering 21 Sample Latency t_2 H Switch t_1 Collector H H Sender Receiver Latency = t_2 - t_1 22 Sample Latency t_2 H Switch t_1 Collector H H Sender Receiver Latency = t_2 - t_1 Low Congestion ! Sample Latency: ! 75–150 µs 22 Sample Latency t_2 High Congestion Sample Latency H Switch t_1 H H Sender Receiver Latency = t_2 - t_1 Low Congestion ! Sample Latency: ! 75–150 µs CDF Collector 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 Measurement Latency (ms) IBM G8264 (10Gb) Pronto 3290 (1Gb) 22 Control Loop Times ❖ < 3.5 ms to obtain sample + < 700µs to get tput estimate = 4.2 ms worst-case measurement time for 10 Gb! ❖ Planck achieves measurement speeds 18x – 291x faster than recent approaches! ❖ Shadow MAC addresses [1] and some ARP tricks allow re-routing at < 3ms! ❖ See paper for more details [1] Shadow MACs: Scalable Label-switching ! for Commodity Ethernet (HotSDN ’14 ) 100 ms — 1 sec+ Measurement > 10 ms ~100 µs Control Decision 23 Control Loop Times ❖ Planck: < 4.2 ms 100 ms — 1 sec+ < 3.5 ms to obtain sample + < 700µs to get tput estimate = 4.2 ms worst-case measurement time for 10 Gb! ❖ Planck achieves measurement speeds 18x – 291x faster than recent approaches! ❖ Shadow MAC addresses [1] and some ARP tricks allow re-routing at < 3ms! ❖ See paper for more details [1] Shadow MACs: Scalable Label-switching ! for Commodity Ethernet (HotSDN ’14 ) Measurement > 10 ms ~100 µs Control Decision 23 Control Loop Times ❖ ❖ Planck achieves measurement speeds 18x – 291x faster than recent approaches! ❖ Shadow MAC addresses [1] and some ARP tricks allow re-routing at < 3ms! ❖ Planck: < 4.2 ms 100 ms — 1 sec+ < 3.5 ms to obtain sample + < 700µs to get tput estimate = 4.2 ms worst-case measurement time for 10 Gb! See paper for more details [1] Shadow MACs: Scalable Label-switching ! for Commodity Ethernet (HotSDN ’14 ) Measurement < 3 ms > 10 ms ~100 µs Control Decision 23 Planck as a Platform ❖ Vantage point mirroring! ❖ ❖ tcpdump for switches! Global view of the network! ❖ ❖ SDN! Controller flow data across all links! Traffic engineering! ❖ congested port notifications Planck! Collector! Instance(s) 24 Outline ❖ Motivation! ❖ Planck Architecture! ❖ Is Planck Feasible?! ❖ Is Planck Useful?! ❖ Microbenchmarks! ❖ Traffic Engineering 25 Testbed 16 Host Fat Tree ❖ Split x4 IBM G8264 (48-port) switches into 20 sub-switches! ❖ Routing via Floodlight plugin inspired by FlowVisor! ❖ x3 server machines with x8 10 GbE NICs each! ❖ x16 machines with x2 10 GbE NICs 26 Methodology Traffic Engineering! ❖ ❖ Floodlight-based module using Planck! Collectors notify a controller when ports become congested Workloads! Routing! ❖ Shuffle! ❖ Static [1]! ❖ Stride! ❖ Poll-100 ms! ❖ Random! ❖ PlanckTE! ❖ Random Bijection ❖ Optimal [1] PAST: Scalable Ethernet for Data Centers (CoNEXT ’12) Optimal Topology … 1 2 3 4 … 16 27 Traffic Engineering Stride(8) 100 MiB Workload! CDF of Flow Throughput Static [1] Optimal 28 Traffic Engineering Stride(8) 100 MiB Workload! CDF of Flow Throughput Static [1] Optimal Poll 100ms 28 Traffic Engineering Stride(8) 100 MiB Workload! CDF of Flow Throughput Static [1] Optimal Poll 100ms PlanckTE 28 Traffic Engineering Stride(8) Workload 29 Future Work ❖ ❖ Planck should be able to go much faster!! ❖ Limit mirror port buffer! ❖ Truncation of samples! ❖ Improve re-routing time (via ARP improvements)! Control loop of 100s of µs is possible 30 Conclusion ❖ Planck provides 1–2 orders of magnitude faster throughput measurements over recent approaches (< 4.2 ms today and 100s of µs possible)! ❖ Planck provides a platform for low-latency measurement! ❖ Planck traffic engineering yields near optimal results even for small flows! ❖ Measurements at these speeds prompt a re-thinking of how networks are managed 31
© Copyright 2026 Paperzz