Scalable and Crash-Tolerant Load Balancing based on Switch

Scalable and Crash-Tolerant Load
Balancing based on Switch Migration
for Multiple OpenFlow Controllers
Chu LIANG * Ryota KAWASHIMA * Hiroshi MATSUO *
*
Nagoya Institute of Technology, Japan
1/22
Research Background
 The spread of Software-Defined Networking
 Easier management and faster innovation
 A centralized controller and programmable network devices
 The complication of controller is increasing
 Load balancing, QoS controlling, Security…
 The size of networks continues to increase
 The growing number of OpenFlow-enabled devices
A centralized controller has a potential bottleneck
Distributed control plane has been proposed
2/22
Distributed OpenFlow Controllers
 OpenFlow : one of the mostly representative protocol for SDN
 Packet-in : not match any forwarding rule forward to controller
OpenFlow Controller
OFC 1
OFC 2
OpenFlow controller cluster
Packet-in
OpenFlow switches
Group A
Group B
physically distributed controllers achieve better scalability
3/22
Problems in Distributed Controllers
 Load imbalance results in suboptimal performance
 Topology change
 Variety user traffic
• Elastic resources, VMs migration, Variety utilizations
OFC 1
OFC 2
OpenFlow controller cluster
High
loaded
Low
loaded
Static mapping configuration
OpenFlow switches
Dynamic load balance among the controllers is required
Group A
Group B
4/22
Problems in OpenFlow Controllers
 Multiple Controllers in OpenFlow
 Roles : Master/ Slave/ Equal
Master
OFC1
Slave
?
OFC2
Switch
OFC 1
OFC 2
S1
Master
Slave
S2
Master
Slave
S3
Master
Slave
Each controller only has one
role
S1
S2
S3
The coordination of “role changing” is not provided
5/22
Related Work
 Scalable OpenFlow Controller Redundancy Tackling Local and
Global Recoveries
•
Keisuke Kuroki, Nobutaka Matsumoto, Michiaki Hayashi, The Fifth International Conference on
Advances in Future Internet. 2013.
Proposed a crash-tolerant method based on with multiple controllers of OpenFlow1.3
Do not support load balancing among the controllers
Role-Management server can be single point of failure
 Towards an Elastic Distributed SDN Controller
•
Advait Dixit, Fang Hao, Sarit Mukherjee, T.V. Lakshman, Ramana Kompella, ACM SIGCOMM HotSDN,
2013
Proposed a switch migration protocol according to controller load
Be complex and do not support the crash-tolerant for master controller
6/22
Proposed Method
 Dynamically shift the load across the multiple controllers
 different controllers can be set master for individual switch
 switch migration
 Support the crash-tolerant for controllers
 distributed architecture
 automatic failover in the event of a failure
 JGroups based communication
 Simplification of the switch management by grouping
 each controller only manage switches in the same group
 switch migration is performed in group
7/22
Proposed Architecture
Global DB
Local controller cluster
Local DB
Global controller cluster
Group A
OpenFlow
switch
Group B
Group C
8/22
Global controller cluster
Global DB
t cluster
Global controller
 Based on the global JGroups channel
 Share global data
 tenant information, user data etc.
 Provide global view of network to upper controller
 can be considered as a logically centralized controller plane
9/22
Local controller cluster
Local DB
Local controller cluster
 reduce network delay
 reduce communicate traffic
Synchronize network status
 switch-controller mapping, link, port
Perform controller load scheduling
Coordinate switch migration
 set master/slave role for switches
Dynamically shift the load across the multiple controllers
10/22
Implementation
Application
Application
OpenDaylight APIs
A : Load Monitoring Module
B : Load Scheduling Module
C : Switch Migration Module
Link
Discovery
Switch
Manger
OpenDaylight Core
Host
Manger
Event Notification (JGroups)
Distributed Key-Value Store (Infinispan)
 Controller structure (OpenDaylight based)
• Collect and calculate controllers load
• Selected master controller
• Perform switch migration
OpenFlow Driver
11/22
A. Load Calculation
 Coordinator : collecting and computing load information
 Controller Load : switch metric and server metric
 the number of active switches, packets requests rate (switch metric )
 usage of cpu, memory, network bandwidth (server metric)
coordinator
OFC1
OFC2
Local Controller Cluster
OFC3
OFC4
12/22
B. Load Scheduling
 When and Which controller should be elected as master
 The lightest-load controller
 Which switches should be selected to migrate
 Dynamic round-trip time feedback based switch selection
Perform Switch Migration
OFC1
OFC2
OFC3
Controller failover
Add new switches
?
13/22
C. Switch Migration
Initial heaviest-load
Controller A
OpenFlow
Switch T
Initial lightest-load
Controller B
Slave for T
Master for T
Switch T migration Reply
Slave for T
Master for T
14/22
C. Switch Migration
Initial heaviest-load
Controller A
OpenFlow
Switch T
Initial lightest-load
Controller B
Slave for T
Master for T
Switch T migration Reply
Slave for T
Master for T
Failover time
Master for T
15/22
Preliminary evaluation (1/2)
 The switch migration process
 The migration process takes about 2ms
Initial heaviest-load
Controller A
Master for T
OpenFlow
Switch T
2ms
Initial lightest-load
Controller B
Slave for T
Switch T migration Reply
Slave for T
Master for T
16/22
Preliminary evaluation (2/2)
 The controller failover process
 The failover process takes about an average of 20ms
 mostly affected by the failure detection provided by JGroups.
Initial heaviest-load
Controller A
OpenFlow
Switch T
Initial lightest-load
Controller B
Slave for T
Master for T
Failure detection
Failover time
20ms
17/18
17/22
Evaluation environment
VM1
Host 1
(OFC A)
VM2
Iperf
client
Host 3
Host 2
(OFC B)
Iperf
server
SW 1
SW 2
SW 3
SW 4
SW 5
SW 6
SW 7
SW 8
Host
Host
Host
Host
Host
Host
Host
Host
(Mininet Network)
Host 4 (Traffic Generator)
18/22
Evaluation
Three kind of workloads
Switch
switch 1 ∼ 2
switch 3∼ 4
switch 5 ∼ 6
switch 7 ∼ 8
Workload A
1000 pps
2000 pps
2000 pps
4000 pps
Workload B
1000 pps
2000 pps
4000 pps
6000 pps
Workload C
1000 pps
2000 pps
6000 pps
8000 pps
Load
Machine specifications
Controller Node
OS
Evaluation Node
Ubuntu-server 12.04
CPU
Core i5(4 core)
Memory
16GB
Network
OpenFlow
Switch
Traffic Generator
Centos 6.5 64bit
Core i7(1 core)
Core i5(4 core)
8GB
100Mbps Ethernet
-
Open vSwtich-1.10.0
19/22
Results : Throughput
 static : existing ( static switch-controller mapping)
 proposal :0 dynamically
switch
migration
5
10
15
20
25
0
40
5
10
15
20
25
30
30
35
35
40
140
1
0.9
0.9
0.8
0.8
0.7
0.7
30
OFC A:6 switches
OFC B:2 switches
OFC A:5 switches
OFC B:3 switches
25
static
OFC A:7 switches
OFC B:1 switches
proposal
20
0.50.5
0.40.4
0.30.3
OFC A:6 switches
OFC B:2 switches
15
Workload A
10
0.60.6
Load Difference
Throughput (Mbit/sec)
35
0
20
40
Workload B
60
80
100
Workload C
120
Run Time (in sec)
140
160
0.20.2
0.10.1
0 0
180
20/22
Results : Response Time
 Response Time: VM——OFC B (Ping)
1
cumulative distribution function (CDF)
0.9
Packets
loss
0.8
0.7
0.6
0.5
0.4
0.3
workload C
A (static)
(static)
workload
workload C
A (proposal)
(proposal)
workload
workload
B
(static)
workload B (proposal)
workload BB (static)
(proposal)
workload
workload
C
(static)
workload A (static)
workload AC (proposal)
workload
(proposal)
0.2
0.1
0
0
5
10
15
20
Response Time (in msec)
25
30
35
21/22
Conclusion & Future Work
 Conclusion
Proposed a scalable and crash-tolerant load balancing based
on switch migration for multiple OpenFlow controllers
Enable the controllers coordinate actions to dynamically shift
the load across the multiple controllers
Improve the throughput and response time of control plane
 Future Work
Optimize of the load scheduling modules
Implement a topology aware switch migration algorithms to
improve the scalability in the real large scale network
Evaluate the performance in vary applications and topologies
with more practical traffics
22/22