anshul_presentation

Reinforcement Learning Based Virtual
Cluster Management
Anshul Gangwar
Dept. of Computer Science and Automation,
Indian Institute of Science
Virtualization:
The ability to run multiple operating
systems on a single physical system and
share the underlying hardware resources*.
Cloud Computing:
“The provisioning of services in a timely
(near on instant), on-demand manner, to
allow the scaling up and down of
resources”**.
* VMware white paper, Virtualization Overview
** Alan Williamson, quoted in Cloud BootCamp March 2009
The Traditional Server Concept
SERVER 1
SERVER 2
SERVER 3
SERVER 4
SERVER 5
SERVER m
SERVER n
App 1
App 2
App 3
App 4
App 5
App n
30 %
40 %
App m
25 %
30 %
20 %
28 %
50 %
OS 1
OS 2
OS 3
OS 4
OS 5
OS m
OS n
•
9 A.M. TO 5 P.M.•
M-F
Rate of
Server
Accesses
All Other
Times
Time
Machine provisioning done
for peak demands
Processors are under utilized
during off-peak hours
• Wastage of resources
Need technology and
algorithms that can allow
allocation of only as many
resources as are required
Server Consolidation Process
SERVER 1
SERVER 2
SERVER 3
SERVER 4
SERVER 5
SERVER m
SERVER n
App 1
App 2
App 3
App 4
App 5
App n
30 %
40 %
App m
25 %
30 %
20 %
28 %
50 %
OS 1
OS 2
OS 3
OS 4
OS 5
OS m
OS n
Consolidation Process
Allows shutting down of
idle PMs, saving
operational costs
SERVER 2
SERVER 1
SERVER m
VM 1
VM 2
VM 3
VM 4
VM 5
VM m
VM n
App 1
on
Guest
OS
30 %
App 2
on
Guest
OS
40 %
App 3
on
Guest
OS
25 %
App 4
on
Guest
OS
30 %
App 5
on
Guest
OS
20 %
App m
on
Guest
OS
28 %
App n
on
Guest
OS
50 %
Hypervisor
Hypervisor
Hypervisor
Load Distribution
VM 1
VM 2
VM 3
VM 4
VM 5
VM m
VM n
App 1
on
Guest
OS
30 %
App 2
on
Guest
OS
40 %
App 3
on
Guest
OS
25 %
App 4
on
Guest
OS
30 %
App 5
on
Guest
OS
20 %
App m
on
Guest
OS
28 %
App n
on
Guest
OS
50 %
Hypervisor
SERVER 1
Hypervisor
Hypervisor
SERVER 2
SERVER m
VM 1
VM 2
VM 3
VM 4
VM 5
VM m
VM n
App 1
on
Guest
OS
30 %
App 2
on
Guest
OS
20 %
App 3
on
Guest
OS
45 %
App 4
on
Guest
OS
30 %
App 5
on
Guest
OS
25 %
App m
on
Guest
OS
28 %
App n
on
Guest
OS
50 %
Hypervisor
Hypervisor5
Hypervisor
Live Migration
VM 1
VM 2
VM 3
VM 4
VM 5
VM m
VM n
App 1
on
Guest
OS
30 %
App 2
on
Guest
OS
20 %
App 3
on
Guest
OS
45 %
App 4
on
Guest
OS
30 %
App 5
on
Guest
OS
25 %
App m
on
Guest
OS
28 %
App n
on
Guest
OS
50 %
Hypervisor6
Hypervisor
SERVER 1
Hypervisor
SERVER m
SERVER 2
Migrate VM 5
SERVER 1
SERVER 2
SERVER m
VM 1
VM 2
VM 5
VM 3
VM 4
VM m
VM n
App 1
on
Guest
OS
30 %
App 2
on
Guest
OS
20 %
App 5
on
Guest
OS
25 %
App 3
on
Guest
OS
45 %
App 4
on
Guest
OS
30 %
App m
on
Guest
OS
28 %
App n
on
Guest
OS
50 %
Hypervisor6
Hypervisor
Hypervisor
Dynamic Resource Allocation
•
•
Dynamic Workload requires Dynamic Resource Management
• Allocation of resources to VMs in each PM
• Resources such as CPU, memory etc.
• Allocation of VMs to PMs
• Minimize number of operational PMs
Modern VMs (e.g. Xen) allow
• Resource allocation within PM
• Dynamic allocation of VM to PM through Live Migration
Required:
Architecture and mechanisms for
• Determining resource allocation to VM within PM
• Determining deployment of VMs on PMs
So that:
• Capital and Operational costs are minimized
• Application Performance is maximized
Two Level Controller Architecture
PMA : Performance Measurement Agent
RAC : Resource Allocation Controller
VM Placement
Controller
RAC
Determined
Resource
Requirements
Performance
Measures
Migration
Decisions
PM
VM
PMA
RAC
VM
PM
RAC
PMA
VM
PM
VM
PMA
RAC
VM
Note: PMs(Physical Machines/servers) are assumed to be homogeneous.
VM
Problem Definition
VM Placement Controller has to make optimal migration
decisions at regular intervals which results in
• Low SLA Violations
• Reduction in number of busy PMs
SLA(Service Level Agreement) are Performance
Guaranties which Data Center Owner negotiates with
User. These Performance Guaranties can include average
response time, maximum delay, maximum downtime etc.
Idle PMs can be switched off/run in low power mode
Issues in Server Consolidation/
Distribution
There are various issues involved in Server Consolidation
•
•
•
•
Interference of VMs : Bad behaviors of one application in a VM
adversely affect(degraded performance) the other VMs on same
PM
Delayed Effects : Resource configurations of a VM show effects
after some delay
Migration Cost : Live Migrations involves cost(performance
degradation)
Workload Pattern : is not deterministic or known apriori
These difficulties motivates us to use Reinforcement Learning Approach
Reinforcement Learning(RL)
The agent-environment interaction in RL
The goal of the agent is to maximize the cumulative long term
reward based on the immediate reward rn+1.
Reinforcement Learning(RL)
The agent-environment interaction in RL
The goal of the agent is to maximize the cumulative long term
reward based on the immediate reward rn+1.
RL has two major benefits
• Doesn't requires model of the system
• Capture delayed effects in decision making
• Can take action before problem arises
Problem Formulation for RL
Framework: System Assumptions
•
•
•
•
M PMs: Assumed to be homogeneous
N VMs: Each assumed to be running one application whose performance
metrics of interest are throughput and response time
• Response time implied at server level only (not as seen by user)
Workload per VM is assumed to be cyclic
Resource requirement assumed to be equal to workload
Time Period
Cyclic Workload model:
Time period assumed to be divided
Into phases
Rate of
Server
Accesses
Phase 1
Phase 2
Time
Problem Formulation for RL
Framework
•
•
•
•
N VMs , M PMs , M > 1 and P Phases in Cyclic Workload Models
State sn at time n is (Phase of Cyclic Workload Models , allocation vector of
VMs)
Action an at time n is (VM id which is migrating , to PM id which it is migrating)
Reward r(sn,an) at time n is defined as
r(𝒔𝒏 ;𝒂𝒏 ) =
𝒔𝒄𝒐𝒓𝒆 𝒏 + 𝟏 − 𝒑𝒐𝒘𝒆𝒓𝒄𝒐𝒔𝒕 𝒏 + 𝟏 𝒊𝒇 𝒔𝒄𝒐𝒓𝒆𝒊 > 𝟎; 𝒊 = 𝟏, … , 𝑵,
;
−𝟏
𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
𝒑𝒐𝒘𝒆𝒓𝒄𝒐𝒔𝒕 𝒏 = 𝒃𝒖𝒔𝒚 𝒏 ∗ 𝒇𝒊𝒙𝒆𝒅𝒄𝒐𝒔𝒕 ;
𝒔𝒄𝒐𝒓𝒆(𝒏) =
𝒔𝒄𝒐𝒓𝒆𝒊 𝐧 =
𝑵
𝑵
𝒊=𝟏 𝒘𝒆𝒊𝒈𝒉𝒕𝒊
∗ 𝒔𝒄𝒐𝒓𝒆𝒊 (𝒏) ;
𝑻𝒉𝒓𝒐𝒖𝒈𝒉𝒑𝒖𝒕 𝒐𝒇 𝑽𝑴 𝒊(𝒏)
𝑨𝒓𝒓𝒊𝒗𝒂𝒍 𝑹𝒂𝒕𝒆 𝒐𝒇 𝑽𝑴 𝒊(𝒏)
−𝟏
𝒊𝒇 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒕𝒊𝒎𝒆𝒊 ≤ 𝑺𝑳𝑨𝒊
𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
𝑺𝑳𝑨𝒊 is maximum allowable response time for VM i
;
RL Agent implemented in CloudSim
CloudSim is a Java based Simulation tool for Cloud Computing
We have implemented the following additions to it
• Response Time and Throughput calculations
• Cyclic Workload Model
• Interference Model
• RL Agent which takes Migration Decisions
• Implements Q-Learning with ε non-greedy policy
1) Full State Representation without batch updates
2) Full State Representation with batch updates of batch size 200
• Implements Q-Learning with Function Approximation and ε non-greedy
policy
1) Function Approximation without batch updates
2) Function Approximation with batch updates of batch size 200
Used CloudSim for implementation of all our algorithms
Workload Model Graphs for
Experiments
Workload Model 1(wm2)
Workload Model 2(wm2)
Graphs shows the cyclic workload graphs with 5 phases which repeats itself
periodically
Experiment Setup
•
5 VMs, 3 PMs and 5 Phases in cyclic workload model (shown in previous
slide)
•
Migration decisions are taken at end of Phase
•
Experiments:
1) One VM have workload model wm1 and others have workload model
wm2
2) Two VMs have workload model wm1 and others have workload model
wm2
•
For each Experiment (1) and (2), following scenarios
All costs are negligible except power
High interference cost with VM 4 and 5 interfering(high performance
degradation due to interference of VMs)
High migration cost with VM 4 and 5 interfering (high performance
degradation due to VM migration)
High migration cost and Interference cost
1)
2)
3)
4)
Policy Generated after Convergence
•
•
Initial state is all VMs on PM 1 and Phase 1
i.e. (1;((1,2,3,4,5),(),()))
All costs are negligible except power
VM
VM
VM
VM
VM
0.5
utilization
Migrate VM 1 to PM 1
Migrate VM 1 to PM 1
0.1
1
2
Migrate VM 1 to PM 2
3
4
5
1
2
3
4
Phase
Migrate VM 1 to PM 2
5
1
2
3
4
5
Policy Generated after Convergence
•
•
Initial state is all VMs on PM 1 and Phase 1
i.e. (1;((1,2,3,4,5),(),()))
High migration cost and Interference cost
VM 3 and 4 are interfering
VM
VM
VM
VM
VM
0.5
utilization
0.1
1
Migrate VM 1
2
3
4
5
Phase
1
2
3
4
5
1
2
3
4
5
Results with Full State Representation
• Algorithm verified to
converge most of the
times in 15000 steps in case 1 and 80 steps in
case 2
• Algorithm verified
to converge every time in
20000 steps in case 1 and 115 steps in case 2.
Features Based Function
Approximation
•
•
Full state representation leads to state space
explosion
• 10 PMs , 10 VMs , 5 Phases results in number
of states to be in order of 1011
• Will require huge memory and long time to
converge to a good policy
• Need approximation methods
Approximate 𝑄 𝑠; 𝑎 as 𝑄 𝑠; 𝑎 ≈ 𝜃 𝑇 σ s; a where
σ s; a is a d-dimensional feature (column vector)
corresponding to the state-action tuple 𝑠; 𝑎 and
𝜃 is a d-dimensional tunable parameter
Features for Function Approximation
•
•
•
Let there be 5 VMs, 3 PMs and 2 Phases in Cyclic Workload Model
State = [1,(1,2,3),(4),(5)] and Action = (4,3)
Next State allocation = [1,(1,2,3),(),(4,5)]
Phase Indicator of
Cyclic Workload Model
(utilization level)
Fraction of
Busy PMs in
Next State
(Power Savings)
0.7
Pairwise Indicator whether VMs
(i,j) allocated on same PM in
Next State(interference)
1
2 (1,2) (1,3) (1,4)
1
0
1
1
1
(3,5) (4,5)
…
total k features
0
1
Features for Function Approximation
Migrate VM 1
f1
start index
Migrate VM 2
Migrate VM 4
f2
f4
0
0
…
k features
0
…
0
k features
…
0.7
…
No Migration
f6
1
…
k features
0
…
0
k features
Position of fi features captures the migration cost for migrating VM i
•
•
•
•
Features except f4 are zero vectors
Store only f4 features and its start index
Perform multiplication and addition operations only for k features starting
from start index
k features are corresponding to three bullet points on previous slide
Above idea reduces number of multiplication and addition operations by around
five times the number of VMs
Results with Function Approximation
Feature based Q-Learning with Function Approximation Algorithm
•
Features are found to be non-differentiating with some state-action (s;a)
tuples. For Example
Consider state-action tuples
((5;(1,2,3,4),(5),()); (1,2)) and ((5;(2,3,4),(1,5),()); (1,1))
Above state-action tuples are differentiated by pairwise indicators only
In first case pairwise indicators (1,5) and in second case (1,2);(1,3);(1,4)
Clearly from next slide action (1,2) is good for state (5;(1,2,3,4),(5),())
while action (1,1) is bad for state (5;(2,3,4),(1,5),())
Which implies pair (1,3) and (1,4) is bad allocation and pair (1,5) is
good but they are equivalent in deployment
•
•
Optimal Policy
Initial state is all VMs on PM 1 and Phase 1
i.e. (1;((1,2,3,4,5),(),()))
High migration cost and Interference cost
VM 3 and 4 are interfering
VM
VM
VM
VM
VM
0.5
utilization
0.1
1
Migrate VM 1
2
3
4
5
Phase
1
2
3
4
5
1
2
3
4
5
Conclusion and Future Work
We conclude with this project that
• RL Algorithm with Full State Representation works very well
•
but has problem of huge state space
For these features to work for Function Approximation, we
have to add more features for interference of VMs
• Which results in huge Feature Set
• Same problem as before
Future work would involve following three issues
• Features must be able to well differentiate
•
•
between (s;a) tuple
Fast Convergence of algorithm
Scalability of Algorithm
References
1.Virtualization and Cloud Computing . Norman
Wilde. Thomas Huber
http://uwf.edu/computerscience/seminar/Docu
ments/20090911_VirtualizationAndCloud.ppt
2.VCONF : A Reinforcement Learning Approach to
Virtual Machines Auto-Configuration
http://portal.acm.org/citation.cfm?id=1555263
3.L A Prashanth and Shalabh Bhatnagar.
Reinforcement learning with function
approximation for trafic signal control.
Thank You !